用python 写简单的网络爬虫抓取google play的app 信息

最近在做一个项目，需要抓取 google play 的应用信息，包括app的包名，icon, 截图，评论等。
又因为一直被python的优雅语法所吸引，所以决定用python 做做项目，当练手都好。
分析了一下google play 的网页格式，发现该要信息静态页面基本都有了，好嗨森，不用去分析javascript。
其实只要拿到了应用的包名(com.xxx.xxx之类的) 就可以直接跳到应用的detail页面抓信息了。
不过有一点是比较麻烦的是怎么拿到尽可能多的包名？
首先来看一下:
https://play.google.com/store/apps/category/GAME/collection/topselling_free?start=480&num=26
像这样的一个url ，如果start超过了499，google 会自动屏蔽的。
这里用了两个小技巧：

就是在每个app的detail页面会有相关的其它app的推荐，可以由此递归找下去。
用关键字search出相应的app,再递归找下去(这个还没做，不过策略这种东西什么时候加都可以)
按照这种想法，用python建立一个生产者消费者的模型，把新找到的包名加到队列，再依次读出来去请求detail页抓取app的信息.
python的生产者消费者模型实现起来就有很多种方式了:

生产者

class QueueState:
    stop = 0
    running = 1

class GLCManager(object):
    __metaclass__ = Singleton #网上找到python singleton的写法，要实现Singleton的类， 以后有机会再研究
    taskQueue = Queue()
    consumers = []
    queueState = QueueState.stop
    #game id Manager for consumer
    def __init__(self):
        self.queueState = QueueState.stop;
        for i in range(1, 5):
            consumer = GLCConsumer()
            self.consumers.append(consumer)
    def __del__(self):
        self.stop();
    def addGameTask(self , id):
        self.taskQueue.put(id)
    def start(self):
        if self.queueState == QueueState.stop:
            for consumer in self.consumers:
                consumer.start()
            self.queueState = QueueState.running
    def stop(self):
        if self.queueState == QueueState.running:
            for consumer in self.consumers:
                consumer.isStop = True
                consumer.join()
            self.queueState = QueueState.stop

消费者

class GLCConsumer(threading.Thread):
    isStop = False
    def __init__(self):
        threading.Thread.__init__(self)
    def run(self):
        from Manager import GLCManager
        taskManager = GLCManager()
        while not self.isStop or not taskManager.taskQueue.empty():
            item = taskManager.taskQueue.get()
            if item is not None:
                logging.debug("Extracting game: " + item)
                extractor = GameInfoExtractor(item)#开始抓信息啦～～～～
                extractor.run()
                time.sleep(0.05)