从m.zhihu.com/topics找到所有的话题的内容,点击“更多”,发现请求的是'https://m.zhihu.com/node/TopicsPlazzaListV2', 并且发送的FormData为:
def get_topic_url(self, response): topics = response.css('.item .blk > a[target=_blank]::attr(href)').extract() _xsrf = response.css('input[name="_xsrf"]::attr(value)').extract()[0] for topic in topics: print topic data = response.css('.zh-general-list::attr(data-init)').extract() import json param = json.loads(data[0]) topic_id = param['params']['topic_id'] hash_id = param['params']['hash_id'] offset = param['params']['offset'] yield scrapy.FormRequest( url="https://m.zhihu.com/node/TopicsPlazzaListV2", headers=headers, formdata={ "method":"next", "params":{ "topic_id":topic_id, "offset":offset, "hash_id":hash_id, }, "_xsrf":_xsrf, }, meta={ "proxy": proxy, "cookiejar": response.meta["cookiejar"], }, callback=self.get_topic_url, )
但是返回的是400代码,是不是代码哪里写错了?请指教
2016-05-08 10:43:52 [scrapy] DEBUG: Retryinghttps://m.zhihu.com/node/TopicsPlazzaListV2> (failed 1 times): 400 Bad Request 2016-05-08 10:43:53 [scrapy] DEBUG: Retrying https://m.zhihu.com/node/TopicsPlazzaListV2> (failed 2 times): 400 Bad Request 2016-05-08 10:43:53 [scrapy] DEBUG: Gave up retrying https://m.zhihu.com/node/TopicsPlazzaListV2> (failed 3 times): 400 Bad Request 2016-05-08 10:43:53 [scrapy] DEBUG: Crawled (400) https://m.zhihu.com/node/TopicsPlazzaListV2> (referer: https://m.zhihu.com/topics) 2016-05-08 10:43:53 [scrapy] DEBUG: Ignoring response <400 https://m.zhihu.com/node/TopicsPlazzaListV2>: HTTP status code is not handled or not allowed
把header设置成手机浏览器的header试试。