scrapy 中如何设置爬虫请求之间的时间间隔,爬取太快容易被封 IP?

scrapy 中如何设置爬虫请求之间的时间间隔,爬取太快容易被封 IP?

1 个解决方案

AllenJiang
中间件研发,关注微信公众号 : 小哈学Java, 回复"666", 即可免费领取10G学习&面试资料

可以在 settings 文件里面进行设置:

DOWNLOAD_DELAY = 0.25 # 250 ms 的延迟

以下是官方文档:

The amount of time (in secs) that the downloader should wait before downloading consecutive pages from the same spider. This can be used to throttle the crawling speed to avoid hitting servers too hard. Decimal numbers are supported. Example:

DOWNLOAD_DELAY = 0.25 # 250 ms of delay This setting is also affected by the RANDOMIZE_DOWNLOAD_DELAY setting (which is enabled by default). By default, Scrapy doesn’t wait a fixed amount of time between requests, but uses a random interval between 0.5 and 1.5 * DOWNLOAD_DELAY.

You can also change this setting per spider.