We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Command Run: wayback-machine-scraper -f 20231201 -t 20231220 http://breitbart.com/ads.txt
wayback-machine-scraper -f 20231201 -t 20231220 http://breitbart.com/ads.txt
Output:
2024-01-20 09:40:43 [scrapy.utils.log] INFO: Scrapy 2.11.0 started (bot: scrapybot) 2024-01-20 09:40:43 [scrapy.utils.log] INFO: Versions: lxml 4.9.2.0, libxml2 2.9.13, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.2, Twisted 22.10.0, Python 3.9.12 (main, Apr 5 2022, 01:52:34) - [Clang 12.0.0 ], pyOpenSSL 22.0.0 (OpenSSL 1.1.1o 3 May 2022), cryptography 37.0.1, Platform macOS-13.4-arm64-arm-64bit 2024-01-20 09:40:43 [scrapy.addons] INFO: Enabled addons: [] 2024-01-20 09:40:43 [py.warnings] WARNING: /opt/miniconda3/lib/python3.9/site-packages/scrapy/utils/request.py:254: ScrapyDeprecationWarning: '2.6' is a deprecated value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting. It is also the default value. In other words, it is normal to get this warning if you have not defined a value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting. This is so for backward compatibility reasons, but it will change in a future version of Scrapy. See the documentation of the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting for information on how to handle this deprecation. return cls(crawler) 2024-01-20 09:40:43 [scrapy.extensions.telnet] INFO: Telnet Password: b9b190d843dfdca9 2024-01-20 09:40:43 [scrapy.middleware] INFO: Enabled extensions: ['scrapy.extensions.corestats.CoreStats', 'scrapy.extensions.telnet.TelnetConsole', 'scrapy.extensions.memusage.MemoryUsage', 'scrapy.extensions.logstats.LogStats', 'scrapy.extensions.throttle.AutoThrottle'] 2024-01-20 09:40:43 [scrapy.crawler] INFO: Overridden settings: {'AUTOTHROTTLE_ENABLED': True, 'AUTOTHROTTLE_START_DELAY': 1, 'AUTOTHROTTLE_TARGET_CONCURRENCY': 10.0, 'LOG_LEVEL': 'INFO', 'USER_AGENT': 'Wayback Machine Scraper/1.0.8 ' '(+https://github.com/sangaline/scrapy-wayback-machine)'} 2024-01-20 09:40:44 [scrapy.middleware] INFO: Enabled downloader middlewares: ['scrapy_wayback_machine.WaybackMachineMiddleware', 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 'scrapy.downloadermiddlewares.retry.RetryMiddleware', 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 'scrapy.downloadermiddlewares.stats.DownloaderStats'] 2024-01-20 09:40:44 [scrapy.middleware] INFO: Enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 'scrapy.spidermiddlewares.referer.RefererMiddleware', 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 'scrapy.spidermiddlewares.depth.DepthMiddleware'] 2024-01-20 09:40:44 [scrapy.middleware] INFO: Enabled item pipelines: [] 2024-01-20 09:40:44 [scrapy.core.engine] INFO: Spider opened 2024-01-20 09:40:44 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2024-01-20 09:40:44 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023 2024-01-20 09:40:44 [scrapy.core.scraper] ERROR: Error downloading <GET https://web.archive.org/cdx/search/cdx?url=http%3A//breitbart.com/ads.txt&output=json&fl=timestamp,original,statuscode,digest> Traceback (most recent call last): File "/opt/miniconda3/lib/python3.9/site-packages/twisted/internet/defer.py", line 1697, in _inlineCallbacks result = context.run(gen.send, result) File "/opt/miniconda3/lib/python3.9/site-packages/scrapy/core/downloader/middleware.py", line 68, in process_response method(request=request, response=response, spider=spider) File "/opt/miniconda3/lib/python3.9/site-packages/scrapy_wayback_machine/__init__.py", line 83, in process_response self.crawler.engine.schedule(snapshot_request, spider) AttributeError: 'ExecutionEngine' object has no attribute 'schedule' 2024-01-20 09:40:44 [scrapy.core.engine] INFO: Closing spider (finished) 2024-01-20 09:40:44 [scrapy.statscollectors] INFO: Dumping Scrapy stats: {'downloader/request_bytes': 370, 'downloader/request_count': 1, 'downloader/request_method_count/GET': 1, 'downloader/response_bytes': 2366, 'downloader/response_count': 1, 'downloader/response_status_count/200': 1, 'elapsed_time_seconds': 0.54403, 'finish_reason': 'finished', 'finish_time': datetime.datetime(2024, 1, 20, 17, 40, 44, 757760, tzinfo=datetime.timezone.utc), 'httpcompression/response_bytes': 9831, 'httpcompression/response_count': 1, 'log_count/ERROR': 1, 'log_count/INFO': 10, 'log_count/WARNING': 1, 'memusage/max': 69681152, 'memusage/startup': 69681152, 'scheduler/dequeued': 2, 'scheduler/dequeued/memory': 2, 'scheduler/enqueued': 2, 'scheduler/enqueued/memory': 2, 'start_time': datetime.datetime(2024, 1, 20, 17, 40, 44, 213730, tzinfo=datetime.timezone.utc)} 2024-01-20 09:40:44 [scrapy.core.engine] INFO: Spider closed (finished)
The text was updated successfully, but these errors were encountered:
See #18
Sorry, something went wrong.
No branches or pull requests
Command Run:
wayback-machine-scraper -f 20231201 -t 20231220 http://breitbart.com/ads.txt
Output:
The text was updated successfully, but these errors were encountered: