Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'ExecutionEngine' object has no attribute 'schedule' #20

Open
Yash-Vekaria opened this issue Jan 20, 2024 · 1 comment
Open

'ExecutionEngine' object has no attribute 'schedule' #20

Yash-Vekaria opened this issue Jan 20, 2024 · 1 comment

Comments

@Yash-Vekaria
Copy link

Command Run: wayback-machine-scraper -f 20231201 -t 20231220 http://breitbart.com/ads.txt

Output:

2024-01-20 09:40:43 [scrapy.utils.log] INFO: Scrapy 2.11.0 started (bot: scrapybot)
2024-01-20 09:40:43 [scrapy.utils.log] INFO: Versions: lxml 4.9.2.0, libxml2 2.9.13, cssselect 1.2.0, parsel 1.8.1, w3lib 2.1.2, Twisted 22.10.0, Python 3.9.12 (main, Apr  5 2022, 01:52:34) - [Clang 12.0.0 ], pyOpenSSL 22.0.0 (OpenSSL 1.1.1o  3 May 2022), cryptography 37.0.1, Platform macOS-13.4-arm64-arm-64bit
2024-01-20 09:40:43 [scrapy.addons] INFO: Enabled addons:
[]
2024-01-20 09:40:43 [py.warnings] WARNING: /opt/miniconda3/lib/python3.9/site-packages/scrapy/utils/request.py:254: ScrapyDeprecationWarning: '2.6' is a deprecated value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting.

It is also the default value. In other words, it is normal to get this warning if you have not defined a value for the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting. This is so for backward compatibility reasons, but it will change in a future version of Scrapy.

See the documentation of the 'REQUEST_FINGERPRINTER_IMPLEMENTATION' setting for information on how to handle this deprecation.
  return cls(crawler)

2024-01-20 09:40:43 [scrapy.extensions.telnet] INFO: Telnet Password: b9b190d843dfdca9
2024-01-20 09:40:43 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.memusage.MemoryUsage',
 'scrapy.extensions.logstats.LogStats',
 'scrapy.extensions.throttle.AutoThrottle']
2024-01-20 09:40:43 [scrapy.crawler] INFO: Overridden settings:
{'AUTOTHROTTLE_ENABLED': True,
 'AUTOTHROTTLE_START_DELAY': 1,
 'AUTOTHROTTLE_TARGET_CONCURRENCY': 10.0,
 'LOG_LEVEL': 'INFO',
 'USER_AGENT': 'Wayback Machine Scraper/1.0.8 '
               '(+https://github.com/sangaline/scrapy-wayback-machine)'}
2024-01-20 09:40:44 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy_wayback_machine.WaybackMachineMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2024-01-20 09:40:44 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2024-01-20 09:40:44 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2024-01-20 09:40:44 [scrapy.core.engine] INFO: Spider opened
2024-01-20 09:40:44 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2024-01-20 09:40:44 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2024-01-20 09:40:44 [scrapy.core.scraper] ERROR: Error downloading <GET https://web.archive.org/cdx/search/cdx?url=http%3A//breitbart.com/ads.txt&output=json&fl=timestamp,original,statuscode,digest>
Traceback (most recent call last):
  File "/opt/miniconda3/lib/python3.9/site-packages/twisted/internet/defer.py", line 1697, in _inlineCallbacks
    result = context.run(gen.send, result)
  File "/opt/miniconda3/lib/python3.9/site-packages/scrapy/core/downloader/middleware.py", line 68, in process_response
    method(request=request, response=response, spider=spider)
  File "/opt/miniconda3/lib/python3.9/site-packages/scrapy_wayback_machine/__init__.py", line 83, in process_response
    self.crawler.engine.schedule(snapshot_request, spider)
AttributeError: 'ExecutionEngine' object has no attribute 'schedule'
2024-01-20 09:40:44 [scrapy.core.engine] INFO: Closing spider (finished)
2024-01-20 09:40:44 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 370,
 'downloader/request_count': 1,
 'downloader/request_method_count/GET': 1,
 'downloader/response_bytes': 2366,
 'downloader/response_count': 1,
 'downloader/response_status_count/200': 1,
 'elapsed_time_seconds': 0.54403,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2024, 1, 20, 17, 40, 44, 757760, tzinfo=datetime.timezone.utc),
 'httpcompression/response_bytes': 9831,
 'httpcompression/response_count': 1,
 'log_count/ERROR': 1,
 'log_count/INFO': 10,
 'log_count/WARNING': 1,
 'memusage/max': 69681152,
 'memusage/startup': 69681152,
 'scheduler/dequeued': 2,
 'scheduler/dequeued/memory': 2,
 'scheduler/enqueued': 2,
 'scheduler/enqueued/memory': 2,
 'start_time': datetime.datetime(2024, 1, 20, 17, 40, 44, 213730, tzinfo=datetime.timezone.utc)}
2024-01-20 09:40:44 [scrapy.core.engine] INFO: Spider closed (finished)
@laulaz
Copy link

laulaz commented Jan 29, 2024

See #18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants