Scraper is eating our CPU. #234

mooncalfskb · 2018-12-10T06:37:05Z

Dear Congress Folk:
As the year progress and the congress xml files on govinfo.gov get bigger and bigger, I am finding that downloading some files is eating 80-90% of our CPU, causing server lag times. Some example problem files are:

https://www.govinfo.gov/sitemap/BILLS_2018_sitemap.xml
https://www.govinfo.gov/sitemap/bulkdata/BILLSTATUS/115hr/sitemap.xml

I would humbly suggest that you need to put a stream handler in the download function of utils in order to download the files in chunks. Examples suggested here:
https://stackoverflow.com/questions/16694907/how-to-download-large-file-in-python-with-requests-py

Thanks
Sherrod

konklone · 2018-12-10T06:44:03Z

@mooncalfskb Thanks for identifying this. Would you be up for submitting a pull request which refits our download method to use a streaming handler?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scraper is eating our CPU. #234

Scraper is eating our CPU. #234

mooncalfskb commented Dec 10, 2018

konklone commented Dec 10, 2018

Scraper is eating our CPU. #234

Scraper is eating our CPU. #234

Comments

mooncalfskb commented Dec 10, 2018

konklone commented Dec 10, 2018