GitHub - AndrewRPorter/wiki-futures: Asynchronously download Wikipedia pages

wiki-futures

Uses python requests_futures to asynchronously download Wikipedia pages.

Installation

pip3 install --user wiki-futures

Usage

Get 10 random wikipedia articles. Returns a dictionary of (title, content) pairs for each article

from wiki_futures import dispatcher
content = dispatcher.get_content(10)

If you want to pass in custom titles you can do so like this

content = dispatcher.get_content(titles=["Python", "GitHub"])

Pass in workers value to change the default, which is 8

content = dispatcher.get_content(10, workers=4)

Get list of random titles. This is helpful because the limit for non-bots is 500.

titles = dispatcher.get_titles(1000)

Motivation

Most of my NLP projects revolved around download random wikipedia articles, I wanted a quick way to download them concurrently and I found the python wikipedia package to be too cumbersome to work around.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
tests		tests
wiki_futures		wiki_futures
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

wiki-futures

Installation

Usage

Motivation

About

Releases 1

Packages

Languages

License

AndrewRPorter/wiki-futures

Folders and files

Latest commit

History

Repository files navigation

wiki-futures

Installation

Usage

Motivation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages