Skip to content

Latest commit

 

History

History
62 lines (50 loc) · 2.53 KB

README.MD

File metadata and controls

62 lines (50 loc) · 2.53 KB

Wikigraph

The file wikigraph.py implements classes for finding paths between wikipedia articles and other related functions using the wikimedia API. A path is created by linking articles by the links they contain, just like the wikipedia game. See blog post https://winstonjay.github.io/posts/homunculus for more info on project motivations.

Basic Use

install requiremnts

python findpath.py --start="Car" --end="Home"

Example session:

The main method find_path is better run in a shell session or in a batch collection as its use of memoization will speed up searches whilst it runs, reducing requests to the Wikimedia API.

>>> import wikigraph
>>> w = wikigraph.WikiGraph()
>>> path = w.find_path(start="Tom Hanks", end="Kevin Bacon")
>>> print(path)
<wikigraph.Path: Tom Hanks -> Kevin Bacon>
>>> print(path.info)
Path:
        Path:        Tom Hanks -> Kevin Bacon
        Separation:  1 steps
        Time Taken:  0.578131 seconds
        Requests:    2

>>> path.data
{'start': 'Tom Hanks', 'end': 'Kevin Bacon', 'path': 'Tom Hanks->Kevin Bacon', 'degree': 1}
>>> print(path.json(indent=2))
{
  "start": "Tom Hanks",
  "end": "Kevin Bacon",
  "path": "Tom Hanks->Kevin Bacon",
  "degree": 1
}

collectbatch.py

For a given sample of start articles find a path from each to a central end article. Save the output to a given csv file. Without start list specified, program will default to collecting an k sized random sample generated by the wikimedia API. For more info, See command line arg details below.

usage:

-h, --help            show this help message and exit
-o OUTFILE, --outfile OUTFILE
                        Filename to save the results to.
-x CENTER, --center CENTER
                        Title of valid wiki page to center all nodes on
-k SAMPLE_SIZE, --sample_size SAMPLE_SIZE
                        Sample size of k pages to search from. (Only applies
                        when sample source is not given)
-s SAMPLE_SOURCE, --sample_source SAMPLE_SOURCE
                        Filename containing newline delimited list of valid
                        wiki article titles if not specified sample defaults
                        to random selection from wikimedia api.
-v                    add to display titles of page requests made.

Requirements: requests