HTTP API for evaluation of Bacon Numbers of actors.
The service uses Postgres database as the persistent storage and a disk cache for quick launch.
Clone the repository:
git clone https://github.com/greatvovan/bacon-number
Launch the services:
cd bacon-number
docker-compose up -d --build
Hint: subsequent times you can omit --build
key to speed up start up.
Now initialize the database. The service uses Postgres
database as primary storage of actors relationships. Container named
init
will parse the dataset from CSV files and store it to DB.
docker-compose exec init unzip dataset/*.zip -d dataset
docker-compose exec init python dataset_to_db.py
You need to do it only one time. When it is done (in around 1.5 minutes),
you can remove init
container from docker-compose.yaml
as it is
not needed any more.
HTTP API service (in container named httpapi
) launches in waiting
state, meaning it will block until the process of data population in
Postgres is completed. As soon as it is completed, the service begins
building a NetworkX Graph
of connections between actors, which takes around 2 minutes. The built
graph is then dumped to disk and subsequent launches will take just
seconds.
Ensure that API has started:
docker-compose logs -f httpapi
...
...
...
httpapi_1 | [2020-10-15 10:43:27 +0000] [8] [INFO] Waiting for application startup.
httpapi_1 | [2020-10-15 10:43:41 +0000] [8] [INFO] Application startup complete.
Run tests:
docker-compose exec httpapi pytest
Run the benchmark:
$ docker-compose exec httpapi python benchmark.py
Got 5000 random actors
5000 Bacon Numbers calculated in 4.7 (1064/s)
Got 10000 random actors
5000 random pair distances calculated in 4.8 (1038/s)
Play with some actors you know:
$ curl http://localhost:8080/bn?name=Tom+Hanks
{"dist":1}
curl 'http://localhost:8080/dist?name1=Jennifer+Aniston&name2=Davood+Goodarzi&path=true'
{"dist":8,"path":["Jennifer Aniston","Olivia Munn","Christopher Maleki","Mahmoud Behraznia","Parviz Parastui","Esmail Soltanian","Moharram Zaynalzadeh","Mohsen Makhmalbaf","Davood Goodarzi"]}
$ curl 'http://localhost:8080/dist?name1=Davood+Goodarzi&name2=Grey+Evans'
{"dist":-1}
You can shut down everything by
docker-compose down
docker-compose
will create volumes named pgdata
and graphcache
to
persist the data, so next time you launch docker-compose up
in the
same directory, all the data (and the graph dump) will be in place.
If you want to delete the volume (and the data) say
docker volume rm bacon-number_pgdata bacon-number_graphcache
Return Bacon Number of an actor.
HTTP request
GET /bn
Query parameters
name
: actor name,path
: optionaltrue/false
to indicate that you want to see the connection path, too.
Response codes
200
: OK, check the response data,404
: actor was not found in the database,500
: unexpected error occured,503
: service is initializing, retry later.
Response body
A JSON with fields dist
(integer) and path
(array of strings).
HTTP request
GET /dist
Query parameters
name1
: actor name,name2
: actor name,path
: optionaltrue/false
to indicate that you want to see the connection path, too.
Response codes
200
: OK, check the response data,404
: actor was not found in the database,500
: unexpected error occured,503
: service is initializing, retry later.
Response body
A JSON with fields dist
(integer) and path
(array of strings).