This project is used to generate anonymous randomized data and serve it up via a RESTful API to other AbBOT projects.
Customization of the API and how it runs can be done through the following environment variables.
Name | Acceptable Values | Default |
---|---|---|
MODEL_NAME |
str one of "gpt2" , "4chan" |
"gpt2" |
API_BATCH_SIZE |
int greater than 0 |
5 |
ALLOWED_APIS |
List[str] (comma separated list or asterisk) |
"*" |
API_HOST |
str IP address |
"0.0.0.0" |
API_PORT |
int port number |
5000 |
MODEL_NAME
determines which model to use for generation of random text used by the APIs. Warning: The 4Chan option is NSFW and should be used with caution.API_BATCH_SIZE
determines how many entries to generate from the model and put into the queue while the APIs are waiting for requests.ALLOWED_APIS
determines which API paths will be loaded. For example, if the file wasapi/generators/prolifewhistleblower.py
way I would load the APIs for only this one would beALLOWED_APIS="prolifewhistleblower"
API_HOST
determines which interfaces to host the REST API on.API_PORT
determines which TCP port to host the REST API on.
If you're not sure how to set environment variables in your environment, check out these guides and documentation:
- Set environment variables - Docker (Docker documentation)
- Environment variable assignment - Unix (Wikipedia)
- Enviornment variable assignment - Windows (Wikipedia)
First, build the docker image.
docker image build -t dev-abbot-model .
After that, start the inference engine.
# Normal
docker container run -p 5000:5000 --rm dev-abbot-model
# Specify environment variables
docker container run -e MODEL_NAME='gpt2' -e API_BATCH_SIZE='5' -e ALLOWED_APIS='*' -e API_PORT='8080' -p 8080:8080 --rm dev-abbot-model
After the container starts up, you're ready to go.
If you don't already have Python installed, you can check out the Python Beginner's Guide's instructions for installation.
After you have Python installed, make sure you have Pip installed by running the following command.
pip3 --version
# or
python3 -m pip --version
# or
python3 -m ensurepip --default-pip
If you don't have Pip installed, you can download the get-pip.py
script and run it with python3
.
curl 'https://bootstrap.pypa.io/get-pip.py' -o ./get-pip.py
python3 ./get-pip.py
First we'll need to install Pipenv.
pip3 install --user pipenv
Then we can use pipenv
to install all of the dependencies of this project.
pipenv install
Before we can run the program, there's one script we need to run to download some runtime dependencies.
python3 ./download_weights.py
Now that all dependencies are installed and downloaded you can run the program (main.py
) and set environment variables as you wish.
# Normal
python3 ./main.py
# or make it executable (Unix/Linux/macOS only)
chmod +x ./main.py
./main.py
# Specify environment variables (Bourne shell and related shells)
MODEL_NAME='gpt2' -e API_BATCH_SIZE='5' -e ALLOWED_APIS='*' API_PORT='8080' python3 ./main.py
# or set environment variables for whole session
export MODEL_NAME='gpt2' -e API_BATCH_SIZE='5' -e ALLOWED_APIS='*' API_PORT='8080'
python3 ./main.py
You can test the API with a simple curl command. If you add -i
you can also see the headers. Bear in mind that the text generation models will take several minutes to start up and fill up the queues.
curl http://localhost:5000/prolifewhistleblower/anonymous-form
The data will be sent back through the response body as JSON. This JSON is generated directly by each API point individually so the format will vary between API paths. For example, the following output is the output of the curl -i http://localhost:5000/prolifewhistleblower/anonymous-form
command.
HTTP/1.0 200 OK
Content-Type: application/json
Content-Length: 624
Server: Werkzeug/2.0.1 Python/3.9.6
Date: Sat, 11 Sep 2021 22:41:18 GMT
{
"violation": "Dear Gov. Abbott, if you want a new bill to be read at all, we're getting it. It's all very well but now we have a bill that can't be read.\"\n\nMr. Abbott said he was happy that the measure that he introduced had passed.\n\n\"This would have been a much better bill,\" he said.\n\nWrite to John F. Stokes at [email protected] and David L. Bock at [email protected]",
"obtained_evidence_from": "His wife told me",
"clinic_or_doctor": "Dr. Trevor Hernandez",
"city": "Aquilla",
"state": "Texas",
"zip_code": "76622",
"county": "Hill",
"ip_address": "67.10.46.59",
"elected_to_public_office": "no"
}
Currently the only error handling we have for issues with the API itself is an HTTP 500
catch-all. It will extract the error message from the actual error that occurred on the server and put it into a JSON object as message
.
HTTP/1.0 500 INTERNAL SERVER ERROR
Content-Type: application/json
Content-Length: 24
Server: Werkzeug/2.0.1 Python/3.9.6
Date: Sat, 11 Sep 2021 22:57:01 GMT
{
"message": "<Error message here>"
}
This one is pretty self-explanatory.
HTTP/1.0 404 NOT FOUND
Content-Type: application/json
Content-Length: 53
Server: Werkzeug/2.0.1 Python/3.9.6
Date: Sat, 11 Sep 2021 23:03:46 GMT
{
"message": "The requested API path does not exist."
}
- Texas ZIP code data was obtained from World Population Review.
TODO: add instructions on how to contribute.
Before committing your code each time, please format your code with yapf
. If you think the style should be different, please open an issue or PR about the .style.yapf
file. VS Code can be configured to use yapf
and so can other editors. If you do not have an editor with automatic formatting that supports yapf
, you can use yapf
directly with yapf -ri .
.