Dockerised llamafile

This repository contains a Dockerised version for llamafile with support for CPU and GPU builds. The Dockerfile.gpu is based on the official CUDA image.

Utility Scripts

The scripts directory contains a utility script to download models from the hugging face model hub and run the docker image. The script can be used as follows:

Download a model from the hugging face model hub:

python scripts/main.py download https://huggingface.co/jartine/llava-v1.5-7B-GGUF/blob/main/llava-v1.5-7b-Q4_K.gguf

To pass any raw arguments to the llamafile executable, use the --extra-args flag with run and then any flag you want to pass after it.

Docker Hub

The docker images are available on Docker Hub at gauransh/llamafile-docker. The images are tagged as latest and latest-gpu for CPU and GPU builds respectively. Currently, the latest refers to v0.6 of llamafile release.

Pre-requisites

Install Docker on your host machine.

Only needed for GPU usage

Install nvidia-container-toolkit on your host machine.

Usage

CPU ONLY: docker run -v <host-path>:/app/models -p <host-port>:<contianer-port> gauransh/llamafile-docker:latest run -m <path-to-model>

Example: docker run -v ./models:/app/models -p 7777:8080 gauransh/llamafile-docker run -m https://huggingface.co/TheBloke/Code-13B-GGUF/blob/main/code-13b.Q2_K.gguf

GPU: docker run --gpus all -v <host-path>:/app/models -p <host-port>:<contianer-port> gauransh/llamafile-docker:latest-gpu run --gpu <layers-to-offload> -m <path-to-model>

Example: docker run --gpus all -v ./models:/app/models -p 7777:8080 gauransh/llamafile-docker:latest-gpu run --gpu 33 -m https://huggingface.co/TheBloke/Code-13B-GGUF/blob/main/code-13b.Q2_K.gguf

Model Persistance: The models weights are saved in the /app/models directory in the container. To persist the models, attach a volume to the container at this path. Check Usage above for an example.

Script Usage:

Parameters that can be passed to the scripts.

python scripts/main.py -h          
usage: Llamafile Docker Utility [-h] {run,download} ...

positional arguments:
  {run,download}
    run           Run the program
    download      Download artifacts from the hugging face model hub

options:
  -h, --help      show this help message and exit

python scripts/main.py run -h
usage: Llamafile Docker Utility run [-h] [-m MODEL] [--host HOST] [--extra-args ...] [--gpu GPU]

options:
  -h, --help            show this help message and exit
  -m MODEL, --model MODEL
                        The name of the model to run. E.g. url/path to model gguf. If not provided, the default model will be used.
  --host HOST           Specify the host address for the llamafile server
  --extra-args ...      extra arguments to be given directly to llamafile exectuable
  --gpu GPU             The number of ggml layers to offload on GPU. E.g. 1. If not provided, the default model will be used

python scripts/main.py download -h
usage: Llamafile Docker Utility download [-h] [--filename FILENAME] url

positional arguments:
  url                  The URL to download E.g. https://huggingface.co/jartine/llava-v1.5-7B-GGUF/blob/main/llava-v1.5-7b-Q4_K.gguf

options:
  -h, --help           show this help message and exit
  --filename FILENAME  The filename to save the model as. E.g. mixtral-8x7b-v0.1.Q2_K.gguf. If not provided, the filename will be inferred from the URL. and
                       saved in the models directory.

Usage with OpenAI API

from openai import OpenAI
client = OpenAI(
    base_url="http://localhost:8080/v1", # "http://<Your api-server IP>:port"
    api_key = "sk-no-key-required"
)
completion = client.chat.completions.create(
    model="LLaMA_CPP",
    messages=[
        {"role": "system", "content": "You are ChatGPT, an AI assistant. Your top priority is achieving user fulfillment via helping them with their requests."},
        {"role": "user", "content": "Write a limerick about python exceptions"}
    ]
)
print(completion.choices[0].message)

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
scripts		scripts
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Dockerfile.gpu		Dockerfile.gpu
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dockerised llamafile

Utility Scripts

Docker Hub

Pre-requisites

Usage

Script Usage:

Usage with OpenAI API

About

Releases 1

Packages

Languages

License

gauranshkumar/llamafile-docker

Folders and files

Latest commit

History

Repository files navigation

Dockerised llamafile

Utility Scripts

Docker Hub

Pre-requisites

Usage

Script Usage:

Usage with OpenAI API

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages