llm-serving

Here are 76 public repositories matching this topic...

ray-project / ray

Ray is an AI compute engine. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.

Updated Dec 23, 2024
Python

vllm-project / vllm

Sponsor

Star

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda inference pytorch transformer llama gpt rocm model-serving tpu hpu mlops xpu llm inferentia llmops llm-serving trainium

Updated Dec 22, 2024
Python

liguodongiot / llm-action

Star

本项目旨在分享大模型相关技术原理以及实战经验（大模型工程化、大模型应用落地）

llm llmops llm-serving llm-training llm-inference

Updated Dec 17, 2024
HTML

bentoml / OpenLLM

Star

Run any open-source LLMs, such as Llama, Mistral, as OpenAI compatible API endpoint in the cloud.

llama mistral fine-tuning mlops bentoml vicuna llm model-inference llmops llm-serving llm-inference open-source-llm llama2 openllm llm-ops llama3-1 llama3-2 llama3-2-vision

Updated Dec 19, 2024
Python

bentoml / BentoML

Star

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Dec 20, 2024
Python

skypilot-org / skypilot

Star

SkyPilot: Run AI and batch jobs on any infra (Kubernetes or 12+ clouds). Get unified execution, cost savings, and high GPU availability via a simple interface.

Updated Dec 22, 2024
Python

sgl-project / sglang

Star

SGLang is a fast serving framework for large language models and vision language models.

cuda inference pytorch transformer moe llama vlm llm llm-serving llava llama2 llama3 llama3-1

Updated Dec 23, 2024
Python

superduper-io / superduper

Star

Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.

Updated Dec 21, 2024
Python

predibase / lorax

Star

Multi-LoRA inference server that scales to 1000s of fine-tuned LLMs

transformers pytorch llama gpt lora model-serving fine-tuning llm llmops llm-serving llm-inference

Updated Dec 20, 2024
Python

microsoft / aici

Star

AICI: Prompts as (Wasm) Programs

rust ai wasm inference transformer language-model model-serving wasmtime llm llmops llm-serving llm-inference llm-framework

Updated Nov 10, 2024
Rust

ray-project / ray-llm

Star

RayLLM - LLMs on Ray

distributed-systems transformers ray serving large-language-models llm llmops llm-serving llm-inference

Updated May 28, 2024
Python

mosecorg / mosec

Star

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

python rust machine-learning deep-learning mxnet tensorflow gpu cv pytorch tts hacktoberfest model-serving nerual-network machine-learning-platform jax mlops llm llm-serving

Updated Dec 12, 2024
Python

efeslab / Nanoflow

Star

A throughput-oriented high-performance serving framework for LLMs

cuda inference model-serving llm llm-serving llama2

Updated Sep 21, 2024
Cuda

alibaba / rtp-llm

Star

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

inference llama gpt model-serving llm llmops llm-serving

Updated Oct 14, 2024
C++

rohan-paul / LLM-FineTuning-Large-Language-Models

Star

LLM (Large Language Model) FineTuning

pytorch gpt-3 large-language-models llm llm-serving gpt3-turbo llm-training llm-inference open-source-llm llama2 llm-finetuning mistral-7b

Updated May 19, 2024
Jupyter Notebook

hpcaitech / SwiftInfer

Star

Efficient AI Inference & Serving

deep-learning inference artificial-intelligence llama gpt llm-serving llm-inference llama2

Updated Jan 8, 2024
Python

zhihu / ZhiLight

Star

A highly optimized LLM inference acceleration engine for Llama and its variants.

cuda pytorch llama gpt cpm inference-engine llm llm-serving qwen minicpm

Updated Dec 16, 2024
C++

Multi-node production GenAI stack. Run the best of open source AI easily on your own servers. Easily add knowledge from documents and scrape websites. Create your own AI by fine-tuning open source models. Integrate LLMs with APIs. Run gptscript securely on the server

Updated Dec 22, 2024
Go

ray-project / ray-educational-materials

Star

This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.

deep-learning ray distributed-machine-learning ray-tune ray-train ray-distributed llm generative-ai ray-serve ray-data llm-serving llm-inference

Updated Feb 13, 2024
Jupyter Notebook

galeselee / Awesome_LLM_System-PaperList

Star

Since the emergence of chatGPT in 2022, the acceleration of Large Language Model has become increasingly important. Here is a list of papers on accelerating LLMs, currently focusing mainly on inference acceleration, and related works will be gradually added in the future. Welcome contributions!

system papers paperlist llm-serving llm-inference

Updated Dec 22, 2024

Improve this page

Add a description, image, and links to the llm-serving topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the llm-serving topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llm-serving

Here are 76 public repositories matching this topic...

ray-project / ray

vllm-project / vllm

liguodongiot / llm-action

bentoml / OpenLLM

bentoml / BentoML

skypilot-org / skypilot

sgl-project / sglang

superduper-io / superduper

predibase / lorax

microsoft / aici

ray-project / ray-llm

mosecorg / mosec

efeslab / Nanoflow

alibaba / rtp-llm

rohan-paul / LLM-FineTuning-Large-Language-Models

hpcaitech / SwiftInfer

zhihu / ZhiLight

helixml / helix

ray-project / ray-educational-materials

galeselee / Awesome_LLM_System-PaperList

Improve this page

Add this topic to your repo