Documentation for running inference or a pre-built inference server #417

Ben-Epstein · 2024-05-06T20:50:18Z

Ben-Epstein
May 6, 2024

This library is great, I've been testing phi-3-mini-128k, and this by far the fastest runtime for it. For a non-onnx model, id use TGI but presumably you have a more optimized setup for onnx models?

Do you have documentation for best practices around how to deploy the model and handle things like batching, streaming etc? Or, are you planning on building some RESTful server that can be deployed through a docker image?

Might be related to #313 but it's not clear what that issue is asking for

Thanks!

natke · 2024-05-08T18:28:17Z

natke
May 8, 2024
Collaborator

Hi @Ben-Epstein, thank you for the feedback! Yes, we do have the RESTful server feature on our roadmap and we are discussing the delivery timeframe for that. In the meantime I am moving this issue into our Discussions forum so that we can continue the discussion

1 reply

Ben-Epstein May 8, 2024
Author

@natke that's great to hear! I will definitely be using it.

Do you have any tips for doing this self-serve until that is ready? Of course it's straightforward to host a rest service and call generate, but any tips around batching or async processing so we can serve it at scale?

Ben-Epstein · 2024-05-20T16:14:25Z

Ben-Epstein
May 20, 2024
Author

Hi @natke just curious, is this on the nearterm roadmap, or further out? We are planning to start on this work internally, would love to collaborate if that's of interest!

0 replies

mgiessing · 2024-06-21T15:16:45Z

mgiessing
Jun 21, 2024

Hi @natke - kindly asking if there are any updates to the server component?
Thank you!

0 replies

Ben-Epstein · 2024-07-08T15:37:41Z

Ben-Epstein
Jul 8, 2024
Author

@natke do you have open discussions of your roadmap that the community can join/listen into/comment on? It would be great to know if this is something coming short term (this year) vs longer term

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation for running inference or a pre-built inference server #417

{{title}}

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Documentation for running inference or a pre-built inference server #417

Ben-Epstein May 6, 2024

Replies: 4 comments · 1 reply

natke May 8, 2024 Collaborator

Ben-Epstein May 8, 2024 Author

Ben-Epstein May 20, 2024 Author

mgiessing Jun 21, 2024

Ben-Epstein Jul 8, 2024 Author

Ben-Epstein
May 6, 2024

Replies: 4 comments 1 reply

natke
May 8, 2024
Collaborator

Ben-Epstein May 8, 2024
Author

Ben-Epstein
May 20, 2024
Author

mgiessing
Jun 21, 2024

Ben-Epstein
Jul 8, 2024
Author