Documentation for running inference or a pre-built inference server #417
Replies: 4 comments 1 reply
-
Hi @Ben-Epstein, thank you for the feedback! Yes, we do have the RESTful server feature on our roadmap and we are discussing the delivery timeframe for that. In the meantime I am moving this issue into our Discussions forum so that we can continue the discussion |
Beta Was this translation helpful? Give feedback.
-
Hi @natke just curious, is this on the nearterm roadmap, or further out? We are planning to start on this work internally, would love to collaborate if that's of interest! |
Beta Was this translation helpful? Give feedback.
-
Hi @natke - kindly asking if there are any updates to the server component? |
Beta Was this translation helpful? Give feedback.
-
@natke do you have open discussions of your roadmap that the community can join/listen into/comment on? It would be great to know if this is something coming short term (this year) vs longer term |
Beta Was this translation helpful? Give feedback.
-
This library is great, I've been testing phi-3-mini-128k, and this by far the fastest runtime for it. For a non-onnx model, id use TGI but presumably you have a more optimized setup for onnx models?
Do you have documentation for best practices around how to deploy the model and handle things like batching, streaming etc? Or, are you planning on building some RESTful server that can be deployed through a docker image?
Might be related to #313 but it's not clear what that issue is asking for
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions