v0.9.5
0.9.5
Features, Fixes, and Improvements
- Fixed the automated pypi deploys by @paulguerrie in #126
- Fixed broken docs links for entities by @paulguerrie in #127
- revert accidental change to makefile by @sberan in #128
- Update compatability_matrix.md by @capjamesg in #129
- Model Validation On Load by @paulguerrie in #131
- Use Simple Docker Commands in Tests by @paulguerrie in #132
- No Exception Raised By Model Manager Remove Model by @paulguerrie in #134
- Noted that inference stream only supports object detection by @stellasphere in #136
- Fix URL in docs image by @capjamesg in #138
- Deduce API keys from logs by @PawelPeczek-Roboflow in #140
- Fix problem with BGR->RGB and RGB->BGR conversions by @PawelPeczek-Roboflow in #137
- Update default API key parameter for get_roboflow_model function by @SkalskiP in #142
- Documentation improvements by @capjamesg in #133
- Hosted Inference Bug Fixes by @paulguerrie in #143
- Introduce Active Learning by @PawelPeczek-Roboflow in #130
- Update HTTP inference docs by @capjamesg in #145
- Speed Regression Fix - Remove Numpy Range Validation by @paulguerrie in #146
- Introduce additional active learning sampling strategies by @PawelPeczek-Roboflow in #148
- Add stub endpoints to allow data collection without model by @PawelPeczek-Roboflow in #141
- Fix CLIP example by @capjamesg in #150
- Fix outdated warning with 'inference' upgrade suggestion by @PawelPeczek-Roboflow in #154
- Allow setting cv2 camera capture props from .env file by @sberan in #152
- Wrap pingback url by @robiscoding in #151
- Introduce new stream interface by @PawelPeczek-Roboflow in #156
- Clarify Enterprise License by @yeldarby in #158
- Async Model Manager by @probicheaux in #111
- Peter/async model manager by @probicheaux in #159
- Fix Critical and High Vulnerabilities in Docker Images by @paulguerrie in #157
- Split Requirements For Unit vs. Integration Tests by @paulguerrie in #160
Full Changelog: v0.9.3...v0.9.5.rc2
New inference.Stream
interface
We are excited to introduce the upgraded version of our stream interface: InferencePipeline
. Additionally, the WebcamStream
class has evolved into a more versatile VideoSource
.
This new abstraction is not only faster and more stable but also provides more granular control over the entire inference process.
Can I still use inference.Stream
?
Absolutely! The old components remain unchanged for now. However, be aware that this abstraction is slated for deprecation over time. We encourage you to explore the new InferencePipeline
interface and take advantage of its benefits.
What has been improved?
- Performance: Experience A significant boost in throughput, up to 5 times, and improved latency for online inference on video streams using the YOLOv8n model.
- Stability:
InferencePipeline
can now automatically re-establish a connection for online video streams if a connection is lost. - Prediction Sinks: Introducing prediction sinks, simplifying the utilization of predictions without the need for custom code.
- Control Over Inference Process:
InferencePipeline
intelligently adapts to the type of video source, whether a file or stream. Video files are processed frame by frame, while online streams prioritize real-time processing, dropping non-real-time frames. - Observability: Gain insights into the processing state through events exposed by
InferencePipeline
. Reference implementations letting you to monitor processing are also available.
How to Migrate to the new Inference Stream interface?
You need to change a few lines of code to migrate to using the new Inference stream interface.
Below is an example that shows the old interface:
import inference
def on_prediction(predictions, image):
pass
inference.Stream(
source="webcam", # or "rstp://0.0.0.0:8000/password" for RTSP stream, or "file.mp4" for video
model="rock-paper-scissors-sxsw/11", # from Universe
output_channel_order="BGR",
use_main_thread=True, # for opencv display
on_prediction=on_prediction,
)
Here is the same code expressed in the new interface:
from inference.core.interfaces.stream.inference_pipeline import InferencePipeline
from inference.core.interfaces.stream.sinks import render_boxes
pipeline = InferencePipeline.init(
model_id="rock-paper-scissors-sxsw/11",
video_reference=0,
on_prediction=render_boxes,
)
pipeline.start()
pipeline.join()
Note the slight change in the on_prediction handler, from:
def on_prediction(predictions: dict, image: np.ndarray) -> None:
pass
Into:
from inference.core.interfaces.camera.entities import VideoFrame
def on_prediction(predictions: dict, video_frame: VideoFrame) -> None:
pass
Want to know more?
Here are useful references:
Parallel Robofolow Inference server
The Roboflow Inference Server supports concurrent processing. This version of the server accepts and processes requests asynchronously, running the web server, preprocessing, auto batching, inference, and post processing all in separate threads to increase server FPS throughput. Separate requests to the same model will be batched on the fly as allowed by $MAX_BATCH_SIZE
, and then response handling will occurr independently. Images are passed via Python's SharedMemory module to maximize throughput.
These changes result in as much as a 76% speedup on one measured workload.
Note
Currently, only Object Detection, Instance Segmentation, and Classification models are supported by this module. Core models are not enabled.
Important
We require a Roboflow Enterprise License to use this in production. See inference/enterpise/LICENSE.txt for details.
How To Use Concurrent Processing
You can build the server using ./inference/enterprise/parallel/build.sh
and run it using ./inference/enterprise/parallel/run.sh
We provide a container at Docker Hub that you can pull using docker pull roboflow/roboflow-inference-server-gpu-parallel:latest
. If you are pulling a pinned tag, be sure to change the $TAG
variable in run.sh
.
This is a drop in replacement for the old server, so you can send requests using the same API calls you were using previously.
Performance
We measure and report performance across a variety of different task types by selecting random models found on Roboflow Universe.
Methodology
The following metrics are taken on a machine with eight cores and one gpu. The FPS metrics reflect best out of three trials. The column labeled 0.9.5.parallel reflects the latest concurrent FPS metrics. Instance segmentation metrics are calculated using "mask_decode_mode": "fast"
in the request body. Requests are posted concurrently with a parallelism of 1000.
Results
Workspace | Model | Model Type | split | 0.9.5.rc FPS | 0.9.5.parallel FPS |
---|---|---|---|---|---|
senior-design-project-j9gpp | nbafootage/3 | object-detection | train | 30.2 fps | 44.03 fps |
niklas-bommersbach-jyjff | dart-scorer/8 | object-detection | train | 26.6 fps | 47.0 fps |
geonu | water-08xpr/1 | instance-segmentation | valid | 4.7 fps | 6.1 fps |
university-of-bradford | detecting-drusen_1/2 | instance-segmentation | train | 6.2 fps | 7.2 fps |
fy-project-y9ecd | cataract-detection-viwsu/2 | classification | train | 48.5 fps | 65.4 fps |
hesunyu | playing-cards-ir0wr/1 | classification | train | 44.6 fps | 57.7 fps |