K4Tech

Tech you can trust

A simple C++ Class for performing object detection on YOLOv8 with ONNXRuntime

introduction

Object detection stands as a foundational block in many industries around the world. It powers many industries such as security and manufacturing. As part of my FOCUS project, I wanted to be able to perform object detection inside of my C++ application, at high speed, and on any hardware. It was a project which took much longer than initially anticipated, but the result is a blazing-fast, flexible solution that’s ready for real-world applications.

The Problem

Traditional object detection uses techniques such as edge detection and colour mapping, these simple techniques can be very effective in certain scenarios, but struggle for general applications. This is where the recent onset of AI-based Object Detection has reared it’s head. YOLO (You Only Look Once) is a powerful and popular AI object detection algorithm which has many generations. There are also many runtimes which support running the model such as ONNXRUNTIME, OPENVINO, and PYTORCH and each come with their own set of challenges for implementation. This problem also extends to the integration libraries which use these runtimes, some sacrifice ease of use for speed, whereas others may lack the flexibility to utilise the library in diverse applications. Neither of these compromises suited my demands so I found myself in need of a solution that could achieve the following:

  1. Perform inferencing on both CPU and GPU
  2. Run at maximum speed, being optimised for looping applications
  3. Be easily modified for various use-cases
  4. Integrate smoothly with ONNXRUNTIME

The SOLUTION

Enter the YOLOv8-ONNXRUNTIME-CPP engine. This implementation addresses the above challenges and is built around a core “YoloInferencer” class that handles the entire detection pipeline, from pre-processing to post-processing, with a clean and intuitive API.

Technical Deep Dive

The core of this implementation lies with the “YoloInferencer” class. Here are its key components:

  1. Initialisation: The constructor sets up the ONNXRUNTIME session and extracts the metadata from the model
  2. Pre-processing: The “preprocess()” function handles image resizing and normalisation
  3. Inference: The “forward()” method run the actual ONNX model inferencing on either the CPU or GPU (depending on the initialisation)
  4. Post-processing: the “postprocess()” function applies NMS (Non-Maximal Suppression) and converts the raw outputs into “Detection“s

A key aspect of this implementation is how it handles the model’s metadata. This allows for the engine to adapt to different YOLOv8 models without requiring changes to the code.

code showcase

As an example, this is the key part of the implementation – the infer() method:

std::vector<Detection> YoloInferencer::infer(cv::Mat& frame, float conf_threshold, float iou_threshold) {

    std::vector<Ort::Value> inputTensors = preprocess(frame);

    std::vector<Ort::Value> outputTensors = forward(inputTensors);

    std::vector<Detection> detections = postprocess(outputTensors, conf_threshold, iou_threshold);

    return detections;
}

This function encapsulates the entire inferencing pipeline, being publicly accessible and extremely simple. It’s design is easy to understand and modify is needed.

challenges faced

Some of the biggest issues lay with the optimisation of the pre-processing and post-processing. In blocking implementations like this, the overall speed of the function can be heavily effected by the optimisation of these functions. To address this, I used vectorised operations where applicable and leveraged many of OPENCV’s functions which are already well-optimised.

Another key issue was to make sure that the engine could handle different models than just my own. Whilst hard-coded model parameters make testing easy, they can cause many issues when switching models in the future. This issue was solved by using the metadata included in an ONNX model, so that the engine can adapt. For sections where meta-data wasn’t available, it would infer the values based on the shape of the model.

Future improvements

Whilst the current implementation is sufficient for my purposes, there is still room for improvement. Some improvements could include:

  1. Implement non-blocking processing
  2. Add support for other tasks such as segmentation or pose estimation
  3. Complete further optimisations

Conclusion

YOLOv8-ONNXRUNTIME-CPP is a simple yet effective inferencing engine for use in AI object detection tasks. I hope this project is helpful for those looking to accomplish something similar. Please check out the project on GitHub and star it. If you have any issues please create an issue, or if you have improvements, make a pull-request.

Leave a Reply

Your email address will not be published. Required fields are marked *

+ ,