FPGA-based Object detection for Autonomous Vehicles
Abstract
Abstract
# Object Detection on KRIA KR260
## Abstract
Real-time vision systems require fast and accurate object detection and classification, especially in applications where low latency is critical. With the increasing demand for deploying deep learning models on edge devices, efficient and power-optimized inference solutions are essential.
This project presents the implementation of convolutional neural network (CNN) models on an FPGA-based edge platform using the AMD Kria KR260 and Vitis AI. Image classification using a ResNet model and object detection using a YOLOv5 model are developed and deployed individually on a Deep Processing Unit (DPU), with both models quantized to INT8 for efficient inference.
The system processes input from a USB camera and demonstrates real-time performance with low latency and improved energy efficiency compared to CPU-based approaches, making it suitable for edge AI applications.
---
## Aim
- Design and implement real-time image classification and object detection systems using an FPGA-based platform
- Deploy a ResNet-based model for image classification and a YOLOv5-based model for object detection on a DPU accelerator, with each model implemented and evaluated separately
- Achieve low-latency processing of input images from a live video stream
- Develop a modular hardware–software co-design architecture supporting multiple CNN models
---
## Introduction
The motivation for this project is to accelerate image processing tasks in autonomous systems using FPGA-based hardware for low-latency and power-efficient performance. In such systems, the ability to quickly and accurately classify and detect objects such as pedestrians, vehicles, and traffic signs is critical, as even small delays can impact safety and decision-making.
This project addresses this challenge by leveraging FPGA acceleration to speed up both image classification and object detection processes. A ResNet-based model is used for classification, while a YOLOv5-based model is used for detection, with each implemented and evaluated separately. By offloading computationally intensive operations to dedicated hardware, the system achieves faster performance compared to traditional software-based approaches.
This balance of speed and efficiency makes the solution well-suited for real-time autonomous and intelligent systems.
---
## Technologies Used
- FPGA Platform – AMD Kria KR260 Robotics Starter Kit
- Hardware Design Tool – Vivado (for DPU configuration and bitstream generation)
- Embedded OS – PetaLinux (custom Linux image for the board)
- AI Toolchain – Vitis AI (for model quantization and compilation)
- Deep Learning Models – ResNet (image classification), YOLOv5 (object detection)
- Programming Languages – Python
- Libraries & Frameworks – OpenCV, NumPy, VART, XIR
- Communication – TCP/IP (for data transfer and monitoring)
- Input Device – USB Webcam (V4L2 interface)
---
## Background Theory
### 1) ResNet
ResNet (Residual Network) is a deep learning model widely used for image classification tasks. It introduces residual connections that help in training very deep neural networks by addressing the vanishing gradient problem. Instead of detecting objects, ResNet classifies the entire image into predefined categories, making it suitable for fast and efficient inference in real-time systems.
### 2) YOLOv5
YOLOv5 (You Only Look Once) is a deep learning model used for object detection. Unlike classification models, it identifies and localizes multiple objects within an image by predicting bounding boxes along with class labels. It performs detection in a single forward pass, making it highly suitable for real-time applications requiring both speed and accuracy.
### 3) PS
The Processing System (PS) refers to the ARM-based processor on the KR260 board. It runs the operating system (PetaLinux) and handles high-level tasks such as control flow, data management, and post-processing operations like interpreting classification and detection outputs. The PS also manages communication, including capturing camera input and transmitting results.
### 4) PL
The Programmable Logic (PL) is the FPGA fabric where custom hardware is implemented. In this project, the DPU (Deep Processing Unit) is deployed in the PL to accelerate the computationally intensive parts of the neural networks. By offloading inference to hardware, the PL enables faster and more efficient execution compared to running entirely on the PS.
### 5) PetaLinux
PetaLinux is a Linux-based embedded operating system used on AMD/Xilinx FPGA platforms like the KR260. It is customized with the required drivers, libraries, and tools, including DPU support and the Vitis AI runtime. Running on the PS, it manages device control, communication, and execution of the inference applications, acting as a bridge between hardware (PL) and software.
### 6) Quantization (Vitis AI)
Quantization converts a model from high precision (FP32) to lower precision (INT8) to reduce computational complexity and memory usage. This is essential for deploying models on the DPU, enabling faster inference with minimal accuracy loss, which is critical for real-time edge applications.
---
## Methodology
### Project Flow:
1. Set up development environment
Install Vivado, PetaLinux, and Vitis AI on the host system. Configure all dependencies required for hardware and software development.
2. Design and configure DPU in Vivado
Create the hardware design by integrating the DPU into the FPGA. Generate the bitstream and export the hardware description (.xsa).
3. Build custom PetaLinux image
Use the .xsa file to build a Linux image with DPU support. Include required drivers, libraries, and Vitis AI runtime.
4. Flash image and boot KR260
Write the PetaLinux image to an SD card and boot the board. Initialize the system and prepare it for deployment.
5. Train CNN models
Train the required models using appropriate datasets in PyTorch. A ResNet model is used for image classification, and a YOLOv5 model is used for object detection, with each trained and handled separately. Generate trained model files (.pt/.pth).
6. Quantize models (INT8)
Convert the trained models from FP32 to INT8 using Vitis AI. Reduce computation and make them suitable for DPU execution.
7. Compile models to .xmodel
Compile the quantized models using the DPU architecture file. Generate hardware-specific executable models (.xmodel).
8. Deploy models to KR260
Transfer the compiled models to the FPGA board using SSH/SCP. Prepare the system for running inference.
9. Capture live video
Use a USB webcam to capture real-time input frames. Feed the video stream into the processing pipeline.
10. Perform inference on DPU
Run the deployed models on the DPU for fast inference. ResNet performs image classification, while YOLOv5 performs object detection, executed separately depending on the application.
11. Display/stream output
Send the processed frames to the laptop via TCP. Display real-time results, including class labels for classification and bounding boxes for detection.
---
## System Architecture and Data Flow
The system implements separate pipelines for image classification and object detection, with each executed independently based on the application.
### Image Classification Pipeline (ResNet)
- The PS captures live video input from a USB webcam using OpenCV.
- Captured frames are preprocessed (resizing and normalization) before inference.
- The processed frames are sent to the DPU in the PL for execution.
#### DPU (in PL)
- Performs deep learning inference on input frames
- Generates classification outputs (class probabilities)
- Outputs are sent back to PS
- PS interprets predicted labels
- Results displayed or streamed
---
### Object Detection Pipeline (YOLOv5)
- PS captures video
- Frames preprocessed
- Sent to DPU
#### DPU (in PL)
- Performs inference
- Generates feature maps
- Outputs sent to PS
- PS performs:
- Decoding
- Non-Maximum Suppression (NMS)
- Final frames with bounding boxes displayed
---
## Results
### YOLO Model Results
- Successfully deployed on DPU
- Real-time detection achieved
- Multiple objects detected per frame
- Latency: 240–250 ms (~4 FPS)
- DPU handles convolution
- PS handles post-processing
- Full pipeline implemented
- Needs optimization for real-world use
---
### ResNet Model Results
- Stable real-time performance
- Latency: 16–17 ms
- FPS: ~58–62 FPS
- High confidence predictions (0.85–0.98)
- Good temporal stability
- Rare incorrect predictions
---
## Performance Comparison (Hardware vs Software)
- FPGA (DPU): ~16–17 ms per frame (~58–62 FPS)
- CPU: ~5–15 FPS
FPGA provides:
- Better performance
- Lower power consumption
---
## Conclusion
This project demonstrated CNN deployment on AMD Kria KR260 using Vitis AI.
- ResNet → efficient classification
- YOLOv5 → real-time detection
YOLO showed lower FPS but proved FPGA capability.
Overall:
- Strong performance-energy tradeoff
- Effective edge AI solution
---
## Future Scope
- Use more efficient detection models
- Explore Binary Neural Networks (BNNs)
---
## Team Members
### Mentors
- Adithya A
- Adithya Santhanam
- Prateek Goel
### Mentees
- Shouryadip Chakraborty
- Daksh Singh
- Abhishek Agrawal
- Abhinav S Rao
Report Information
Report Details
Created: April 21, 2026, 12:02 a.m.
Approved by: None
Approval date: None
Report Details
Created: April 21, 2026, 12:02 a.m.
Approved by: None
Approval date: None