Technology for Traffic Regulation: YOLOv8 and Tesseract-OCR for Helmet Violation Fines Syndication Cloud

Photo from Unsplash

Originally Posted On: https://medium.datadriveninvestor.com/technology-for-traffic-regulation-yolov8-and-tesseract-ocr-for-helmet-violation-fines-f47bfb1b4209

Background

There are several countries that fine people who drive bikes without helmets. In countries like India, police takes pictures of drivers who are offending traffic rules and uploads it in a electronic portal and imposes fines/challans. Offenders should pay fine from this portal . This is a manual process and needs workforce to capture pictures, note down the vehicle registration number and updating it in the portal. There are several countries which are already automated this process by using various technologies. In this article we will see how we can implement this system by using latest advancements in object detection field i.e. YOLOv8.

Approach:

We will get a real time video stream from camera installed at roads. We utilize OpenCV for capturing video and to work with the image frames. To achieve our task of finding traffic violations in video stream we have to implement four different subparts which are listed below.

Detect people who are driving a motorcycle: To implement this we will use dataset from kaggle which contains 795 images of people driving various types of motor cycles, taken from various directions, distances.
Detect whether driver is wearing a helmet or not: To implement this we will use the bike-helmet-detection-2vdjo dataset from roboflow which contains nearly 1376 images of people with and without helmet.
Detect the license plate and extract the registration number: To implement license plate detection we will use vehicle-registration-plates-trudk dataset from roboflow which contains 8823 images of Vehicle Registration plates captured in various situations, locations, environmental conditions. Extracting the number from number plate image can be done by using Tesseract-OCR.
Store the violation details in a database: We will use sqlite3 for this purpose which helps us to work with a database without installing any additional software’s. Sometimes images which are captured from camera are blurred/unclear, in these cases tesseract-OCR cannot extract accurate text, so along with extracted text we store the image of person with bike also.

For .NET developers building similar traffic enforcement systems, IronOCR can help address the blurred image challenge. It includes built-in preprocessing specifically designed for license plate recognition scenarios, handling low contrast, motion blur, and varying lighting conditions.

using IronOcr;

var ocr = new IronTesseract();
using var input = new OcrInput();
input.LoadImage(croppedPlateImage);
input.Sharpen();
input.EnhanceResolution();
var result = ocr.Read(input);
string plateNumber = result.Text.Trim();

The preprocessing chain can improve extraction accuracy for plates captured from camera feeds, reducing the need to store images for manual verification when OCR fails.

All three detection models are trained using YOLOv8 . Now lets start the implementation part.

Technology for Traffic Regulation: YOLOv8 and Tesseract-OCR for Helmet Violation Fines Automated Rider Detection, Helmet Detection, License Number Extraction and Database Integration — Approach (Source: Image generated using prompthunt.com)

Part-1: Detect people who are driving a motorcycle:

YOLO: You Only Look Once is an object detection algorithm which is most preferred when we need to work in a real time environment and have to get faster results with high accuracy. YOLO uses CNN architecture to perform its tasks. YOLOv8 is recently launched version (January 2023) of YOLO and has several advancements over its previous versions.

Dataset format: To train YOLOv8 model, we need to provide a set of images and corresponding labels/annotations for each image. Annotations are our region of interests in an image. For each image we have to create annotation file which contain, class corresponding to region-of-interest, and coordinates of region-of-interest normalized between 0 and 1. Generally annotations are created using tools like LabelImg, makesense where we just draw a bounding box around the objects, these tools will generate the annotation files. There can be multiple classes and coordinates in a annotation file for single image. We divide our root dataset folder into subfolders for training, testing, validation which in turn contains subfolders images, labels.

Lets begin the dataset preparation process, The dataset from Kaggle is not present in the format which YOLOv8 requires, All the images and annotations in this dataset are present in single folder also names are unclear. So i have created a python code that divides annotations/text files and images into two different folders and gives names to them in numerical order(1.jpg, 2.jpg..1.txt,2.txt,…)

import os
import shutil

# Set the path to your folder containing the images and text files
folder_path = "/path/to/your/folder"

# Create the output folders for images and text files
image_folder = "/path/to/output/image/folder"
text_folder = "/path/to/output/text/folder"
os.makedirs(image_folder, exist_ok=True)
os.makedirs(text_folder, exist_ok=True)

# Iterate through the files in the input folder
for filename in os.listdir(folder_path):
    file_path = os.path.join(folder_path, filename)
    
    # Check if the file is an image
    if filename.endswith(('.jpg', '.jpeg', '.png', '.gif')):
        # Generate the new image filename
        new_filename = str(len(os.listdir(image_folder)) + 1) + os.path.splitext(filename)[1]
        new_image_path = os.path.join(image_folder, new_filename)
        # Move the image file to the image folder
        shutil.move(file_path, new_image_path)

    # Check if the file is a text file
    elif filename.endswith('.txt'):
        # Generate the new text file filename
        new_filename = str(len(os.listdir(text_folder)) + 1) + os.path.splitext(filename)[1]
        new_text_path = os.path.join(text_folder, new_filename)
        # Move the text file to the text folder
        shutil.move(file_path, new_text_path)

print("Files have been sorted successfully!")

After running above code we got two folders images, corresponding labels. Now we have to divide these data into three subfolders train, test, valid. In my case i used a ratio of 795 images into 595,100,100. Below is my folder structure.

├── Root folder
## └── train
####└── images (595 training images)
####└── labels (Corresponding 595 training annotations/labels)
## └── test
####└── images (100 testing images)
####└── labels (Corresponding 100 testing annotations/labels)
## └── valid
####└── images (100 validation images)
####└── labels (Corresponding 100 validation annotations/labels)

For training purpose i used Google Collab, One of the key benefits of YOLOv8 is its ease of implementation, In just few lines of code we can implement entire object detection model. I have uploaded the dataset into google drive and connected google drive to collab by running below code snippet.

from google.colab import drive
drive.mount('/content/drive')

Now create a configuration file(data.yaml) in collab which contains paths to test, train, valid and also details related to number of classes and its names.

path:  Path to root dataset folder
train: Path to dataset training folder
test:  Path to dataset testing folder 
val: Path to dataset validation folder

nc: 1 # Number of classes

names: ['Person_Bike'] #classes names

Now install and import ultralytics package which provides YOLO module to implement YOLOv8.

!pip install ultralytics
from ultralytics import YOLO

Lets start the training process by running below code snippet. task=detect states that we are training a model for detection task(yolo provides classification algorithms also). model=yolov8m.pt specifies that we are using medium model. There are various preconfigured yolov8 models such as nano(n), small(s), medium(m), large(l), extra large(x) with various parameters, layers settings. In data parameter we will pass the path to the configuration file. Due to GPU resources, time constraints in Google Collab I am training medium model with 50 epochs of image size 224. You can change these values according to your resources. After 50 epochs are completed we will get the path to .pt files. Given path contains two pytorch files, one is best.pt(trained weights at checkpoint where it achieved the best results) and last.pt(trained weights at last checkpoint), download and save the best.pt file. We can use this file in our local systems for detection. Here is the link to my collab training file.

!yolo task=detect mode=train model=yolov8m.pt data=/content/runs/data.yaml epochs=50 imgsz=224

Below are the results of YOLO model for motorcycle with person detection, illustrating values of class loss, dfl loss, precision, recall, mean average precision over number of epochs during training and validation. Model at its best checkpoint achieved a precision of 70.4%. At the end, as we need a balance between precision and recall we choose the model at its best checkpoint with 47.5% precision and 43% recall. If we increase the dataset size and number of epochs we can expect better results than this.

YOLOv8 model Results for Person-on-bike detection. — Loss, Precision, Recall of Trained YOLOv8 model for Person With Bike detection(Image by author)

Below are few testing images with bounding box drawn by YOLOv8 model along with the probability. We can see our model is performing better when the bike with person are clearly visible, as stated earlier if we increase dataset size with more diversified images taken from various angles and distances we can expect the model to perform even more better.

Testing person-on-bike yolov8 model on images from roboflow. — Bounding boxes along with probability given by YOLOv8(Testing Images from kaggle)

Part-2: Detect whether driver is wearing a helmet or not:

This is the second step in our system. After finding the bounding box of the bike with the person, we then crop that image using bounding box coordinates and pass that cropped image into this step, here we identify whether the person is wearing the helmet or not.

Previous model dataset contains single class Person_With_Bike but this time we have two classes With_Helmet and Without_Helmet also unlike the previous model where we have to download the dataset from kaggle and sort the dataset according to YOLOv8 format, here we are using Roboflow dataset which provides an api key and dataset download code. In roboflow we can select the model in which we are training like YOLOv5, 7,8 it will automatically gives data in that format. The best part is, we are not even needed to download the dataset and upload it to drive then connecting it to google collab. We just run the download code in collab given by roboflow, the data will be automatically loaded. Go to the dataset and click on YOLOv8 then select “show download code”. Save the code appeared on pop-up.

Roboflow helmet dataset download code preview image, — Roboflow dataset download code(Image taken in roboflow)

Install the roboflow module and run the download code. Below is the corresponding code.

!pip install roboflow

from roboflow import Roboflow
rf = Roboflow(api_key="xxxxxxxxxxx")
project = rf.workspace("bike-helmets").project("bike-helmet-detection-2vdjo")
dataset = project.version(1).download("yolov8")

Now start the training process by running below command, This time it is not required to create data.yaml file. Directly run below command. Change epochs and image size based on your resources and time constraints. I have choosen 100 epochs with image size of 224, it took me nearly 1 hour for completing the training process.

!yolo task=detect mode=train model=yolov8m.pt data={dataset.location}/data.yaml epochs=100 imgsz=220 plots=True

Same as previous model, After 100 epochs are completed we will get a path to .pt file. Go to that path and save the best.pt file. Here is the link to my collab training file.

Below are the results of helmet detection. Model achieved a precision of 76.4% and 71.4% recall, mean average precision of 77.2%. Same as discussed earlier, If we increase the dataset size and number of epochs we can expect even much better results than this.

Results for YOLOv8 model for Helmet detection — Loss, Precision, Recall of Trained YOLOv8 model for Helmet detection(Image by author)

Below are few testing images with bounding box drawn by YOLOv8 model along with the probability. We can see our model is giving accurate results even when the images are not clear. Because this time we trained our model using 1.3K+ images with 100 epochs.

Testing YOLOv8 model for Helmet detection on dataset images from roboflow — Bounding boxes along with probability given by YOLOv8(Testing Images from roboflow)

Part-3: Detect the license plate and extract the registration number

This is the third step in our system. After step2 if we found a person driving bike without helmet we will send that image to this step. In this step we will create a object detection model that can detect and crop the the number plate from image.

Lets begin the process by getting the download code from roboflow, Same as previous step click on YOLOv8 and select “show download code”. Go to roboflow and load the dataset by running the download code.

!pip install roboflow

from roboflow import Roboflow
rf = Roboflow(api_key="wa1i5RBxhKJHtzTAbDkg")
project = rf.workspace("augmented-startups").project("vehicle-registration-plates-trudk")
dataset = project.version(1).download("yolov8")

Now start the training process using below command.I have selected medium model , 100 epochs with image size of 224.

!yolo task=detect mode=train model=yolov8m.pt data={dataset.location}/data.yaml epochs=100 imgsz=224 plots=True

Here is link to my training file. Below are the accuracy, recall and loss details of trained vehicle plate detection detection model.

Results for YOLOv8 licence plate detection. — Graphs illustrating accuracy, recall and loss details of trained model (Image by author)

Below are few testing images with bounding box drawn around number plates by YOLOv8 model. We can see our model is performing better and can able to predict number plate location, even when the images are not clear.

Testing trained YOLOv8 Licence plate detection model on images from roboflow. — Testing images for Vehicle Number Plate detection(Testing Images from roboflow)

Next we will crop the number plate using the coordinates given by yolov8 model and we will pass this cropped image to Tesseract-OCR to extract the number present in it. Tesseract OCR by google is an open source optical character recognition model for extracting text from images and documents. This is most popular library for text extraction. To work with tesseract-OCR in python we can simply install the pytesseract library using pip command. One more dependency is, it needs executable tesseract file. Go to this link and download that file. Below is the sample code for extracting text from image.

import os
import pytesseract
from PIL import Image
# Initialize Tesseract
pytesseract.pytesseract.tesseract_cmd = "Path to tesseract.exe"
# Open the image
image = Image.open("Image Path")
print(pytesseract.image_to_string(image))
print("Text extraction completed!")

Part-4: Store the violation details in a database:

After passing the cropped license plate image to tesseract-OCR and extracting the text, It is important to store these details in a database for sending the fines receipt to corresponding people. For this purpose we use sqlite3 module in python which provides simple and easy to use on-disk database. First we create a table with two columns (vehicle number and image of driver) then we will create a code for inserting the records into table. It is not recommended to store images in SQLite as it can affect the performance of our application, so instead of storing images directly we upload images into server and we store that path here. Below is the code to implement this functionality

import sqlite3

# Function to create the table if it doesn't exist
def create_table():
    # Connect to the SQLite database
    conn = sqlite3.connect("vehicle_data.db")
    # Create a cursor object to execute SQL statements
    c = conn.cursor()
    # Create the "vehicles" table with two columns: "vehicle_number" and "bike_image_path"
    # The "IF NOT EXISTS" clause ensures that the table is only created if it doesn't already exist
    c.execute("CREATE TABLE IF NOT EXISTS vehicles (vehicle_number TEXT, bike_image_path TEXT)")
    # Commit the changes to the database
    conn.commit()
    # Close the database connection
    conn.close()

# Function to insert a record into the "vehicles" table
def insert_record(vehicle_number, bike_image_path):
    # Connect to the SQLite database
    conn = sqlite3.connect("vehicle_data.db")
    # Create a cursor object to execute SQL statements
    c = conn.cursor()
    # Insert the record into the "vehicles" table using parameterized SQL statement
    c.execute("INSERT INTO vehicles VALUES (?, ?)", (vehicle_number, bike_image_path))
    # Commit the changes to the database
    conn.commit()
    # Close the database connection
    conn.close()

# Example usage
# Create the "vehicles" table if it doesn't exist
create_table()
# Insert two records into the "vehicles" table
insert_record('ABC123', '/path/to/bike1.jpg')
insert_record('XYZ789', '/path/to/bike2.jpg')

Integration and Final Script:

All the above four parts are combined to create our final system. Finally we have three yolov8 trained weights files, 1st one returns single class name (Person_Bike in my case) and its coordinates, We crop the person with bike image using these coordinates and sent it to 2nd model. 2nd model returns two class names (With helmet and without helmet), If it is without helmet we sent it to 3rd model. 3rd model detects the number plate. We crop number plate from image using bounding box coordinates. This cropped image is sent to Tesseract-OCR for text extraction. Finally we save extracted text and the path to image of person on bike in database using sqlite3. To make our code work in real-time video stream, we use open cv2.VideoCapture() function and send one frame at a time to our system.

# Set up video capture
video_capture = cv2.VideoCapture(0)  # Use 0 for the default camera or provide the desired camera index

while True:
    # Capture frame-by-frame
    ret, frame = video_capture.read()
    # Process frame
    img = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

Below is the link to final script and training files.

GitHub …

Contribute to VinayEdula/Real-Time-Helmet-Detection-and-License-Number-Extraction-for-Traffic-Rule-Enforcement…

github.com

Final Results:

Below images are taken from google. All images are sent to three object detection models(person bike, helmet, license plate). Models successfully drew bounding boxes around almost correct region of interest. Extraction of accurate text from number plate depends on clarity of captured image.

Testing image-1 (Image source: semanticscholar)

Testing image-3 (Image source: betterindia.com)

Testing image-4 (Image source: knocksense)

Testing image-5 (Image source: thenewsminute)

Testing image-6 (Image source: deccanherald)

You can view the sqlite database using a tool called db browser for SQLite. You can download it from here. Below is my database screenshot after testing. Empty string represents that Tesseract-OCR is unable to extract text from number plate image.

Subscribe to DDIntel Here.

DDIntel captures the more notable pieces from our main site and our popular DDI Medium publication. Check us out for more insightful work from our community.

Support DDI AI Art Series: https://heartq.net/collections/ddi-ai-art-series

Join our network here: https://datadriveninvestor.com/collaborate