Building a CLI-Based People Tracking and Dwell Time Analytics System Using YOLOv8 and DeepSORT

 Introduction

Tracking people across video frames and analyzing their behavior (like dwell time) is a crucial task for many real-world applications: retail analytics, security surveillance, smart cities, etc.

In this article, we will build a full command-line interface (CLI) application that:

  • Detects people in a video using YOLOv8.
  • Tracks each detected person across frames using DeepSORT.
  • Calculates each person’s dwell time (time spent visible).
  • Generates an annotated output video with:
  • Bounding boxes
  • Unique IDs
  • Live people count

We’ll optimize for speed and simplicity, while maintaining accuracy and extensibility.

Let’s dive deep.

Git Repo: faheemkhaskheli9/PeopleTracking

Project Setup

First, you must install the following dependencies:

pip install ultralytics deep_sort_realtime opencv-python

Libraries used:

  • ultralytics: Official YOLOv8 implementation.
  • deep_sort_realtime: Real-time DeepSORT tracking.
  • opencv-python: For reading, writing, and processing videos.

Step-by-Step Code Breakdown

Here’s the full explanation of the project code:

1. CLI Argument Parsing

We start by creating a flexible CLI interface:

import argparse

parser = argparse.ArgumentParser(description="Person dwell time tracker with output video")
parser.add_argument("--video", required=True, help="Path to input video file")
parser.add_argument("--conf", type=float, default=0.3, help="YOLO confidence threshold")
parser.add_argument("--max-age", type=int, default=5, help="DeepSORT max_age")
args = parser.parse_args()

This allows users to pass:

  • --video: Input video file.
  • --conf: Detection confidence threshold (default 0.3).
  • --max-age: DeepSORT’s tracker age-out time (default 5).

✅ Good CLI = easy to reuse and configure later.

2. Load YOLOv8 and DeepSORT

from ultralytics import YOLO
from deep_sort_realtime.deepsort_tracker import DeepSort

model = YOLO("yolov8n.pt")
model.classes = [0] # Detect only 'person' class

tracker = DeepSort(max_age=args.max_age, n_init=3, max_iou_distance=0.7)
  • We load YOLOv8n (Nano model) — fastest for real-time.
  • We configure DeepSORT tracker for robust ID assignment.

✅ YOLO = detectionDeepSORT = tracking and re-identification.

3. Prepare Input and Output Videos

import cv2
import os

cap = cv2.VideoCapture(args.video)
fps = cap.get(cv2.CAP_PROP_FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

out_path = os.path.splitext(args.video)[0] + "_output.mp4"
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
out_vid = cv2.VideoWriter(out_path, fourcc, fps, (width, height))
  • Open the input video.
  • Read its properties (FPS, width, height).
  • Setup an output video writer to save the processed frames.

✅ Output video will show real-time detection and tracking results.

4. Frame-by-Frame Processing

We loop over every frame:

dwell_frames = {}
frame_idx = 0

while True:
ret, frame = cap.read()
if not ret:
break
frame_idx += 1

We maintain a dictionary dwell_frames to count how many frames each ID is visible.

5. Detect People

results = model(frame)[0]
detections = []

for det in results.boxes.data.tolist():
x1, y1, x2, y2, conf, cls = det
if conf < args.conf:
continue
xmin, ymin = int(x1), int(y1)
w, h = int(x2 - x1), int(y2 - y1)
detections.append(([xmin, ymin, w, h], float(conf), int(cls)))
  • Run YOLO on the frame.
  • Extract bounding boxes.
  • Filter detections based on confidence.
  • Format detection for DeepSORT.

✅ Only confident person detections are passed to the tracker.

6. Track People Across Frames

tracks = tracker.update_tracks(detections, frame=frame)
  • DeepSORT assigns a persistent ID to each detected person.
  • Even if someone is temporarily occluded, DeepSORT tries to maintain their ID.

✅ Smooth tracking without frequent ID-switching.

7. Draw Bounding Boxes and Calculate Dwell Time

live_ids = []
for track in tracks:
if not track.is_confirmed():
continue
track_id = track.track_id
bbox = track.to_ltrb()
x1, y1, x2, y2 = map(int, bbox)

dwell_frames[track_id] = dwell_frames.get(track_id, 0) + 1
live_ids.append(track_id)

cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
cv2.putText(frame, f"ID {track_id}", (x1, y1 - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)

For each confirmed track:

  • Draw green bounding box.
  • Label with ID.
  • Update the frame count for that ID.

✅ Each person gets their own persistent ID across frames.

8. Show Live People Count

live_people = len(set(live_ids))
cv2.putText(frame, f"Live People: {live_people}", (20, 40),
cv2.FONT_HERSHEY_SIMPLEX, 1.0, (255, 0, 0), 3)

We count unique IDs detected in the current frame and show it.

✅ Live people counter shows how many people are actively visible.

9. Save Annotated Frame

out_vid.write(frame)

Write the processed frame to the output video file.

✅ Output video = proof of working detection, tracking, analytics.

10. Cleanup

cap.release()
out_vid.release()

Always release video objects to avoid resource leaks.

11. Print Final Dwell Time Stats

print("\n--- Dwell Times ---")
for track_id, frames in dwell_frames.items():
dwell_seconds = frames / fps
print(f"ID {track_id}: {dwell_seconds:.2f} seconds")
  • After processing, we calculate how many seconds each ID spent in view.
  • Frames → Seconds conversion using video’s FPS.

✅ Each person’s dwell time is printed cleanly.

Example CLI Usage

python dwell_time_tracker_with_video.py --video my_video.mp4

Outputs:

  • my_video_output.mp4 with boxes, IDs, people count.
  • Console printout of dwell time per person.

No comments:

Post a Comment

Building a CLI-Based People Tracking and Dwell Time Analytics System Using YOLOv8 and DeepSORT

  Introduction Tracking people across video frames and analyzing their behavior (like  dwell time ) is a crucial task for many real-world ap...