Introduction

Tracking people across video frames and analyzing their behavior (like dwell time) is a crucial task for many real-world applications: retail analytics, security surveillance, smart cities, etc.

In this article, we will build a full command-line interface (CLI) application that:

Detects people in a video using YOLOv8.
Tracks each detected person across frames using DeepSORT.
Calculates each person’s dwell time (time spent visible).
Generates an annotated output video with:
Bounding boxes
Unique IDs
Live people count

We’ll optimize for speed and simplicity, while maintaining accuracy and extensibility.

Let’s dive deep.

Git Repo: faheemkhaskheli9/PeopleTracking

Project Setup

First, you must install the following dependencies:

pip install ultralytics deep_sort_realtime opencv-python

Libraries used:

ultralytics: Official YOLOv8 implementation.
deep_sort_realtime: Real-time DeepSORT tracking.
opencv-python: For reading, writing, and processing videos.

Step-by-Step Code Breakdown

Here’s the full explanation of the project code:

1. CLI Argument Parsing

We start by creating a flexible CLI interface:

import argparse

parser = argparse.ArgumentParser(description="Person dwell time tracker with output video")
parser.add_argument("--video", required=True, help="Path to input video file")
parser.add_argument("--conf", type=float, default=0.3, help="YOLO confidence threshold")
parser.add_argument("--max-age", type=int, default=5, help="DeepSORT max_age")
args = parser.parse_args()

This allows users to pass:

--video: Input video file.
--conf: Detection confidence threshold (default 0.3).
--max-age: DeepSORT’s tracker age-out time (default 5).

✅ Good CLI = easy to reuse and configure later.

2. Load YOLOv8 and DeepSORT

from ultralytics import YOLO
from deep_sort_realtime.deepsort_tracker import DeepSort

model = YOLO("yolov8n.pt")
model.classes = [0]  # Detect only 'person' class

tracker = DeepSort(max_age=args.max_age, n_init=3, max_iou_distance=0.7)

We load YOLOv8n (Nano model) — fastest for real-time.
We configure DeepSORT tracker for robust ID assignment.

✅ YOLO = detection; DeepSORT = tracking and re-identification.

3. Prepare Input and Output Videos

import cv2
import os

cap = cv2.VideoCapture(args.video)
fps = cap.get(cv2.CAP_PROP_FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))

out_path = os.path.splitext(args.video)[0] + "_output.mp4"
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
out_vid = cv2.VideoWriter(out_path, fourcc, fps, (width, height))

Open the input video.
Read its properties (FPS, width, height).
Setup an output video writer to save the processed frames.

✅ Output video will show real-time detection and tracking results.

4. Frame-by-Frame Processing

We loop over every frame:

dwell_frames = {}
frame_idx = 0

while True:
    ret, frame = cap.read()
    if not ret:
        break
    frame_idx += 1

We maintain a dictionary dwell_frames to count how many frames each ID is visible.

5. Detect People

results = model(frame)[0]
detections = []

for det in results.boxes.data.tolist():
    x1, y1, x2, y2, conf, cls = det
    if conf < args.conf:
        continue
    xmin, ymin = int(x1), int(y1)
    w, h = int(x2 - x1), int(y2 - y1)
    detections.append(([xmin, ymin, w, h], float(conf), int(cls)))

Run YOLO on the frame.
Extract bounding boxes.
Filter detections based on confidence.
Format detection for DeepSORT.

✅ Only confident person detections are passed to the tracker.

6. Track People Across Frames

tracks = tracker.update_tracks(detections, frame=frame)

DeepSORT assigns a persistent ID to each detected person.
Even if someone is temporarily occluded, DeepSORT tries to maintain their ID.

✅ Smooth tracking without frequent ID-switching.

7. Draw Bounding Boxes and Calculate Dwell Time

live_ids = []
for track in tracks:
    if not track.is_confirmed():
        continue
    track_id = track.track_id
    bbox = track.to_ltrb()
    x1, y1, x2, y2 = map(int, bbox)

    dwell_frames[track_id] = dwell_frames.get(track_id, 0) + 1
    live_ids.append(track_id)

    cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
    cv2.putText(frame, f"ID {track_id}", (x1, y1 - 10),
                cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)

For each confirmed track:

Draw green bounding box.
Label with ID.
Update the frame count for that ID.

✅ Each person gets their own persistent ID across frames.

8. Show Live People Count

live_people = len(set(live_ids))
cv2.putText(frame, f"Live People: {live_people}", (20, 40),
            cv2.FONT_HERSHEY_SIMPLEX, 1.0, (255, 0, 0), 3)

We count unique IDs detected in the current frame and show it.

✅ Live people counter shows how many people are actively visible.

9. Save Annotated Frame

out_vid.write(frame)

Write the processed frame to the output video file.

✅ Output video = proof of working detection, tracking, analytics.

10. Cleanup

cap.release()
out_vid.release()

Always release video objects to avoid resource leaks.

11. Print Final Dwell Time Stats

print("\n--- Dwell Times ---")
for track_id, frames in dwell_frames.items():
    dwell_seconds = frames / fps
    print(f"ID {track_id}: {dwell_seconds:.2f} seconds")

After processing, we calculate how many seconds each ID spent in view.
Frames → Seconds conversion using video’s FPS.

✅ Each person’s dwell time is printed cleanly.

Example CLI Usage

python dwell_time_tracker_with_video.py --video my_video.mp4

Outputs:

my_video_output.mp4 with boxes, IDs, people count.
Console printout of dwell time per person.

Artificial Intelligence Code

Building a CLI-Based People Tracking and Dwell Time Analytics System Using YOLOv8 and DeepSORT