Introduction
Tracking people across video frames and analyzing their behavior (like dwell time) is a crucial task for many real-world applications: retail analytics, security surveillance, smart cities, etc.
In this article, we will build a full command-line interface (CLI) application that:
- Detects people in a video using YOLOv8.
- Tracks each detected person across frames using DeepSORT.
- Calculates each person’s dwell time (time spent visible).
- Generates an annotated output video with:
- Bounding boxes
- Unique IDs
- Live people count
We’ll optimize for speed and simplicity, while maintaining accuracy and extensibility.
Let’s dive deep.
Git Repo: faheemkhaskheli9/PeopleTracking
Project Setup
First, you must install the following dependencies:
pip install ultralytics deep_sort_realtime opencv-pythonLibraries used:
ultralytics: Official YOLOv8 implementation.deep_sort_realtime: Real-time DeepSORT tracking.opencv-python: For reading, writing, and processing videos.
Step-by-Step Code Breakdown
Here’s the full explanation of the project code:
1. CLI Argument Parsing
We start by creating a flexible CLI interface:
import argparse
parser = argparse.ArgumentParser(description="Person dwell time tracker with output video")
parser.add_argument("--video", required=True, help="Path to input video file")
parser.add_argument("--conf", type=float, default=0.3, help="YOLO confidence threshold")
parser.add_argument("--max-age", type=int, default=5, help="DeepSORT max_age")
args = parser.parse_args()This allows users to pass:
--video: Input video file.--conf: Detection confidence threshold (default 0.3).--max-age: DeepSORT’s tracker age-out time (default 5).
✅ Good CLI = easy to reuse and configure later.
2. Load YOLOv8 and DeepSORT
from ultralytics import YOLO
from deep_sort_realtime.deepsort_tracker import DeepSort
model = YOLO("yolov8n.pt")
model.classes = [0] # Detect only 'person' class
tracker = DeepSort(max_age=args.max_age, n_init=3, max_iou_distance=0.7)- We load YOLOv8n (Nano model) — fastest for real-time.
- We configure DeepSORT tracker for robust ID assignment.
✅ YOLO = detection; DeepSORT = tracking and re-identification.
3. Prepare Input and Output Videos
import cv2
import os
cap = cv2.VideoCapture(args.video)
fps = cap.get(cv2.CAP_PROP_FPS)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
out_path = os.path.splitext(args.video)[0] + "_output.mp4"
fourcc = cv2.VideoWriter_fourcc(*"mp4v")
out_vid = cv2.VideoWriter(out_path, fourcc, fps, (width, height))- Open the input video.
- Read its properties (FPS, width, height).
- Setup an output video writer to save the processed frames.
✅ Output video will show real-time detection and tracking results.
4. Frame-by-Frame Processing
We loop over every frame:
dwell_frames = {}
frame_idx = 0
while True:
ret, frame = cap.read()
if not ret:
break
frame_idx += 1We maintain a dictionary dwell_frames to count how many frames each ID is visible.
5. Detect People
results = model(frame)[0]
detections = []
for det in results.boxes.data.tolist():
x1, y1, x2, y2, conf, cls = det
if conf < args.conf:
continue
xmin, ymin = int(x1), int(y1)
w, h = int(x2 - x1), int(y2 - y1)
detections.append(([xmin, ymin, w, h], float(conf), int(cls)))- Run YOLO on the frame.
- Extract bounding boxes.
- Filter detections based on confidence.
- Format detection for DeepSORT.
✅ Only confident person detections are passed to the tracker.
6. Track People Across Frames
tracks = tracker.update_tracks(detections, frame=frame)- DeepSORT assigns a persistent ID to each detected person.
- Even if someone is temporarily occluded, DeepSORT tries to maintain their ID.
✅ Smooth tracking without frequent ID-switching.
7. Draw Bounding Boxes and Calculate Dwell Time
live_ids = []
for track in tracks:
if not track.is_confirmed():
continue
track_id = track.track_id
bbox = track.to_ltrb()
x1, y1, x2, y2 = map(int, bbox)
dwell_frames[track_id] = dwell_frames.get(track_id, 0) + 1
live_ids.append(track_id)
cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
cv2.putText(frame, f"ID {track_id}", (x1, y1 - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 255, 0), 2)For each confirmed track:
- Draw green bounding box.
- Label with ID.
- Update the frame count for that ID.
✅ Each person gets their own persistent ID across frames.
8. Show Live People Count
live_people = len(set(live_ids))
cv2.putText(frame, f"Live People: {live_people}", (20, 40),
cv2.FONT_HERSHEY_SIMPLEX, 1.0, (255, 0, 0), 3)We count unique IDs detected in the current frame and show it.
✅ Live people counter shows how many people are actively visible.
9. Save Annotated Frame
out_vid.write(frame)Write the processed frame to the output video file.
✅ Output video = proof of working detection, tracking, analytics.
10. Cleanup
cap.release()
out_vid.release()Always release video objects to avoid resource leaks.
11. Print Final Dwell Time Stats
print("\n--- Dwell Times ---")
for track_id, frames in dwell_frames.items():
dwell_seconds = frames / fps
print(f"ID {track_id}: {dwell_seconds:.2f} seconds")- After processing, we calculate how many seconds each ID spent in view.
- Frames → Seconds conversion using video’s FPS.
✅ Each person’s dwell time is printed cleanly.
Example CLI Usage
python dwell_time_tracker_with_video.py --video my_video.mp4Outputs:
my_video_output.mp4with boxes, IDs, people count.- Console printout of dwell time per person.