Human Following Robot | Gregory Aiosa

Summary

Implemented autonomous human-following on a Clearpath Jackal mobile robot. The system uses ROS 2 with a full Nav2 stack and YOLO pose detection to dynamically track, pursue, and navigate toward a designated person in real-time without relying on wearable sensors or tags.

ROS 2 Python YOLO26 Nav2 SLAM Toolbox Clearpath Jackal Intel RealSense Velodyne LiDAR

System Architecture

Vision-Based Tracking

Deploys YOLO to identify the target and continuously orients the robot to keep the person centered within the camera frame.
Extracts RGB and depth data via a RealSense d435i to continuously calculate the robot's distance to the person.

Robot point of view while tracking.

Gesture Control

Recognizes and responds to visual gesture commands, allowing the user to initiate or halt the following behavior hands-free.
Maps individual joint keypoints to specific control commands through YOLO pose detection, trained on the Hand Keypoints dataset.

Initiate follow.

Halt command.

Autonomous Navigation & Obstacle Avoidance

Safely maneuvers through environments, calculating dynamic paths to the target while preventing collisions.
Remains approximately 1 meter away from the person to give them space and keep them in the camera frame.
Integrates the Nav2 stack, SLAM Toolbox, and a Velodyne LiDAR sensor to achieve reliable spatial awareness.

Robot planning and navigation.

Search & Reacquisition

If line-of-sight is broken, the system calculates the target's last known position to autonomously explore and reacquire visual contact.
Initiates a rotational search pattern in the direction the person exited the frame if they are no longer detected at their last known coordinates.

Autonomous search pattern.

Discussion

The tracking performed reliably, especially considering all processing was executed onboard a 4th-generation i5 CPU. The project successfully proved the feasibility of dynamic, markerless human tracking using only onboard vision and SLAM. Future iterations would benefit from upgraded compute hardware to allow for higher-frequency pose processing and more aggressive dynamic tracking.

View Code on GitHub