GVRSF 2026 · Swarm Robotics · Physical AI

Artificial collective intelligence built from local signals, digital pheromones, and real robots.

Inspired by carpenter ants, this project explores how decentralized agents can learn to coordinate through shared environmental memory rather than centralized control. It spans theory, simulator design, reinforcement learning, Mission Control infrastructure, and a physical robotic platform.

23D
Per-frame agent observation

Structured local state, stacked through time.

MAPPO
Learning approach

Recurrent decentralized execution with curriculum training.

Sim → Real
Deployment path

Mission Control, camera sensing, and robot runtime integration.

Mission Control Field Live Conceptual View

Project Overview

A swarm robotics system grounded in environmental coordination.

The project studies how collective intelligence can emerge from decentralized agents with limited local sensing. Instead of relying on a central controller, the system combines local policy inference with a shared digital pheromone field and a physical robotic testbed.

Theory

Stigmergy

Agents do not need a global planner. They coordinate through changes in a shared environment, inspired by how ant colonies build route structure from local action.

Shared Memory

Digital Pheromones

Mission Control maintains an authoritative field that robots can query in simulator-compatible form, giving physical agents access to a shared, decaying memory.

Embodiment

Sim-to-Real Transfer

A learned decentralized policy is deployed onto a physical platform with BLE, TCP, relay, camera sensing, and a real operator-facing command center.

Theory

From carpenter ants to artificial stigmergy.

Ant colonies demonstrate coordination without central supervision. The key idea is stigmergy: agents change the environment, and later agents read those changes as actionable information. This project translates that idea into a computational and robotic system.

Each agent is intentionally limited. It does not hold a global map, a broadcast state, or a centralized planner. Instead, coordinated behavior emerges from repeated local interactions with geometry, targets, and a shared digital field.

That matters because resilience often comes from redundancy and local decision-making. In fragile environments, centralized communication can fail. Stigmergic systems offer a different model: coordination through persistent environmental structure.

Placeholder Stigmergy Diagram
Replace with a polished diagram showing agents discovering, returning, depositing, and exploiting shared route memory.

Simulation and Learning

A simulator designed for decentralized policy learning.

The simulation environment models targets, obstacles, lidar, movement, local observations, and pheromone dynamics. The training path uses recurrent MAPPO and a staged curriculum to teach progressively richer behaviors.

System

Simulation Environment

A custom multi-agent environment models targets, obstacles, lidar, agent-local observations, and pheromone dynamics for decentralized policy learning.

System

Learning Stack

The project trains recurrent MAPPO policies under a staged curriculum, preserving decentralized execution while using richer training-time structure.

System

Mission Control

A live desktop command center visualizes robots, maintains the digital pheromone field, and bridges simulator semantics into physical deployment.

System

Robot Runtime

The physical stack runs policy inference on-device and integrates sensing, camera-based target cues, communication transport, and motion control.

Operational Flow

How the system behaves step by step.

  1. Each agent perceives only local structure: lidar, nearby target cues, neighbor geometry, heading, speed, and pheromone samples.
  2. The learned policy selects one discrete action combining throttle, turning, and optional deposit behavior.
  3. Mission Control updates the global digital pheromone field and answers SENSE requests with simulator-compatible forward samples.
  4. Successful return paths become reusable environmental information rather than explicit agent-to-agent messages.
Placeholder Sim-to-Real Bridge
Replace with a visual connecting simulator observations, Mission Control pheromone responses, and physical robot runtime execution.

Sim-to-Real

The bridge from simulation to embodied robotics.

The project does not stop at policy training. It carries the learned behavior into a physical stack with Mission Control, real transport layers, robot firmware, and camera-based target cues.

Policy Interface

Observation Contract

The deployed system preserves the simulator’s structured observation layout rather than replacing it with raw image input, which keeps sim-to-real alignment tractable.

Connectivity

Transport Flexibility

The system supports BLE, direct TCP, and relay-based communication, allowing the same core protocol to operate across laptop, simulator, and Raspberry Pi robot workflows.

Perception

Camera Integration

A Pi camera path now supplies compact target features into the policy. The implementation stays backward compatible by falling back safely when detection is unavailable.

Physical Robot

A real platform for testing decentralized swarm ideas.

The hardware stack turns the project from a simulator into a deployable system. The robot runtime runs policy inference on-device, integrates sensing and transport, and plugs into Mission Control without changing the core policy contract.

Placeholder Hero Robot Photo
Drop in the best photograph of the assembled robot here.
Compute

Onboard Compute

A Raspberry Pi hosts the runtime, communication clients, and camera integration. This keeps the deployed robot close to the simulator’s structured decision loop.

Sensing

Sensing and Perception

The physical stack combines lidar-style range information, robot-local state, and an optional camera detector that fills compact target-related policy slots.

Control

Command and Coordination

Mission Control acts as the live operator-facing field model, maintaining digital pheromones and visualizing robot state, trails, targets, and transport health.

Software Architecture

A layered system from training code to desktop coordination.

The codebase spans environment modeling, learning, inference, desktop infrastructure, and hardware deployment. The architecture is intentionally modular so simulator semantics can survive the move to physical robots.

01

Environment

Custom swarm environment with pheromone dynamics, obstacle geometry, target logic, and structured local observations.

02

Training

Curriculum-driven recurrent MAPPO training for decentralized policies with richer behavior shaping and long-horizon coordination.

03

Inference

Robot-local policy execution preserves the decentralized contract instead of shifting control back to a central planner.

04

Mission Control

Desktop command center for live state aggregation, pheromone field updates, sensing replies, target markers, and operator visualization.

Lessons and Limitations

What the project learned by building the full stack.

The value of the work is not only in the successes. It is also in the places where training, perception, and physical deployment exposed the real boundaries of decentralized robotic intelligence.

The project is strongest when it treats collective behavior as an emergent systems problem rather than a centralized planning problem.

Sim-to-real transfer depends less on flashy models than on preserving observation semantics and building reliable transport and runtime infrastructure.

Negative or mixed training results are informative: they exposed where route-reuse tasks, curriculum design, and detector robustness become the real bottlenecks.