Structured local state, stacked through time.
Artificial collective intelligence built from local signals, digital pheromones, and real robots.
Inspired by carpenter ants, this project explores how decentralized agents can learn to coordinate through shared environmental memory rather than centralized control. It spans theory, simulator design, reinforcement learning, Mission Control infrastructure, and a physical robotic platform.
Recurrent decentralized execution with curriculum training.
Mission Control, camera sensing, and robot runtime integration.
Project Overview
A swarm robotics system grounded in environmental coordination.
The project studies how collective intelligence can emerge from decentralized agents with limited local sensing. Instead of relying on a central controller, the system combines local policy inference with a shared digital pheromone field and a physical robotic testbed.
Stigmergy
Agents do not need a global planner. They coordinate through changes in a shared environment, inspired by how ant colonies build route structure from local action.
Digital Pheromones
Mission Control maintains an authoritative field that robots can query in simulator-compatible form, giving physical agents access to a shared, decaying memory.
Sim-to-Real Transfer
A learned decentralized policy is deployed onto a physical platform with BLE, TCP, relay, camera sensing, and a real operator-facing command center.
Theory
From carpenter ants to artificial stigmergy.
Ant colonies demonstrate coordination without central supervision. The key idea is stigmergy: agents change the environment, and later agents read those changes as actionable information. This project translates that idea into a computational and robotic system.
Each agent is intentionally limited. It does not hold a global map, a broadcast state, or a centralized planner. Instead, coordinated behavior emerges from repeated local interactions with geometry, targets, and a shared digital field.
That matters because resilience often comes from redundancy and local decision-making. In fragile environments, centralized communication can fail. Stigmergic systems offer a different model: coordination through persistent environmental structure.
Simulation and Learning
A simulator designed for decentralized policy learning.
The simulation environment models targets, obstacles, lidar, movement, local observations, and pheromone dynamics. The training path uses recurrent MAPPO and a staged curriculum to teach progressively richer behaviors.
Simulation Environment
A custom multi-agent environment models targets, obstacles, lidar, agent-local observations, and pheromone dynamics for decentralized policy learning.
Learning Stack
The project trains recurrent MAPPO policies under a staged curriculum, preserving decentralized execution while using richer training-time structure.
Mission Control
A live desktop command center visualizes robots, maintains the digital pheromone field, and bridges simulator semantics into physical deployment.
Robot Runtime
The physical stack runs policy inference on-device and integrates sensing, camera-based target cues, communication transport, and motion control.
Operational Flow
How the system behaves step by step.
- Each agent perceives only local structure: lidar, nearby target cues, neighbor geometry, heading, speed, and pheromone samples.
- The learned policy selects one discrete action combining throttle, turning, and optional deposit behavior.
- Mission Control updates the global digital pheromone field and answers SENSE requests with simulator-compatible forward samples.
- Successful return paths become reusable environmental information rather than explicit agent-to-agent messages.
Sim-to-Real
The bridge from simulation to embodied robotics.
The project does not stop at policy training. It carries the learned behavior into a physical stack with Mission Control, real transport layers, robot firmware, and camera-based target cues.
Observation Contract
The deployed system preserves the simulator’s structured observation layout rather than replacing it with raw image input, which keeps sim-to-real alignment tractable.
Transport Flexibility
The system supports BLE, direct TCP, and relay-based communication, allowing the same core protocol to operate across laptop, simulator, and Raspberry Pi robot workflows.
Camera Integration
A Pi camera path now supplies compact target features into the policy. The implementation stays backward compatible by falling back safely when detection is unavailable.
Physical Robot
A real platform for testing decentralized swarm ideas.
The hardware stack turns the project from a simulator into a deployable system. The robot runtime runs policy inference on-device, integrates sensing and transport, and plugs into Mission Control without changing the core policy contract.
Onboard Compute
A Raspberry Pi hosts the runtime, communication clients, and camera integration. This keeps the deployed robot close to the simulator’s structured decision loop.
Sensing and Perception
The physical stack combines lidar-style range information, robot-local state, and an optional camera detector that fills compact target-related policy slots.
Command and Coordination
Mission Control acts as the live operator-facing field model, maintaining digital pheromones and visualizing robot state, trails, targets, and transport health.
Software Architecture
A layered system from training code to desktop coordination.
The codebase spans environment modeling, learning, inference, desktop infrastructure, and hardware deployment. The architecture is intentionally modular so simulator semantics can survive the move to physical robots.
Environment
Custom swarm environment with pheromone dynamics, obstacle geometry, target logic, and structured local observations.
Training
Curriculum-driven recurrent MAPPO training for decentralized policies with richer behavior shaping and long-horizon coordination.
Inference
Robot-local policy execution preserves the decentralized contract instead of shifting control back to a central planner.
Mission Control
Desktop command center for live state aggregation, pheromone field updates, sensing replies, target markers, and operator visualization.
Lessons and Limitations
What the project learned by building the full stack.
The value of the work is not only in the successes. It is also in the places where training, perception, and physical deployment exposed the real boundaries of decentralized robotic intelligence.
The project is strongest when it treats collective behavior as an emergent systems problem rather than a centralized planning problem.
Sim-to-real transfer depends less on flashy models than on preserving observation semantics and building reliable transport and runtime infrastructure.
Negative or mixed training results are informative: they exposed where route-reuse tasks, curriculum design, and detector robustness become the real bottlenecks.