Interactive demo

Formation Game

theory developed by Máté Badó Miklós

A mean-field formalization of a multi-agent formation task. Read the math, then run agents on a 30×30 grid under any of six coupling variants and watch them self-organise into a target pattern from random initial positions.

1. The setup

Many agents occupy locations on a grid. The goal is for the population to take on a particular spatial shape (a ring, a letter, a target density). Each agent only pays attention to local payoffs and moves at a small cost. The basic question: when can such a population organise into a target pattern under purely local, decentralised updates?

The key modelling trick is to stop tracking individual agents and instead track their empirical distribution:

m^{N}_{t} = \tfrac{1}{N} \sum_{i=1}^{N} \delta_{x_i(t)}

where $x_i(t)$ is the location of agent $i$ . In the large-population limit $m^{N}_{t} \rightharpoonup m_{t}$ , with $m_t \in \mathcal{P}(\Omega)$ a density on the spatial domain $\Omega$ . The desired shape is a target density $\rho \in \mathcal{P}(\Omega)$ , and the formation objective is to make $m_t$ approximate $\rho$ .

2. Finite-agent formation game

Let $V$ be a finite grid. Each agent $i$ occupies a cell $x_i \in V$ . The occupation count is $n_v = \#\{ i : x_i = v \}$ . A natural formation error compares the population to the target:

\mathcal{E}_{N}(n) = \tfrac{1}{2} \sum_{v \in V} w_v (n_v - N\rho_v)^{2}

Suppose agent $i$ moves from $u$ to $v$ . The marginal utility of the move is

U_{i}(u \to v;\, n) = \mathcal{E}_N(n) - \mathcal{E}_N(n - e_u + e_v) - \lambda\, d(u, v)

The first two terms are the global error reduction; $\lambda \geq 0$ is the movement cost. Defining $\mathcal{V}_N(n) = -\mathcal{E}_N(n)$ gives a potential game: any unilateral payoff-improving move increases $\mathcal{V}_N$ . The local best-response update is

x_i(t + 1) \in \arg\max_{v \in \mathcal{N}_r(x_i(t))}\, U_i(x_i(t) \to v;\, n(t))

where $\mathcal{N}_r(x) = \{ y \in V : d(x, y) \leq r \}$ is the local action set within perception/movement radius $r$ .

Below: $N = 32$ agents on a 12×12 grid, moving one cell per tick under exactly this rule. The shading is a centred Gaussian target.

N = 32 agents r = 1 (4-neighbour) step 0

Each agent moves to the neighbouring cell that most reduces the global formation error, minus a small movement cost. Coloured shading is the target density ρ.

3. The large-population limit

In the limit $N \to \infty$ , individual identities don't matter — only the density does. Write a representative agent's state $x_t \in \Omega$ and control $u_t \in U$ , with mean field $m_t \in \mathcal{P}(\Omega)$ :

dx_t = f(x_t, u_t, m_t)\, dt + \sigma(x_t)\, dw_t

Each agent has an individual cost $J^{N}_{i}(u_i, u_{-i}) = \mathbb{E}[\int_0^T c(x_i, u_i, m^{N}_{t};\, \rho) dt + g(x_i(T), m^{N}_{T};\, \rho)]$ , where $c$ and $g$ take the target $\rho$ as a parameter.

4. Mean-field control

If agents are cooperative and share an objective, the limit is mean-field control: a central planner picks a feedback $u(t, x)$ for the whole population, and the density evolves by Fokker–Planck:

\partial_t m_t = -\nabla \cdot (f(x, u, m_t) m_t) + \tfrac{1}{2} \sum_{i,j} \partial^{2}_{x_i x_j}(a_{ij}(x) m_t)

For target-density formation, a natural running cost is

c(x, u, m;\, \rho) = \tfrac{1}{2}|u|^{2} + \tfrac{\gamma}{2}((Km)(x) - \rho(x))^{2}

where $K$ is a perception/smoothing kernel $(Km)(x) = \int_\Omega K(x, y)\, m(dy)$ . If $K$ is local, agents only see their neighbourhood, matching the local-information regime explored in the simulator below.

5. Mean-field game (HJB–FP)

If agents are non-cooperative, the limit is a mean-field game: each agent solves its own optimisation while the mean field $m_t$ is taken as given. An equilibrium $(u^\star, m^\star)$ satisfies the coupled Hamilton–Jacobi–Bellman / Fokker–Planck system:

\begin{aligned}-\partial_t v &= \tfrac{1}{2}\,\mathrm{tr}(a D^{2}_{x} v) + H(x, D_x v, m_t) \\ \partial_t m_t &= -\nabla \cdot (f(x, \hat{u}, m_t) m_t) + \tfrac{1}{2} \sum_{i,j} \partial^{2}_{x_i x_j}(a_{ij} m_t)\end{aligned}

The decentralised feedback $u_i(t) = u^\star(t, x_i(t))$ used by each finite agent yields an $\varepsilon_N$ -Nash equilibrium with $\varepsilon_N \to 0$ as $N \to \infty$ . That's the game-theoretic justification for the mean-field limit: instead of solving an intractable $N$ -agent strategic problem, solve a representative-agent fixed point and get approximately-Nash decentralised strategies for free.

6. Cost variants

Different formation tasks come from different choices of $c$ and $g$ . The simulator below lets you swap between four of these.

§6.1 Target-density. Match a prescribed $\rho$ exactly.

c = \tfrac{1}{2}|u|^2 + \tfrac{\gamma}{2}((Km)(x) - \rho(x))^2

§6.2 Congestion-avoiding. Move toward useful regions but punish crowding.

c = \tfrac{1}{2}|u|^2 + \gamma(Km)(x) - \eta\rho(x)

§6.3 Local information. Replace $K$ with a localised kernel $K_r$ whose support is a ball of radius $r$ — directly the "perception radius" slider below.

§6.5 Coverage. Spread out to cover a domain weighted by importance $q(x)$ :

c = \tfrac{1}{2}|u|^2 - q(x) + \gamma(Km)(x)

§6.9 Minority / anti-coordination. Prefer underused cells; useful when $\rho$ is uniform or when you want soft target-following.

Simulator

The grid is $30 \times 30$ . Pick a target shape, pick a cost variant, hit play, and watch the agents self-organise from random initial positions. The formation error in the readout normalises $\mathcal{E}_N$ to $[0, 1]$ so you can compare runs.

target pattern

cost variant

agents 250

perception radius r 5

movement cost λ 0.05

step 0

formation error 1.000

target density / agents

What to look for. Compare target-density against anti-coord on the ring pattern — both produce something ring-ish, but the anti-coord variant tolerates much fuzzier convergence. Try a small perception radius ( $r = 1$ or $2$ ) on the T pattern: agents get stuck in local optima because they can't see far enough to "find" the shape.

Source

Based on Mean-Field Formalization of a Formation Game, theory developed by Máté Badó Miklós, June 2026. The simulator implements the finite-agent best-response rule from §2 with four of the coupling variants from §6. Perception-radius slider exposes the local-information regime (§6.3).