skip to content
Interactive demo

Formation Game

theory developed by Máté Badó Miklós

A mean-field formalization of a multi-agent formation task. Read the math, then run agents on a 30×30 grid under any of six coupling variants and watch them self-organise into a target pattern from random initial positions.

1. The setup

Many agents occupy locations on a grid. The goal is for the population to take on a particular spatial shape (a ring, a letter, a target density). Each agent only pays attention to local payoffs and moves at a small cost. The basic question: when can such a population organise into a target pattern under purely local, decentralised updates?

The key modelling trick is to stop tracking individual agents and instead track their empirical distribution:

mtN=1Ni=1Nδxi(t)m^{N}_{t} = \tfrac{1}{N} \sum_{i=1}^{N} \delta_{x_i(t)}

where xi(t)x_i(t) is the location of agent ii. In the large-population limit mtNmtm^{N}_{t} \rightharpoonup m_{t}, with mtP(Ω)m_t \in \mathcal{P}(\Omega) a density on the spatial domain Ω\Omega. The desired shape is a target density ρP(Ω)\rho \in \mathcal{P}(\Omega), and the formation objective is to make mtm_t approximate ρ\rho.

2. Finite-agent formation game

Let VV be a finite grid. Each agent ii occupies a cell xiVx_i \in V. The occupation count is nv=#{i:xi=v}n_v = \#\{ i : x_i = v \}. A natural formation error compares the population to the target:

EN(n)=12vVwv(nvNρv)2\mathcal{E}_{N}(n) = \tfrac{1}{2} \sum_{v \in V} w_v (n_v - N\rho_v)^{2}

Suppose agent ii moves from uu to vv. The marginal utility of the move is

Ui(uv;n)=EN(n)EN(neu+ev)λd(u,v)U_{i}(u \to v;\, n) = \mathcal{E}_N(n) - \mathcal{E}_N(n - e_u + e_v) - \lambda\, d(u, v)

The first two terms are the global error reduction; λ0\lambda \geq 0 is the movement cost. Defining VN(n)=EN(n)\mathcal{V}_N(n) = -\mathcal{E}_N(n) gives a potential game: any unilateral payoff-improving move increases VN\mathcal{V}_N. The local best-response update is

xi(t+1)argmaxvNr(xi(t))Ui(xi(t)v;n(t))x_i(t + 1) \in \arg\max_{v \in \mathcal{N}_r(x_i(t))}\, U_i(x_i(t) \to v;\, n(t))

where Nr(x)={yV:d(x,y)r}\mathcal{N}_r(x) = \{ y \in V : d(x, y) \leq r \} is the local action set within perception/movement radius rr.

Below: N=32N = 32 agents on a 12×12 grid, moving one cell per tick under exactly this rule. The shading is a centred Gaussian target.

N = 32 agents r = 1 (4-neighbour) step 0
Each agent moves to the neighbouring cell that most reduces the global formation error, minus a small movement cost. Coloured shading is the target density ρ.

3. The large-population limit

In the limit NN \to \infty, individual identities don't matter — only the density does. Write a representative agent's state xtΩx_t \in \Omega and control utUu_t \in U, with mean field mtP(Ω)m_t \in \mathcal{P}(\Omega):

dxt=f(xt,ut,mt)dt+σ(xt)dwtdx_t = f(x_t, u_t, m_t)\, dt + \sigma(x_t)\, dw_t

Each agent has an individual cost JiN(ui,ui)=E[0Tc(xi,ui,mtN;ρ)dt+g(xi(T),mTN;ρ)]J^{N}_{i}(u_i, u_{-i}) = \mathbb{E}[\int_0^T c(x_i, u_i, m^{N}_{t};\, \rho) dt + g(x_i(T), m^{N}_{T};\, \rho)], where cc and gg take the target ρ\rho as a parameter.

4. Mean-field control

If agents are cooperative and share an objective, the limit is mean-field control: a central planner picks a feedback u(t,x)u(t, x) for the whole population, and the density evolves by Fokker–Planck:

tmt=(f(x,u,mt)mt)+12i,jxixj2(aij(x)mt)\partial_t m_t = -\nabla \cdot (f(x, u, m_t) m_t) + \tfrac{1}{2} \sum_{i,j} \partial^{2}_{x_i x_j}(a_{ij}(x) m_t)

For target-density formation, a natural running cost is

c(x,u,m;ρ)=12u2+γ2((Km)(x)ρ(x))2c(x, u, m;\, \rho) = \tfrac{1}{2}|u|^{2} + \tfrac{\gamma}{2}((Km)(x) - \rho(x))^{2}

where KK is a perception/smoothing kernel (Km)(x)=ΩK(x,y)m(dy)(Km)(x) = \int_\Omega K(x, y)\, m(dy). If KK is local, agents only see their neighbourhood, matching the local-information regime explored in the simulator below.

5. Mean-field game (HJB–FP)

If agents are non-cooperative, the limit is a mean-field game: each agent solves its own optimisation while the mean field mtm_t is taken as given. An equilibrium (u,m)(u^\star, m^\star) satisfies the coupled Hamilton–Jacobi–Bellman / Fokker–Planck system:

tv=12tr(aDx2v)+H(x,Dxv,mt)tmt=(f(x,u^,mt)mt)+12i,jxixj2(aijmt)\begin{aligned}-\partial_t v &= \tfrac{1}{2}\,\mathrm{tr}(a D^{2}_{x} v) + H(x, D_x v, m_t) \\ \partial_t m_t &= -\nabla \cdot (f(x, \hat{u}, m_t) m_t) + \tfrac{1}{2} \sum_{i,j} \partial^{2}_{x_i x_j}(a_{ij} m_t)\end{aligned}

The decentralised feedback ui(t)=u(t,xi(t))u_i(t) = u^\star(t, x_i(t)) used by each finite agent yields an εN\varepsilon_N-Nash equilibrium with εN0\varepsilon_N \to 0 as NN \to \infty. That's the game-theoretic justification for the mean-field limit: instead of solving an intractable NN-agent strategic problem, solve a representative-agent fixed point and get approximately-Nash decentralised strategies for free.

6. Cost variants

Different formation tasks come from different choices of cc and gg. The simulator below lets you swap between four of these.

§6.1 Target-density. Match a prescribed ρ\rho exactly.

c=12u2+γ2((Km)(x)ρ(x))2c = \tfrac{1}{2}|u|^2 + \tfrac{\gamma}{2}((Km)(x) - \rho(x))^2

§6.2 Congestion-avoiding. Move toward useful regions but punish crowding.

c=12u2+γ(Km)(x)ηρ(x)c = \tfrac{1}{2}|u|^2 + \gamma(Km)(x) - \eta\rho(x)

§6.3 Local information. Replace KK with a localised kernel KrK_r whose support is a ball of radius rr — directly the "perception radius" slider below.

§6.5 Coverage. Spread out to cover a domain weighted by importance q(x)q(x):

c=12u2q(x)+γ(Km)(x)c = \tfrac{1}{2}|u|^2 - q(x) + \gamma(Km)(x)

§6.9 Minority / anti-coordination. Prefer underused cells; useful when ρ\rho is uniform or when you want soft target-following.

Simulator

The grid is 30×3030 \times 30. Pick a target shape, pick a cost variant, hit play, and watch the agents self-organise from random initial positions. The formation error in the readout normalises EN\mathcal{E}_N to [0,1][0, 1] so you can compare runs.

target pattern

cost variant

agents 250
perception radius r 5
movement cost λ 0.05
step 0
formation error 1.000
target density / agents

What to look for. Compare target-density against anti-coord on the ring pattern — both produce something ring-ish, but the anti-coord variant tolerates much fuzzier convergence. Try a small perception radius (r=1r = 1 or 22) on the T pattern: agents get stuck in local optima because they can't see far enough to "find" the shape.

Source

Based on Mean-Field Formalization of a Formation Game, theory developed by Máté Badó Miklós, June 2026. The simulator implements the finite-agent best-response rule from §2 with four of the coupling variants from §6. Perception-radius slider exposes the local-information regime (§6.3).