"The cat sat on the mat"
The 1
cat 2
sat 3
on 4
the 5
mat 6
6 tokens × 6 tokens = 36 computations
Self-Attention Matrix
The
cat
sat
on
the
mat
The
0.30
0.25
0.10
0.10
0.15
0.10
cat
0.12
0.08
0.35
0.05
0.05
0.35
sat
0.06
0.34
0.10
0.10
0.05
0.35
on
0.05
0.05
0.10
0.10
0.30
0.40
the
0.10
0.05
0.05
0.30
0.10
0.40
mat
0.15
0.25
0.15
0.20
0.15
0.10
n × n = n²
O(N²) Compute Explosion

Recursive Language Models

The Next Evolution in AI Scaling

An AI doesn't read words; it tokenizes them

Tokens are the smallest atomic units a language model reads.

The model computes an attention score

Between "cat" and every other token. Higher score = more relevant context.

Every single token does this simultaneously

6 tokens × 6 attention scores = 36 computations.

This creates an attention matrix

Every cell is one computation. n tokens = n² computations.

The Quadratic Bottleneck

As context size grows, compute explodes — the fundamental wall.