"The cat sat on the mat"

The 1

cat 2

sat 3

on 4

the 5

mat 6

6 tokens × 6 tokens = 36 computations

Self-Attention Matrix

The

cat

sat

the

mat

The

0.30

0.25

0.10

0.15

0.10

cat

0.12

0.08

0.35

0.05

0.35

sat

0.06

0.34

0.10

0.05

0.35

0.05

0.10

0.30

0.40

the

0.10

0.05

0.30

0.10

0.40

mat

0.15

0.25

0.15

0.20

0.15

0.10

n × n = n²

O(N²) Compute Explosion

Recursive Language Models

The Next Evolution in AI Scaling

An AI doesn't read words; it tokenizes them

Tokens are the smallest atomic units a language model reads.

The model computes an attention score

Between "cat" and every other token. Higher score = more relevant context.

Every single token does this simultaneously

6 tokens × 6 attention scores = 36 computations.

This creates an attention matrix

Every cell is one computation. n tokens = n² computations.

The Quadratic Bottleneck

As context size grows, compute explodes — the fundamental wall.