Seven is enough

I went down the rabbit hole once, trying to figure out why numpy's matrix multiplication is so fast. The trail ends up at a library called BLAS, and following that name reveals one axis of the answer. Hardware-side optimisations like cache-tiling, SIMD, parallelism. There's another axis tho. The algorithmic cost model itself turns out to be lower than what we were first taught. Matrix multiplication is not $O(n^3)$ .

More precisely, the best algorithm of our era runs in $O(n^{2.371339})$ . The era when $n^3$ looked like the obvious answer lasted until 1969. After that, fifty years of human work shaved roughly 0.6 off the exponent. Then in 2022, an RL agent left a fresh footprint along the trail. We still haven't reached the end of it.

This essay walks back to where the trail begins. To Volker Strassen's short 1969 discovery. The discovery was the act of removing just one multiplication, and that one removal shaped fifty years of the landscape that followed. The starting point is a handful of algebra you can verify by hand.

Say we want to multiply two $2 \times 2$ matrices. By definition, each entry of the result is the sum of products from one row of $A$ and one column of $B$ .

\begin{pmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{pmatrix} \begin{pmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{pmatrix} = \begin{pmatrix} A_{11}B_{11} + A_{12}B_{21} & A_{11}B_{12} + A_{12}B_{22} \\ A_{21}B_{11} + A_{22}B_{21} & A_{21}B_{12} + A_{22}B_{22} \end{pmatrix}

Count the scalar multiplications and you get 4 entries $\times 2 = 8$ . The natural answer. Open any textbook and you'll find the same number, because the definition of matrix multiplication forces it.

A short 1969 paper breaks that naturalness. Volker Strassen's Gaussian Elimination is not Optimal, three pages long. He shows how to multiply two $2 \times 2$ matrices using only seven multiplications.

First he constructs seven intermediate products.

M_1 = (A_{11} + A_{22})(B_{11} + B_{22})

M_2 = (A_{21} + A_{22}) B_{11}

M_3 = A_{11} (B_{12} - B_{22})

M_4 = A_{22} (B_{21} - B_{11})

M_5 = (A_{11} + A_{12}) B_{22}

M_6 = (A_{21} - A_{11}) (B_{11} + B_{12})

M_7 = (A_{12} - A_{22}) (B_{21} + B_{22})

Each $M_i$ uses one multiplication. Additions and subtractions are cheap relative to multiplication, so we don't count them. This follows a hardware model in which multiplication is the dominant cost for large matrices.

The four entries of the result are then assembled from these $M_i$ using additions and subtractions.

C_{11} = M_1 + M_4 - M_5 + M_7

C_{12} = M_3 + M_5

C_{21} = M_2 + M_4

C_{22} = M_1 - M_2 + M_3 + M_6

Seven multiplications. That's it.

The first time you see these seven formulas, they look arbitrary. Why does $M_1 = (A_{11} + A_{22})(B_{11} + B_{22})$ even show up? Why pair $A_{11} + A_{22}$ with $B_{11} + B_{22}$ ? There's no natural intuition that leads here.

And yet it works. The careful cancellation across these seven products yields the correct result. That's the elegance of Strassen's discovery. It isn't intuitive but it's right. And once you have the result, you can trace why it's right by hand.

Checking the algebra

Let's actually expand one entry by hand to see if the seven formulas really produce $C = AB$ . Take $C_{11}$ .

C_{11} = M_1 + M_4 - M_5 + M_7

Substituting each $M_i$ by its definition.

C_{11} = (A_{11} + A_{22})(B_{11} + B_{22}) + A_{22}(B_{21} - B_{11}) - (A_{11} + A_{12})B_{22} + (A_{12} - A_{22})(B_{21} + B_{22})

Expanding all the terms.

\begin{aligned} C_{11} &= A_{11}B_{11} + A_{11}B_{22} + A_{22}B_{11} + A_{22}B_{22} \\ &\quad + A_{22}B_{21} - A_{22}B_{11} \\ &\quad - A_{11}B_{22} - A_{12}B_{22} \\ &\quad + A_{12}B_{21} + A_{12}B_{22} - A_{22}B_{21} - A_{22}B_{22} \end{aligned}

Now collect like terms to see what cancels.

$A_{11}B_{11}$ : appears once in line 1 → Survives

$A_{11}B_{22}$ : +1 in line 1, -1 in line 3. → Cancels

$A_{22}B_{11}$ : +1 in line 1, -1 in line 2 → Cancels

$A_{22}B_{22}$ : +1 in line 1, -1 in line 4 → Cancels

$A_{22}B_{21}$ : +1 in line 2, -1 in line 4 → Cancels

$A_{12}B_{22}$ : -1 in line 3, +1 in line 4 → Cancels

$A_{12}B_{21}$ : appears once in line 4 → Survives

Collecting only the survivors.

C_{11} = A_{11}B_{11} + A_{12}B_{21}

This matches the definition of $C_{11}$ exactly.

The other three entries verify the same way. What used to be expressed with eight multiplications is reconstructed exactly from combinations of seven, through addition and subtraction. Follow the hand calculation through and you start to see how this cancellation works.

But hand calculation only shows correctness. It doesn't show the path to discovery. How Strassen found these seven formulas is a separate question. There's no intuitive derivation, no accident of luck. From the tensor rank perspective, there's a theorem that the rank of the $2 \times 2$ matrix multiplication tensor is exactly 7. Hopcroft and Kerr proved it in 1971, and Landsberg and others extended the result later. But knowing the rank and finding the decomposition that achieves it are different problems. The path of discovery hides behind the algebra.

Doing all four entries by hand gets tedious, so let's also confirm with code.

Python

agree: 100 / 100

For all 100 random matrices, the result of the seven-product Strassen formula agrees with the definition. Doing it by hand showed the cancellation algebraically for one entry. The code confirms empirically over random matrices that the same cancellation holds throughout.

The weight of recursion

At the $2 \times 2$ level the reduction from 8 to 7 is just one. $7/8 = 0.875$ , about 12.5% faster. Interesting, but it doesn't really change the bigger picture of matrix multiplication.

Things shift once you apply it recursively.

Cut an $n \times n$ matrix (assume $n$ is power of 2) into four $(n/2) \times (n/2)$ blocks.

A = \begin{pmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{pmatrix}, \quad B = \begin{pmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{pmatrix}

Each $A_{ij}$ and $B_{ij}$ is now a smaller matrix. Apply Strassen's seven-product formulas at the block level. For example, in $M1 = (A_{11} + A_{22})(B_{11} + B_{22})$ , the $A_{11} + A_{22}$ is a matrix addition between small matrices, and the product of that result with $B_{11} + B_{22}$ is a matrix multiplication between small matrices.

The key is that we can call Strassen recursively. Computing each $M_i$ requires one matrix multiplication at half the size, and that smaller multiplication can also be done by Strassen, yielding seven even smaller multiplications. $\frac{n}{2} \to \frac{n}{4} \to \frac{n}{8} \to \cdots \to 1$

A recurrence falls out naturally. Let $T(n)$ be the number of scalar multiplications needed to multiply $n \times n$ matrices.

T(n) = 7 \cdot T\left(\frac{n}{2}\right) + O(n^2)

Seven smaller matrix multiplications plus $O(n^2)$ additions and subtractions. Asymptotically, the dominant term is the multiplication count.

The Master theorem solves this.

T(n) = O\left(n^{\log_2 7}\right) \approx O(n^{2.807})

Compared to the naive $O(n^3)$ , the exponent drops from 3 to 2.807.

How big is this one-line change? At $n = 1000$ , the naive count is roughly $10^9$ multiplications, while Strassen drops it to roughly $1000^{2.807} \approx 4 \times 10^8$ . Roughly 2.5x difference. As $n$ grows the gap widens exponentially. A single removed multiplication, amplified through recursion.

Let's actually count the multiplications in code.

Python

     n         naive      strassen     ratio
------  ------------  ------------  --------
     2             8             7     1.14x
     4            64            49     1.31x
     8           512           343     1.49x
    16          4096          2401     1.71x
    32         32768         16807     1.95x
    64        262144        117649     2.23x
   128       2097152        823543     2.55x
   256      16777216       5764801     2.91x

Each time $n$ doubles, the ratio grows by roughly $8/7 \approx 1.143$ . A small one-time reduction blooms across an exponential lattice. That's the weight of recursion.

One practical caveat. In real libraries, Strassen typically starts outperforming the naive algorithm only around $n \approx 1000$ . For smaller $n$ , the addition overhead and poor cache locality make the naive method faster. The table above counts multiplications only, not wall time. For wall time, hardware-side optimisations in BLAS like cache-tiling, SIMD, and parallelism matter more than the multiplication count.

Fifty years of chasing $\omega$

After Strassen's discovery, mathematicians start chasing the matrix multiplication complexity exponent $\omega$ . That's the smallest exponent such that some algorithm runs in $O(n^\omega)$ . Fifty years of tightening the upper bound.

Year	Discoverer(s)	Bound $\omega \leq$	Key technique
1969	Strassen	$\log_2 7 \approx 2.807$	7 product for 2*2
1978	Pan	2.781	partial matrix multiplication
1979	Bini et al.	2.78	approximate matrix mult, border rank
1981	Schönhage	2.522	τ-theorem, partial decomposition
1986	Strassen	2.479	early laser method
1990	Coppersmith, Winograd	2.376	arithmetic progressions + laser(held 30 years)
2014	Le Gall	2.3728639	tighter laser method analysis
2020	Alman, Williams	2.3728596	refined laser method(SODA 2021)
2022	Duan, Wu, Zhou	2.371866	asymmetric hashing
2024	Williams, Xu, Xu, Zhou	2.371552	new laser method variant(SODA 2024)
2024	Alman, Duan, Williams, Xu, Xu, Zhou	2.371339	further asymmetry(SODA 2025)

Each improvement isn't a simple subtraction like 7 → 6. It's a refinement of asymptotic analysis. Accumulated algebra about how to construct tensor decompositions better in the large $n$ limit. Coppersmith-Winograd's 2.376 holding as the best result for 24 years speaks to the depth of that analysis.

A particularly interesting moment is the silence between 1990 and 2014. The reading at the time was that Coppersmith-Winograd's laser method had given everything it could. Then Le Gall 2014 shaved off another 0.003 with a finer analysis of the same method. After 24 years of quiet, this was the discovery that more could be squeezed out within the method's limits. After that came 2020's refined laser by Alman and Williams, 2022's asymmetric hashing by Duan, Wu, and Zhou, 2024's new variant by Williams, Xu, Xu, and Zhou, and the same year's further asymmetry by Alman, Duan, and four others. Each shaved a little more, landing us at the current best of $\omega \leq 2.371339$ .

The mathematical grain of this march is an accumulation of asymptotic function analysis pointing toward the true value of $\omega$ . Not intuitive algebra, but precise tracking of limits. Fifty years of human work, poured into a 0.43 improvement in the exponent.

One note on the lower-bound side. The trivial lower bound is $\omega \geq 2$ . Every entry has to be read at least once, so at least $n^2$ operations are required. Since Winograd's 1979 result, no tighter lower bound has been found. So we've been shaving the upper bound while the lower bound has stayed put for 50 years.

AlphaTensor 2022, the moment AI becomes a discoverer

In October 2022, DeepMind publishes a paper in Nature titled Discovering faster matrix multiplication algorithms with reinforcement learning.

The core idea is to reduce matrix multiplication to a tensor decomposition problem. An $n \times n$ matrix multiplication is represented as a 3-dimensional tensor $T_n$ with $n^3$ entries, and the rank of this tensor equals the minimum number of multiplications needed for that multiplication. Strassen's seven-product discovery is exactly equivalent to the fact that the rank of the $2 \times 2$ matrix multiplication tensor $T_2$ is at most 7.

Finding a way to make the tensor's rank lower than what's known is the same as making a faster algorithm. But finding tensor rank is NP-hard even for fixed nn. Brute-force search doesn't work because the solution space grows exponentially.

DeepMind's approach treats tensor decomposition as a game. They train an RL agent from the AlphaZero family, called AlphaTensor, to play it. Each turn, the agent picks one rank-1 component and subtracts it from the tensor. The game ends when the tensor reaches zero. Fewer moves mean a lower-rank decomposition.

Here's what AlphaTensor found, in the paper's Fig. 3 and Supplementary.

Case	Previous best	AlphaTensor	Note
$4{\times}4 \cdot 4{\times}4$ (standard)	49 (Strassen recursion)	47	record broken after 50 years
$4{\times}5 \cdot 5{\times}5$ (standard)	80	76
$5{\times}5 \cdot 5{\times}5$ (mod 2)	98	96
$4{\times}4 \cdot 4{\times}4$ (mod 2)	49	47

AlphaTensor also matched or beat the previous best in dozens of other cases.

Three nuances to mark here.

First, mod 2 arithmetic. Matrix multiplication over the boolean field $\mathbb{F}_2$ is not the same as multiplication over the reals. In booleans, $-1 = 1$ , so signs disappear. Over the reals, signs matter. So some of the paper's mod 2 results can't be ported straight into practical matrix multiplication libraries. The $4 \times 4$ 47-product formula works over standard arithmetic, but a few cases hold only over mod 2.

Next, AlphaTensor didn't crack the exponent $\omega$ itself. It found better decompositions for fixed $n$ ( $4 \times 4$ , $5 \times 5$ , and so on), but the asymptotic upper bound on $\omega$ stays where humans left it in 2024, at 2.371339. What AlphaTensor found were concrete decompositions at small $n$ , not improvements to the asymptotic estimate of $\omega$ . Two kinds of discovery on different layers.

Finally, the discovered decompositions don't show algebraic patterns a human can read. Strassen's seven-product formula can be hand-traced for why it works, through cancellation we walked through earlier. The human-found 49-product Strassen-Smirnov is similar, with the pattern visible. AlphaTensor's 47-product formula has no such visible pattern. It's just what the RL agent found, and the intuition for why 47 is enough stays opaque. Verification still works. Plug in random mod 2 matrices and you get exactly $AB$ . But the algebra of discovery is buried inside the training curve.

Still, AI broke a 50-year human record. The 49-product decomposition for $4 \times 4$ dates back to Strassen 1969, his $2 \times 2$ trick applied recursively, with Smirnov refining it in 1986. For over fifty years, no human found a faster way. The RL agent pinned down a 47-product formula after a few days of training. The 47 formulas themselves are open in the paper's supplementary, so anyone can verify them over mod 2.

What we don't know

The true value of $\omega$ stays unknown.

Lower bound: $\omega \geq 2$ , the naive lower bound. Every entry has to be read at least once, so at least $n^2$ operations are required. Winograd formalised this in 1979, and no tighter lower bound has been found since.

Upper bound is $\omega \leq 2.371339$

Gap is about 0.371339.

We don't know what lives inside that gap. The question of how fast matrix multiplication can intrinsically be has stayed open for over 50 years.

The march of the upper bound may converge at some point, or it may keep dropping. If $\omega$ turns out to be the true answer, the strong $\omega$ conjecture, it means every matrix multiplication runs as fast as just reading the entries. A hard conclusion to accept, but 50 years of falling upper bounds point that way.

If $\omega > 2$ , meaning a strict lower bound exists, that lower bound has to be proved. We don't even know what such a proof would look like. The existing tools of circuit complexity and communication complexity don't apply to this problem directly.

AlphaTensor pinned down one thing in particular. Even at small $n$ , the decompositions we believed were best may not be optimal. The 49-product formula for $4 \times 4$ was best for over fifty years, but it wasn't actually the best. We just hadn't found better. What we call the upper bound may, in fact, be lower than we know.

The path toward $\omega$ 's true value is no longer a human-only path. What 50 years of pursuit have shown is the shape of the path, not its endpoint. The endpoint is still somewhere we don't know.

n naive strassen ratio ------ ------------ ------------ -------- 2 8 7 1.14x 4 64 49 1.31x 8 512 343 1.49x 16 4096 2401 1.71x 32 32768 16807 1.95x 64 262144 117649 2.23x 128 2097152 823543 2.55x 256 16777216 5764801 2.91x

Year

Discoverer(s)

Bound

\omega \leq

Key technique

1969

Strassen

\log_2 7 \approx 2.807

7 product for 2*2

1978

Pan

2.781

partial matrix multiplication

1979

Bini et al.

2.78

approximate matrix mult, border rank

1981

Schönhage

2.522

τ-theorem, partial decomposition

1986

Strassen

2.479

early laser method

1990

Coppersmith, Winograd

2.376

arithmetic progressions + laser(held 30 years)

2014

Le Gall

2.3728639

tighter laser method analysis

2020

Alman, Williams

2.3728596

refined laser method(SODA 2021)

2022

Duan, Wu, Zhou

2.371866

asymmetric hashing

2024

Williams, Xu, Xu, Zhou

2.371552

new laser method variant(SODA 2024)

2024

Alman, Duan, Williams, Xu, Xu, Zhou

2.371339

further asymmetry(SODA 2025)

Case

Previous best

AlphaTensor

Note

4{\times}4 \cdot 4{\times}4

(standard)

49 (Strassen recursion)

record broken after 50 years

4{\times}5 \cdot 5{\times}5

(standard)

5{\times}5 \cdot 5{\times}5

(mod 2)

4{\times}4 \cdot 4{\times}4

(mod 2)

Checking the algebra

The weight of recursion

Fifty years of chasing ω\omegaω

AlphaTensor 2022, the moment AI becomes a discoverer

What we don't know

Checking the algebra

The weight of recursion

Fifty years of chasing ω\omegaω

AlphaTensor 2022, the moment AI becomes a discoverer

What we don't know

Fifty years of chasing $\omega$

Fifty years of chasing $\omega$