DokaLab
  • About
  • Blog
  • Contact
DokaLab
  • About
  • Blog
  • Contact

© 2026 DokaLab · Soho, London · No tracking, no sign-ups.

Set in Inter, JetBrains Mono, and Bricolage Grotesque.

AboutContactPrivacy

Formal›Algorithms

Seven is enough

2026-05-1512 min
  • AI Discovery
  • Complexity
  • Matrix Multiplication

I went down the rabbit hole once, trying to figure out why numpy's matrix multiplication is so fast. The trail ends up at a library called BLAS, and following that name reveals one axis of the answer. Hardware-side optimisations like cache-tiling, SIMD, parallelism. There's another axis tho. The algorithmic cost model itself turns out to be lower than what we were first taught. Matrix multiplication is not O(n3)O(n^3)O(n3).

More precisely, the best algorithm of our era runs in O(n2.371339)O(n^{2.371339})O(n2.371339). The era when n3n^3n3 looked like the obvious answer lasted until 1969. After that, fifty years of human work shaved roughly 0.6 off the exponent. Then in 2022, an RL agent left a fresh footprint along the trail. We still haven't reached the end of it.

This essay walks back to where the trail begins. To Volker Strassen's short 1969 discovery. The discovery was the act of removing just one multiplication, and that one removal shaped fifty years of the landscape that followed. The starting point is a handful of algebra you can verify by hand.

Say we want to multiply two 2×22 \times 22×2 matrices. By definition, each entry of the result is the sum of products from one row of AAA and one column of BBB.

(A11A12A21A22)(B11B12B21B22)=(A11B11+A12B21A11B12+A12B22A21B11+A22B21A21B12+A22B22)\begin{pmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{pmatrix} \begin{pmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{pmatrix} = \begin{pmatrix} A_{11}B_{11} + A_{12}B_{21} & A_{11}B_{12} + A_{12}B_{22} \\ A_{21}B_{11} + A_{22}B_{21} & A_{21}B_{12} + A_{22}B_{22} \end{pmatrix}(A11​A21​​A12​A22​​)(B11​B21​​B12​B22​​)=(A11​B11​+A12​B21​A21​B11​+A22​B21​​A11​B12​+A12​B22​A21​B12​+A22​B22​​)

Count the scalar multiplications and you get 4 entries ×2=8\times 2 = 8×2=8. The natural answer. Open any textbook and you'll find the same number, because the definition of matrix multiplication forces it.

A short 1969 paper breaks that naturalness. Volker Strassen's Gaussian Elimination is not Optimal, three pages long. He shows how to multiply two 2×22 \times 22×2 matrices using only seven multiplications.

First he constructs seven intermediate products.

M1=(A11+A22)(B11+B22)M_1 = (A_{11} + A_{22})(B_{11} + B_{22})M1​=(A11​+A22​)(B11​+B22​)
M2=(A21+A22)B11M_2 = (A_{21} + A_{22}) B_{11}M2​=(A21​+A22​)B11​
M3=A11(B12−B22)M_3 = A_{11} (B_{12} - B_{22})M3​=A11​(B12​−B22​)
M4=A22(B21−B11)M_4 = A_{22} (B_{21} - B_{11})M4​=A22​(B21​−B11​)
M5=(A11+A12)B22M_5 = (A_{11} + A_{12}) B_{22}M5​=(A11​+A12​)B22​
M6=(A21−A11)(B11+B12)M_6 = (A_{21} - A_{11}) (B_{11} + B_{12})M6​=(A21​−A11​)(B11​+B12​)
M7=(A12−A22)(B21+B22)M_7 = (A_{12} - A_{22}) (B_{21} + B_{22})M7​=(A12​−A22​)(B21​+B22​)

Each MiM_iMi​ uses one multiplication. Additions and subtractions are cheap relative to multiplication, so we don't count them. This follows a hardware model in which multiplication is the dominant cost for large matrices.

The four entries of the result are then assembled from these MiM_iMi​ using additions and subtractions.

C11=M1+M4−M5+M7C_{11} = M_1 + M_4 - M_5 + M_7C11​=M1​+M4​−M5​+M7​
C12=M3+M5C_{12} = M_3 + M_5C12​=M3​+M5​
C21=M2+M4C_{21} = M_2 + M_4C21​=M2​+M4​
C22=M1−M2+M3+M6C_{22} = M_1 - M_2 + M_3 + M_6C22​=M1​−M2​+M3​+M6​

Seven multiplications. That's it.

The first time you see these seven formulas, they look arbitrary. Why does M1=(A11+A22)(B11+B22)M_1 = (A_{11} + A_{22})(B_{11} + B_{22})M1​=(A11​+A22​)(B11​+B22​) even show up? Why pair A11+A22A_{11} + A_{22}A11​+A22​ with B11+B22B_{11} + B_{22}B11​+B22​? There's no natural intuition that leads here.

And yet it works. The careful cancellation across these seven products yields the correct result. That's the elegance of Strassen's discovery. It isn't intuitive but it's right. And once you have the result, you can trace why it's right by hand.

Checking the algebra

Let's actually expand one entry by hand to see if the seven formulas really produce C=ABC = ABC=AB. Take C11C_{11}C11​.

C11=M1+M4−M5+M7C_{11} = M_1 + M_4 - M_5 + M_7C11​=M1​+M4​−M5​+M7​

Substituting each MiM_iMi​ by its definition.

C11=(A11+A22)(B11+B22)+A22(B21−B11)−(A11+A12)B22+(A12−A22)(B21+B22)C_{11} = (A_{11} + A_{22})(B_{11} + B_{22}) + A_{22}(B_{21} - B_{11}) - (A_{11} + A_{12})B_{22} + (A_{12} - A_{22})(B_{21} + B_{22})C11​=(A11​+A22​)(B11​+B22​)+A22​(B21​−B11​)−(A11​+A12​)B22​+(A12​−A22​)(B21​+B22​)

Expanding all the terms.

C11=A11B11+A11B22+A22B11+A22B22+A22B21−A22B11−A11B22−A12B22+A12B21+A12B22−A22B21−A22B22\begin{aligned} C_{11} &= A_{11}B_{11} + A_{11}B_{22} + A_{22}B_{11} + A_{22}B_{22} \\ &\quad + A_{22}B_{21} - A_{22}B_{11} \\ &\quad - A_{11}B_{22} - A_{12}B_{22} \\ &\quad + A_{12}B_{21} + A_{12}B_{22} - A_{22}B_{21} - A_{22}B_{22} \end{aligned}C11​​=A11​B11​+A11​B22​+A22​B11​+A22​B22​+A22​B21​−A22​B11​−A11​B22​−A12​B22​+A12​B21​+A12​B22​−A22​B21​−A22​B22​​

Now collect like terms to see what cancels.

A11B11A_{11}B_{11}A11​B11​: appears once in line 1 → Survives

A11B22A_{11}B_{22}A11​B22​: +1 in line 1, -1 in line 3. → Cancels

A22B11A_{22}B_{11}A22​B11​: +1 in line 1, -1 in line 2 → Cancels

A22B22A_{22}B_{22}A22​B22​: +1 in line 1, -1 in line 4 → Cancels

A22B21A_{22}B_{21}A22​B21​: +1 in line 2, -1 in line 4 → Cancels

A12B22A_{12}B_{22}A12​B22​: -1 in line 3, +1 in line 4 → Cancels

A12B21A_{12}B_{21}A12​B21​: appears once in line 4 → Survives

Collecting only the survivors.

C11=A11B11+A12B21C_{11} = A_{11}B_{11} + A_{12}B_{21}C11​=A11​B11​+A12​B21​

This matches the definition of C11C_{11}C11​ exactly.

The other three entries verify the same way. What used to be expressed with eight multiplications is reconstructed exactly from combinations of seven, through addition and subtraction. Follow the hand calculation through and you start to see how this cancellation works.

But hand calculation only shows correctness. It doesn't show the path to discovery. How Strassen found these seven formulas is a separate question. There's no intuitive derivation, no accident of luck. From the tensor rank perspective, there's a theorem that the rank of the 2×22 \times 22×2 matrix multiplication tensor is exactly 7. Hopcroft and Kerr proved it in 1971, and Landsberg and others extended the result later. But knowing the rank and finding the decomposition that achieves it are different problems. The path of discovery hides behind the algebra.

Doing all four entries by hand gets tedious, so let's also confirm with code.

Python
123456789101112131415161718192021222324252627
agree: 100 / 100

For all 100 random matrices, the result of the seven-product Strassen formula agrees with the definition. Doing it by hand showed the cancellation algebraically for one entry. The code confirms empirically over random matrices that the same cancellation holds throughout.

The weight of recursion

At the 2×22 \times 22×2 level the reduction from 8 to 7 is just one. 7/8=0.8757/8 = 0.8757/8=0.875, about 12.5% faster. Interesting, but it doesn't really change the bigger picture of matrix multiplication.

Things shift once you apply it recursively.

Cut an n×nn \times nn×n matrix (assume nnn is power of 2) into four (n/2)×(n/2)(n/2) \times (n/2)(n/2)×(n/2) blocks.

A=(A11A12A21A22),B=(B11B12B21B22)A = \begin{pmatrix} A_{11} & A_{12} \\ A_{21} & A_{22} \end{pmatrix}, \quad B = \begin{pmatrix} B_{11} & B_{12} \\ B_{21} & B_{22} \end{pmatrix}A=(A11​A21​​A12​A22​​),B=(B11​B21​​B12​B22​​)

Each AijA_{ij}Aij​ and BijB_{ij}Bij​ is now a smaller matrix. Apply Strassen's seven-product formulas at the block level. For example, in M1=(A11+A22)(B11+B22)M1 = (A_{11} + A_{22})(B_{11} + B_{22})M1=(A11​+A22​)(B11​+B22​), the A11+A22A_{11} + A_{22}A11​+A22​ is a matrix addition between small matrices, and the product of that result with B11+B22B_{11} + B_{22}B11​+B22​ is a matrix multiplication between small matrices.

The key is that we can call Strassen recursively. Computing each MiM_iMi​ requires one matrix multiplication at half the size, and that smaller multiplication can also be done by Strassen, yielding seven even smaller multiplications. n2→n4→n8→⋯→1\frac{n}{2} \to \frac{n}{4} \to \frac{n}{8} \to \cdots \to 12n​→4n​→8n​→⋯→1

A recurrence falls out naturally. Let T(n)T(n)T(n) be the number of scalar multiplications needed to multiply n×nn \times nn×n matrices.

T(n)=7⋅T(n2)+O(n2)T(n) = 7 \cdot T\left(\frac{n}{2}\right) + O(n^2)T(n)=7⋅T(2n​)+O(n2)

Seven smaller matrix multiplications plus O(n2)O(n^2)O(n2) additions and subtractions. Asymptotically, the dominant term is the multiplication count.

The Master theorem solves this.

T(n)=O(nlog⁡27)≈O(n2.807)T(n) = O\left(n^{\log_2 7}\right) \approx O(n^{2.807})T(n)=O(nlog2​7)≈O(n2.807)

Compared to the naive O(n3)O(n^3)O(n3), the exponent drops from 3 to 2.807.

How big is this one-line change? At n=1000n = 1000n=1000, the naive count is roughly 10910^9109 multiplications, while Strassen drops it to roughly 10002.807≈4×1081000^{2.807} \approx 4 \times 10^810002.807≈4×108. Roughly 2.5x difference. As nnn grows the gap widens exponentially. A single removed multiplication, amplified through recursion.

Let's actually count the multiplications in code.

Python
1234567891011121314151617
     n         naive      strassen     ratio
------  ------------  ------------  --------
     2             8             7     1.14x
     4            64            49     1.31x
     8           512           343     1.49x
    16          4096          2401     1.71x
    32         32768         16807     1.95x
    64        262144        117649     2.23x
   128       2097152        823543     2.55x
   256      16777216       5764801     2.91x

Each time nnn doubles, the ratio grows by roughly 8/7≈1.1438/7 \approx 1.1438/7≈1.143. A small one-time reduction blooms across an exponential lattice. That's the weight of recursion.

One practical caveat. In real libraries, Strassen typically starts outperforming the naive algorithm only around n≈1000n \approx 1000n≈1000. For smaller nnn, the addition overhead and poor cache locality make the naive method faster. The table above counts multiplications only, not wall time. For wall time, hardware-side optimisations in BLAS like cache-tiling, SIMD, and parallelism matter more than the multiplication count.

Fifty years of chasing ω\omegaω

After Strassen's discovery, mathematicians start chasing the matrix multiplication complexity exponent ω\omegaω. That's the smallest exponent such that some algorithm runs in O(nω)O(n^\omega)O(nω). Fifty years of tightening the upper bound.

Year
Discoverer(s)
Bound ω≤\omega \leqω≤
Key technique
1969
Strassen
log⁡27≈2.807\log_2 7 \approx 2.807log2​7≈2.807
7 product for 2*2
1978
Pan
2.781
partial matrix multiplication
1979
Bini et al.
2.78
approximate matrix mult, border rank
1981
Schönhage
2.522
τ-theorem, partial decomposition
1986
Strassen
2.479
early laser method
1990
Coppersmith, Winograd
2.376
arithmetic progressions + laser(held 30 years)
2014
Le Gall
2.3728639
tighter laser method analysis
2020
Alman, Williams
2.3728596
refined laser method(SODA 2021)
2022
Duan, Wu, Zhou
2.371866
asymmetric hashing
2024
Williams, Xu, Xu, Zhou
2.371552
new laser method variant(SODA 2024)
2024
Alman, Duan, Williams, Xu, Xu, Zhou
2.371339
further asymmetry(SODA 2025)

Each improvement isn't a simple subtraction like 7 → 6. It's a refinement of asymptotic analysis. Accumulated algebra about how to construct tensor decompositions better in the large nnn limit. Coppersmith-Winograd's 2.376 holding as the best result for 24 years speaks to the depth of that analysis.

A particularly interesting moment is the silence between 1990 and 2014. The reading at the time was that Coppersmith-Winograd's laser method had given everything it could. Then Le Gall 2014 shaved off another 0.003 with a finer analysis of the same method. After 24 years of quiet, this was the discovery that more could be squeezed out within the method's limits. After that came 2020's refined laser by Alman and Williams, 2022's asymmetric hashing by Duan, Wu, and Zhou, 2024's new variant by Williams, Xu, Xu, and Zhou, and the same year's further asymmetry by Alman, Duan, and four others. Each shaved a little more, landing us at the current best of ω≤2.371339\omega \leq 2.371339ω≤2.371339.

The mathematical grain of this march is an accumulation of asymptotic function analysis pointing toward the true value of ω\omegaω. Not intuitive algebra, but precise tracking of limits. Fifty years of human work, poured into a 0.43 improvement in the exponent.

One note on the lower-bound side. The trivial lower bound is ω≥2\omega \geq 2ω≥2. Every entry has to be read at least once, so at least n2n^2n2 operations are required. Since Winograd's 1979 result, no tighter lower bound has been found. So we've been shaving the upper bound while the lower bound has stayed put for 50 years.

AlphaTensor 2022, the moment AI becomes a discoverer

In October 2022, DeepMind publishes a paper in Nature titled Discovering faster matrix multiplication algorithms with reinforcement learning.

The core idea is to reduce matrix multiplication to a tensor decomposition problem. An n×nn \times nn×n matrix multiplication is represented as a 3-dimensional tensor TnT_nTn​ with n3n^3n3 entries, and the rank of this tensor equals the minimum number of multiplications needed for that multiplication. Strassen's seven-product discovery is exactly equivalent to the fact that the rank of the 2×22 \times 22×2 matrix multiplication tensor T2T_2T2​ is at most 7.

Finding a way to make the tensor's rank lower than what's known is the same as making a faster algorithm. But finding tensor rank is NP-hard even for fixed nn. Brute-force search doesn't work because the solution space grows exponentially.

DeepMind's approach treats tensor decomposition as a game. They train an RL agent from the AlphaZero family, called AlphaTensor, to play it. Each turn, the agent picks one rank-1 component and subtracts it from the tensor. The game ends when the tensor reaches zero. Fewer moves mean a lower-rank decomposition.

Here's what AlphaTensor found, in the paper's Fig. 3 and Supplementary.

Case
Previous best
AlphaTensor
Note
4×4⋅4×44{\times}4 \cdot 4{\times}44×4⋅4×4 (standard)
49 (Strassen recursion)
47
record broken after 50 years
4×5⋅5×54{\times}5 \cdot 5{\times}54×5⋅5×5 (standard)
80
76
5×5⋅5×55{\times}5 \cdot 5{\times}55×5⋅5×5 (mod 2)
98
96
4×4⋅4×44{\times}4 \cdot 4{\times}44×4⋅4×4 (mod 2)
49
47
AlphaTensor also matched or beat the previous best in dozens of other cases.

Three nuances to mark here.

First, mod 2 arithmetic. Matrix multiplication over the boolean field F2\mathbb{F}_2F2​ is not the same as multiplication over the reals. In booleans, −1=1-1 = 1−1=1, so signs disappear. Over the reals, signs matter. So some of the paper's mod 2 results can't be ported straight into practical matrix multiplication libraries. The 4×44 \times 44×4 47-product formula works over standard arithmetic, but a few cases hold only over mod 2.

Next, AlphaTensor didn't crack the exponent ω\omegaω itself. It found better decompositions for fixed nnn(4×44 \times 44×4, 5×55 \times 55×5, and so on), but the asymptotic upper bound on ω\omegaω stays where humans left it in 2024, at 2.371339. What AlphaTensor found were concrete decompositions at small nnn, not improvements to the asymptotic estimate of ω\omegaω. Two kinds of discovery on different layers.

Finally, the discovered decompositions don't show algebraic patterns a human can read. Strassen's seven-product formula can be hand-traced for why it works, through cancellation we walked through earlier. The human-found 49-product Strassen-Smirnov is similar, with the pattern visible. AlphaTensor's 47-product formula has no such visible pattern. It's just what the RL agent found, and the intuition for why 47 is enough stays opaque. Verification still works. Plug in random mod 2 matrices and you get exactly ABABAB. But the algebra of discovery is buried inside the training curve.

Still, AI broke a 50-year human record. The 49-product decomposition for 4×44 \times 44×4 dates back to Strassen 1969, his 2×22 \times 22×2 trick applied recursively, with Smirnov refining it in 1986. For over fifty years, no human found a faster way. The RL agent pinned down a 47-product formula after a few days of training. The 47 formulas themselves are open in the paper's supplementary, so anyone can verify them over mod 2.

What we don't know

The true value of ω\omegaω stays unknown.

Lower bound: ω≥2\omega \geq 2ω≥2, the naive lower bound. Every entry has to be read at least once, so at least n2n^2n2 operations are required. Winograd formalised this in 1979, and no tighter lower bound has been found since.

Upper bound is ω≤2.371339\omega \leq 2.371339ω≤2.371339

Gap is about 0.371339.

We don't know what lives inside that gap. The question of how fast matrix multiplication can intrinsically be has stayed open for over 50 years.

The march of the upper bound may converge at some point, or it may keep dropping. If ω\omegaω turns out to be the true answer, the strong ω\omegaω conjecture, it means every matrix multiplication runs as fast as just reading the entries. A hard conclusion to accept, but 50 years of falling upper bounds point that way.

If ω>2\omega > 2ω>2, meaning a strict lower bound exists, that lower bound has to be proved. We don't even know what such a proof would look like. The existing tools of circuit complexity and communication complexity don't apply to this problem directly.

AlphaTensor pinned down one thing in particular. Even at small nnn, the decompositions we believed were best may not be optimal. The 49-product formula for 4×44 \times 44×4 was best for over fifty years, but it wasn't actually the best. We just hadn't found better. What we call the upper bound may, in fact, be lower than we know.

The path toward ω\omegaω's true value is no longer a human-only path. What 50 years of pursuit have shown is the shape of the path, not its endpoint. The endpoint is still somewhere we don't know.

Contents

  • Checking the algebra
  • The weight of recursion
  • Fifty years of chasing \omega
  • AlphaTensor 2022, the moment AI becomes a discoverer
  • What we don't know