Neural Learning of Fast Matrix Multiplication Algorithms: A StrassenNet Approach

Abstract

Fast matrix multiplication can be described as searching for low-rank decompositions of the matrix--multiplication tensor. We design a neural architecture, StrassenNet, which reproduces the Strassen algorithm for 2× 2 multiplication. Across many independent runs the network always converges to a rank-7 tensor, thus numerically recovering Strassen's optimal algorithm. We then train the same architecture on 3× 3 multiplication with rank r∈\19,…,23\. Our experiments reveal a clear numerical threshold: models with r=23 attain significantly lower validation error than those with r 22, suggesting that r=23 could actually be the smallest effective rank of the matrix multiplication tensor 3× 3. We also sketch an extension of the method to border-rank decompositions via an --parametrisation and report preliminary results consistent with the known bounds for the border rank of the 3× 3 matrix--multiplication tensor.

0

Discussion (0)

Sign in to join the discussion.

Loading comments…