Abstract: The Transformer architecture, despite its scaling law, faces expensive computational cost challenges as the number of parameters increases. Quantization methods like Ternary-BERT and BitNet ...
Abstract: We propose an efficient quantum subroutine for matrix multiplication that computes a state vector encoding the entries of the product of two matrices in superposition. The subroutine ...