Sparse matrix-matrix multiplication (SpMM) is a crucial kernel in various applications, including sparse deep neural networks [1]–[6], graph analytics [7], triangle counting [8], and linear algebra ...
Abstract: Sparse-sparse matrix multiplication (SpGEMM) is a well-studied problem on CPUs, GPUs, accelerators (e.g. FPGAs), and distributed systems. The main computational bottleneck in SpGEMM is the ...
Since our sparse attention is implemented by FlexAttention, we recommend conducting a warm-up inference first, as subsequent inferences will perform better in terms of speed. To better demonstrate the ...