We took this version of HeCBench and are modifying it to build the CUDA and OMP codes to gather their roofline performance data. So far we have a large portion of the CUDA and OMP codes building ...
This project is a step-by-step learning journey where we implement various types of Triton kernels—from the simplest examples to more advanced applications—while exploring GPU programming with Triton.
Abstract: Matrix-matrix multiplication is one of the most important kernel in linear algebra operations with a multitude of applications in scientific and engineering computing. Sparse matrix ...
Abstract: Sparse General Matrix-Matrix Multiplication (SpGEMM) is a core operation in high-performance computing applications such as algebraic multigrid solvers, machine learning, and graph ...
From the UCSB The Current article "Innovative Hardware for Rapidly Solving High-order Optimization Problems" The rise of AI, graphic processing, combinatorial optimization, and other data-intensive ...