MapReduce Matrix Multiplication in Java

Can Large Language Models Predict Parallel Code Performance?

We took this version of HeCBench and are modifying it to build the CUDA and OMP codes to gather their roofline performance data. So far we have a large portion of the CUDA and OMP codes building ...

GitHub

A curated list of resources for learning and exploring Triton, OpenAI's programming language for writing efficient GPU code.

This project is a step-by-step learning journey where we implement various types of Triton kernels—from the simplest examples to more advanced applications—while exploring GPU programming with Triton.

IEEE

Multiplication of Sparse Matrices and their Transpose Using Compressed Sparse Diagonals

Abstract: Matrix-matrix multiplication is one of the most important kernel in linear algebra operations with a multitude of applications in scientific and engineering computing. Sparse matrix ...

IEEE

MH-SpGEMM: Efficient Sparse General Matrix-Matrix Multiplication on Modern GPUs via Masking and Hashing Cooperative Optimization

Abstract: Sparse General Matrix-Matrix Multiplication (SpGEMM) is a core operation in high-performance computing applications such as algebraic multigrid solvers, machine learning, and graph ...

ece.ucsb.edu

Bhattacharya – HW for Rapidly Solving High-order Optimization Problems

From the UCSB The Current article "Innovative Hardware for Rapidly Solving High-order Optimization Problems" The rise of AI, graphic processing, combinatorial optimization, and other data-intensive ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results