Abstract: We propose COSMA: a parallel matrix-matrix multiplication algorithm that is near communication-optimal for all combinations of matrix dimensions, processor counts, and memory sizes. The key ...
This repository contains the artifact for the SC '25 paper submission "KAMI: Communication-Avoiding General Matrix Multiplication within a Single GPU." The NVIDIA GH200 is installed with Ubuntu 22.04 ...
Orthopedic surgeons are living in an era of unprecedented technological advancement. Robotic-assisted surgery, AI-driven preoperative planning and patient-specific instrumentation have transformed the ...
Quantum-inspired adaptive tiling for high-performance matrix multiplication. Uses WKB tunneling physics with the golden ratio to derive optimal tile sizes from real-time CPU state. 15%+ gains on ...