Abstract: Model compression techniques such as pruning and quantization have been proposed to address the high computational and memory demands of deep neural networks (DNNs). However, determining an ...