Quantization in LLMs - Search Videos

Quantization in modern LLMs - Advanced Quantization Techniques for Large Language Models Video Tutorial | LinkedIn Learning, formerly Lynda.com

Quantization in modern LLMs - Advanced Quantization Technique…

Local LLMs on Consumer Hardware: GLM-4.7-Flash Performance | Hammad Armghan, PhD posted on the topic | LinkedIn

Local LLMs on Consumer Hardware: GLM-4.7-Flash Performance | Ham…

1 views1 month ago

MLX MiniMax 2.5 running LOCALLY on a single M3 Ultra 512GB! Writing a poem on LLMs at 6bit quantization! 🔥 Let's start some coding, context and distributed tests! Generation: 40.2 tokens-per-sec Peak memory: 186 GB Source: Ivan Fioravanti | Thanh Hoang

MLX MiniMax 2.5 running LOCALLY on a single M3 Ultra 512GB! Writin…

1.1K views4 weeks ago

FacebookThanh Hoang

What is Quantization? | IBM

What is Quantization? | IBM

LLMs can take gigabytes of memory to store, which limits what can be run on consumer hardware. But quantization can dramatically compress models, making a wider selection of models available to developers. You can often reduce model size by 4x or more while maintaining reasonable performance. In our new short course Quantization Fundamentals taught by Hugging Face's Younes Belkada and Marc Sun, you'll: - Learn how to quantize nearly any open source model - Use int8 and bfloat16 (Brain float 16)

LLMs can take gigabytes of memory to store, which limits what can be …

6.8K viewsApr 15, 2024

FacebookAndrew Ng

Hugging Face SafeTensors LLMs in… - Partner

Hugging Face SafeTensors LLMs in… - Partner

"Fine-tuning LLMs on AMD Strix Halo with Framework Desktop" | Donato Capitella posted on the topic | LinkedIn

"Fine-tuning LLMs on AMD Strix Halo with Framework Deskto…

Optimize LLMs for faster AI inference

351 views1 month ago

[LoRA] Unsloth Fine-Tuning: LoRA and QLoRA Guide. Efficient LLM fi…

389 views1 month ago

YouTubeAI Podcast Series. Byte Goose AI.

LLM Inference on a Budget: Speed vs. Cost! #llm #inference #optimiz…

YouTubeThe Code Architect

Run Giant AI Models on Your Laptop 🚀 (INT8 Explained)

6 views2 months ago

YouTubeForward Logic

🤯 Run LLMs on Your Laptop?! The Quantization Secret! #Shorts

YouTubeCodeTapasya

[IDSL Seminar'25] M-ANT: Efficient Low-bit Group Quantization for LL…

20 views3 months ago

Qwen3.5 Fine-Tuning Guide. Qwen3.5 Medium Size Model Run I…

1 views5 days ago

YouTubeAI Podcast Series. Byte Goose AI.

Why Your LLM Crashes Google Colab | VRAM, Quantization Explai…

208 views1 month ago

YouTubeAnalytics Vidhya

What Is Quantization | Quantization | TensorTeach

300 viewsNov 20, 2024

YouTubeTensorTeach

Understanding Symmetric Quantization | Quantization | Tens…

276 viewsNov 20, 2024

YouTubeTensorTeach

SmoothQuant

4.3K viewsOct 25, 2023

YouTubeMIT HAN Lab

LLM Distillation and Compression

558 viewsDec 17, 2024

YouTubeMLOps.community

Host a AI Server

453 viewsMar 27, 2024

YouTubeAI Arcade

What is LLM Quantization ?

3K views1 year ago

YouTubeNew Machina

LLMs Naming Convention Explained

1.8K viewsSep 15, 2023

YouTubeAI Readme

LLMs On The Edge

1.6K views9 months ago

YouTubeSemiconductor Engineering

Optimize Your AI - Quantization Explained

406.9K viewsDec 28, 2024

YouTubeMatt Williams

LLM Explained | What is LLM

399.7K viewsAug 22, 2023

YouTubecodebasics

What is LLM quantization?

25.6K viewsNov 6, 2023

YouTubeAirtrain AI

MR-GPTQ: Better FP4 Microscaling for LLMs

109 views5 months ago

YouTubeAI Research Roundup

Quantization in Deep Learning (LLMs)

11.5K viewsSep 22, 2023

YouTubeAI Bites

BitNet Distillation: 1.58‑bit LLMs from FP16

171 views4 months ago

YouTubeAI Research Roundup

AGI Dreams Podcast – October 01, 2025

2 views5 months ago

YouTubeRobert Lee

See more videos