Textbook in PDF format
Elevate your AI system performance capabilities with this definitive guide to unlocking peak efficiency across every layer of your AI infrastructure. In today's era of ever-growing generative models, AI Systems Performance Engineering equips professionals with actionable strategies to co-optimize hardware, software, and algorithms for high-performance and cost-effective AI systems. Authored by Chris Fregly, a performance-focused engineering and product leader, this comprehensive resource transforms complex systems into streamlined, high-impact AI solutions. Preface Introduction and AI System Overview (available) AI System Hardware Overview (available) OS, Docker, and Kubernetes Tuning for GPU-based Environments (available) Distributed Communication and I/O Optimizations (available) CUDA Programming, Profiling, and Debugging (unavailable) Optimizing CUDA Performance (unavailable) PyTorch Profiling and Tuning (unavailable) Distributed Training at Ultra‑Scale (unavailable) Multi-Node Inference Optimizations (unavailable) AI System Optimization Case Studies (available) Future Trends in Ultra-Scale AI Systems Performance Engineering (available) AI Systems Performance Checklist (175+ Items) (available)