Textbook in PDF format
Enables readers to develop foundational and advanced vectorization skills for scalable Data Science and Machine Learning and address real-world problems. Offering insights across various domains such as Computer Vision and natural language processing (NLP), Vectorization covers the fundamental topics of vectorization including array and tensor operations, data wrangling, and batch processing. This book illustrates how the principles discussed lead to successful outcomes in Machine Learning projects, serving as concrete examples for the theories explained, with each chapter including practical case studies and code implementations using NumPy, TensorFlow, and PyTorch. Each chapter has one or two types of contents: either an introduction/comparison of the specific operations in the numerical libraries (illustrated as tables) and/or case study examples that apply the concepts introduced to solve a practical problem (as code blocks and figures). Readers can approach the knowledge presented by reading the text description, running the code blocks, or examining the figures. Every chapter is punctuated with numerous case studies to ensure the concepts are grounded in practical scenarios. Accompanying these are detailed code implementations with line-by-line explanations. Most of the implementations will use only the knowledge acquired within the chapter. Certainly, as the reader learns along the way, one may come up with better implementations by leveraging more advanced concepts. The book covers vectorization techniques using NumPy, Tensorflow, and PyTorch (and for certain operations, Pandas as well). When introducing the operations, I will include tables that enumerate equivalent operations in these three libraries, in the hope that readers who are familiar with one of the libraries can also learn to translate algorithms and models implemented in another library. Most case study code examples will be implemented in one of the libraries (often NumPy), but readers are encouraged to implement the same functionality using alternative libraries by referencing equivalent operations. Written by the developer of the first recommendation system on the Peacock streaming platform, Vectorization explores sample topics including • Basic tensor operations and the art of tensor indexing, elucidating how to access individual or subsets of tensor elements Vectorization in tensor multiplications and common linear algebraic routines, which form the backbone of many machine learning algorithms • Masking and padding, concepts which come into play when handling data of non-uniform sizes, and string processing techniques for natural language processing (NLP) • Sparse matrices and their data structures and integral operations, and ragged or jagged tensors and the nuances of processing them From the essentials of vectorization to the subtleties of advanced data structures, Vectorization is an ideal one-stop resource for both beginners and experienced practitioners, including researchers, data scientists, statisticians, and other professionals in industry, who seek academic success and career advancement