A+ for ML with R: Lantz’s 4th Edition

TL;DR – This book is a great value. I’d pay $100 for it, even though I already have a well worn 3rd edition (and even a 1st edition that I started using with undergrads in 2015).

Want to learn how to write R code to develop and deploy machine learning models in R? If so, use caution: ANYONE can “turn the crank” and write code to build machine learning models, especially now that tools like Github Copilot are so easy to access and use. Companies like mine will increasingly be looking for data analysts and data scientists who can articulate the context of a problem and understand when methods and models are appropriate to apply. While hundreds of books focus on the mechanics of turning the crank, or the details of the packages used to build the models, Brett Lantz takes a first principles approach.

Several years ago, I adopted an early edition of Lantz’s book for a course I regularly taught for undergraduate juniors and seniors in STEM rather than deciding to write my own book. I never regretted that decision. Students routinely evaluated my choice of textbook highly, and said they appreciated the clear examples. This book is ideal to support an introductory two-course sequence for upper level undergrads or graduate students who need a solid foundation in machine learning with R. It is also ideal for entry level and mid-level data analysts and data scientists who want to build solid competencies. The book does not cover topics like tidymodels, deployment issues (especially to the cloud), model maintenance, or integrating R and Python through R Markdown and Quarto, BUT I think if it had, it would have taken away from the simple style and approach that makes this book so unique and powerful – I’m glad the table of contents covers what it does, and nothing more.

The 4th edition maintains the same standard for excellence as in previous editions. His prose is clear, his examples have no missing steps, and his discussions are straightforward and comprehensive. For each method, he lays out the background and rationale for using it, illustrates the inputs, process, and outputs with simple yet compelling examples, and explains the results in the context of how you might evaluate them. He also includes modernized deeper dives on topics like lift, model tuning, feature engineering, stacking, and outliers (what exactly ARE outliers in a group of images?) Chapters 11-15 are largely new, and make up the bulk of the additional 300 pages, making this a 700+ page reference that will surely find its permanent home in a prominent position on your desk.