Applied Statistics

The Best Book Ever on Machine Learning (and Intelligent Systems) in R

lantz-ml-in-rDear Brett (Lantz),

In short: your book, Machine Learning with R, is the book I’ve been dreaming about for years. Everyone who applies machine learning techniques for their work, teaches applied machine learning at a university, or just loves R and wants to know more about these super cool algorithms should buy and use your book.

I’ve been teaching a course called “Intelligent Systems” (ISAT/CS 344 at JMU) for the past few years. I inherited a syllabus and course description from professors who had taught the course from the mid-1990’s until 2009, so I started out following their lead and broadly covering expert/knowledge-based systems, simple neural networks for regression, and some elements of robotics. We used a commercial package to build the expert systems (rather than a declarative language like Prolog), which was fine, but we also used a commercial package for the neural networks. I was unsatisfied for two reasons: first, I knew that far more “stuff” was going on in the world of intelligent systems which we weren’t sharing with our students, and second, I knew there were tons of free packages in the R Statistical Software that could perform the same tasks… and more. I started a yearlong process of soul-searching and creating new materials… determined to bring R to the classroom, along with neural networks for both classification and regression, classification using k-nearest neighbors and Naive Bayes approaches, clustering with k-means, and some text mining and analysis to show students what you could do with unstructured data.

I also wanted to compare and contrast neural network regression with simple linear regression, classification algorithms in general with logistic regression, and share how to evaluate and improve model performance using metrics like precision, recall, and F1. (I mean, who cares about developing an intelligent software system if you can’t evaluate and continually improve its performance?) In addition, I’ve dreamed about adding a module on decision trees, in particular focusing on the C5.0 algorithm. But I haven’t found the time to explore or create new course materials on this topic. So I knew it would be even harder to compile all of my course materials into a book for my students to reference.

But you, in the meantime, have saved my life. I’ve explored tons of books on machine learning and intelligent systems that focus more on the practical applications of the techniques rather than the theory… and I have not found one that meets my standards, until now. In a friendly and conversational manner (that’s not overfriendly, condescending, or flippant) you have managed to cover pretty much all of the topics I want to share in my intelligent systems class — in a way that I’m comfortable with.

Chapters 1 (Introduction to Machine Learning) and 2 (Managing and Understanding Data) provide a great, simplified introduction to what machine learning is all about and highlights the data structures and R commands that might be the most useful for these purposes. Chapters 3 and 4 cover classification… first with k-nearest neighbors, then with Naive Bayes. Chapter 5 covers decision trees and C5.0. Chapter 6 covers regression in general, but with applications to decision trees (yeah!) In Chapter 7 (Black Box Methods – Neural Networks and Support Vector Machines) there’s a great example based on Optical Character Recognition (which will pair nicely with the lab exercise I already use). Chapter 8 covers Apriori, Chapter 9 introduces clustering with k-means, and Chapters 10 and 11 specifically deal with evaluating and improving model performance.

As a cherry on top of the cake that is this book, Chapter 12 provides an overview of most-used ways to acquire data (e.g. using RCurl, XML, and JSON) and even introduces parallel computing.

I am eternally grateful to you for writing the book that’s been in my head in a way I (think I!) would have written it. It’s not PERFECT (I would have spent more time on concepts like overfitting, and maybe given examples… and maybe some prose on the Turing Test and Reverse Turing Test) — but I can easily use your book as a required text and then provide supplemental materials on the side.

Thank you Brett!

Sincerely and with a world of gratitude,

Nicole

2 replies »

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s