Category Archives: Data Science

Imperfect Action is Better Than Perfect Inaction: What Harry Truman Can Teach Us About Loss Functions (with an intro to ggplot)

One of the heuristics we use at Intelex to guide decision making is former US President Truman’s advice that “imperfect action is better than perfect inaction.” What it means is — don’t wait too long to take action, because you don’t want to miss opportunities. Good advice, right?

When I share this with colleagues, I often hear a response like: “that’s dangerous!” To which my answer is “well sure, sometimes, but it can be really valuable depending on how you apply it!” The trick is: knowing how and when.

Here’s how it can be dangerous. For example, statistical process control (SPC) exists to keep us from tampering with processes — from taking imperfect action based on random variation, which will not only get us nowhere, but can exacerbate the problem we were trying to solve. The secret is to apply Truman’s heuristic based on an understanding of exactly how imperfect is OK with your organization, based on your risk appetite. And this is where loss functions can help.

Along the way, we’ll demonstrate how to do a few important things related to plotting with the ggplot package in R, gradually adding in new elements to the plot so you can see how it’s layered, including:

  • Plot a function based on its equation
  • Add text annotations to specific locations on a ggplot
  • Draw horizontal and vertical lines on a ggplot
  • Draw arrows on a ggplot
  • Add extra dots to a ggplot
  • Eliminate axis text and axis tick marks

What is a Loss Function?

A loss function quantifies how unhappy you’ll be based on the accuracy or effectiveness of a prediction or decision. In the simplest case, you control one variable (x) which leads to some cost or loss (y). For the case we’ll examine in this post, the variables are:

  • How much time and effort you put in to scoping and characterizing the problem (x); we assume that time+effort invested leads to real understanding
  • How much it will cost you (y); can be expressed in terms of direct costs (e.g. capex + opex) as well as opportunity costs or intangible costs (e.g. damage to reputation)

Here is an example of what this might look like, if you have a situation where overestimating (putting in too much x) OR underestimating (putting in too little x) are both equally bad. In this case, x=10 is the best (least costly) decision or prediction:

plot of a typical squared loss function
# describe the equation we want to plot
parabola <- function(x) ((x-10)^2)+10  

# initialize ggplot with a dummy dataset
library(ggplot)
p <- ggplot(data = data.frame(x=0), mapping = aes(x=x)) 

p + stat_function(fun=parabola) + xlim(-2,23) + ylim(-2,100) +
     xlab("x = the variable you can control") + 
     ylab("y = cost of loss ($$)")

In regression (and other techniques where you’re trying to build a model to predict a quantitative dependent variable), mean square error is a squared loss function that helps you quantify error. It captures two facts: the farther away you are from the correct answer the worse the error is — and both overestimating and underestimating is bad (which is why you square the values). Across this and related techniques, the loss function captures these characteristics:

From http://www.cs.cornell.edu/courses/cs4780/2015fa/web/lecturenotes/lecturenote10.html

Not all loss functions have that general shape. For classification, for example, the 0-1 loss function tells the story that if you get a classification wrong (x < 0) you incur all the penalty or loss (y=1), whereas if you get it right (x > 0) there is no penalty or loss (y=0):

# set up data frame of red points
d.step <- data.frame(x=c(-3,0,0,3), y=c(1,1,0,0))

# note that the loss function really extends to x=-Inf and x=+Inf
ggplot(d.step) + geom_step(mapping=aes(x=x, y=y), direction="hv") +
     geom_point(mapping=aes(x=x, y=y), color="red") + 
     xlab("y* f(x)") + ylab("Loss (Cost)") +  
     ggtitle("0-1 Loss Function for Classification")

Use the Loss Function to Make Strategic Decisions

So let’s get back to Truman’s advice. Ideally, we want to choose the x (the amount of time and effort to invest into project planning) that results in the lowest possible cost or loss. That’s the green point at the nadir of the parabola:

p + stat_function(fun=parabola) + xlim(-2,23) + ylim(-2,100) + 
     xlab("Time Spent and Information Gained (e.g. person-weeks)") + ylab("$$ COST $$") +
     annotate(geom="text", x=10, y=5, label="Some Effort, Lowest Cost!!", color="darkgreen") +
     geom_point(aes(x=10, y=10), colour="darkgreen")

Costs get higher as we move up the x-axis:

p + stat_function(fun=parabola) + xlim(-2,23) + ylim(-2,100) + 
     xlab("Time Spent and Information Gained (e.g. person-weeks)") + ylab("$$ COST $$") +
     annotate(geom="text", x=10, y=5, label="Some Effort, Lowest Cost!!", color="darkgreen") +
     geom_point(aes(x=10, y=10), colour="darkgreen") +
     annotate(geom="text", x=0, y=100, label="$$$$$", color="green") +
     annotate(geom="text", x=0, y=75, label="$$$$", color="green") +
     annotate(geom="text", x=0, y=50, label="$$$", color="green") +
     annotate(geom="text", x=0, y=25, label="$$", color="green") +
     annotate(geom="text", x=0, y=0, label="$ 0", color="green")

And time+effort grows as we move along the x-axis (we might spend minutes on a problem at the left of the plot, or weeks to years by the time we get to the right hand side):

p + stat_function(fun=parabola) + xlim(-2,23) + ylim(-2,100) + 
     xlab("Time Spent and Information Gained (e.g. person-weeks)") + ylab("$$ COST $$") +
     annotate(geom="text", x=10, y=5, label="Some Effort, Lowest Cost!!", color="darkgreen") +
     geom_point(aes(x=10, y=10), colour="darkgreen") +
     annotate(geom="text", x=0, y=100, label="$$$$$", color="green") +
     annotate(geom="text", x=0, y=75, label="$$$$", color="green") +
     annotate(geom="text", x=0, y=50, label="$$$", color="green") +
     annotate(geom="text", x=0, y=25, label="$$", color="green") +
     annotate(geom="text", x=0, y=0, label="$ 0", color="green") +
     annotate(geom="text", x=2, y=0, label="minutes\nof effort", size=3) +
     annotate(geom="text", x=20, y=0, label="months\nof effort", size=3)

Planning too Little = Planning too Much = Costly

What this means is — if we don’t plan, or we plan just a little bit, we incur high costs. We might make the wrong decision! Or miss critical opportunities! But if we plan too much — we’re going to spend too much time, money, and/or effort compared to the benefit of the solution we provide.


p + stat_function(fun=parabola) + xlim(-2,23) + ylim(-2,100) + 
     xlab("Time Spent and Information Gained (e.g. person-weeks)") + ylab("$$ COST $$") +
     annotate(geom="text", x=10, y=5, label="Some Effort, Lowest Cost!!", color="darkgreen") +
     geom_point(aes(x=10, y=10), colour="darkgreen") +
     annotate(geom="text", x=0, y=100, label="$$$$$", color="green") +
     annotate(geom="text", x=0, y=75, label="$$$$", color="green") +
     annotate(geom="text", x=0, y=50, label="$$$", color="green") +
     annotate(geom="text", x=0, y=25, label="$$", color="green") +
     annotate(geom="text", x=0, y=0, label="$ 0", color="green") +
     annotate(geom="text", x=2, y=0, label="minutes\nof effort", size=3) +
     annotate(geom="text", x=20, y=0, label="months\nof effort", size=3) +
     annotate(geom="text",x=3, y=85, label="Little (or no) Planning\nHIGH COST", color="red") +
     annotate(geom="text", x=18, y=85, label="Paralysis by Planning\nHIGH COST", color="red") +
     geom_vline(xintercept=0, linetype="dotted") + geom_hline(yintercept=0, linetype="dotted")

The trick is to FIND THAT CRITICAL LEVEL OF TIME and EFFORT invested to gain information and understanding about your problem… and then if you’re going to err, make sure you err towards the left — if you’re going to make a mistake, make the mistake that costs less and takes less time to make:

arrow.x <- c(10, 10, 10, 10)
arrow.y <- c(35, 50, 65, 80)
arrow.x.end <- c(6, 6, 6, 6)
arrow.y.end <- arrow.y
d <- data.frame(arrow.x, arrow.y, arrow.x.end, arrow.y.end)

p + stat_function(fun=parabola) + xlim(-2,23) + ylim(-2,100) + 
     xlab("Time Spent and Information Gained (e.g. person-weeks)") + ylab("$$ COST $$") +
     annotate(geom="text", x=10, y=5, label="Some Effort, Lowest Cost!!", color="darkgreen") +
     geom_point(aes(x=10, y=10), colour="darkgreen") +
     annotate(geom="text", x=0, y=100, label="$$$$$", color="green") +
     annotate(geom="text", x=0, y=75, label="$$$$", color="green") +
     annotate(geom="text", x=0, y=50, label="$$$", color="green") +
     annotate(geom="text", x=0, y=25, label="$$", color="green") +
     annotate(geom="text", x=0, y=0, label="$ 0", color="green") +
     annotate(geom="text", x=2, y=0, label="minutes\nof effort", size=3) +
     annotate(geom="text", x=20, y=0, label="months\nof effort", size=3) +
     annotate(geom="text",x=3, y=85, label="Little (or no) Planning\nHIGH COST", color="red") +
     annotate(geom="text", x=18, y=85, label="Paralysis by Planning\nHIGH COST", color="red") +
     geom_vline(xintercept=0, linetype="dotted") + 
     geom_hline(yintercept=0, linetype="dotted") +
     geom_vline(xintercept=10) +
     geom_segment(data=d, mapping=aes(x=arrow.x, y=arrow.y, xend=arrow.x.end, yend=arrow.y.end),
     arrow=arrow(), color="blue", size=2) +
     annotate(geom="text", x=8, y=95, size=2.3, color="blue",
     label="we prefer to be\non this side of the\nloss function")

Moral of the Story

The moral of the story is… imperfect action can be expensive, but perfect action is ALWAYS expensive. Spend less to make mistakes and learn from them, if you can! This is one of the value drivers for agile methodologies… agile practices can help improve communication and coordination so that the loss function is minimized.

## FULL CODE FOR THE COMPLETELY ANNOTATED CHART ##
# If you change the equation for the parabola, annotations may shift and be in the wrong place.
parabola <- function(x) ((x-10)^2)+10

my.title <- expression(paste("Imperfect Action Can Be Expensive. But Perfect Action is ", italic("Always"), " Expensive."))

arrow.x <- c(10, 10, 10, 10)
arrow.y <- c(35, 50, 65, 80)
arrow.x.end <- c(6, 6, 6, 6)
arrow.y.end <- arrow.y
d <- data.frame(arrow.x, arrow.y, arrow.x.end, arrow.y.end)

p + stat_function(fun=parabola) + xlim(-2,23) + ylim(-2,100) + 
     xlab("Time Spent and Information Gained (e.g. person-weeks)") + ylab("$$ COST $$") +
     annotate(geom="text", x=10, y=5, label="Some Effort, Lowest Cost!!", color="darkgreen") +
     geom_point(aes(x=10, y=10), colour="darkgreen") +
     annotate(geom="text", x=0, y=100, label="$$$$$", color="green") +
     annotate(geom="text", x=0, y=75, label="$$$$", color="green") +
     annotate(geom="text", x=0, y=50, label="$$$", color="green") +
     annotate(geom="text", x=0, y=25, label="$$", color="green") +
     annotate(geom="text", x=0, y=0, label="$ 0", color="green") +
     annotate(geom="text", x=2, y=0, label="minutes\nof effort", size=3) +
     annotate(geom="text", x=20, y=0, label="months\nof effort", size=3) +
     annotate(geom="text",x=3, y=85, label="Little (or no) Planning\nHIGH COST", color="red") +
     annotate(geom="text", x=18, y=85, label="Paralysis by Planning\nHIGH COST", color="red") +
     geom_vline(xintercept=0, linetype="dotted") + 
     geom_hline(yintercept=0, linetype="dotted") +
     geom_vline(xintercept=10) +
     geom_segment(data=d, mapping=aes(x=arrow.x, y=arrow.y, xend=arrow.x.end, yend=arrow.y.end),
     arrow=arrow(), color="blue", size=2) +
     annotate(geom="text", x=8, y=95, size=2.3, color="blue",
     label="we prefer to be\non this side of the\nloss function") +
     ggtitle(my.title) +
     theme(axis.text.x=element_blank(), axis.ticks.x=element_blank(),
     axis.text.y=element_blank(), axis.ticks.y=element_blank()) 

Now sometimes you need to make this investment! (Think nuclear power plants, or constructing aircraft carriers or submarines.) Don’t get caught up in getting your planning investment perfectly optimized — but do be aware of the trade-offs, and go into the decision deliberately, based on the risk level (and regulatory nature) of your industry, and your company’s risk appetite.

Object of Type Closure is Not Subsettable

I started using R in 2004. I started using R religiously on the day of the annular solar eclipse in Madrid (October 3, 2005) after being inspired by David Hunter’s talk at ADASS.

It took me exactly 4,889 days to figure out what this vexing error means, even though trial and error helped me move through it most every time it happened! I’m so moved by what I learned (and embarrassed that it took so long to make sense), that I have to share it with you.

This error happens when you’re trying to treat a function like a list, vector, or data frame. To fix it, start treating the function like a function.

Let’s take something that’s very obviously a function. I picked the sampling distribution simulator from a 2015 blog post I wrote. Cut and paste it into your R console:

sdm.sim <- function(n,src.dist=NULL,param1=NULL,param2=NULL) {
   r <- 10000  # Number of replications/samples - DO NOT ADJUST
   # This produces a matrix of observations with  
   # n columns and r rows. Each row is one sample:
   my.samples <- switch(src.dist,
	"E" = matrix(rexp(n*r,param1),r),
	"N" = matrix(rnorm(n*r,param1,param2),r),
	"U" = matrix(runif(n*r,param1,param2),r),
	"P" = matrix(rpois(n*r,param1),r),
        "B" = matrix(rbinom(n*r,param1,param2),r),
	"G" = matrix(rgamma(n*r,param1,param2),r),
	"X" = matrix(rchisq(n*r,param1),r),
	"T" = matrix(rt(n*r,param1),r))
   all.sample.sums <- apply(my.samples,1,sum)
   all.sample.means <- apply(my.samples,1,mean)   
   all.sample.vars <- apply(my.samples,1,var) 
   par(mfrow=c(2,2))
   hist(my.samples[1,],col="gray",main="Distribution of One Sample")
   hist(all.sample.sums,col="gray",main="Sampling Distribution\nof
	the Sum")
   hist(all.sample.means,col="gray",main="Sampling Distribution\nof the Mean")
   hist(all.sample.vars,col="gray",main="Sampling Distribution\nof
	the Variance")
}

The right thing to do with this function is use it to simulate a bunch of distributions and plot them using base R. You write the function name, followed by parenthesis, followed by each of the four arguments the function needs to work. This will generate a normal distribution with mean of 20 and standard deviation of 3, along with three sampling distributions, using a sample size of 100 and 10000 replications:

sdm.sim(100, src.dist="N", param1=20, param2=3)

(You should get four plots, arranged in a 2×2 grid.)

But what if we tried to treat sdm.sim like a list, and call the 3rd element of it? Or what if we tried to treat it like a data frame, and we wanted to call one of the variables in the column of the data frame?

> sdm.sim[3]
Error in sdm.sim[3] : object of type 'closure' is not subsettable

> sdm.sim$values
Error in sdm.sim$values : object of type 'closure' is not subsettable

SURPRISE! Object of type closure is not subsettable. This happens because sdm.sim is a function, and its data type is (shockingly) something called “closure”:

> class(sdm.sim)
[1] "function"

> typeof(sdm.sim)
[1] "closure"

I had read this before on Stack Overflow a whole bunch of times, but it never really clicked until I saw it like this! And now that I had a sense for why the error was occurring, turns out it’s super easy to reproduce with functions in base R or functions in anyfamiliar packages you use:

> ggplot$a
Error in ggplot$a : object of type 'closure' is not subsettable

> table$a
Error in table$a : object of type 'closure' is not subsettable

> ggplot[1]
Error in ggplot[1] : object of type 'closure' is not subsettable

> table[1]
Error in table[1] : object of type 'closure' is not subsettable

As a result, if you’re pulling your hair out over this problem, check and see where in your rogue line of code you’re treating something like a non-function. Or maybe you picked a variable name that competes with a function name. Or maybe you got your operations out of order. In any case, change your notation so that your function is doing function things, and your code should start working.

But as Luke Smith pointed out, this is not true for functional sequences (which you can also write as functions). Functional sequences are those chains of commands you write when you’re in tidyverse mode, all strung together with %>% pipes:

Luke’s code that you can cut and paste (and try), with the top line that got cut off by Twitter, is:

fun1 <- . %>% 
    group_by(col1) %>% 
    mutate(n=n+h) %>% 
    filter(n==max(n)) %>% 
    ungroup() %>% 
    summarize(new_col=mean(n))

fun2 <- fun1[-c(2,5)] 

Even though Luke’s fun1 and fun2 are of type closure, they are subsettable because they contain a sequence of functions:

> typeof(fun1)
[1] "closure"

> typeof(fun2)
[1] "closure"

> fun1[1]
Functional sequence with the following components:

 1. group_by(., col1)

Use 'functions' to extract the individual functions. 

> fun1[[1]]
function (.) 
group_by(., col1)

Don’t feel bad! This problem has plagued all of us for many, many hours (me: years), and yet it still happens to us more often than we would like to admit. Awareness of this issue will not prevent you from attempting things that give you this error in the future. It’s such a popular error that there have been memes about it and sad valentines written about it:

SCROLL DOWN PAST STEPH’S TWEET TO SEE THE JOKE!!

(Also, if you’ve made it this far, FOLLOW THESE GOOD PEOPLE ON TWITTER: @stephdesilva @djnavarro @lksmth – They all share great information on data science in general, and R in particular. Many thanks too to all the #rstats crowd who shared in my glee last night and didn’t make me feel like an idiot for not figuring this out for ALMOST 14 YEARS. It seems so simple now.

Steph is also a Microsoft whiz who you should definitely hire if you need anything R+ Microsoft. Thanks to all of you!)

Quality 4.0: Reveal Hidden Insights with Data Sci & Machine Learning (Webinar)

Quality Digest

What’s Quality 4.0, why is it important, and how can you use it to gain competitive advantage? Did you know you can benefit from Quality 4.0 even if you’re not a manufacturing organization? That’s right. I’ll tell you more next week.

Sign up for my 50-minute webinar at 2pm ET on Tuesday, October 16, 2018 — hosted by Dirk Dusharme and Mike Richman at Quality Digest. This won’t be your traditional “futures” talk to let you know about all of the exciting technology on the horizon… I’ve actually been doing and teaching data science, and applying machine learning to practical problems in quality improvement, for over a decade.

Come to this webinar if:

  1. You have a LOT of data and you don’t know where to begin
  2. You’re kind of behind… you still use paper and Excel and you’re hoping you don’t miss the opportunities here
  3. You’re a data scientist and you want to find out about quality and process improvement
  4. You’re a quality professional and you want to find out more about data science
  5. You’re a quality engineer and you want some professional preparation for what’s on the horizon
  6. You want to be sure you get on our Quality 4.0 mailing list to receive valuable information assets for the next couple years to help you identify and capture opportunities

Register Here! See you on Tuesday. If you can’t make it, we’ll also be at the ASQ Quality 4.0 Summit in Dallas next month sharing more information about the convergence of quality and Big Data.

Quality 4.0: Let’s Get Digital

Want to find out what Quality 4.0 really is — and start realizing the benefits for your organization? If so, check out the October 2018 issue of ASQ’s Quality Progress, where my new article (“Let’s Get Digital“) does just that.

Quality 4.0 asks how we can leverage connected, intelligent, automated (C-I-A) technologies to increase efficiency, effectiveness, and satisfaction: “As connected, intelligent and automated systems are more widely adopted, we can once again expect a renaissance in quality tools and methods.” In addition, we’re working to bring this to the forefront of quality management and quality engineering practice at Intelex.

Quality 4.0 Evolution

The progression can be summarized through four themes. We’re in the “quality as discovery” stage today:

  • Quality as inspection: In the early days, quality assurance relied on inspecting bad quality out of items produced. Walter A. Shewhart’s methods for statistical process control (SPC) helped operators determine whether variation was due to random or special causes.
  • Quality as design: Next, more holistic methods emerged for designing quality in to processes. The goal is to prevent quality problems before they occur. These movements were inspired by W. Edwards Deming’s push to cease dependence on inspection, and Juran’s Quality by Design.
  • Quality as empowerment: By the 1990’s, organizations adopting TQM and Six Sigma advocated a holistic approach to quality. Quality is everyone’s responsibility and empowered individuals contribute to continuous improvement.
  • Quality as discovery: Because of emerging technologies, we’re at a new frontier. In an adaptive, intelligent environment, quality depends on how:
    • quickly we can discover and aggregate new data sources,
    • effectively we can discover root causes and
    • how well we can discover new insights about ourselves, our products and our organizations.”

Read more at http://asq.org/quality-progress/2018/10/basic-quality/lets-get-digital.html  or download the PDF (http://asq.org/quality-progress/2018/10/basic-quality/lets-get-digital.pdf)

A Simple Intro to Q-Learning in R: Floor Plan Navigation

This example is drawn from “A Painless Q-Learning Tutorial” at http://mnemstudio.org/path-finding-q-learning-tutorial.htm which explains how to manually calculate iterations using the updating equation for Q-Learning, based on the Bellman Equation (image from https://www.is.uni-freiburg.de/ressourcen/business-analytics/13_reinforcementlearning.pdf):

The example explores path-finding through a house:

The question to be answered here is: What’s the best way to get from Room 2 to Room 5 (outside)? Notice that by answering this question using reinforcement learning, we will also know how to find optimal routes from any room to outside. And if we run the iterative algorithm again for a new target state, we can find out the optimal route from any room to that new target state.

Since Q-Learning is model-free, we don’t need to know how likely it is that our agent will move between any room and any other room (the transition probabilities). If you had observed the behavior in this system over time, you might be able to find that information, but it many cases it just isn’t available. So the key for this problem is to construct a Rewards Matrix that explains the benefit (or penalty!) of attempting to go from one state (room) to another.

Assigning the rewards is somewhat arbitrary, but you should give a large positive value to your target state and negative values to states that are impossible or highly undesirable. Here’s the guideline we’ll use for this problem:

  • -1 if “you can’t get there from here”
  • 0 if the destination is not the target state
  • 100 if the destination is the target state

We’ll start constructing our rewards matrix by listing the states we’ll come FROM down the rows, and the states we’ll go TO in the columns. First, let’s fill the diagonal with -1 rewards, because we don’t want our agent to stay in the same place (that is, move from Room 1 to Room 1, or from Room 2 to Room 2, and so forth). The final one gets a 100 because if we’re already in Room 5, we want to stay there.

Next, let’s move across the first row. Starting in Room 0, we only have one choice: go to Room 4. All other possibilities are blocked (-1):

Now let’s fill in the row labeled 1. From Room 1, you have two choices: go to Room 3 (which is not great but permissible, so give it a 0) or go to Room 5 (the target, worth 100)!

Continue moving row by row, determining if you can’t get there from here (-1), you can but it’s not the target (0), or it’s the target(100). You’ll end up with a final rewards matrix that looks like this:

Now create this rewards matrix in R:

[code language=”r”]
R <- matrix(c(-1, -1, -1, -1, 0, -1,
-1, -1, -1, 0, -1, 100,
-1, -1, -1, 0, -1, -1,
-1, 0, 0, -1, 0, -1,
0, -1, -1, 0, -1, 100,
-1, 0, -1, -1, 0, 100), nrow=6, ncol=6, byrow=TRUE)
[/code]

And run the code. Notice that we’re calling the target state 6 instead of 5 because even though we have a room labeled with a zero, our matrix starts with a 1s so we have to adjust:

[code language=”r”]
source("https://raw.githubusercontent.com/NicoleRadziwill/R-Functions/master/qlearn.R")

results <- q.learn(R,10000,alpha=0.1,gamma=0.8,tgt.state=6)
> round(results)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0 80 0
[2,] 0 0 0 64 0 100
[3,] 0 0 0 64 0 0
[4,] 0 80 51 0 80 0
[5,] 64 0 0 64 0 100
[6,] 0 80 0 0 80 100
[/code]

You can read this table of average value to obtain policies. A policy is a “path” through the states of the system:

  • Start at Room 0 (first row, labeled 1): Choose Room 4 (80), then from Room 4 choose Room 5 (100)
  • Start at Room 1: Choose Room 5 (100)
  • Start at Room 2: Choose Room 3 (64), from Room 3 choose Room 1 or Room 4 (80); from 1 or 4 choose 5 (100)
  • Start at Room 3: Choose Room 1 or Room 4 (80), then Room 5 (100)
  • Start at Room 4: Choose Room 5 (100)
  • Start at Room 5: Stay at Room 5 (100)

To answer the original problem, we would take route 2-3-1-5 or 2-3-4-5 to get out the quickest if we started in Room 2. This is easy to see with a simple map, but is much more complicated when the maps get bigger.

Reinforcement Learning: Q-Learning with the Hopping Robot

Overview: Reinforcement learning uses “reward” signals to determine how to navigate through a system in the most valuable way. (I’m particularly interested in the variant of reinforcement learning called “Q-Learning” because the goal is to create a “Quality Matrix” that can help you make the best sequence of decisions!) I found a toy robot navigation problem on the web that was solved using custom R code for reinforcement learning, and I wanted to reproduce the solution in different ways than the original author did. This post describes different ways that I solved the problem described at http://bayesianthink.blogspot.com/2014/05/hopping-robots-and-reinforcement.html

The Problem: Our agent, the robot, is placed at random on a board of wood. There’s a hole at s1, a sticky patch at s4, and the robot is trying to make appropriate decisions to navigate to s7 (the target). The image comes from the blog post linked above.

To solve a problem like this, you can use MODEL-BASED approaches if you know how likely it is that the robot will move from one state to another (that is, the transition probabilities for each action) or MODEL-FREE approaches (you don’t know how likely it is that the robot will move from state to state, but you can figure out a reward structure).

  • Markov Decision Process (MDP) – If you know the states, actions, rewards, and transition probabilities (which are probably different for each action), you can determine the optimal policy or “path” through the system, given different starting states. (If transition probabilities have nothing to do with decisions that an agent makes, your MDP reduces to a Markov Chain.)
  • Reinforcement Learning (RL) – If you know the states, actions, and rewards (but not the transition probabilities), you can still take an unsupervised approach. Just randomly create lots of hops through your system, and use them to update a matrix that describes the average value of each hop within the context of the system.

Solving a RL problem involves finding the optimal value functions (e.g. the Q matrix in Attempt 1) or the optimal policy (the State-Action matrix in Attempt 2). Although there are many techniques for reinforcement learning, we will use Q-learning because we don’t know the transition probabilities for each action. (If we did, we’d model it as a Markov Decision Process and use the MDPtoolbox package instead.) Q-Learning relies on traversing the system in many ways to update a matrix of average expected rewards from each state transition. This equation that it uses is from https://www.is.uni-freiburg.de/ressourcen/business-analytics/13_reinforcementlearning.pdf:

For this to work, all states have to be visited a sufficient number of times, and all state-action pairs have to be included in your experience sample. So keep this in mind when you’re trying to figure out how many iterations you need.

Attempt 1: Quick Q-Learning with qlearn.R

  • Input: A rewards matrix R. (That’s all you need! Your states are encoded in the matrix.)
  • Output: A Q matrix from which you can extract optimal policies (or paths) to help you navigate the environment.
  • Pros: Quick and very easy. Cons: Does not let you set epsilon (% of random actions), so all episodes are determined randomly and it may take longer to find a solution. Can take a long time to converge.

Set up the rewards matrix so it is a square matrix with all the states down the rows, starting with the first and all the states along the columns, starting with the first:

[code language=”r”]
hopper.rewards <- c(-10, 0.01, 0.01, -1, -1, -1, -1,
-10, -1, 0.1, -3, -1, -1, -1,
-1, 0.01, -1, -3, 0.01, -1, -1,
-1, -1, 0.01, -1, 0.01, 0.01, -1,
-1, -1, -1, -3, -1, 0.01, 100,
-1, -1, -1, -1, 0.01, -1, 100,
-1, -1, -1, -1, -1, 0.01, 100)

HOP <- matrix(hopper.rewards, nrow=7, ncol=7, byrow=TRUE)
> HOP
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] -10 0.01 0.01 -1 -1.00 -1.00 -1
[2,] -10 -1.00 0.10 -3 -1.00 -1.00 -1
[3,] -1 0.01 -1.00 -3 0.01 -1.00 -1
[4,] -1 -1.00 0.01 -1 0.01 0.01 -1
[5,] -1 -1.00 -1.00 -3 -1.00 0.01 100
[6,] -1 -1.00 -1.00 -1 0.01 -1.00 100
[7,] -1 -1.00 -1.00 -1 -1.00 0.01 100
[/code]

Here’s how you read this: the rows represent where you’ve come FROM, and the columns represent where you’re going TO. Each element 1 through 7 corresponds directly to S1 through S7 in the cartoon above. Each cell contains a reward (or penalty, if the value is negative) if we arrive in that state.

The S1 state is bad for the robot… there’s a hole in that piece of wood, so we’d really like to keep it away from that state. Location [1,1] on the matrix tells us what reward (or penalty) we’ll receive if we start at S1 and stay at S1: -10 (that’s bad). Similarly, location [2,1] on the matrix tells us that if we start at S2 and move left to S1, that’s also bad and we should receive a penalty of -10. The S4 state is also undesirable – there’s a sticky patch there, so we’d like to keep the robot away from it. Location [3,4] on the matrix represents the action of going from S3 to S4 by moving right, which will put us on the sticky patch

Now load the qlearn command into your R session:

[code language=”r”]
qlearn <- function(R, N, alpha, gamma, tgt.state) {
# Adapted from https://stackoverflow.com/questions/39353580/how-to-implement-q-learning-in-r
Q <- matrix(rep(0,length(R)), nrow=nrow(R))
for (i in 1:N) {
cs <- sample(1:nrow(R), 1)
while (1) {
next.states <- which(R[cs,] > -1) # Get feasible actions for cur state
if (length(next.states)==1) # There may only be one possibility
ns <- next.states
else
ns <- sample(next.states,1) # Or you may have to pick from a few
if (ns > nrow(R)) { ns <- cs }
# NOW UPDATE THE Q-MATRIX
Q[cs,ns] <- Q[cs,ns] + alpha*(R[cs,ns] + gamma*max(Q[ns, which(R[ns,] > -1)]) – Q[cs,ns])
if (ns == tgt.state) break
cs <- ns
}
}
return(round(100*Q/max(Q)))
}
[/code]

Run qlearn with the HOP rewards matrix, a learning rate of 0.1, a discount rate of 0.8, and a target state of S7 (the location to the far right of the wooden board). I did 10,000 episodes (where in each one, the robot dropped randomly onto the wooden board and has to get to S7):

[code language=”r”]
r.hop <- qlearn(HOP,10000,alpha=0.1,gamma=0.8,tgt.state=7)
> r.hop
[,1] [,2] [,3] [,4] [,5] [,6] [,7]
[1,] 0 51 64 0 0 0 0
[2,] 0 0 64 0 0 0 0
[3,] 0 51 0 0 80 0 0
[4,] 0 0 64 0 80 80 0
[5,] 0 0 0 0 0 80 100
[6,] 0 0 0 0 80 0 100
[7,] 0 0 0 0 0 80 100
[/code]

The Q-Matrix that is presented encodes the best-value solutions from each state (the “policy”). Here’s how you read it:

  • If you’re at s1 (first row), hop to s3 (biggest value in first row), then hop to s5 (go to row 3 and find biggest value), then hop to s7 (go to row 5 and find biggest value)
  • If you’re at s2, go right to s3, then hop to s5, then hop to s7
  • If you’re at s3, hop to s5, then hop to s7
  • If you’re at s4, go right to s5 OR hop to s6, then go right to s7
  • If you’re at s5, hop to s7
  • If you’re at s6, go right to s7
  • If you’re at s7, stay there (when you’re in the target state, the value function will not be able to pick out a “best action” because the best action is to do nothing)

Alternatively, the policy can be expressed as the best action from each of the 7 states: HOP, RIGHT, HOP, RIGHT, HOP, RIGHT, (STAY PUT)

Attempt 2: Use ReinforcementLearning Package

I also used the ReinforcementLearning package by Nicholas Proellochs (6/19/2017) described in https://cran.r-project.org/web/packages/ReinforcementLearning/ReinforcementLearning.pdf.

  • Input: 1) a definition of the environment, 2) a list of states, 3) a list of actions, and 4) control parameters alpha (the learning rate; usually 0.1), gamma (the discount rate which describes how important future rewards are; often 0.9 indicating that 90% of the next reward will be taken into account), and epsilon (the probability that you’ll try a random action; often 0.1)
  • Output: A State-Action Value matrix, which attaches a number to how good it is to be in a particular state and take an action. You can use it to determine the highest value action from each state. (It contains the same information as the Q-matrix from Attempt 1, but you don’t have to infer the action from the destination it brings you to.)
  • Pros: Relatively straightforward. Allows you to specify epsilon, which controls the proportion of random actions you’ll explore as you create episodes and explore your environment. Cons: Requires manual setup of all state transitions and associated rewards.

First, I created an “environment” that describes 1) how the states will change when actions are taken, and 2) what rewards will be accrued when that happens. I assigned a reward of -1 to all actions that are not special, e.g. landing on S1, landing on S4, or landing on S7. To be perfectly consistent with Attempt 1, I could have used 0.01 instead of -1, but the results will be similar. The values you choose for rewards are sort of arbitrary, but you do need to make sure there’s a comparatively large positive reward at your target state and “negative rewards” for states you want to avoid or are physically impossible.

[code language=”r”]
my.env <- function(state,action) {
next_state <- state
if (state == state("s1") && action == "right") { next_state <- state("s2") }
if (state == state("s1") && action == "hop") { next_state <- state("s3") }

if (state == state("s2") && action == "left") {
next_state <- state("s1"); reward <- -10 }
if (state == state("s2") && action == "right") { next_state <- state("s3") }
if (state == state("s2") && action == "hop") {
next_state <- state("s4"); reward <- -3 }

if (state == state("s3") && action == "left") { next_state <- state("s2") }
if (state == state("s3") && action == "right") {
next_state <- state("s4"); reward <- -3 }
if (state == state("s3") && action == "hop") { next_state <- state("s5") }

if (state == state("s4") && action == "left") { next_state <- state("s3") }
if (state == state("s4") && action == "right") { next_state <- state("s5") }
if (state == state("s4") && action == "hop") { next_state <- state("s6") }

if (state == state("s5") && action == "left") {
next_state <- state("s4"); reward <- -3 }
if (state == state("s5") && action == "right") { next_state <- state("s6") }
if (state == state("s5") && action == "hop") {
next_state <- state("s7"); reward <- 10 }

if (state == state("s6") && action == "left") { next_state <- state("s5") }
if (state == state("s6") && action == "right") {
next_state <- state("s7"); reward <- 10 }

if (next_state == state("s7") && state != state("s7")) {
reward <- 10
} else {
reward <- -1
}
out <- list(NextState = next_state, Reward = reward)
return(out)
}
[/code]

Next, I installed and loaded up the ReinforcementLearning package and ran the RL simulation:

[code language=”r”]
install.packages("ReinforcementLearning")
library(ReinforcementLearning)
states <- c("s1", "s2", "s3", "s4", "s5", "s6", "s7")
actions <- c("left","right","hop")
data <- sampleExperience(N=3000,env=my.env,states=states,actions=actions)
control <- list(alpha = 0.1, gamma = 0.8, epsilon = 0.1)
model <- ReinforcementLearning(data, s = "State", a = "Action", r = "Reward",
s_new = "NextState", control = control)
[/code]

Now we can see the results:

[code language=”r”]
> print(model)
State-Action function Q
hop right left
s1 2.456741 1.022440 1.035193
s2 2.441032 2.452331 1.054154
s3 4.233166 2.469494 1.048073
s4 4.179853 4.221801 2.422842
s5 6.397159 4.175642 2.456108
s6 4.217752 6.410110 4.223972
s7 -4.602003 -4.593739 -4.591626

Policy
s1 s2 s3 s4 s5 s6 s7
"hop" "right" "hop" "right" "hop" "right" "left"

Reward (last iteration)
[1] 223
[/code]

The recommended policy is: HOP, RIGHT, HOP, RIGHT, HOP, RIGHT, (STAY PUT)

If you tried this example and it didn’t produce the same response, don’t worry! Model-free reinforcement learning is done by simulation, and when you used the sampleExperience function, you generated a different set of state transitions to learn from. You may need more samples, or to tweak your rewards structure, or both.)

A Newbie’s Install of Keras & Tensorflow on Windows 10 with R

This weekend, I decided it was time: I was going to update my Python environment and get Keras and Tensorflow installed so I could start doing tutorials (particularly for deep learning) using R. Although I used to be a systems administrator (about 20 years ago), I don’t do much installing or configuring so I guess that’s why I’ve put this task off for so long. And it wasn’t unwarranted: it took me the whole weekend to get the install working. Here are the steps I used to get things running on Windows 10, leveraging clues in about 15 different online resources — and yes (I found out the hard way), the order of operations is very important. I do not claim to have nailed the order of operations here, but definitely one that works.

Step 0: I had already installed the tensorflow and keras packages within R, and had been wondering why they wouldn’t work. “Of course!” I finally realized, a few weeks later. “I don’t have Python on this machine, and both of these packages depend on a Python install.” Turns out they also depend on the reticulate package, so install.packages(“reticulate”) if you have not already.

Step 1: Installed Anaconda3 to C:/Users/User/Anaconda3 (from https://www.anaconda.com/download/)

Step 2: Opened “Anaconda Prompt” from Windows Start Menu. First, to create an “environment” specifically for use with tensorflow and keras in R called “tf-keras” with a 64-bit version of Python 3.5 I typed:

conda create -n tf-keras python=3.5 anaconda

… and then after it was done, I did this:

activate tf-keras

Step 3: Install TensorFlow from Anaconda prompt. Using the instructions at https://storage.googleapis.com/tensorflow/windows/cpu/tensorflow-1.1.0-cp35-cp35m-win_amd64.whl I typed this:

pip install --ignore-installed --upgrade

I didn’t know whether this worked or not — it gave me an error saying that it “can not import html5lib”, so I did this next:

conda install -c conda-forge html5lib

I tried to run the pip command again, but there was an error so I consulted https://www.tensorflow.org/install/install_windows. It told me to do this:

pip install --ignore-installed --upgrade tensorflow

This failed, and told me that the pip command had an error. I searched the web for an alternative to that command, and found this, which worked!!

conda install -c conda-forge tensorflow

 

Step 4: From inside the Anaconda prompt, I opened python by typing “python”. Next, I did this, line by line:

import tensorflow as tf
 hello = tf.constant('Hello, TensorFlow!')
 sess = tf.Session()
 print(sess.run(hello))

It said “b’Hello, TensorFlow!'” which I believe means it works. (Ctrl-Z then Enter will then get you out of Python and back to the Anaconda prompt.) This means that my Python installation of TensorFlow was functional.

Step 5: Install Keras. I tried this:

pip install keras

…but I got the same error message that pip could not be installed or found or imported or something. So I tried this, which seemed to work:

conda install -c conda-forge keras

 

Step 6: Load them up from within R. First, I opened a 64-bit version of R v3.4.1 and did this:

library(tensorflow)
install_tensorflow(conda="tf=keras")

It took a couple minutes but it seemed to work.

library(keras)

 

Step 7: Try a tutorial! I decided to go for https://www.analyticsvidhya.com/blog/2017/06/getting-started-with-deep-learning-using-keras-in-r/ which guides you through developing a classifier for the MNIST handwritten image database — a very popular data science resource. I loaded up my dataset and checked to make sure it loaded properly:

data <- data_mnist()
str(data)
List of 2
 $ train:List of 2
 ..$ x: int [1:60000, 1:28, 1:28] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ y: int [1:60000(1d)] 5 0 4 1 9 2 1 3 1 4 ...
 $ test :List of 2
 ..$ x: int [1:10000, 1:28, 1:28] 0 0 0 0 0 0 0 0 0 0 ...
 ..$ y: int [1:10000(1d)] 7 2 1 0 4 1 4 9 5 9 ...

 

Step 8: Here is the code I used to prepare the data and create the neural network model. This didn’t take long to run at all.

trainx<-data$train$x
trainy<-data$train$y
testx<-data$test$x
testy<-data$test$y

train_x <- array(train_x, dim = c(dim(train_x)[1], prod(dim(train_x)[-1]))) / 255

test_x <- array(test_x, dim = c(dim(test_x)[1], prod(dim(test_x)[-1]))) / 255

train_y<-to_categorical(train_y,10)
test_y<-to_categorical(test_y,10)

model %>% 
layer_dense(units = 784, input_shape = 784) %>% 
layer_dropout(rate=0.4)%>%
layer_activation(activation = 'relu') %>% 
layer_dense(units = 10) %>% 
layer_activation(activation = 'softmax')

model %>% compile(
loss = 'categorical_crossentropy',
optimizer = 'adam',
metrics = c('accuracy')
)

 

Step 9: Train the network. THIS TOOK ABOUT 12 MINUTES on a powerful machine with 64GB high-performance RAM. It looks like it worked, but I don’t know how to find or evaluate the results yet.

model %>% fit(train_x, train_y, epochs = 100, batch_size = 128)
 loss_and_metrics <- model %>% evaluate(test_x, test_y, batch_size = 128)

str(model)
Model
___________________________________________________________________________________
Layer (type) Output Shape Param #
===================================================================================
dense_1 (Dense) (None, 784) 615440
___________________________________________________________________________________
dropout_1 (Dropout) (None, 784) 0
___________________________________________________________________________________
activation_1 (Activation) (None, 784) 0
___________________________________________________________________________________
dense_2 (Dense) (None, 10) 7850
___________________________________________________________________________________
activation_2 (Activation) (None, 10) 0
===================================================================================
Total params: 623,290
Trainable params: 623,290
Non-trainable params: 0

 

Step 10: Next, I wanted to try the tutorial at https://cran.r-project.org/web/packages/kerasR/vignettes/introduction.html. Turns out this uses the kerasR package, not the keras package:

X_train <- mnist$X_train
Y_train <- mnist$Y_train
X_test <- mnist$X_test
Y_test <- mnist$Y_test

> dim(X_train)
[1] 60000 28 28

X_train <- array(X_train, dim = c(dim(X_train)[1], prod(dim(X_train)[-1]))) / 255
X_test <- array(X_test, dim = c(dim(X_test)[1], prod(dim(X_test)[-1]))) / 255

To check and see what’s in any individual image, type:

image(X_train[1,,])

At this point, the to_categorical function stopped working. I was supposed to do this but got an error:

Y_train <- to_categorical(mnist$Y_train, 10)

So I did this instead:

mm <- model.matrix(~ Y_train)

Y_train <- to_categorical(mm[,2])

mod <- Sequential()  # THIS IS THE EXCITING PART WHERE YOU USE KERAS!! :)

But then I tried this, and it was clear I was stuck again — it wouldn’t work:

mod$add(Dense(units = 512, input_shape = dim(X_train)[2]))

Stack Overflow recommended grabbing a version of kerasR from GitHub, so that’s what I did next:

install.packages("devtools")
library(devtools)
devtools::install_github("statsmaths/kerasR")
library(kerasR)

I got an error in R which told me to go to the Anaconda prompt (which I did), and type this:

conda install m2w64-toolchain

Then I went back into R and this worked fantastically:

mod <- Sequential()

mod$Add would still not work though, and this is where my patience expired for the evening. I’m pretty happy though — Python is up, keras and tensorflow are up on Python, all three (keras, tensorflow, and kerasR) are up in R, and some tutorials seem to be working.

« Older Entries