Image Credit: Shutterstock, from http://asq.org/blog/2015/02/why-should-quality-go-global/

Do you have, or have you had, a supplier selection problem to solve? I have some algorithms I’ve been working on to help you make better decisions about what suppliers to choose — and how to monitor performance over time. I’d like to test and refine them on real data. If anyone has data that you’ve used to select suppliers in the past 10 years, or have data that you’re working with right now to select suppliers, or have a colleague who may be able to share this data — that’s what I’m interested in sourcing.

Because this data can sometimes be proprietary and confidential, feel free to blind the names or identifying information for the suppliers — or I can do this myself (no suppliers, products, or parts will be named when I publish the results). I just need to be able to tell them apart. Tags like Supplier A or Part1SupplierA are fine. I’d prefer if you blinded the data, but I can also write scripts to do this and have you check them before I move forward.

Desired data format is CSV or Excel. Text files are also OK, as long as they clearly identify the criteria that you used for supplier selection. Email me at myfirstname dot mylastname at gmail if you can help out — and maybe I can help you out too! Thanks.

What’s Quality 4.0, why is it important, and how can you use it to gain competitive advantage? Did you know you can benefit from Quality 4.0 even if you’re not a manufacturing organization? That’s right. I’ll tell you more next week.

Sign up for my 50-minute webinar at 2pm ET on Tuesday, October 16, 2018 — hosted by Dirk Dusharme and Mike Richman at Quality Digest. This won’t be your traditional “futures” talk to let you know about all of the exciting technology on the horizon… I’ve actually been doing and teaching data science, and applying machine learning to practical problems in quality improvement, for over a decade.

You have a LOT of data and you don’t know where to begin

You’re kind of behind… you still use paper and Excel and you’re hoping you don’t miss the opportunities here

You’re a data scientist and you want to find out about quality and process improvement

You’re a quality professional and you want to find out more about data science

You’re a quality engineer and you want some professional preparation for what’s on the horizon

You want to be sure you get on our Quality 4.0 mailing list to receive valuable information assets for the next couple years to help you identify and capture opportunities

Register Here! See you on Tuesday. If you can’t make it, we’ll also be at the ASQ Quality 4.0 Summit in Dallas next month sharing more information about the convergence of quality and Big Data.

Want to find out what Quality 4.0 really is — and start realizing the benefits for your organization? Check out this month’s issue of ASQ’s Quality Progress, where my new article (“Let’s Get Digital“) does just that. Quality 4.0 — which we’re working to bring to the practice of quality management and quality engineering at Intelex — asks how we can leverage connected, intelligent, automated (C-I-A) technologies to increase efficiency, effectiveness, and satisfaction: “As connected, intelligent and automated systems are more widely adopted, we can once again expect a renaissance in quality tools and methods. The progression can be summarized through four themes:

Quality as inspection: In the early days, quality assurance relied on inspecting bad quality out of the total items produced. Walter A. Shewhart’s methods for statistical process control helped operators determine whether variation was due to random or special causes.

Quality as design: Inspired by W. Edwards Deming’s recommendation to cease dependence on inspection, more holistic methods emerged for designing quality into processes to prevent quality problems before they occurred.

Quality as empowerment: TQM and Six Sigma advocate a holistic approach to quality, making it everyone’s responsibility and empowering individuals to contribute to continuous improvement.

Quality as discovery: In an adaptive, intelligent environment, quality depends on how quickly we can discover and aggregate new data sources, how effectively we can discover root causes and how well we can discover new insights about ourselves, our products and our organizations.”

The example explores path-finding through a house:

The question to be answered here is: What’s the best way to get from Room 2 to Room 5 (outside)? Notice that by answering this question using reinforcement learning, we will also know how to find optimal routes from any room to outside. And if we run the iterative algorithm again for a new target state, we can find out the optimal route from any room to that new target state.

Since Q-Learning is model-free, we don’t need to know how likely it is that our agent will move between any room and any other room (the transition probabilities). If you had observed the behavior in this system over time, you might be able to find that information, but it many cases it just isn’t available. So the key for this problem is to construct a Rewards Matrix that explains the benefit (or penalty!) of attempting to go from one state (room) to another.

Assigning the rewards is somewhat arbitrary, but you should give a large positive value to your target state and negative values to states that are impossible or highly undesirable. Here’s the guideline we’ll use for this problem:

-1 if “you can’t get there from here”

0 if the destination is not the target state

100 if the destination is the target state

We’ll start constructing our rewards matrix by listing the states we’ll come FROM down the rows, and the states we’ll go TO in the columns. First, let’s fill the diagonal with -1 rewards, because we don’t want our agent to stay in the same place (that is, move from Room 1 to Room 1, or from Room 2 to Room 2, and so forth). The final one gets a 100 because if we’re already in Room 5, we want to stay there.

Next, let’s move across the first row. Starting in Room 0, we only have one choice: go to Room 4. All other possibilities are blocked (-1):

Now let’s fill in the row labeled 1. From Room 1, you have two choices: go to Room 3 (which is not great but permissible, so give it a 0) or go to Room 5 (the target, worth 100)!

Continue moving row by row, determining if you can’t get there from here (-1), you can but it’s not the target (0), or it’s the target(100). You’ll end up with a final rewards matrix that looks like this:

And run the code. Notice that we’re calling the target state 6 instead of 5 because even though we have a room labeled with a zero, our matrix starts with a 1s so we have to adjust:

You can read this table of average value to obtain policies. A policy is a “path” through the states of the system:

Start at Room 0 (first row, labeled 1): Choose Room 4 (80), then from Room 4 choose Room 5 (100)

Start at Room 1: Choose Room 5 (100)

Start at Room 2: Choose Room 3 (64), from Room 3 choose Room 1 or Room 4 (80); from 1 or 4 choose 5 (100)

Start at Room 3: Choose Room 1 or Room 4 (80), then Room 5 (100)

Start at Room 4: Choose Room 5 (100)

Start at Room 5: Stay at Room 5 (100)

To answer the original problem, we would take route 2-3-1-5 or 2-3-4-5 to get out the quickest if we started in Room 2.This is easy to see with a simple map, but is much more complicated when the maps get bigger.

Overview: Reinforcement learning uses “reward” signals to determine how to navigate through a system in the most valuable way. (I’m particularly interested in the variant of reinforcement learning called “Q-Learning” because the goal is to create a “Quality Matrix” that can help you make the best sequence of decisions!) I found a toy robot navigation problem on the web that was solved using custom R code for reinforcement learning, and I wanted to reproduce the solution in different ways than the original author did. This post describes different ways that I solved the problem described at http://bayesianthink.blogspot.com/2014/05/hopping-robots-and-reinforcement.html

The Problem: Our agent, the robot, is placed at random on a board of wood. There’s a hole at s1, a sticky patch at s4, and the robot is trying to make appropriate decisions to navigate to s7 (the target). The image comes from the blog post linked above.

To solve a problem like this, you can use MODEL-BASED approaches if you know how likely it is that the robot will move from one state to another (that is, the transition probabilities for each action) or MODEL-FREE approaches (you don’t know how likely it is that the robot will move from state to state, but you can figure out a reward structure).

Markov Decision Process (MDP) – If you know the states, actions, rewards, and transition probabilities (which are probably different for each action), you can determine the optimal policy or “path” through the system, given different starting states. (If transition probabilities have nothing to do with decisions that an agent makes, your MDP reduces to a Markov Chain.)

Reinforcement Learning (RL) – If you know the states, actions, and rewards (but not the transition probabilities), you can still take an unsupervised approach. Just randomly create lots of hops through your system, and use them to update a matrix that describes the average value of each hop within the context of the system.

Solving a RL problem involves finding the optimal value functions (e.g. the Q matrix in Attempt 1) or the optimal policy (the State-Action matrix in Attempt 2). Although there are many techniques for reinforcement learning, we will use Q-learning because we don’t know the transition probabilities for each action. (If we did, we’d model it as a Markov Decision Process and use the MDPtoolbox package instead.) Q-Learning relies on traversing the system in many ways to update a matrix of average expected rewards from each state transition. This equation that it uses is from https://www.is.uni-freiburg.de/ressourcen/business-analytics/13_reinforcementlearning.pdf:

For this to work, all states have to be visited a sufficient number of times, and all state-action pairs have to be included in your experience sample. So keep this in mind when you’re trying to figure out how many iterations you need.

Attempt 1: Quick Q-Learning with qlearn.R

Input: A rewards matrix R. (That’s all you need! Your states are encoded in the matrix.)

Output: A Q matrix from which you can extract optimal policies (or paths) to help you navigate the environment.

Pros: Quick and very easy.Cons: Does not let you set epsilon (% of random actions), so all episodes are determined randomly and it may take longer to find a solution. Can take a long time to converge.

Set up the rewards matrix so it is a square matrix with all the states down the rows, starting with the first and all the states along the columns, starting with the first:

Here’s how you read this: the rows represent where you’ve come FROM, and the columns represent where you’re going TO. Each element 1 through 7 corresponds directly to S1 through S7 in the cartoon above. Each cell contains a reward (or penalty, if the value is negative) if we arrive in that state.

The S1 state is bad for the robot… there’s a hole in that piece of wood, so we’d really like to keep it away from that state. Location [1,1] on the matrix tells us what reward (or penalty) we’ll receive if we start at S1 and stay at S1: -10 (that’s bad). Similarly, location [2,1] on the matrix tells us that if we start at S2 and move left to S1, that’s also bad and we should receive a penalty of -10. The S4 state is also undesirable – there’s a sticky patch there, so we’d like to keep the robot away from it. Location [3,4] on the matrix represents the action of going from S3 to S4 by moving right, which will put us on the sticky patch

Now load the qlearn command into your R session:

[code language=”r”]
qlearn <- function(R, N, alpha, gamma, tgt.state) {
# Adapted from https://stackoverflow.com/questions/39353580/how-to-implement-q-learning-in-r
Q <- matrix(rep(0,length(R)), nrow=nrow(R))
for (i in 1:N) {
cs <- sample(1:nrow(R), 1)
while (1) {
next.states <- which(R[cs,] > -1) # Get feasible actions for cur state
if (length(next.states)==1) # There may only be one possibility
ns <- next.states
else
ns <- sample(next.states,1) # Or you may have to pick from a few
if (ns > nrow(R)) { ns <- cs }
# NOW UPDATE THE Q-MATRIX
Q[cs,ns] <- Q[cs,ns] + alpha*(R[cs,ns] + gamma*max(Q[ns, which(R[ns,] > -1)]) – Q[cs,ns])
if (ns == tgt.state) break
cs <- ns
}
}
return(round(100*Q/max(Q)))
}
[/code]

Run qlearn with the HOP rewards matrix, a learning rate of 0.1, a discount rate of 0.8, and a target state of S7 (the location to the far right of the wooden board). I did 10,000 episodes (where in each one, the robot dropped randomly onto the wooden board and has to get to S7):

The Q-Matrix that is presented encodes the best-value solutions from each state (the “policy”). Here’s how you read it:

If you’re at s1 (first row), hop to s3 (biggest value in first row), then hop to s5 (go to row 3 and find biggest value), then hop to s7 (go to row 5 and find biggest value)

If you’re at s2, go right to s3, then hop to s5, then hop to s7

If you’re at s3, hop to s5, then hop to s7

If you’re at s4, go right to s5 OR hop to s6, then go right to s7

If you’re at s5, hop to s7

If you’re at s6, go right to s7

If you’re at s7, stay there (when you’re in the target state, the value function will not be able to pick out a “best action” because the best action is to do nothing)

Alternatively, the policy can be expressed as the best action from each of the 7 states: HOP, RIGHT, HOP, RIGHT, HOP, RIGHT, (STAY PUT)

Input: 1) a definition of the environment, 2) a list of states, 3) a list of actions, and 4) control parameters alpha (the learning rate; usually 0.1), gamma (the discount rate which describes how important future rewards are; often 0.9 indicating that 90% of the next reward will be taken into account), and epsilon (the probability that you’ll try a random action; often 0.1)

Output: A State-Action Value matrix, which attaches a number to how good it is to be in a particular state and take an action. You can use it to determine the highest value action from each state. (It contains the same information as the Q-matrix from Attempt 1, but you don’t have to infer the action from the destination it brings you to.)

Pros: Relatively straightforward. Allows you to specify epsilon, which controls the proportion of random actions you’ll explore as you create episodes and explore your environment. Cons: Requires manual setup of all state transitions and associated rewards.

First, I created an “environment” that describes 1) how the states will change when actions are taken, and 2) what rewards will be accrued when that happens. I assigned a reward of -1 to all actions that are not special, e.g. landing on S1, landing on S4, or landing on S7. To be perfectly consistent with Attempt 1, I could have used 0.01 instead of -1, but the results will be similar. The values you choose for rewards are sort of arbitrary, but you do need to make sure there’s a comparatively large positive reward at your target state and “negative rewards” for states you want to avoid or are physically impossible.

The recommended policy is: HOP, RIGHT, HOP, RIGHT, HOP, RIGHT, (STAY PUT)

If you tried this example and it didn’t produce the same response, don’t worry! Model-free reinforcement learning is done by simulation, and when you used the sampleExperience function, you generated a different set of state transitions to learn from. You may need more samples, or to tweak your rewards structure, or both.)

This weekend, I decided it was time: I was going to update my Python environment and get Keras and Tensorflow installed so I could start doing tutorials (particularly for deep learning) using R. Although I used to be a systems administrator (about 20 years ago), I don’t do much installing or configuring so I guess that’s why I’ve put this task off for so long. And it wasn’t unwarranted: it took me the whole weekend to get the install working. Here are the steps I used to get things running on Windows 10, leveraging clues in about 15 different online resources — and yes (I found out the hard way), the order of operations is very important. I do not claim to have nailed the order of operations here, but definitely one that works.

Step 0: I had already installed the tensorflow and keras packages within R, and had been wondering why they wouldn’t work. “Of course!” I finally realized, a few weeks later. “I don’t have Python on this machine, and both of these packages depend on a Python install.” Turns out they also depend on the reticulate package, so install.packages(“reticulate”) if you have not already.

Step 2: Opened “Anaconda Prompt” from Windows Start Menu. First, to create an “environment” specifically for use with tensorflow and keras in R called “tf-keras” with a 64-bit version of Python 3.5 I typed:

It said “b’Hello, TensorFlow!'” which I believe means it works. (Ctrl-Z then Enter will then get you out of Python and back to the Anaconda prompt.) This means that my Python installation of TensorFlow was functional.

Step 5: Install Keras. I tried this:

pip install keras

…but I got the same error message that pip could not be installed or found or imported or something. So I tried this, which seemed to work:

conda install -c conda-forge keras

Step 6: Load them up from within R. First, I opened a 64-bit version of R v3.4.1 and did this:

Step 9: Train the network. THIS TOOK ABOUT 12 MINUTES on a powerful machine with 64GB high-performance RAM. It looks like it worked, but I don’t know how to find or evaluate the results yet.

model %>% fit(train_x, train_y, epochs = 100, batch_size = 128)
loss_and_metrics <- model %>% evaluate(test_x, test_y, batch_size = 128)

I got an error in R which told me to go to the Anaconda prompt (which I did), and type this:

conda install m2w64-toolchain

Then I went back into R and this worked fantastically:

mod <- Sequential()

mod$Add would still not work though, and this is where my patience expired for the evening. I’m pretty happy though — Python is up, keras and tensorflow are up on Python, all three (keras, tensorflow, and kerasR) are up in R, and some tutorials seem to be working.

Every student learns how to look up areas under the normal curve using Z-Score tables in their first statistics class. But what is less commonly covered, especially in courses where calculus is not a prerequisite, is where those Z-Score tables come from: figuring out the area under the normal curve for all possible places you could chop it into two, then making a table from it.

You get the z-score by evaluating the integral of the equation for the bell-shaped normal curve, usually from -Inf to the z-score of interest. This is the same thing that the R command pnorm does when you provide it with a z-score. Here is the slide presentation I put together to explain the use and origin of the Z-Score table, and how it relates to pnorm and qnorm (the command that lets you input an area to find the z-score at which the area to the left is swiped out). It’s free to use under Creative Commons, and is part of the course materials that is available for use with this 2017 book.

One of the fun things I did was to make my own z-score table in R. I don’t know why anyone would WANT to do this — they are easy to find in books, and online, and if you know how to use pnorm and qnorm, you don’t need one at all. But, you can, and here’s how.

First, let’s create a z-score table just with left-tail areas. Using symmetry, we can also use this to get any areas in the right tail, because the area to the left of any -z is the same as any area to the right of any +z. Even though the z-score table contains areas in its cells, our first step is to create a table just of the z-scores that correspond to each cell:

Now that we have slots for all the z-scores, we can use pnorm to transform all those values into the areas that are swiped out to the left of that z-score. This part is easy, and only takes one line. The remaining three lines format and display the z-score table: