Tag Archives: data science

Imperfect Action is Better Than Perfect Inaction: What Harry Truman Can Teach Us About Loss Functions (with an intro to ggplot)

One of the heuristics we use at Intelex to guide decision making is former US President Truman’s advice that “imperfect action is better than perfect inaction.” What it means is — don’t wait too long to take action, because you don’t want to miss opportunities. Good advice, right?

When I share this with colleagues, I often hear a response like: “that’s dangerous!” To which my answer is “well sure, sometimes, but it can be really valuable depending on how you apply it!” The trick is: knowing how and when.

Here’s how it can be dangerous. For example, statistical process control (SPC) exists to keep us from tampering with processes — from taking imperfect action based on random variation, which will not only get us nowhere, but can exacerbate the problem we were trying to solve. The secret is to apply Truman’s heuristic based on an understanding of exactly how imperfect is OK with your organization, based on your risk appetite. And this is where loss functions can help.

Along the way, we’ll demonstrate how to do a few important things related to plotting with the ggplot package in R, gradually adding in new elements to the plot so you can see how it’s layered, including:

  • Plot a function based on its equation
  • Add text annotations to specific locations on a ggplot
  • Draw horizontal and vertical lines on a ggplot
  • Draw arrows on a ggplot
  • Add extra dots to a ggplot
  • Eliminate axis text and axis tick marks

What is a Loss Function?

A loss function quantifies how unhappy you’ll be based on the accuracy or effectiveness of a prediction or decision. In the simplest case, you control one variable (x) which leads to some cost or loss (y). For the case we’ll examine in this post, the variables are:

  • How much time and effort you put in to scoping and characterizing the problem (x); we assume that time+effort invested leads to real understanding
  • How much it will cost you (y); can be expressed in terms of direct costs (e.g. capex + opex) as well as opportunity costs or intangible costs (e.g. damage to reputation)

Here is an example of what this might look like, if you have a situation where overestimating (putting in too much x) OR underestimating (putting in too little x) are both equally bad. In this case, x=10 is the best (least costly) decision or prediction:

plot of a typical squared loss function
# describe the equation we want to plot
parabola <- function(x) ((x-10)^2)+10  

# initialize ggplot with a dummy dataset
library(ggplot)
p <- ggplot(data = data.frame(x=0), mapping = aes(x=x)) 

p + stat_function(fun=parabola) + xlim(-2,23) + ylim(-2,100) +
     xlab("x = the variable you can control") + 
     ylab("y = cost of loss ($$)")

In regression (and other techniques where you’re trying to build a model to predict a quantitative dependent variable), mean square error is a squared loss function that helps you quantify error. It captures two facts: the farther away you are from the correct answer the worse the error is — and both overestimating and underestimating is bad (which is why you square the values). Across this and related techniques, the loss function captures these characteristics:

From http://www.cs.cornell.edu/courses/cs4780/2015fa/web/lecturenotes/lecturenote10.html

Not all loss functions have that general shape. For classification, for example, the 0-1 loss function tells the story that if you get a classification wrong (x < 0) you incur all the penalty or loss (y=1), whereas if you get it right (x > 0) there is no penalty or loss (y=0):

# set up data frame of red points
d.step <- data.frame(x=c(-3,0,0,3), y=c(1,1,0,0))

# note that the loss function really extends to x=-Inf and x=+Inf
ggplot(d.step) + geom_step(mapping=aes(x=x, y=y), direction="hv") +
     geom_point(mapping=aes(x=x, y=y), color="red") + 
     xlab("y* f(x)") + ylab("Loss (Cost)") +  
     ggtitle("0-1 Loss Function for Classification")

Use the Loss Function to Make Strategic Decisions

So let’s get back to Truman’s advice. Ideally, we want to choose the x (the amount of time and effort to invest into project planning) that results in the lowest possible cost or loss. That’s the green point at the nadir of the parabola:

p + stat_function(fun=parabola) + xlim(-2,23) + ylim(-2,100) + 
     xlab("Time Spent and Information Gained (e.g. person-weeks)") + ylab("$$ COST $$") +
     annotate(geom="text", x=10, y=5, label="Some Effort, Lowest Cost!!", color="darkgreen") +
     geom_point(aes(x=10, y=10), colour="darkgreen")

Costs get higher as we move up the x-axis:

p + stat_function(fun=parabola) + xlim(-2,23) + ylim(-2,100) + 
     xlab("Time Spent and Information Gained (e.g. person-weeks)") + ylab("$$ COST $$") +
     annotate(geom="text", x=10, y=5, label="Some Effort, Lowest Cost!!", color="darkgreen") +
     geom_point(aes(x=10, y=10), colour="darkgreen") +
     annotate(geom="text", x=0, y=100, label="$$$$$", color="green") +
     annotate(geom="text", x=0, y=75, label="$$$$", color="green") +
     annotate(geom="text", x=0, y=50, label="$$$", color="green") +
     annotate(geom="text", x=0, y=25, label="$$", color="green") +
     annotate(geom="text", x=0, y=0, label="$ 0", color="green")

And time+effort grows as we move along the x-axis (we might spend minutes on a problem at the left of the plot, or weeks to years by the time we get to the right hand side):

p + stat_function(fun=parabola) + xlim(-2,23) + ylim(-2,100) + 
     xlab("Time Spent and Information Gained (e.g. person-weeks)") + ylab("$$ COST $$") +
     annotate(geom="text", x=10, y=5, label="Some Effort, Lowest Cost!!", color="darkgreen") +
     geom_point(aes(x=10, y=10), colour="darkgreen") +
     annotate(geom="text", x=0, y=100, label="$$$$$", color="green") +
     annotate(geom="text", x=0, y=75, label="$$$$", color="green") +
     annotate(geom="text", x=0, y=50, label="$$$", color="green") +
     annotate(geom="text", x=0, y=25, label="$$", color="green") +
     annotate(geom="text", x=0, y=0, label="$ 0", color="green") +
     annotate(geom="text", x=2, y=0, label="minutes\nof effort", size=3) +
     annotate(geom="text", x=20, y=0, label="months\nof effort", size=3)

Planning too Little = Planning too Much = Costly

What this means is — if we don’t plan, or we plan just a little bit, we incur high costs. We might make the wrong decision! Or miss critical opportunities! But if we plan too much — we’re going to spend too much time, money, and/or effort compared to the benefit of the solution we provide.


p + stat_function(fun=parabola) + xlim(-2,23) + ylim(-2,100) + 
     xlab("Time Spent and Information Gained (e.g. person-weeks)") + ylab("$$ COST $$") +
     annotate(geom="text", x=10, y=5, label="Some Effort, Lowest Cost!!", color="darkgreen") +
     geom_point(aes(x=10, y=10), colour="darkgreen") +
     annotate(geom="text", x=0, y=100, label="$$$$$", color="green") +
     annotate(geom="text", x=0, y=75, label="$$$$", color="green") +
     annotate(geom="text", x=0, y=50, label="$$$", color="green") +
     annotate(geom="text", x=0, y=25, label="$$", color="green") +
     annotate(geom="text", x=0, y=0, label="$ 0", color="green") +
     annotate(geom="text", x=2, y=0, label="minutes\nof effort", size=3) +
     annotate(geom="text", x=20, y=0, label="months\nof effort", size=3) +
     annotate(geom="text",x=3, y=85, label="Little (or no) Planning\nHIGH COST", color="red") +
     annotate(geom="text", x=18, y=85, label="Paralysis by Planning\nHIGH COST", color="red") +
     geom_vline(xintercept=0, linetype="dotted") + geom_hline(yintercept=0, linetype="dotted")

The trick is to FIND THAT CRITICAL LEVEL OF TIME and EFFORT invested to gain information and understanding about your problem… and then if you’re going to err, make sure you err towards the left — if you’re going to make a mistake, make the mistake that costs less and takes less time to make:

arrow.x <- c(10, 10, 10, 10)
arrow.y <- c(35, 50, 65, 80)
arrow.x.end <- c(6, 6, 6, 6)
arrow.y.end <- arrow.y
d <- data.frame(arrow.x, arrow.y, arrow.x.end, arrow.y.end)

p + stat_function(fun=parabola) + xlim(-2,23) + ylim(-2,100) + 
     xlab("Time Spent and Information Gained (e.g. person-weeks)") + ylab("$$ COST $$") +
     annotate(geom="text", x=10, y=5, label="Some Effort, Lowest Cost!!", color="darkgreen") +
     geom_point(aes(x=10, y=10), colour="darkgreen") +
     annotate(geom="text", x=0, y=100, label="$$$$$", color="green") +
     annotate(geom="text", x=0, y=75, label="$$$$", color="green") +
     annotate(geom="text", x=0, y=50, label="$$$", color="green") +
     annotate(geom="text", x=0, y=25, label="$$", color="green") +
     annotate(geom="text", x=0, y=0, label="$ 0", color="green") +
     annotate(geom="text", x=2, y=0, label="minutes\nof effort", size=3) +
     annotate(geom="text", x=20, y=0, label="months\nof effort", size=3) +
     annotate(geom="text",x=3, y=85, label="Little (or no) Planning\nHIGH COST", color="red") +
     annotate(geom="text", x=18, y=85, label="Paralysis by Planning\nHIGH COST", color="red") +
     geom_vline(xintercept=0, linetype="dotted") + 
     geom_hline(yintercept=0, linetype="dotted") +
     geom_vline(xintercept=10) +
     geom_segment(data=d, mapping=aes(x=arrow.x, y=arrow.y, xend=arrow.x.end, yend=arrow.y.end),
     arrow=arrow(), color="blue", size=2) +
     annotate(geom="text", x=8, y=95, size=2.3, color="blue",
     label="we prefer to be\non this side of the\nloss function")

Moral of the Story

The moral of the story is… imperfect action can be expensive, but perfect action is ALWAYS expensive. Spend less to make mistakes and learn from them, if you can! This is one of the value drivers for agile methodologies… agile practices can help improve communication and coordination so that the loss function is minimized.

## FULL CODE FOR THE COMPLETELY ANNOTATED CHART ##
# If you change the equation for the parabola, annotations may shift and be in the wrong place.
parabola <- function(x) ((x-10)^2)+10

my.title <- expression(paste("Imperfect Action Can Be Expensive. But Perfect Action is ", italic("Always"), " Expensive."))

arrow.x <- c(10, 10, 10, 10)
arrow.y <- c(35, 50, 65, 80)
arrow.x.end <- c(6, 6, 6, 6)
arrow.y.end <- arrow.y
d <- data.frame(arrow.x, arrow.y, arrow.x.end, arrow.y.end)

p + stat_function(fun=parabola) + xlim(-2,23) + ylim(-2,100) + 
     xlab("Time Spent and Information Gained (e.g. person-weeks)") + ylab("$$ COST $$") +
     annotate(geom="text", x=10, y=5, label="Some Effort, Lowest Cost!!", color="darkgreen") +
     geom_point(aes(x=10, y=10), colour="darkgreen") +
     annotate(geom="text", x=0, y=100, label="$$$$$", color="green") +
     annotate(geom="text", x=0, y=75, label="$$$$", color="green") +
     annotate(geom="text", x=0, y=50, label="$$$", color="green") +
     annotate(geom="text", x=0, y=25, label="$$", color="green") +
     annotate(geom="text", x=0, y=0, label="$ 0", color="green") +
     annotate(geom="text", x=2, y=0, label="minutes\nof effort", size=3) +
     annotate(geom="text", x=20, y=0, label="months\nof effort", size=3) +
     annotate(geom="text",x=3, y=85, label="Little (or no) Planning\nHIGH COST", color="red") +
     annotate(geom="text", x=18, y=85, label="Paralysis by Planning\nHIGH COST", color="red") +
     geom_vline(xintercept=0, linetype="dotted") + 
     geom_hline(yintercept=0, linetype="dotted") +
     geom_vline(xintercept=10) +
     geom_segment(data=d, mapping=aes(x=arrow.x, y=arrow.y, xend=arrow.x.end, yend=arrow.y.end),
     arrow=arrow(), color="blue", size=2) +
     annotate(geom="text", x=8, y=95, size=2.3, color="blue",
     label="we prefer to be\non this side of the\nloss function") +
     ggtitle(my.title) +
     theme(axis.text.x=element_blank(), axis.ticks.x=element_blank(),
     axis.text.y=element_blank(), axis.ticks.y=element_blank()) 

Now sometimes you need to make this investment! (Think nuclear power plants, or constructing aircraft carriers or submarines.) Don’t get caught up in getting your planning investment perfectly optimized — but do be aware of the trade-offs, and go into the decision deliberately, based on the risk level (and regulatory nature) of your industry, and your company’s risk appetite.

Object of Type Closure is Not Subsettable

I started using R in 2004. I started using R religiously on the day of the annular solar eclipse in Madrid (October 3, 2005) after being inspired by David Hunter’s talk at ADASS.

It took me exactly 4,889 days to figure out what this vexing error means, even though trial and error helped me move through it most every time it happened! I’m so moved by what I learned (and embarrassed that it took so long to make sense), that I have to share it with you.

This error happens when you’re trying to treat a function like a list, vector, or data frame. To fix it, start treating the function like a function.

Let’s take something that’s very obviously a function. I picked the sampling distribution simulator from a 2015 blog post I wrote. Cut and paste it into your R console:

sdm.sim <- function(n,src.dist=NULL,param1=NULL,param2=NULL) {
   r <- 10000  # Number of replications/samples - DO NOT ADJUST
   # This produces a matrix of observations with  
   # n columns and r rows. Each row is one sample:
   my.samples <- switch(src.dist,
	"E" = matrix(rexp(n*r,param1),r),
	"N" = matrix(rnorm(n*r,param1,param2),r),
	"U" = matrix(runif(n*r,param1,param2),r),
	"P" = matrix(rpois(n*r,param1),r),
        "B" = matrix(rbinom(n*r,param1,param2),r),
	"G" = matrix(rgamma(n*r,param1,param2),r),
	"X" = matrix(rchisq(n*r,param1),r),
	"T" = matrix(rt(n*r,param1),r))
   all.sample.sums <- apply(my.samples,1,sum)
   all.sample.means <- apply(my.samples,1,mean)   
   all.sample.vars <- apply(my.samples,1,var) 
   par(mfrow=c(2,2))
   hist(my.samples[1,],col="gray",main="Distribution of One Sample")
   hist(all.sample.sums,col="gray",main="Sampling Distribution\nof
	the Sum")
   hist(all.sample.means,col="gray",main="Sampling Distribution\nof the Mean")
   hist(all.sample.vars,col="gray",main="Sampling Distribution\nof
	the Variance")
}

The right thing to do with this function is use it to simulate a bunch of distributions and plot them using base R. You write the function name, followed by parenthesis, followed by each of the four arguments the function needs to work. This will generate a normal distribution with mean of 20 and standard deviation of 3, along with three sampling distributions, using a sample size of 100 and 10000 replications:

sdm.sim(100, src.dist="N", param1=20, param2=3)

(You should get four plots, arranged in a 2×2 grid.)

But what if we tried to treat sdm.sim like a list, and call the 3rd element of it? Or what if we tried to treat it like a data frame, and we wanted to call one of the variables in the column of the data frame?

> sdm.sim[3]
Error in sdm.sim[3] : object of type 'closure' is not subsettable

> sdm.sim$values
Error in sdm.sim$values : object of type 'closure' is not subsettable

SURPRISE! Object of type closure is not subsettable. This happens because sdm.sim is a function, and its data type is (shockingly) something called “closure”:

> class(sdm.sim)
[1] "function"

> typeof(sdm.sim)
[1] "closure"

I had read this before on Stack Overflow a whole bunch of times, but it never really clicked until I saw it like this! And now that I had a sense for why the error was occurring, turns out it’s super easy to reproduce with functions in base R or functions in anyfamiliar packages you use:

> ggplot$a
Error in ggplot$a : object of type 'closure' is not subsettable

> table$a
Error in table$a : object of type 'closure' is not subsettable

> ggplot[1]
Error in ggplot[1] : object of type 'closure' is not subsettable

> table[1]
Error in table[1] : object of type 'closure' is not subsettable

As a result, if you’re pulling your hair out over this problem, check and see where in your rogue line of code you’re treating something like a non-function. Or maybe you picked a variable name that competes with a function name. Or maybe you got your operations out of order. In any case, change your notation so that your function is doing function things, and your code should start working.

But as Luke Smith pointed out, this is not true for functional sequences (which you can also write as functions). Functional sequences are those chains of commands you write when you’re in tidyverse mode, all strung together with %>% pipes:

Luke’s code that you can cut and paste (and try), with the top line that got cut off by Twitter, is:

fun1 <- . %>% 
    group_by(col1) %>% 
    mutate(n=n+h) %>% 
    filter(n==max(n)) %>% 
    ungroup() %>% 
    summarize(new_col=mean(n))

fun2 <- fun1[-c(2,5)] 

Even though Luke’s fun1 and fun2 are of type closure, they are subsettable because they contain a sequence of functions:

> typeof(fun1)
[1] "closure"

> typeof(fun2)
[1] "closure"

> fun1[1]
Functional sequence with the following components:

 1. group_by(., col1)

Use 'functions' to extract the individual functions. 

> fun1[[1]]
function (.) 
group_by(., col1)

Don’t feel bad! This problem has plagued all of us for many, many hours (me: years), and yet it still happens to us more often than we would like to admit. Awareness of this issue will not prevent you from attempting things that give you this error in the future. It’s such a popular error that there have been memes about it and sad valentines written about it:

SCROLL DOWN PAST STEPH’S TWEET TO SEE THE JOKE!!

(Also, if you’ve made it this far, FOLLOW THESE GOOD PEOPLE ON TWITTER: @stephdesilva @djnavarro @lksmth – They all share great information on data science in general, and R in particular. Many thanks too to all the #rstats crowd who shared in my glee last night and didn’t make me feel like an idiot for not figuring this out for ALMOST 14 YEARS. It seems so simple now.

Steph is also a Microsoft whiz who you should definitely hire if you need anything R+ Microsoft. Thanks to all of you!)

Supplier Quality Management: Seeking Test Data

Image Credit: Shutterstock, from http://asq.org/blog/2015/02/why-should-quality-go-global/

Do you have, or have you had, a supplier selection problem to solve? I have some algorithms I’ve been working on to help you make better decisions about what suppliers to choose — and how to monitor performance over time. I’d like to test and refine them on real data. If anyone has data that you’ve used to select suppliers in the past 10 years, or have data that you’re working with right now to select suppliers, or have a colleague who may be able to share this data — that’s what I’m interested in sourcing.

Because this data can sometimes be proprietary and confidential, feel free to blind the names or identifying information for the suppliers — or I can do this myself (no suppliers, products, or parts will be named when I publish the results). I just need to be able to tell them apart. Tags like Supplier A or Part1SupplierA are fine. I’d prefer if you blinded the data, but I can also write scripts to do this and have you check them before I move forward.

Desired data format is CSV or Excel. Text files are also OK, as long as they clearly identify the criteria that you used for supplier selection. Email me at myfirstname dot mylastname at gmail if you can help out — and maybe I can help you out too! Thanks.

 

Happy 10th Birthday!

10 years ago today, this blog published its first post: “How Do I Do a Lean Six Sigma (LSS) Project?” Looking back, it seems like a pretty simple place to have started. I didn’t know whether it would even be useful to anyone, but I was committed to making my personal PDSA cycles high-impact: I was going to export things I learned, or things I found valuable. (As it turns out, many people did appreciate the early posts even though it would take a few years for that to become evident!)

Since then, hundreds more have followed to help people understand more about quality and process improvement in theory and in practice. I started writing because I was in the middle of my PhD dissertation in the Quality Systems program at Indiana State, and I was discovering so many interesting nuggets of information that I wanted to share those with the world – particularly practitioners, who might not have lots of time (or even interest) in sifting through the research. In addition, I was using data science (and some machine learning, although at the time, it was much more difficult to implement) to explore quality-related problems, and could see the earliest signs that this new paradigm for problem solving might help fuel data-driven decision making in the workplace… if only we could make the advanced techniques easy for people in busy jobs to use and apply.

We’re not there yet, but as ASQ and other organizations recognize Quality 4.0 as a focus area, we’re much closer. As a result, I’ve made it my mission to help bring insights from research to practitioners, to make these new innovations real. If you are developing or demonstrating any new innovative techniques that relate to making people, processes, or products better, easier, faster, or less expensive — or reducing risks and building individual and organizational capabilities — let me know!

I’ve also learned a lot in the past decade, most of which I’ve spent helping undergraduate students develop and refine their data-driven decision making skills, and more recently at Intelex (provider of integrated environment, health & safety, and quality management EHSQ software to enterprises and smaller organizations). Here are some of the big lessons:

  1. People are complex. They have multidimensional lives, and work should support and enrich those lives. Any organization that cares about performance — internally and in the market — should examine how it can create complete and meaningful experiences. This applies not only to customers, but to employees and partners and suppliers. It also applies to anyone an organization has the power and potential to impact, no matter how small.
  2. Everybody wants to do a good job (and be recognized for it). How can we create environments where each person is empowered to contribute in all the areas where they have talent and interest? How can these same environments be designed with empathy as a core capability?
  3. Your data are your most valuable assets. It sounds trite, but data is becoming as valuable as warehouses, inventory, and equipment. I was involved in a project a few years ago where we digitized data that had been collected for three years — and by analyzing it, we uncovered improvement opportunities that when implemented, saved thousands of dollars a week. We would not have been able to do that if the data had remained scratched in pencil on thousands of sheets of well-worn legal paper.
  4. Nothing beats domain expertise (especially where data science is concerned). I’ve analyzed terabytes of data over the past decade, and in many cases, the secrets are subtle. Any time you’re using data to make decisions, be sure to engage the people with practical, on-the-ground experience in the area you’re studying.
  5. Self-awareness must be cultivated. The older you get, and the more experience you gain, the more you know what you don’t know. Many of my junior colleagues (and yours) haven’t reached this point yet, and will need some help from senior colleagues to gain this awareness. At the same time, those of you who are senior have valuable lessons to learn from your junior colleagues, too! Quality improvement is grounded in personal and organizational learning, and processes should help people help each other uncover blind spots and work through them — without fear.

Most of all, I discovered that what really matters is learning. We can spend time supporting human and organizational performance, developing and refining processes that have quality baked in, and making sure that products meet all their specifications. But what’s going on under the surface is more profound: people are learning about themselves, they are learning about how to transform inputs into outputs in a way that adds value, and they are learning about each other and their environment. Our processes just encapsulate that organizational knowledge that we develop as we learn.

Quality 4.0: Let’s Get Digital

Want to find out what Quality 4.0 really is — and start realizing the benefits for your organization? If so, check out the October 2018 issue of ASQ’s Quality Progress, where my new article (“Let’s Get Digital“) does just that.

Quality 4.0 asks how we can leverage connected, intelligent, automated (C-I-A) technologies to increase efficiency, effectiveness, and satisfaction: “As connected, intelligent and automated systems are more widely adopted, we can once again expect a renaissance in quality tools and methods.” In addition, we’re working to bring this to the forefront of quality management and quality engineering practice at Intelex.

Quality 4.0 Evolution

The progression can be summarized through four themes. We’re in the “quality as discovery” stage today:

  • Quality as inspection: In the early days, quality assurance relied on inspecting bad quality out of items produced. Walter A. Shewhart’s methods for statistical process control (SPC) helped operators determine whether variation was due to random or special causes.
  • Quality as design: Next, more holistic methods emerged for designing quality in to processes. The goal is to prevent quality problems before they occur. These movements were inspired by W. Edwards Deming’s push to cease dependence on inspection, and Juran’s Quality by Design.
  • Quality as empowerment: By the 1990’s, organizations adopting TQM and Six Sigma advocated a holistic approach to quality. Quality is everyone’s responsibility and empowered individuals contribute to continuous improvement.
  • Quality as discovery: Because of emerging technologies, we’re at a new frontier. In an adaptive, intelligent environment, quality depends on how:
    • quickly we can discover and aggregate new data sources,
    • effectively we can discover root causes and
    • how well we can discover new insights about ourselves, our products and our organizations.”

Read more at http://asq.org/quality-progress/2018/10/basic-quality/lets-get-digital.html  or download the PDF (http://asq.org/quality-progress/2018/10/basic-quality/lets-get-digital.pdf)

What is Quality 4.0?

My first post of the year addresses an idea that’s just starting to gain traction – one you’ll hear a lot more about from me in 2018 and beyond: Quality 4.0.  It’s not a fad or trend, but a reminder that the business environment is changing, and that performance excellence in the future will depend on how well you adapt, change, and transform in response.

Although we started building community around this concept at the ASQ Quality 4.0 Summits on Disruption, Innovation, and Change in 2017 and 2018, the truly revolutionary work is yet to come.

What is Quality 4.0?

Quality 4.0 = Connectedness + Intelligence + Automation (C-I-A)

for Performance Innovation

The term “Quality 4.0” comes from “Industry 4.0” – the “fourth industrial revolution” originally addressed at the Hannover (Germany) Fair in 2011. That meeting emphasized the increasing intelligence and interconnectedness in “smart” manufacturing systems, and reflected on the newest technological innovations in historical context.

The Industrial Revolutions

  • In the first industrial revolution (late 1700’s), steam and water power made it possible for production facilities to scale up and expanded the potential locations for production.
  • By the late 1800’s, the discovery of electricity and development of associated infrastructure enabled the development of machines for mass production. In the US, the expansion of railways made it easier to obtain supplies and deliver finished goods. The availability of power also sparked a renaissance in computing, and digital computing emerged from its analog ancestor.
  • The third industrial revolution came at the end of the 1960’s, with the invention of the Programmable Logic Controller (PLC). This made it possible to automate processes like filling and reloading tanks, turning engines on and off, and controlling sequences of events based on changing conditions.

The Fourth Industrial Revolution

Although the growth and expansion of the internet accelerated innovation in the late 1990’s and 2000’s, we are just now poised for another industrial revolution. What’s changing?

  • Production & Availability of Information: More information is available because people and devices are producing it at greater rates than ever before. Falling costs of enabling technologies like sensors and actuators are catalyzing innovation in these areas.
  • Connectivity: In many cases, and from many locations, that information is instantly accessible over the internet. Improved network infrastructure is expanding the extent of connectivity, making it more widely available and more robust. (And unlike the 80’s and 90’s, there are far fewer communications protocols that are commonly encountered so it’s a lot easier to get one device to talk to another device on your network.)
  • Intelligent Processing: Affordable computing capabilities (and computing power!) are available to process that information so it can be incorporated into decision making. High-performance software libraries for advanced processing and visualization of data are easy to find, and easy to use. (In the past, we had to write our own… now we can use open-source solutions that are battle tested.
  • New Modes of Interaction: The way in which we can acquire and interact with information are also changing, in particular through new interfaces like Augmented Reality (AR) and Virtual Reality (VR), which expand possibilities for training and navigating a hybrid physical-digital environment with greater ease.
  • New Modes of Production: 3D printing, nanotechnology, and gene editing (CRISPR) are poised to change the nature and means of production in several industries. Technologies for enhancing human performance (e.g. exoskeletons, brain-computer interfaces, and even autonomous vehicles) will also open up new mechanisms for innovation in production. (Roco & Bainbridge (2002) describe many of these, and their prescience is remarkable.) New technologies like blockchain have the potential to change the nature of production as well, by challenging ingrained perceptions of trust, control, consensus, and value.

The fourth industrial revolution is one of intelligence: smart, hyperconnected cyber-physical systems that help humans and machines cooperate to achieved shared goals, and use data to generate value.

Enabling Technologies are Physical, Digital, and Biological

These enabling technologies include:

  • Information (Generate & Share)
    • Affordable Sensors and Actuators
    • Big Data infrastructure (e.g. MapReduce, Hadoop, NoSQL databases)
  • Connectivity
    • 5G Networks
    • IPv6 Addresses (which expand the number of devices that can be put online)
    • Internet of Things (IoT)
    • Cloud Computing
  • Processing
    • Predictive Analytics
    • Artificial Intelligence
    • Machine Learning (incl. Deep Learning)
    • Data Science
  • Interaction
    • Augmented Reality (AR)
    • Mixed Reality (MR)
    • Virtual Reality (VR)
    • Diminished Reality (DR)
  • Construction
    • 3D Printing
    • Additive Manufacturing
    • Smart Materials
    • Nanotechnology
    • Gene Editing
    • Automated (Software) Code Generation
    • Robotic Process Automation (RPA)
    • Blockchain

Today’s quality profession was born during the middle of the second industrial revolution, when methods were needed to ensure that assembly lines ran smoothly – that they produced artifacts to specifications, that the workers knew how to engage in the process, and that costs were controlled. As industrial production matured, those methods grew to encompass the design of processes which were built to produce to specifications. In the 1980’s and 1990’s, organizations in the US started to recognize the importance of human capabilities and active engagement in quality as essential, and TQM, Lean, and Six Sigma gained in popularity. 

How will these methods evolve in an adaptive, intelligent environment? The question is largely still open, and that’s the essence of Quality 4.0.

Roco, M. C., & Bainbridge, W. S. (2002). Converging technologies for improving human performance: Integrating from the nanoscale. Journal of nanoparticle research4(4), 281-295. (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.465.7221&rep=rep1&type=pdf)

A Simple Intro to Q-Learning in R: Floor Plan Navigation

This example is drawn from “A Painless Q-Learning Tutorial” at http://mnemstudio.org/path-finding-q-learning-tutorial.htm which explains how to manually calculate iterations using the updating equation for Q-Learning, based on the Bellman Equation (image from https://www.is.uni-freiburg.de/ressourcen/business-analytics/13_reinforcementlearning.pdf):

The example explores path-finding through a house:

The question to be answered here is: What’s the best way to get from Room 2 to Room 5 (outside)? Notice that by answering this question using reinforcement learning, we will also know how to find optimal routes from any room to outside. And if we run the iterative algorithm again for a new target state, we can find out the optimal route from any room to that new target state.

Since Q-Learning is model-free, we don’t need to know how likely it is that our agent will move between any room and any other room (the transition probabilities). If you had observed the behavior in this system over time, you might be able to find that information, but it many cases it just isn’t available. So the key for this problem is to construct a Rewards Matrix that explains the benefit (or penalty!) of attempting to go from one state (room) to another.

Assigning the rewards is somewhat arbitrary, but you should give a large positive value to your target state and negative values to states that are impossible or highly undesirable. Here’s the guideline we’ll use for this problem:

  • -1 if “you can’t get there from here”
  • 0 if the destination is not the target state
  • 100 if the destination is the target state

We’ll start constructing our rewards matrix by listing the states we’ll come FROM down the rows, and the states we’ll go TO in the columns. First, let’s fill the diagonal with -1 rewards, because we don’t want our agent to stay in the same place (that is, move from Room 1 to Room 1, or from Room 2 to Room 2, and so forth). The final one gets a 100 because if we’re already in Room 5, we want to stay there.

Next, let’s move across the first row. Starting in Room 0, we only have one choice: go to Room 4. All other possibilities are blocked (-1):

Now let’s fill in the row labeled 1. From Room 1, you have two choices: go to Room 3 (which is not great but permissible, so give it a 0) or go to Room 5 (the target, worth 100)!

Continue moving row by row, determining if you can’t get there from here (-1), you can but it’s not the target (0), or it’s the target(100). You’ll end up with a final rewards matrix that looks like this:

Now create this rewards matrix in R:

[code language=”r”]
R <- matrix(c(-1, -1, -1, -1, 0, -1,
-1, -1, -1, 0, -1, 100,
-1, -1, -1, 0, -1, -1,
-1, 0, 0, -1, 0, -1,
0, -1, -1, 0, -1, 100,
-1, 0, -1, -1, 0, 100), nrow=6, ncol=6, byrow=TRUE)
[/code]

And run the code. Notice that we’re calling the target state 6 instead of 5 because even though we have a room labeled with a zero, our matrix starts with a 1s so we have to adjust:

[code language=”r”]
source("https://raw.githubusercontent.com/NicoleRadziwill/R-Functions/master/qlearn.R")

results <- q.learn(R,10000,alpha=0.1,gamma=0.8,tgt.state=6)
> round(results)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0 0 0 0 80 0
[2,] 0 0 0 64 0 100
[3,] 0 0 0 64 0 0
[4,] 0 80 51 0 80 0
[5,] 64 0 0 64 0 100
[6,] 0 80 0 0 80 100
[/code]

You can read this table of average value to obtain policies. A policy is a “path” through the states of the system:

  • Start at Room 0 (first row, labeled 1): Choose Room 4 (80), then from Room 4 choose Room 5 (100)
  • Start at Room 1: Choose Room 5 (100)
  • Start at Room 2: Choose Room 3 (64), from Room 3 choose Room 1 or Room 4 (80); from 1 or 4 choose 5 (100)
  • Start at Room 3: Choose Room 1 or Room 4 (80), then Room 5 (100)
  • Start at Room 4: Choose Room 5 (100)
  • Start at Room 5: Stay at Room 5 (100)

To answer the original problem, we would take route 2-3-1-5 or 2-3-4-5 to get out the quickest if we started in Room 2. This is easy to see with a simple map, but is much more complicated when the maps get bigger.

« Older Entries