Category Archives: Education

Happy 10th Birthday!

10 years ago today, this blog published its first post: “How Do I Do a Lean Six Sigma (LSS) Project?” Looking back, it seems like a pretty simple place to have started. I didn’t know whether it would even be useful to anyone, but I was committed to making my personal PDSA cycles high-impact: I was going to export things I learned, or things I found valuable. (As it turns out, many people did appreciate the early posts even though it would take a few years for that to become evident!)

Since then, hundreds more have followed to help people understand more about quality and process improvement in theory and in practice. I started writing because I was in the middle of my PhD dissertation in the Quality Systems program at Indiana State, and I was discovering so many interesting nuggets of information that I wanted to share those with the world – particularly practitioners, who might not have lots of time (or even interest) in sifting through the research. In addition, I was using data science (and some machine learning, although at the time, it was much more difficult to implement) to explore quality-related problems, and could see the earliest signs that this new paradigm for problem solving might help fuel data-driven decision making in the workplace… if only we could make the advanced techniques easy for people in busy jobs to use and apply.

We’re not there yet, but as ASQ and other organizations recognize Quality 4.0 as a focus area, we’re much closer. As a result, I’ve made it my mission to help bring insights from research to practitioners, to make these new innovations real. If you are developing or demonstrating any new innovative techniques that relate to making people, processes, or products better, easier, faster, or less expensive — or reducing risks and building individual and organizational capabilities — let me know!

I’ve also learned a lot in the past decade, most of which I’ve spent helping undergraduate students develop and refine their data-driven decision making skills, and more recently at Intelex (provider of integrated environment, health & safety, and quality management EHSQ software to enterprises and smaller organizations). Here are some of the big lessons:

  1. People are complex. They have multidimensional lives, and work should support and enrich those lives. Any organization that cares about performance — internally and in the market — should examine how it can create complete and meaningful experiences. This applies not only to customers, but to employees and partners and suppliers. It also applies to anyone an organization has the power and potential to impact, no matter how small.
  2. Everybody wants to do a good job (and be recognized for it). How can we create environments where each person is empowered to contribute in all the areas where they have talent and interest? How can these same environments be designed with empathy as a core capability?
  3. Your data are your most valuable assets. It sounds trite, but data is becoming as valuable as warehouses, inventory, and equipment. I was involved in a project a few years ago where we digitized data that had been collected for three years — and by analyzing it, we uncovered improvement opportunities that when implemented, saved thousands of dollars a week. We would not have been able to do that if the data had remained scratched in pencil on thousands of sheets of well-worn legal paper.
  4. Nothing beats domain expertise (especially where data science is concerned). I’ve analyzed terabytes of data over the past decade, and in many cases, the secrets are subtle. Any time you’re using data to make decisions, be sure to engage the people with practical, on-the-ground experience in the area you’re studying.
  5. Self-awareness must be cultivated. The older you get, and the more experience you gain, the more you know what you don’t know. Many of my junior colleagues (and yours) haven’t reached this point yet, and will need some help from senior colleagues to gain this awareness. At the same time, those of you who are senior have valuable lessons to learn from your junior colleagues, too! Quality improvement is grounded in personal and organizational learning, and processes should help people help each other uncover blind spots and work through them — without fear.

Most of all, I discovered that what really matters is learning. We can spend time supporting human and organizational performance, developing and refining processes that have quality baked in, and making sure that products meet all their specifications. But what’s going on under the surface is more profound: people are learning about themselves, they are learning about how to transform inputs into outputs in a way that adds value, and they are learning about each other and their environment. Our processes just encapsulate that organizational knowledge that we develop as we learn.

Writing a Great Article Review

We’re teaching a class on blockchain and cryptocurrencies this semester, and since the field is so new and changing rapidly, we’ve asked our students to make finding and reviewing articles part of their learning practice this semester. This is a particularly challenging topic for this task because there’s so much hype, marketing, and fluff around these topics. We want to slice through that, and improve the signal-to-noise for people new to learning about blockchain and cryptocurrencies. Here are some tips I just prepared for our students — they may be helpful to anyone writing article reviews, especially for technology-related areas.

0 – Type of Source. Reviews or articles from from arXiv, Google Scholar were strong; reviews from Coindesk, CNN were weak; reviews from WSJ and Hacker Noon went both ways. Here are two submissions that were publishable with only minor edits:

1 – Spelling & Grammar. Most of you are college seniors, and the few who aren’t… are juniors. Please use complete sentences that make sense, with words that are spelled correctly! If this is hard for you, remember that every one of you has spell check. One way to remember this is to draft your posts in Word, and then perform spell check before you copy and paste what you wrote into WordPress.

1 – Your job is to create the TL;DR. What’s the essential substance of the source you’re reviewing? What are the main lessons or findings? If you were taking notes for an exam, what elements would you capture? (Using this perspective, commentary about how good or bad you think the article was, or what it didn’t cover well, would not help you on an exam.)

2 – Choose solid source material — primary sources, e.g. research papers, if possible. If the article is less than ~400-500 words, it’s probably not detailed enough to write a 250-300 word summary/analysis. Your job in this class is to break down complex topics & help people understand them. If your article is short and already very easy to understand, there’s nothing for you to do.

3 – Avoid “weasel words” (phrases or sentences that sound like marketing or clickbait but actually say nothing) and words/sentences that sound like you’re writing a Yelp or Amazon review rather than a critical academic review. Here are a couple weaselly examples drawn from this week’s draft posts (see if you can spot what’s wrong):

  • It is clear how beneficial blockchain can be to smaller businesses.
  • Blockchain has the potential to change the world.
  • Each other the topics covered in the article deserve their own piece and could be augmented upon greatly.
  • There is a degree of uncertainty that comes with an emerging technology.
  • Blockchain can bring them into the 21st century to compete with larger corporations.
  • Many people are scared of the changes, and governments will seek to regulate it.

4 – Answer the “so what” question. Why is this topic interesting or compelling?

5 – Choose information-rich tags. For example, in our class, don’t include blockchain as a tag… pretty much everything we do will be related to blockchain, and everyone will tend to use it, so there won’t be much information contained in the tag.

Where Do Z-Score Tables Come From? (+ how to make them in R)

Every student learns how to look up areas under the normal curve using Z-Score tables in their first statistics class. But what is less commonly covered, especially in courses where calculus is not a prerequisite, is where those Z-Score tables come from: figuring out the area under the normal curve for all possible places you could chop it into two, then making a table from it.

You get the z-score by evaluating the integral of the equation for the bell-shaped normal curve, usually from -Inf to the z-score of interest. This is the same thing that the R command pnorm does when you provide it with a z-score. Here is the slide presentation I put together to explain the use and origin of the Z-Score table, and how it relates to pnorm and qnorm (the command that lets you input an area to find the z-score at which the area to the left is swiped out). It’s free to use under Creative Commons, and is part of the course materials that is available for use with this 2017 book.

One of the fun things I did was to make my own z-score table in R. I don’t know why anyone would WANT to do this — they are easy to find in books, and online, and if you know how to use pnorm and qnorm, you don’t need one at all. But, you can, and here’s how.

First, let’s create a z-score table just with left-tail areas. Using symmetry, we can also use this to get any areas in the right tail, because the area to the left of any -z is the same as any area to the right of any +z. Even though the z-score table contains areas in its cells, our first step is to create a table just of the z-scores that correspond to each cell:

[code language=”r”]
c0 <- seq(-3.4,0,.1)
c1 <- seq(-3.41,0,.1)
c2 <- seq(-3.42,0,.1)
c3 <- seq(-3.43,0,.1)
c4 <- seq(-3.44,0,.1)
c5 <- seq(-3.45,0,.1)
c6 <- seq(-3.46,0,.1)
c7 <- seq(-3.47,0,.1)
c8 <- seq(-3.48,0,.1)
c9 <- seq(-3.49,0,.1)
z <- cbind(c0,c1,c2,c3,c4,c5,c6,c7,c8,c9)

c0 c1 c2 c3 c4 c5 c6 c7 c8 c9
[1,] -3.4 -3.41 -3.42 -3.43 -3.44 -3.45 -3.46 -3.47 -3.48 -3.49
[2,] -3.3 -3.31 -3.32 -3.33 -3.34 -3.35 -3.36 -3.37 -3.38 -3.39
[3,] -3.2 -3.21 -3.22 -3.23 -3.24 -3.25 -3.26 -3.27 -3.28 -3.29
[4,] -3.1 -3.11 -3.12 -3.13 -3.14 -3.15 -3.16 -3.17 -3.18 -3.19
[5,] -3.0 -3.01 -3.02 -3.03 -3.04 -3.05 -3.06 -3.07 -3.08 -3.09
[6,] -2.9 -2.91 -2.92 -2.93 -2.94 -2.95 -2.96 -2.97 -2.98 -2.99
[7,] -2.8 -2.81 -2.82 -2.83 -2.84 -2.85 -2.86 -2.87 -2.88 -2.89
[8,] -2.7 -2.71 -2.72 -2.73 -2.74 -2.75 -2.76 -2.77 -2.78 -2.79
[9,] -2.6 -2.61 -2.62 -2.63 -2.64 -2.65 -2.66 -2.67 -2.68 -2.69
[10,] -2.5 -2.51 -2.52 -2.53 -2.54 -2.55 -2.56 -2.57 -2.58 -2.59
[11,] -2.4 -2.41 -2.42 -2.43 -2.44 -2.45 -2.46 -2.47 -2.48 -2.49
[12,] -2.3 -2.31 -2.32 -2.33 -2.34 -2.35 -2.36 -2.37 -2.38 -2.39
[13,] -2.2 -2.21 -2.22 -2.23 -2.24 -2.25 -2.26 -2.27 -2.28 -2.29
[14,] -2.1 -2.11 -2.12 -2.13 -2.14 -2.15 -2.16 -2.17 -2.18 -2.19
[15,] -2.0 -2.01 -2.02 -2.03 -2.04 -2.05 -2.06 -2.07 -2.08 -2.09
[16,] -1.9 -1.91 -1.92 -1.93 -1.94 -1.95 -1.96 -1.97 -1.98 -1.99
[17,] -1.8 -1.81 -1.82 -1.83 -1.84 -1.85 -1.86 -1.87 -1.88 -1.89
[18,] -1.7 -1.71 -1.72 -1.73 -1.74 -1.75 -1.76 -1.77 -1.78 -1.79
[19,] -1.6 -1.61 -1.62 -1.63 -1.64 -1.65 -1.66 -1.67 -1.68 -1.69
[20,] -1.5 -1.51 -1.52 -1.53 -1.54 -1.55 -1.56 -1.57 -1.58 -1.59
[21,] -1.4 -1.41 -1.42 -1.43 -1.44 -1.45 -1.46 -1.47 -1.48 -1.49
[22,] -1.3 -1.31 -1.32 -1.33 -1.34 -1.35 -1.36 -1.37 -1.38 -1.39
[23,] -1.2 -1.21 -1.22 -1.23 -1.24 -1.25 -1.26 -1.27 -1.28 -1.29
[24,] -1.1 -1.11 -1.12 -1.13 -1.14 -1.15 -1.16 -1.17 -1.18 -1.19
[25,] -1.0 -1.01 -1.02 -1.03 -1.04 -1.05 -1.06 -1.07 -1.08 -1.09
[26,] -0.9 -0.91 -0.92 -0.93 -0.94 -0.95 -0.96 -0.97 -0.98 -0.99
[27,] -0.8 -0.81 -0.82 -0.83 -0.84 -0.85 -0.86 -0.87 -0.88 -0.89
[28,] -0.7 -0.71 -0.72 -0.73 -0.74 -0.75 -0.76 -0.77 -0.78 -0.79
[29,] -0.6 -0.61 -0.62 -0.63 -0.64 -0.65 -0.66 -0.67 -0.68 -0.69
[30,] -0.5 -0.51 -0.52 -0.53 -0.54 -0.55 -0.56 -0.57 -0.58 -0.59
[31,] -0.4 -0.41 -0.42 -0.43 -0.44 -0.45 -0.46 -0.47 -0.48 -0.49
[32,] -0.3 -0.31 -0.32 -0.33 -0.34 -0.35 -0.36 -0.37 -0.38 -0.39
[33,] -0.2 -0.21 -0.22 -0.23 -0.24 -0.25 -0.26 -0.27 -0.28 -0.29
[34,] -0.1 -0.11 -0.12 -0.13 -0.14 -0.15 -0.16 -0.17 -0.18 -0.19
[35,] 0.0 -0.01 -0.02 -0.03 -0.04 -0.05 -0.06 -0.07 -0.08 -0.09

Now that we have slots for all the z-scores, we can use pnorm to transform all those values into the areas that are swiped out to the left of that z-score. This part is easy, and only takes one line. The remaining three lines format and display the z-score table:

zscore.df <- round(pnorm(z),4)
row.names(zscore.df) <- sprintf(“%.2f”, c0)
colnames(zscore.df) <- seq(0,0.09,0.01)

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
-3.40 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002
-3.30 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003
-3.20 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005
-3.10 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007
-3.00 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
-2.90 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
-2.80 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
-2.70 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
-2.60 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
-2.50 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
-2.40 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
-2.30 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
-2.20 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
-2.10 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
-2.00 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
-1.90 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
-1.80 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
-1.70 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
-1.60 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
-1.50 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
-1.40 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
-1.30 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
-1.20 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
-1.10 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
-1.00 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
-0.90 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
-0.80 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
-0.70 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148
-0.60 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
-0.50 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
-0.40 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
-0.30 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
-0.20 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
-0.10 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
0.00 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641

You can also draw a picture to go along with your z-score table, so that people remember which area they are looking up:

x <- seq(-4,4,0.1)
y <- dnorm(x)
plot(x,dnorm(x),type=”l”, col=”black”, lwd=3)
polygon(c(x[1:31],rev(x[1:31])), c(rep(0,31),rev(y[1:31])), col=”lightblue”)

It looks like this:


In the slides, code to produce a giant-tail z-score table is also provided (where the areas are > 50%).

Innovation Tips for Strategic Planning

Image Credit: Doug Buckley of

Image Credit: Doug Buckley of

Over the past 15 years, I’ve helped several organizations with continuous improvement initiatives at the strategic, executive level. There are a lot of themes that keep appearing and reappearing, so the purpose of this post is to call out just a few and provide some insights in how to deal with them! 

These come up when you are engaged in strategic planning and when you are planning operations (to ensure that processes and procedures ultimately satisfy strategic goals), and are especially prominent when you’re trying to develop or use Key Performance Indicators (KPIs) and other metrics or analytics.


1) How do you measure innovation? Before you pick metrics, recognize that the answer to this question depends on how you articulate the strategic goals for your innovation outcomes. Do you want to:

  • Keep up with changing technology?
  • Develop a new product/technology?
  • Lead your industry in developing best practices?
  • Pioneer new business models?
  • Improve quality of life for a particular group of people?

All of these will be measured in different ways! And it’s OK to not strategically innovate in one area or another… for example, you might not want to innovate your business model if technology development is your forte. Innovation is one of those things where you really don’t want to be everything to everyone… by design.


2) Do you distinguish between improving productivity and generating impact?

Improving quality (the ability to satisfy stated and implied needs) is good. Improving productivity (that is, what you can produce given the resources that you use) is also good. Reducing defects, reducing waste, and reducing variation (sometimes) are all very good things to do, and to report on. 

But who really cares about any improvements at all unless they have impact? It’s always necessary to tie your KPIs, which are often measures of outcomes, to metrics or analytics that can tell the story about why a particular improvement was useful — in the short term, and (hopefully also) in the long term.

You also have to balance productivity and impact. For example, maybe you run an ultra-efficient 24/7 Help Desk. Your effectiveness is exemplary… when someone submits a request, it’s always satisfied within 8 hours. But you discover that no tickets come in between Friday at 5pm and Monday at 8am. So all that time you spend staffing that Help Desk on the weekend? It’s non-value-added time, and could be eliminated to improve your productivity… but won’t influence your impact at all.

We just worked on a project where we had to consciously had to think about how all the following interact… and you should too:

  • Organizational Productivity: did your improvement help increase the capacity or capability for part of your organization? If so, then it could contribute to technical productivity or business productivity.
  • Technical Productivity: did the improvement remove a technical barrier to getting work done, or make it faster or less error-prone?
  • Business Productivity: did the improvement help you get the needs of the business satisfied faster or better?
  • Business Impact: Did the improvements that yielded organizational productivity benefits, technical productivity benefits, or business productivity benefits make a difference at the strategic level? (This answers the “so what” question. So you improved your throughput by 83%… so what? Who really cares, and why does this matter to them? Long-term, why does this awesome thing you did really matter?)
  • Educational/Workforce Development Impact: Were the lessons learned captured, fed back into the organization’s processes to close the loop on learning, or maybe even used to educate people who may become part of your workforce pipeline?

All of the categories above are interrelated. I don’t think you can have a comprehensive, innovation-focused analytics approach unless you address all of these.


3) Do you distinguish between participation and engagement?

Participation means you showed up. Engagement means you got involved, you stayed involved, your mission was advanced, or maybe you used this experience to help society. Too often, I see organizations that want to improve engagement, and all the metrics they select are really good at characterizing participation.

I’m writing a paper on this topic right now, but in the meantime (if you want to get a REALLY good sense of the difference between participation and engagement), read The Participatory Museum by Nina Simon. Yes, it is “about museums” — and yes, I know you’re in business or industry — and YES, this book really will provide you with amazing management insights. So read it!

Analytic Hierarchy Process (AHP) using preferenceFunction in ahp

Yesterday, I wrote about how to use gluc‘s new ahp package on a simple Tom-Dick-Harry one level decision making problem using Analytic Hierarchy Process (AHP). One of the cool things about that package is that in addition to specifying the pairwise comparisons directly using Saaty’s scale (below, from…


…you can also describe each of the Alternatives in terms of descriptive variables which you can use inside a function to make the pairwise comparisons automatically. This is VERY helpful if you have lots of criteria, subcriteria, or alternatives to evaluate!! For example, I used preferenceFunction to compare 55 alternatives using 6 criteria and 4 subcriteria, and was very easily able to create functions to represent my judgments. This was much easier than manually entering all the comparisons.

This post shows HOW I replaced some of my manual comparisons with automated comparisons using preferenceFunction. (The full YAML file is included at the bottom of this post for you to use if you want to run this example yourself.) First, recall that the YAML file starts with specifying the alternatives that you are trying to choose from (at the bottom level of the decision hierarchy) and some variables that characterize those alternatives. I used the descriptions in the problem statement to come up with some assessments between 1=not great and 10=great:

[code language=”bash” gutter=”false”]
# Alternatives Section
# THIS IS FOR The Tom, Dick, & Harry problem at
Alternatives: &alternatives
# 1= not well; 10 = best possible
# Your assessment based on the paragraph descriptions may be different.
age: 50
experience: 7
education: 4
leadership: 10
age: 60
experience: 10
education: 6
leadership: 6
age: 30
experience: 5
education: 8
leadership: 6
# End of Alternatives Section

Here is a snippet from my original YAML file specifying my AHP problem manually ():

[code language=”bash” gutter=”false”]
– [Tom, Dick, 1/4]
– [Tom, Harry, 4]
– [Dick, Harry, 9]
children: *alternatives
– [Tom, Dick, 3]
– [Tom, Harry, 1/5]
– [Dick, Harry, 1/7]
children: *alternatives

And here is what I changed that snippet to, so that it would do my pairwise comparisons automatically. The functions are written in standard R (fortunately), and each function has access to a1 and a2 (the two alternatives). Recursion is supported which makes this capability particularly useful. I tried to write a function using two of the characteristics in the decision (a1$age and a1$experience) but this didn’t seen to work. I’m not sure whether the package supports it or not. Here are my comparisons rewritten as functions:

[code language=”bash” gutter=”false”]
preferenceFunction: >
ExperiencePreference <- function(a1, a2) {
if (a1$experience < a2$experience) return (1/ExperiencePreference(a2, a1))
ratio <- a1$experience / a2$experience
if (ratio < 1.05) return (1)
if (ratio < 1.2) return (2)
if (ratio < 1.5) return (3)
if (ratio < 1.8) return (4)
if (ratio < 2.1) return (5) return (6) } children: *alternatives Education: preferenceFunction: >
EducPreference <- function(a1, a2) {
if (a1$education < a2$education) return (1/EducPreference(a2, a1))
ratio <- a1$education / a2$education
if (ratio < 1.05) return (1)
if (ratio < 1.15) return (2)
if (ratio < 1.25) return (3)
if (ratio < 1.35) return (4)
if (ratio < 1.55) return (5)
return (5)
children: *alternatives

To run the AHP with functions in R, I used this code (I am including the part that gets the ahp package, in case you have not done that yet). BE CAREFUL and make sure, like in FORTRAN, that you line things up so that the words START in the appropriate columns. For example, the “p” in preferenceFunction MUST be immediately below the 7th character of your criterion’s variable name.

[code language=”bash” gutter=”false”]
devtools::install_github("gluc/ahp", build_vignettes = TRUE)


nofxnAhp <- LoadFile("tomdickharry.txt")
fxnAhp <- LoadFile("tomdickharry-fxns.txt")

print(nofxnAhp, "weight")
print(fxnAhp, "weight")

You can see that the weights are approximately the same, indicating that I did a good job at developing functions that represent the reality of how I used the variables attached to the Alternatives to make my pairwise comparisons. The results show that Dick is now the best choice, although there is some inconsistency in our judgments for Experience that we should examine further. (I have not examined this case to see whether rank reversal could be happening).

[code language=”bash” gutter=”false”]
> print(nofxnAhp, "weight")
levelName weight
1 Choose the Most Suitable Leader 1.00000000
2 ¦–Experience 0.54756924
3 ¦ ¦–Tom 0.21716561
4 ¦ ¦–Dick 0.71706504
5 ¦ °–Harry 0.06576935
6 ¦–Education 0.12655528
7 ¦ ¦–Tom 0.18839410
8 ¦ ¦–Dick 0.08096123
9 ¦ °–Harry 0.73064467
10 ¦–Charisma 0.26994992
11 ¦ ¦–Tom 0.74286662
12 ¦ ¦–Dick 0.19388163
13 ¦ °–Harry 0.06325174
14 °–Age 0.05592555
15 ¦–Tom 0.26543334
16 ¦–Dick 0.67162545
17 °–Harry 0.06294121

> print(fxnAhp, "weight")
levelName weight
1 Choose the Most Suitable Leader 1.00000000
2 ¦–Experience 0.54756924
3 ¦ ¦–Tom 0.25828499
4 ¦ ¦–Dick 0.63698557
5 ¦ °–Harry 0.10472943
6 ¦–Education 0.12655528
7 ¦ ¦–Tom 0.08273483
8 ¦ ¦–Dick 0.26059839
9 ¦ °–Harry 0.65666678
10 ¦–Charisma 0.26994992
11 ¦ ¦–Tom 0.74286662
12 ¦ ¦–Dick 0.19388163
13 ¦ °–Harry 0.06325174
14 °–Age 0.05592555
15 ¦–Tom 0.26543334
16 ¦–Dick 0.67162545
17 °–Harry 0.06294121

> ShowTable(fxnAhp)


Here is the full YAML file for the “with preferenceFunction” case.

[code language=”bash” gutter=”false”]
# Alternatives Section
# THIS IS FOR The Tom, Dick, & Harry problem at
Alternatives: &alternatives
# 1= not well; 10 = best possible
# Your assessment based on the paragraph descriptions may be different.
age: 50
experience: 7
education: 4
leadership: 10
age: 60
experience: 10
education: 6
leadership: 6
age: 30
experience: 5
education: 8
leadership: 6
# End of Alternatives Section
# Goal Section
# A Goal HAS preferences (within-level comparison) and HAS Children (items in level)
name: Choose the Most Suitable Leader
# preferences are defined pairwise
# 1 means: A is equal to B
# 9 means: A is highly preferrable to B
# 1/9 means: B is highly preferrable to A
– [Experience, Education, 4]
– [Experience, Charisma, 3]
– [Experience, Age, 7]
– [Education, Charisma, 1/3]
– [Education, Age, 3]
– [Age, Charisma, 1/5]
preferenceFunction: >
ExperiencePreference <- function(a1, a2) {
if (a1$experience < a2$experience) return (1/ExperiencePreference(a2, a1))
ratio <- a1$experience / a2$experience
if (ratio < 1.05) return (1)
if (ratio < 1.2) return (2)
if (ratio < 1.5) return (3)
if (ratio < 1.8) return (4)
if (ratio < 2.1) return (5) return (6) } children: *alternatives Education: preferenceFunction: >
EducPreference <- function(a1, a2) {
if (a1$education < a2$education) return (1/EducPreference(a2, a1))
ratio <- a1$education / a2$education
if (ratio < 1.05) return (1)
if (ratio < 1.15) return (2)
if (ratio < 1.25) return (3)
if (ratio < 1.35) return (4)
if (ratio < 1.55) return (5)
return (5)
children: *alternatives
– [Tom, Dick, 5]
– [Tom, Harry, 9]
– [Dick, Harry, 4]
children: *alternatives
– [Tom, Dick, 1/3]
– [Tom, Harry, 5]
– [Dick, Harry, 9]
children: *alternatives
# End of Goal Section

A Discrete Time Markov Chain (DTMC) SIR Model in R

Image Credit: Doug Buckley of

Image Credit: Doug Buckley of

There are many different techniques that be used to model physical, social, economic, and conceptual systems. The purpose of this post is to show how the Kermack-McKendrick (1927) formulation of the SIR Model for studying disease epidemics (where S stands for Susceptible, I stands for Infected, and R for Recovered) can be easily implemented in R as a discrete time Markov Chain using the markovchain package.

A Discrete Time Markov Chain (DTMC) is a model for a random process where one or more entities can change state between distinct timesteps. For example, in SIR, people can be labeled as Susceptible (haven’t gotten a disease yet, but aren’t immune), Infected (they’ve got the disease right now), or Recovered (they’ve had the disease, but no longer have it, and can’t get it because they have become immune). If they get the disease, they change states from Susceptible to Infected. If they get well, they change states from Infected to Recovered. It’s impossible to change states between Susceptible and Recovered, without first going through the Infected state. It’s totally possible to stay in the Susceptible state between successive checks on the population, because there’s not a 100% chance you’ll actually be infected between any two timesteps. You might have a particularly good immune system, or maybe you’ve been hanging out by yourself for several days programming.

Discrete time means you’re not continuously monitoring the state of the people in the system. It would get really overwhelming if you had to ask them every minute “Are you sick yet? Did you get better yet?” It makes more sense to monitor individuals’ states on a discrete basis rather than continuously, for example, like maybe once a day. (Ozgun & Barlas (2009) provide a more extensive illustration of the difference between discrete and continuous modeling, using a simple queuing system.)

To create a Markov Chain in R, all you need to know are the 1) transition probabilities, or the chance that an entity will move from one state to another between successive timesteps, 2) the initial state (that is, how many entities are in each of the states at time t=0), and 3) the markovchain package in R. Be sure to install markovchain before moving forward.

Imagine that there’s a 10% infection rate, and a 20% recovery rate. That implies that 90% of Susceptible people will remain in the Susceptible state, and 80% of those who are Infected will move to the Recovered Category, between successive timesteps. 100% of those Recovered will stay recovered. None of the people who are Recovered will become Susceptible.

Say that you start with a population of 100 people, and only 1 is infected. That means your “initial state” is that 99 are Susceptible, 1 is Infected, and 0 are Recovered. Here’s how you set up your Markov Chain:

mcSIR <- new("markovchain", states=c("S","I","R"),
    byrow=TRUE, nrow=3), name="SIR")
initialState <- c(99,1,0)

At this point, you can ask R to see your transition matrix, which shows the probability of moving FROM each of the three states (that form the rows) TO each of the three states (that form the columns).

> show(mcSIR)
 A  3 - dimensional discrete Markov Chain with following states
 S I R 
 The transition matrix   (by rows)  is defined as follows
    S   I   R
S 0.9 0.1 0.0
I 0.0 0.8 0.2
R 0.0 0.0 1.0

You can also plot your transition probabilities:



But all we’ve done so far is to create our model. We haven’t yet done a simulation, which would show us how many people are in each of the three states as you move from one discrete timestep to many others. We can set up a data frame to contain labels for each timestep, and a count of how many people are in each state at each timestep. Then, we fill that data frame with the results after each timestep i, calculated by initialState*mcSIR^i:

timesteps <- 100
sir.df <- data.frame( "timestep" = numeric(),
 "S" = numeric(), "I" = numeric(),
 "R" = numeric(), stringsAsFactors=FALSE)
 for (i in 0:timesteps) {
newrow <- as.list(c(i,round(as.numeric(initialState * mcSIR ^ i),0)))
 sir.df[nrow(sir.df) + 1, ] <- newrow

Now that we have a data frame containing our SIR results (sir.df), we can display them to see what the values look like:

> head(sir.df)
  timestep  S  I  R
1        0 99  1  0
2        1 89 11  0
3        2 80 17  2
4        3 72 22  6
5        4 65 25 10
6        5 58 26 15

And then plot them to view our simulation results using this DTMC SIR Model:

points(sir.df$timestep,sir.df$I, col="red")
points(sir.df$timestep,sir.df$R, col="green")


It’s also possible to use the markovchain package to identify elements of your system as it evolves over time:

> absorbingStates(mcSIR)
[1] "R"
> transientStates(mcSIR)
[1] "S" "I"
> steadyStates(mcSIR)
     S I R
[1,] 0 0 1

And you can calculate the first timestep that your Markov Chain reaches its steady state (the “time to absorption”), which your plot should corroborate:

> ab.state <- absorbingStates(mcSIR)
> <- min(which(sir.df[,ab.state]==max(sir.df[,ab.state])))
> (sir.df[row,]$timestep)+1
[1] 58

You can use this code to change the various transition probabilities to see what the effects are on the outputs yourself (sensitivity analysis). Also, there are methods you can use to perform uncertainty analysis, e.g. putting confidence intervals around your transition probabilities. We won’t do either of these here, nor will we create a Shiny app to run this simulation, despite the significant temptation.

My First (R) Shiny App: An Annotated Tutorial

Image Credit: Doug Buckley of

Image Credit: Doug Buckley of

I’ve been meaning to learn Shiny for 2 years now… and thanks to a fortuitous email from @ImADataGuy this morning and a burst of wild coding energy about 5 hours ago, I am happy to report that I have completely fallen in love again. The purpose of this post is to share how I got my first Shiny app up and running tonight on localhost, how I deployed it to the service, and how you can create a “Hello World” style program of your own that actually works on data that’s meaningful to you.

If you want to create a “Hello World!” app with Shiny (and your own data!) just follow these steps:

0. Install R 3.2.0+ first! This will save you time.
1. I signed up for an account at
2. Then I clicked the link in the email they sent me.
3. That allowed me to set up my location.
4. Then I followed the instructions at
of problems with devtools::install_github('rstudio/shinyapps') - Had to go 
into my R directory, manually delete RCurl and digest, then 
reinstall both RCurl and digest... then installing shinyapps worked.
Note: this last command they tell you to do WILL NOT WORK because you do not have an app yet! 
If you try it, this is what you'll see:
> shinyapps::deployApp('path/to/your/app')
Error in shinyapps::deployApp("path/to/your/app") : 
C:\Users\Nicole\Documents\path\to\your\app does not exist
5. Then I went to and installed rsconnect.
6. I clicked on my name and gravatar in the upper right hand corner of the window I had opened, and then clicked 
"tokens". I realized I'd already done this part, so I skipped down to read 
"A Demo App" on
7. Then, I re-installed ggplot2 and shiny using this command:
install.packages(c('ggplot2', 'shiny'))
8. I created a new directory (C:/Users/Nicole/Documents/shinyapps) and used
setwd to get to it.
9. I pasted the code at to create two files, 
server.R and ui.R, which I put into my new shinyapps directory 
under a subdirectory called demo. The subdirectory name IS your app name.
10. I typed runApp("demo") into my R console, and voila! The GUI appeared in 
my browser window on my localhost.
-- Don't just try to close the browser window to get the Shiny app 
to stop. R will hang. To get out of this, I had to use Task Manager and kill R.
--- Use the main menu, and do Misc -> Stop Current Computation
11. I did the same with the "Hello Shiny" code at 
But what I REALLY want is to deploy a hello world app with MY OWN data. You know, something that's 
meaningful to me. You probably want to do a test app with data that is meaningful to you... here's 
how you can do that.
12. A quick search shows that I need jennybc's (Github) googlesheets package to get 
data from Google Drive viewable in my new Shiny app.
13. So I tried to get the googlesheets package with this command:
but then found out it requires R version 3.2.0. I you already have 3.2.0 you can skip 
to step 16 now.
14. So I reinstalled R using the installr package (highly advised if you want to 
overcome the agony of upgrading on windows). 
for info -- all it requires is that you type installR() -- really!
15. After installing R I restarted my machine. This is probably the first time in a month that 
I've shut all my browser windows, documents, spreadsheets, PDFs, and R sessions. I got the feeling 
that this made my computer happy.
16. Then, I created a Google Sheet with my data. While viewing that document, I went to 
File -> "Publish to the Web". I also discovered that my DOCUMENT KEY is that 
looooong string in the middle of the address, so I copied it for later:
17. Then I created a new directory in C:/Users/Nicole/Documents/shinyapps to test out 
jennybc's googlesheets package, and called it jennybc
18. I copied and pasted the code in her server.R file and ui.R file
into files with the same names in my jennybc directory
19. I went into my R console, used getwd() to make sure I was in the
C:/Users/Nicole/Documents/shinyapps directory, and then typed
20. A browser window popped up on localhost with her test Shiny app! I played with it, and then 
closed that browser tab.
21. When I went back into the R console, it was still hanging, so I went to the menu bar 
to Misc -> Stop Current Computation. This brought my R prompt back.
22. Now it was time to write my own app. I went to and
found a layout I liked (, then copied the 
server.R and ui.R code into C:/Users/Nicole/Documents/shinyapps/my-hello -- 
and finally, tweaked the code and engaged in about 100 iterations of: 1) edit the two files, 
2) type runApp("my-hello") in the R console, 3) test my Shiny app in the 
browser window, 4) kill browser window, 5) do Misc -> Stop Current Computation 
in R. ALL of the computation happens in server.R, and all the display happens in ui.R:



my_key <- "1Bs0OH6F-Pdw5BG8yVo2t_VS9Wq1F7vb_VovOmnDSNf4"
my_ss <- gs_key(my_key)
my_data <- gs_read(my_ss)

shinyServer(function(input, output, session) {
 output$plot <- renderPlot({
 my_data$type <- ordered(my_data$type,levels=c("PRE","POST"))
 output$summary <- renderPrint({
 aggregate(score~type,data=my_data, summary)
 output$the_data <- renderDataTable({




 # Application title
 titlePanel("Nicole's First Shiny App"),
 # Sidebar with controls to select the random distribution type
 # and number of observations to generate. Note the use of the
 # br() element to introduce extra vertical spacing
     helpText("This is my first Shiny app!! It grabs some of my data 
from a Google Spreadsheet, and displays it here. I      
also used lots of examples from"),
href="", target="_blank")),
     h6(a("Click Here for a Tutorial on How It Was Made", 
 # Show a tabset that includes a plot, summary, and table view
 # of the generated distribution
    tabsetPanel(type = "tabs", 
    tabPanel("Plot", plotOutput("plot")), 
    tabPanel("Summary", verbatimTextOutput("summary")), 
    tabPanel("Table", DT::dataTableOutput("the_data"))

23. Once I decided my app was good enough for my practice round, it was time to 
deploy it to the cloud.
24. This part of the process requires the shinyapps and dplyr 
packages, so be sure to install them:

25. To deploy, all I did was this: setwd("C:/Users/Nicole/Documents/shinyapps/my-hello/")


« Older Entries Recent Entries »