Category Archives: Education

Where Do Z-Score Tables Come From? (+ how to make them in R)

z-score-table-iconEvery student learns how to look up areas under the normal curve using Z-Score tables in their first statistics class. But what is less commonly covered, especially in courses where calculus is not a prerequisite, is where those Z-Score tables come from: figuring out the area under the normal curve for all possible places you could chop it into two, then making a table from it.

You get the z-score by evaluating the integral of the equation for the bell-shaped normal curve, usually from -Inf to the z-score of interest. This is the same thing that the R command pnorm does when you provide it with a z-score. Here is the slide presentation I put together to explain the use and origin of the Z-Score table, and how it relates to pnorm and qnorm (the command that lets you input an area to find the z-score at which the area to the left is swiped out). It’s free to use under Creative Commons, and is part of the course materials that is available for use with this 2017 book.

One of the fun things I did was to make my own z-score table in R. I don’t know why anyone would WANT to do this — they are easy to find in books, and online, and if you know how to use pnorm and qnorm, you don’t need one at all. But, you can, and here’s how.

First, let’s create a z-score table just with left-tail areas. Using symmetry, we can also use this to get any areas in the right tail, because the area to the left of any -z is the same as any area to the right of any +z. Even though the z-score table contains areas in its cells, our first step is to create a table just of the z-scores that correspond to each cell:

[code language=”r”]
c0 <- seq(-3.4,0,.1)
c1 <- seq(-3.41,0,.1)
c2 <- seq(-3.42,0,.1)
c3 <- seq(-3.43,0,.1)
c4 <- seq(-3.44,0,.1)
c5 <- seq(-3.45,0,.1)
c6 <- seq(-3.46,0,.1)
c7 <- seq(-3.47,0,.1)
c8 <- seq(-3.48,0,.1)
c9 <- seq(-3.49,0,.1)
z <- cbind(c0,c1,c2,c3,c4,c5,c6,c7,c8,c9)
z

c0 c1 c2 c3 c4 c5 c6 c7 c8 c9
[1,] -3.4 -3.41 -3.42 -3.43 -3.44 -3.45 -3.46 -3.47 -3.48 -3.49
[2,] -3.3 -3.31 -3.32 -3.33 -3.34 -3.35 -3.36 -3.37 -3.38 -3.39
[3,] -3.2 -3.21 -3.22 -3.23 -3.24 -3.25 -3.26 -3.27 -3.28 -3.29
[4,] -3.1 -3.11 -3.12 -3.13 -3.14 -3.15 -3.16 -3.17 -3.18 -3.19
[5,] -3.0 -3.01 -3.02 -3.03 -3.04 -3.05 -3.06 -3.07 -3.08 -3.09
[6,] -2.9 -2.91 -2.92 -2.93 -2.94 -2.95 -2.96 -2.97 -2.98 -2.99
[7,] -2.8 -2.81 -2.82 -2.83 -2.84 -2.85 -2.86 -2.87 -2.88 -2.89
[8,] -2.7 -2.71 -2.72 -2.73 -2.74 -2.75 -2.76 -2.77 -2.78 -2.79
[9,] -2.6 -2.61 -2.62 -2.63 -2.64 -2.65 -2.66 -2.67 -2.68 -2.69
[10,] -2.5 -2.51 -2.52 -2.53 -2.54 -2.55 -2.56 -2.57 -2.58 -2.59
[11,] -2.4 -2.41 -2.42 -2.43 -2.44 -2.45 -2.46 -2.47 -2.48 -2.49
[12,] -2.3 -2.31 -2.32 -2.33 -2.34 -2.35 -2.36 -2.37 -2.38 -2.39
[13,] -2.2 -2.21 -2.22 -2.23 -2.24 -2.25 -2.26 -2.27 -2.28 -2.29
[14,] -2.1 -2.11 -2.12 -2.13 -2.14 -2.15 -2.16 -2.17 -2.18 -2.19
[15,] -2.0 -2.01 -2.02 -2.03 -2.04 -2.05 -2.06 -2.07 -2.08 -2.09
[16,] -1.9 -1.91 -1.92 -1.93 -1.94 -1.95 -1.96 -1.97 -1.98 -1.99
[17,] -1.8 -1.81 -1.82 -1.83 -1.84 -1.85 -1.86 -1.87 -1.88 -1.89
[18,] -1.7 -1.71 -1.72 -1.73 -1.74 -1.75 -1.76 -1.77 -1.78 -1.79
[19,] -1.6 -1.61 -1.62 -1.63 -1.64 -1.65 -1.66 -1.67 -1.68 -1.69
[20,] -1.5 -1.51 -1.52 -1.53 -1.54 -1.55 -1.56 -1.57 -1.58 -1.59
[21,] -1.4 -1.41 -1.42 -1.43 -1.44 -1.45 -1.46 -1.47 -1.48 -1.49
[22,] -1.3 -1.31 -1.32 -1.33 -1.34 -1.35 -1.36 -1.37 -1.38 -1.39
[23,] -1.2 -1.21 -1.22 -1.23 -1.24 -1.25 -1.26 -1.27 -1.28 -1.29
[24,] -1.1 -1.11 -1.12 -1.13 -1.14 -1.15 -1.16 -1.17 -1.18 -1.19
[25,] -1.0 -1.01 -1.02 -1.03 -1.04 -1.05 -1.06 -1.07 -1.08 -1.09
[26,] -0.9 -0.91 -0.92 -0.93 -0.94 -0.95 -0.96 -0.97 -0.98 -0.99
[27,] -0.8 -0.81 -0.82 -0.83 -0.84 -0.85 -0.86 -0.87 -0.88 -0.89
[28,] -0.7 -0.71 -0.72 -0.73 -0.74 -0.75 -0.76 -0.77 -0.78 -0.79
[29,] -0.6 -0.61 -0.62 -0.63 -0.64 -0.65 -0.66 -0.67 -0.68 -0.69
[30,] -0.5 -0.51 -0.52 -0.53 -0.54 -0.55 -0.56 -0.57 -0.58 -0.59
[31,] -0.4 -0.41 -0.42 -0.43 -0.44 -0.45 -0.46 -0.47 -0.48 -0.49
[32,] -0.3 -0.31 -0.32 -0.33 -0.34 -0.35 -0.36 -0.37 -0.38 -0.39
[33,] -0.2 -0.21 -0.22 -0.23 -0.24 -0.25 -0.26 -0.27 -0.28 -0.29
[34,] -0.1 -0.11 -0.12 -0.13 -0.14 -0.15 -0.16 -0.17 -0.18 -0.19
[35,] 0.0 -0.01 -0.02 -0.03 -0.04 -0.05 -0.06 -0.07 -0.08 -0.09
[/code]

Now that we have slots for all the z-scores, we can use pnorm to transform all those values into the areas that are swiped out to the left of that z-score. This part is easy, and only takes one line. The remaining three lines format and display the z-score table:

[code]
zscore.df <- round(pnorm(z),4)
row.names(zscore.df) <- sprintf("%.2f", c0)
colnames(zscore.df) <- seq(0,0.09,0.01)
zscore.df

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
-3.40 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002
-3.30 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0003
-3.20 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005
-3.10 0.0010 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0008 0.0007 0.0007
-3.00 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
-2.90 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
-2.80 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
-2.70 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
-2.60 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
-2.50 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
-2.40 0.0082 0.0080 0.0078 0.0075 0.0073 0.0071 0.0069 0.0068 0.0066 0.0064
-2.30 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
-2.20 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
-2.10 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
-2.00 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
-1.90 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
-1.80 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
-1.70 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
-1.60 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
-1.50 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
-1.40 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
-1.30 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
-1.20 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
-1.10 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
-1.00 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
-0.90 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
-0.80 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867
-0.70 0.2420 0.2389 0.2358 0.2327 0.2296 0.2266 0.2236 0.2206 0.2177 0.2148
-0.60 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
-0.50 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
-0.40 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
-0.30 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
-0.20 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
-0.10 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
0.00 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
[/code]

You can also draw a picture to go along with your z-score table, so that people remember which area they are looking up:

[code]
x <- seq(-4,4,0.1)
y <- dnorm(x)
plot(x,dnorm(x),type="l", col="black", lwd=3)
abline(v=-1,lwd=3,col="blue")
abline(h=0,lwd=3,col="black")
polygon(c(x[1:31],rev(x[1:31])), c(rep(0,31),rev(y[1:31])), col="lightblue")
[/code]

It looks like this:

z-score-table-icon-small

In the slides, code to produce a giant-tail z-score table is also provided (where the areas are > 50%).

Innovation Tips for Strategic Planning

Image Credit: Doug Buckley of http://hyperactive.to

Image Credit: Doug Buckley of http://hyperactive.to

Over the past 15 years, I’ve helped several organizations with continuous improvement initiatives at the strategic, executive level. There are a lot of themes that keep appearing and reappearing, so the purpose of this post is to call out just a few and provide some insights in how to deal with them! 

These come up when you are engaged in strategic planning and when you are planning operations (to ensure that processes and procedures ultimately satisfy strategic goals), and are especially prominent when you’re trying to develop or use Key Performance Indicators (KPIs) and other metrics or analytics.

 

1) How do you measure innovation? Before you pick metrics, recognize that the answer to this question depends on how you articulate the strategic goals for your innovation outcomes. Do you want to:

  • Keep up with changing technology?
  • Develop a new product/technology?
  • Lead your industry in developing best practices?
  • Pioneer new business models?
  • Improve quality of life for a particular group of people?

All of these will be measured in different ways! And it’s OK to not strategically innovate in one area or another… for example, you might not want to innovate your business model if technology development is your forte. Innovation is one of those things where you really don’t want to be everything to everyone… by design.

 

2) Do you distinguish between improving productivity and generating impact?

Improving quality (the ability to satisfy stated and implied needs) is good. Improving productivity (that is, what you can produce given the resources that you use) is also good. Reducing defects, reducing waste, and reducing variation (sometimes) are all very good things to do, and to report on. 

But who really cares about any improvements at all unless they have impact? It’s always necessary to tie your KPIs, which are often measures of outcomes, to metrics or analytics that can tell the story about why a particular improvement was useful — in the short term, and (hopefully also) in the long term.

You also have to balance productivity and impact. For example, maybe you run an ultra-efficient 24/7 Help Desk. Your effectiveness is exemplary… when someone submits a request, it’s always satisfied within 8 hours. But you discover that no tickets come in between Friday at 5pm and Monday at 8am. So all that time you spend staffing that Help Desk on the weekend? It’s non-value-added time, and could be eliminated to improve your productivity… but won’t influence your impact at all.

We just worked on a project where we had to consciously had to think about how all the following interact… and you should too:

  • Organizational Productivity: did your improvement help increase the capacity or capability for part of your organization? If so, then it could contribute to technical productivity or business productivity.
  • Technical Productivity: did the improvement remove a technical barrier to getting work done, or make it faster or less error-prone?
  • Business Productivity: did the improvement help you get the needs of the business satisfied faster or better?
  • Business Impact: Did the improvements that yielded organizational productivity benefits, technical productivity benefits, or business productivity benefits make a difference at the strategic level? (This answers the “so what” question. So you improved your throughput by 83%… so what? Who really cares, and why does this matter to them? Long-term, why does this awesome thing you did really matter?)
  • Educational/Workforce Development Impact: Were the lessons learned captured, fed back into the organization’s processes to close the loop on learning, or maybe even used to educate people who may become part of your workforce pipeline?

All of the categories above are interrelated. I don’t think you can have a comprehensive, innovation-focused analytics approach unless you address all of these.

 

3) Do you distinguish between participation and engagement?

Participation means you showed up. Engagement means you got involved, you stayed involved, your mission was advanced, or maybe you used this experience to help society. Too often, I see organizations that want to improve engagement, and all the metrics they select are really good at characterizing participation.

I’m writing a paper on this topic right now, but in the meantime (if you want to get a REALLY good sense of the difference between participation and engagement), read The Participatory Museum by Nina Simon. Yes, it is “about museums” — and yes, I know you’re in business or industry — and YES, this book really will provide you with amazing management insights. So read it!

Analytic Hierarchy Process (AHP) using preferenceFunction in ahp

Yesterday, I wrote about how to use gluc‘s new ahp package on a simple Tom-Dick-Harry one level decision making problem using Analytic Hierarchy Process (AHP). One of the cool things about that package is that in addition to specifying the pairwise comparisons directly using Saaty’s scale (below, from https://kristalaace2014.wordpress.com/2014/05/14/w12_al_vendor-evaluation/)…

saaty-scale

…you can also describe each of the Alternatives in terms of descriptive variables which you can use inside a function to make the pairwise comparisons automatically. This is VERY helpful if you have lots of criteria, subcriteria, or alternatives to evaluate!! For example, I used preferenceFunction to compare 55 alternatives using 6 criteria and 4 subcriteria, and was very easily able to create functions to represent my judgments. This was much easier than manually entering all the comparisons.

This post shows HOW I replaced some of my manual comparisons with automated comparisons using preferenceFunction. (The full YAML file is included at the bottom of this post for you to use if you want to run this example yourself.) First, recall that the YAML file starts with specifying the alternatives that you are trying to choose from (at the bottom level of the decision hierarchy) and some variables that characterize those alternatives. I used the descriptions in the problem statement to come up with some assessments between 1=not great and 10=great:

[code language=”bash” gutter=”false”]
#########################
# Alternatives Section
# THIS IS FOR The Tom, Dick, & Harry problem at
# https://en.wikipedia.org/wiki/Analytic_hierarchy_process_%E2%80%93_leader_example
#
Alternatives: &alternatives
# 1= not well; 10 = best possible
# Your assessment based on the paragraph descriptions may be different.
Tom:
age: 50
experience: 7
education: 4
leadership: 10
Dick:
age: 60
experience: 10
education: 6
leadership: 6
Harry:
age: 30
experience: 5
education: 8
leadership: 6
#
# End of Alternatives Section
#####################################
[/code]

Here is a snippet from my original YAML file specifying my AHP problem manually ():

[code language=”bash” gutter=”false”]
children:
Experience:
preferences:
– [Tom, Dick, 1/4]
– [Tom, Harry, 4]
– [Dick, Harry, 9]
children: *alternatives
Education:
preferences:
– [Tom, Dick, 3]
– [Tom, Harry, 1/5]
– [Dick, Harry, 1/7]
children: *alternatives
[/code]

And here is what I changed that snippet to, so that it would do my pairwise comparisons automatically. The functions are written in standard R (fortunately), and each function has access to a1 and a2 (the two alternatives). Recursion is supported which makes this capability particularly useful. I tried to write a function using two of the characteristics in the decision (a1$age and a1$experience) but this didn’t seen to work. I’m not sure whether the package supports it or not. Here are my comparisons rewritten as functions:

[code language=”bash” gutter=”false”]
children:
Experience:
preferenceFunction: >
ExperiencePreference <- function(a1, a2) {
if (a1$experience < a2$experience) return (1/ExperiencePreference(a2, a1))
ratio <- a1$experience / a2$experience
if (ratio < 1.05) return (1)
if (ratio < 1.2) return (2)
if (ratio < 1.5) return (3)
if (ratio < 1.8) return (4)
if (ratio < 2.1) return (5) return (6) } children: *alternatives Education: preferenceFunction: >
EducPreference <- function(a1, a2) {
if (a1$education < a2$education) return (1/EducPreference(a2, a1))
ratio <- a1$education / a2$education
if (ratio < 1.05) return (1)
if (ratio < 1.15) return (2)
if (ratio < 1.25) return (3)
if (ratio < 1.35) return (4)
if (ratio < 1.55) return (5)
return (5)
}
children: *alternatives
[/code]

To run the AHP with functions in R, I used this code (I am including the part that gets the ahp package, in case you have not done that yet). BE CAREFUL and make sure, like in FORTRAN, that you line things up so that the words START in the appropriate columns. For example, the “p” in preferenceFunction MUST be immediately below the 7th character of your criterion’s variable name.

[code language=”bash” gutter=”false”]
devtools::install_github("gluc/ahp", build_vignettes = TRUE)
install.packages("data.tree")

library(ahp)
library(data.tree)

setwd("C:/AHP/artifacts")
nofxnAhp <- LoadFile("tomdickharry.txt")
Calculate(nofxnAhp)
fxnAhp <- LoadFile("tomdickharry-fxns.txt")
Calculate(fxnAhp)

print(nofxnAhp, "weight")
print(fxnAhp, "weight")
[/code]

You can see that the weights are approximately the same, indicating that I did a good job at developing functions that represent the reality of how I used the variables attached to the Alternatives to make my pairwise comparisons. The results show that Dick is now the best choice, although there is some inconsistency in our judgments for Experience that we should examine further. (I have not examined this case to see whether rank reversal could be happening).

[code language=”bash” gutter=”false”]
> print(nofxnAhp, "weight")
levelName weight
1 Choose the Most Suitable Leader 1.00000000
2 ¦–Experience 0.54756924
3 ¦ ¦–Tom 0.21716561
4 ¦ ¦–Dick 0.71706504
5 ¦ °–Harry 0.06576935
6 ¦–Education 0.12655528
7 ¦ ¦–Tom 0.18839410
8 ¦ ¦–Dick 0.08096123
9 ¦ °–Harry 0.73064467
10 ¦–Charisma 0.26994992
11 ¦ ¦–Tom 0.74286662
12 ¦ ¦–Dick 0.19388163
13 ¦ °–Harry 0.06325174
14 °–Age 0.05592555
15 ¦–Tom 0.26543334
16 ¦–Dick 0.67162545
17 °–Harry 0.06294121

> print(fxnAhp, "weight")
levelName weight
1 Choose the Most Suitable Leader 1.00000000
2 ¦–Experience 0.54756924
3 ¦ ¦–Tom 0.25828499
4 ¦ ¦–Dick 0.63698557
5 ¦ °–Harry 0.10472943
6 ¦–Education 0.12655528
7 ¦ ¦–Tom 0.08273483
8 ¦ ¦–Dick 0.26059839
9 ¦ °–Harry 0.65666678
10 ¦–Charisma 0.26994992
11 ¦ ¦–Tom 0.74286662
12 ¦ ¦–Dick 0.19388163
13 ¦ °–Harry 0.06325174
14 °–Age 0.05592555
15 ¦–Tom 0.26543334
16 ¦–Dick 0.67162545
17 °–Harry 0.06294121

> ShowTable(fxnAhp)
[/code]

tomdick-ahp-fxns

Here is the full YAML file for the “with preferenceFunction” case.

[code language=”bash” gutter=”false”]
#########################
# Alternatives Section
# THIS IS FOR The Tom, Dick, & Harry problem at
# https://en.wikipedia.org/wiki/Analytic_hierarchy_process_%E2%80%93_leader_example
#
Alternatives: &alternatives
# 1= not well; 10 = best possible
# Your assessment based on the paragraph descriptions may be different.
Tom:
age: 50
experience: 7
education: 4
leadership: 10
Dick:
age: 60
experience: 10
education: 6
leadership: 6
Harry:
age: 30
experience: 5
education: 8
leadership: 6
#
# End of Alternatives Section
#####################################
# Goal Section
#
Goal:
# A Goal HAS preferences (within-level comparison) and HAS Children (items in level)
name: Choose the Most Suitable Leader
preferences:
# preferences are defined pairwise
# 1 means: A is equal to B
# 9 means: A is highly preferrable to B
# 1/9 means: B is highly preferrable to A
– [Experience, Education, 4]
– [Experience, Charisma, 3]
– [Experience, Age, 7]
– [Education, Charisma, 1/3]
– [Education, Age, 3]
– [Age, Charisma, 1/5]
children:
Experience:
preferenceFunction: >
ExperiencePreference <- function(a1, a2) {
if (a1$experience < a2$experience) return (1/ExperiencePreference(a2, a1))
ratio <- a1$experience / a2$experience
if (ratio < 1.05) return (1)
if (ratio < 1.2) return (2)
if (ratio < 1.5) return (3)
if (ratio < 1.8) return (4)
if (ratio < 2.1) return (5) return (6) } children: *alternatives Education: preferenceFunction: >
EducPreference <- function(a1, a2) {
if (a1$education < a2$education) return (1/EducPreference(a2, a1))
ratio <- a1$education / a2$education
if (ratio < 1.05) return (1)
if (ratio < 1.15) return (2)
if (ratio < 1.25) return (3)
if (ratio < 1.35) return (4)
if (ratio < 1.55) return (5)
return (5)
}
children: *alternatives
Charisma:
preferences:
– [Tom, Dick, 5]
– [Tom, Harry, 9]
– [Dick, Harry, 4]
children: *alternatives
Age:
preferences:
– [Tom, Dick, 1/3]
– [Tom, Harry, 5]
– [Dick, Harry, 9]
children: *alternatives
#
# End of Goal Section
#####################################
[/code]

A Discrete Time Markov Chain (DTMC) SIR Model in R

Image Credit: Doug Buckley of http://hyperactive.to

Image Credit: Doug Buckley of http://hyperactive.to

There are many different techniques that be used to model physical, social, economic, and conceptual systems. The purpose of this post is to show how the Kermack-McKendrick (1927) formulation of the SIR Model for studying disease epidemics (where S stands for Susceptible, I stands for Infected, and R for Recovered) can be easily implemented in R as a discrete time Markov Chain using the markovchain package.

A Discrete Time Markov Chain (DTMC) is a model for a random process where one or more entities can change state between distinct timesteps. For example, in SIR, people can be labeled as Susceptible (haven’t gotten a disease yet, but aren’t immune), Infected (they’ve got the disease right now), or Recovered (they’ve had the disease, but no longer have it, and can’t get it because they have become immune). If they get the disease, they change states from Susceptible to Infected. If they get well, they change states from Infected to Recovered. It’s impossible to change states between Susceptible and Recovered, without first going through the Infected state. It’s totally possible to stay in the Susceptible state between successive checks on the population, because there’s not a 100% chance you’ll actually be infected between any two timesteps. You might have a particularly good immune system, or maybe you’ve been hanging out by yourself for several days programming.

Discrete time means you’re not continuously monitoring the state of the people in the system. It would get really overwhelming if you had to ask them every minute “Are you sick yet? Did you get better yet?” It makes more sense to monitor individuals’ states on a discrete basis rather than continuously, for example, like maybe once a day. (Ozgun & Barlas (2009) provide a more extensive illustration of the difference between discrete and continuous modeling, using a simple queuing system.)

To create a Markov Chain in R, all you need to know are the 1) transition probabilities, or the chance that an entity will move from one state to another between successive timesteps, 2) the initial state (that is, how many entities are in each of the states at time t=0), and 3) the markovchain package in R. Be sure to install markovchain before moving forward.

Imagine that there’s a 10% infection rate, and a 20% recovery rate. That implies that 90% of Susceptible people will remain in the Susceptible state, and 80% of those who are Infected will move to the Recovered Category, between successive timesteps. 100% of those Recovered will stay recovered. None of the people who are Recovered will become Susceptible.

Say that you start with a population of 100 people, and only 1 is infected. That means your “initial state” is that 99 are Susceptible, 1 is Infected, and 0 are Recovered. Here’s how you set up your Markov Chain:

library(markovchain)
mcSIR <- new("markovchain", states=c("S","I","R"),
    transitionMatrix=matrix(data=c(0.9,0.1,0,0,0.8,0.2,0,0,1),
    byrow=TRUE, nrow=3), name="SIR")
initialState <- c(99,1,0)

At this point, you can ask R to see your transition matrix, which shows the probability of moving FROM each of the three states (that form the rows) TO each of the three states (that form the columns).

> show(mcSIR)
SIR
 A  3 - dimensional discrete Markov Chain with following states
 S I R 
 The transition matrix   (by rows)  is defined as follows
    S   I   R
S 0.9 0.1 0.0
I 0.0 0.8 0.2
R 0.0 0.0 1.0

You can also plot your transition probabilities:

plot(mcSIR,package="diagram")

dtmc-sir-transitionnetwork

But all we’ve done so far is to create our model. We haven’t yet done a simulation, which would show us how many people are in each of the three states as you move from one discrete timestep to many others. We can set up a data frame to contain labels for each timestep, and a count of how many people are in each state at each timestep. Then, we fill that data frame with the results after each timestep i, calculated by initialState*mcSIR^i:

timesteps <- 100
sir.df <- data.frame( "timestep" = numeric(),
 "S" = numeric(), "I" = numeric(),
 "R" = numeric(), stringsAsFactors=FALSE)
 for (i in 0:timesteps) {
newrow <- as.list(c(i,round(as.numeric(initialState * mcSIR ^ i),0)))
 sir.df[nrow(sir.df) + 1, ] <- newrow
 }

Now that we have a data frame containing our SIR results (sir.df), we can display them to see what the values look like:

> head(sir.df)
  timestep  S  I  R
1        0 99  1  0
2        1 89 11  0
3        2 80 17  2
4        3 72 22  6
5        4 65 25 10
6        5 58 26 15

And then plot them to view our simulation results using this DTMC SIR Model:

plot(sir.df$timestep,sir.df$S)
points(sir.df$timestep,sir.df$I, col="red")
points(sir.df$timestep,sir.df$R, col="green")

dtmc-sir-simulation

It’s also possible to use the markovchain package to identify elements of your system as it evolves over time:

> absorbingStates(mcSIR)
[1] "R"
> transientStates(mcSIR)
[1] "S" "I"
> steadyStates(mcSIR)
     S I R
[1,] 0 0 1

And you can calculate the first timestep that your Markov Chain reaches its steady state (the “time to absorption”), which your plot should corroborate:

> ab.state <- absorbingStates(mcSIR)
> occurs.at <- min(which(sir.df[,ab.state]==max(sir.df[,ab.state])))
> (sir.df[row,]$timestep)+1
[1] 58

You can use this code to change the various transition probabilities to see what the effects are on the outputs yourself (sensitivity analysis). Also, there are methods you can use to perform uncertainty analysis, e.g. putting confidence intervals around your transition probabilities. We won’t do either of these here, nor will we create a Shiny app to run this simulation, despite the significant temptation.

My First (R) Shiny App: An Annotated Tutorial

Image Credit: Doug Buckley of http://hyperactive.to

Image Credit: Doug Buckley of http://hyperactive.to

I’ve been meaning to learn Shiny for 2 years now… and thanks to a fortuitous email from @ImADataGuy this morning and a burst of wild coding energy about 5 hours ago, I am happy to report that I have completely fallen in love again. The purpose of this post is to share how I got my first Shiny app up and running tonight on localhost, how I deployed it to the http://shinyapps.io service, and how you can create a “Hello World” style program of your own that actually works on data that’s meaningful to you.

If you want to create a “Hello World!” app with Shiny (and your own data!) just follow these steps:

0. Install R 3.2.0+ first! This will save you time.
1. I signed up for an account at http://shinyapps.io.
2. Then I clicked the link in the email they sent me.
3. That allowed me to set up my https://radziwill.shinyapps.io location.
4. Then I followed the instructions at https://www.shinyapps.io/admin/#/dashboard
(This page has SPECIAL SECRET INFO CUSTOMIZED JUST FOR YOU ON IT!!) I had lots 
of problems with devtools::install_github('rstudio/shinyapps') - Had to go 
into my R directory, manually delete RCurl and digest, then 
reinstall both RCurl and digest... then installing shinyapps worked.
Note: this last command they tell you to do WILL NOT WORK because you do not have an app yet! 
If you try it, this is what you'll see:
> shinyapps::deployApp('path/to/your/app')
Error in shinyapps::deployApp("path/to/your/app") : 
C:\Users\Nicole\Documents\path\to\your\app does not exist
5. Then I went to http://shiny.rstudio.com/articles/shinyapps.html and installed rsconnect.
6. I clicked on my name and gravatar in the upper right hand corner of the 
https://www.shinyapps.io/admin/#/dashboard window I had opened, and then clicked 
"tokens". I realized I'd already done this part, so I skipped down to read 
"A Demo App" on http://shiny.rstudio.com/articles/shinyapps.html
7. Then, I re-installed ggplot2 and shiny using this command:
install.packages(c('ggplot2', 'shiny'))
8. I created a new directory (C:/Users/Nicole/Documents/shinyapps) and used
setwd to get to it.
9. I pasted the code at http://shiny.rstudio.com/articles/shinyapps.html to create two files, 
server.R and ui.R, which I put into my new shinyapps directory 
under a subdirectory called demo. The subdirectory name IS your app name.
10. I typed runApp("demo") into my R console, and voila! The GUI appeared in 
my browser window on my localhost.
-- Don't just try to close the browser window to get the Shiny app 
to stop. R will hang. To get out of this, I had to use Task Manager and kill R.
--- Use the main menu, and do Misc -> Stop Current Computation
11. I did the same with the "Hello Shiny" code at http://shiny.rstudio.com/articles/shinyapps.html. 
But what I REALLY want is to deploy a hello world app with MY OWN data. You know, something that's 
meaningful to me. You probably want to do a test app with data that is meaningful to you... here's 
how you can do that.
12. A quick search shows that I need jennybc's (Github) googlesheets package to get 
data from Google Drive viewable in my new Shiny app.
13. So I tried to get the googlesheets package with this command:
devtools::install_github('jennybc/googlesheets')
but then found out it requires R version 3.2.0. I you already have 3.2.0 you can skip 
to step 16 now.
14. So I reinstalled R using the installr package (highly advised if you want to 
overcome the agony of upgrading on windows). 
See http://www.r-statistics.com/2013/03/updating-r-from-r-on-windows-using-the-installr-package/
for info -- all it requires is that you type installR() -- really!
15. After installing R I restarted my machine. This is probably the first time in a month that 
I've shut all my browser windows, documents, spreadsheets, PDFs, and R sessions. I got the feeling 
that this made my computer happy.
16. Then, I created a Google Sheet with my data. While viewing that document, I went to 
File -> "Publish to the Web". I also discovered that my DOCUMENT KEY is that 
looooong string in the middle of the address, so I copied it for later:
1Bs0OH6F-Pdw5BG8yVo2t_VS9Wq1F7vb_VovOmnDSNf4
17. Then I created a new directory in C:/Users/Nicole/Documents/shinyapps to test out 
jennybc's googlesheets package, and called it jennybc
18. I copied and pasted the code in her server.R file and ui.R file
from https://github.com/jennybc/googlesheets/tree/master/inst/shiny-examples/01_read-public-sheet 
into files with the same names in my jennybc directory
19. I went into my R console, used getwd() to make sure I was in the
C:/Users/Nicole/Documents/shinyapps directory, and then typed
 runApp("jennybc")
20. A browser window popped up on localhost with her test Shiny app! I played with it, and then 
closed that browser tab.
21. When I went back into the R console, it was still hanging, so I went to the menu bar 
to Misc -> Stop Current Computation. This brought my R prompt back.
22. Now it was time to write my own app. I went to http://shiny.rstudio.com/gallery/ and
found a layout I liked (http://shiny.rstudio.com/gallery/tabsets.html), then copied the 
server.R and ui.R code into C:/Users/Nicole/Documents/shinyapps/my-hello -- 
and finally, tweaked the code and engaged in about 100 iterations of: 1) edit the two files, 
2) type runApp("my-hello") in the R console, 3) test my Shiny app in the 
browser window, 4) kill browser window, 5) do Misc -> Stop Current Computation 
in R. ALL of the computation happens in server.R, and all the display happens in ui.R:

server.R:

library(shiny)
library(googlesheets)
library(DT)

my_key <- "1Bs0OH6F-Pdw5BG8yVo2t_VS9Wq1F7vb_VovOmnDSNf4"
my_ss <- gs_key(my_key)
my_data <- gs_read(my_ss)

shinyServer(function(input, output, session) {
 output$plot <- renderPlot({
 my_data$type <- ordered(my_data$type,levels=c("PRE","POST"))
 boxplot(my_data$score~my_data$type,ylim=c(0,100),boxwex=0.6)
 })
 output$summary <- renderPrint({
 aggregate(score~type,data=my_data, summary)
 })
 output$the_data <- renderDataTable({
 datatable(my_data)
 })

})

ui.R:

library(shiny)
library(shinythemes)
library(googlesheets)

shinyUI(fluidPage(
 
 # Application title
 titlePanel("Nicole's First Shiny App"),
 
 # Sidebar with controls to select the random distribution type
 # and number of observations to generate. Note the use of the
 # br() element to introduce extra vertical spacing
 sidebarLayout(
 sidebarPanel(
     helpText("This is my first Shiny app!! It grabs some of my data 
from a Google Spreadsheet, and displays it here. I      
also used lots of examples from"),
     h6(a("http://shiny.rstudio.com/gallery/", 
href="http://shiny.rstudio.com/gallery/", target="_blank")),
     br(),
     h6(a("Click Here for a Tutorial on How It Was Made", 
href="http://qualityandinnovation.com/2015/12/08/my-first-shin     
y-app-an-annotated-tutorial/",
      target="_blank")),
      br()
 ),
 
 # Show a tabset that includes a plot, summary, and table view
 # of the generated distribution
 mainPanel(
    tabsetPanel(type = "tabs", 
    tabPanel("Plot", plotOutput("plot")), 
    tabPanel("Summary", verbatimTextOutput("summary")), 
    tabPanel("Table", DT::dataTableOutput("the_data"))
 )
 )
 )
))


23. Once I decided my app was good enough for my practice round, it was time to 
deploy it to the cloud.
24. This part of the process requires the shinyapps and dplyr 
packages, so be sure to install them:

devtools::install_github('hadley/dplyr')
library(dplyr)
devtools::install_github('rstudio/shinyapps')
library(shinyapps)
25. To deploy, all I did was this: setwd("C:/Users/Nicole/Documents/shinyapps/my-hello/")
deployApp()

CHECK OUT MY SHINY APP!!

What if Your Job Was Focused on Play?

james-siegal

James Siegal (picture from his Twitter profile, @jsiegal at http://twitter.com/jsiegal)

Last weekend, I had the opportunity to talk to James Siegal, the President of KaBOOM! – a non-profit whose mission is lighthearted, but certainly not frivolous: to bring balanced and active play into the daily lives of all kids! James is another new Business Innovation Factory (BIF) storyteller for 2015… and I wanted to find out how I could learn from his experiences to bring a sense of play into the work environment. (For me, that’s at a university, interacting with students on a daily basis.)

Over the past 20 years, KaBOOM! has built thousands of playgrounds, focusing on children growing up in poverty. By enlisting the help of over a million volunteers, James and his organization have mobilized communities using a model that starts with kids designing their dream playgrounds. It’s a form of crowdsourced placemaking.

Now, KaBOOM! is thinking about a vision that’s a little broader: driving social change at the city level. Doing this, they’ve found, requires answering one key question: How can you integrate play into the daily routine for kids and families? If play is a destination, there are “hassle factors” that must be overcome: safety, travel time, good lighting, and restroom facilities, for starters. So, in addition to building playgrounds, KaBOOM! is challenging cities to think about integrating play everywhere — on the sidewalk, at the bus stop, and beyond.

How can this same logic apply to organizations integrating play into their cultures? Although KaBOOM! focuses on kids, he had some more generalizable advice:

  • The desire for play has to be authentic, not forced. “We truly value kids, and we truly value families. Our policies and our culture strive to reflect that.” What does your organization value at its core? Seek to amplify the enjoyment of that.
  • We take our work really seriously,” he said. “We don’t take ourselves too seriously. You have to leave your ego at the door.” Can your organization engage in more playful collaboration?
  • We drive creativity out of kids as they grow older, he noted. “Kids expect to play everywhere,” and so even ordinary elements like sidewalks can turn into experiences. (This reminded me of how people decorate the Porta-Potties at Burning Man with lights and music… although I wouldn’t necessarily do the same thing to the restrooms at my university, it did make me think about how we might make ordinary places or situations more fun for our students.)

KaBOOM! is such a unique organization that I had to ask James: what’s the most amazing thing you’ve ever observed in your role as President? He says it’s something that hasn’t just happened once… but happens every time KaBOOM! organizes a new playground build. When people from diverse backgrounds come together with a strong shared mission, vision, and purpose, you foster intense community engagement that yields powerful, tangible results — and this is something that so many organizations strive to achieve.

If you haven’t made plans already to hear James and the other storytellers at BIF, there may be a few tickets left — but this event always sells out! Check the BIF registration page and share a memorable experience with the BIF community this year: http://www.businessinnovationfactory.com/summit/register

A Chat with Jaime Casap, Google’s Chief Education Evangelist

jaime-casap-head

“The classroom of the future does not exist!”

That’s the word from Jaime Casap (@jcasap), Google’s Chief Education Evangelist — and a highly anticipated new Business Innovation Factory (BIF) storyteller for 2015.  In advance of the summit which takes place on September 16 and 17, Morgan and I had the opportunity to chat with Jaime about a form of business model innovation that’s close to our hearts – improving education. He’s a native New Yorker, so he’s naturally outspoken and direct. But his caring and considerate tone makes it clear he’s got everyone’s best interests at heart.

At Google, he’s the connector and boundary spanner… the guy the organization trusts to “predict the future” where education is concerned. He makes sure that the channels of communication are open between everyone working on education-related projects. Outside of Google, he advocates smart and innovative applications of technology in education that will open up educational opportunities for everyone.  Most recently, he visited the White House on this mission.

jaime-quote-image

The current system educational system is not broken, he says. It’s doing exactly what it was designed to do: prepare workers for a hierarchical, industrialized production economy. The problem is that the system cannot be high-performing because it’s not doing what we need it to for the upcoming decades, which requires leveraging the skills and capabilities of everyone.

He points out that low-income minorities now have a 9% chance of graduating from college… whereas a couple decades ago, they had a 6% chance. This startling statistic reflects an underlying deficiency in how education is designed and delivered in this country today.

So how do we fix it?

“Technology gives us the ability to question everything,” he says.  As we shift to performance-based assessments, we can create educational experiences that are practical, iterative, and focused on continuous improvement — where we measure iteration, innovation, and sustained incremental progress.

Measuring these, he says, will be a lot more interesting than what we tend to measure now: whether a learner gets something right the first time — or how long it took for a competency to emerge. From this new perspective, we’ll finally be able to answer questions like: What is an excellent school? What does a high-performing educational system look (and feel) like?

Jaime’s opportunity-driven vision for inclusiveness  is an integral part of Google’s future. And you can hear more about his personal story and how it shaped this vision next month at BIF.

If you haven’t made plans already to hear Jaime and the other storytellers at BIF, there may be a few tickets left — but this event always sells out! Check the BIF registration page and share a memorable experience with the BIF community this year: http://www.businessinnovationfactory.com/summit/register

« Older Entries Recent Entries »