Monthly Archives: March 2015

Sampling Distributions and Central Limit Theorem in R

The Central Limit Theorem (CLT), and the concept of the sampling distribution, are critical for understanding why statistical inference works. There are at least a handful of problems that require you to invoke the Central Limit Theorem on every ASQ Certified Six Sigma Black Belt (CSSBB) exam. The CLT says that if you take many repeated samples from a population, and calculate the averages or sum of each one, the collection of those averages will be normally distributed… and it doesn’t matter what the shape of the source distribution is! (Caveat: so long as the data comes from a distribution with finite variance… so that means the Cauchy distribution doesn’t count.)

I wrote some R code to help illustrate this principle for my students. This code allows you to choose a sample size (n), a source distribution, and parameters for that source distribution, and generate a plot of the sampling distributions of the mean, sum, and variance. (Note: the sampling distribution for the variance is a Chi-square distribution — if your source distribution is normal!)

sdm.sim <- function(n,src.dist=NULL,param1=NULL,param2=NULL) {
   r <- 10000  # Number of replications/samples - DO NOT ADJUST
   # This produces a matrix of observations with  
   # n columns and r rows. Each row is one sample:
   my.samples <- switch(src.dist,
	"E" = matrix(rexp(n*r,param1),r),
	"N" = matrix(rnorm(n*r,param1,param2),r),
	"U" = matrix(runif(n*r,param1,param2),r),
	"P" = matrix(rpois(n*r,param1),r),
        "B" = matrix(rbinom(n*r,param1,param2),r),
	"G" = matrix(rgamma(n*r,param1,param2),r),
	"X" = matrix(rchisq(n*r,param1),r),
	"T" = matrix(rt(n*r,param1),r))
   all.sample.sums <- apply(my.samples,1,sum)
   all.sample.means <- apply(my.samples,1,mean)   
   all.sample.vars <- apply(my.samples,1,var) 
   par(mfrow=c(2,2))
   hist(my.samples[1,],col="gray",main="Distribution of One Sample")
   hist(all.sample.sums,col="gray",main="Sampling Distribution\nof
	the Sum")
   hist(all.sample.means,col="gray",main="Sampling Distribution\nof the Mean")
   hist(all.sample.vars,col="gray",main="Sampling Distribution\nof
	the Variance")
}

There are 8 population distributions to choose from: exponential (E), normal (N), uniform (U), Poisson (P), binomial (B), gamma (G), Chi-Square (X), and the Student’s t distribution (T). Note also that you have to provide either one or two parameters, depending upon what distribution you are selecting. For example, a normal distribution requires that you specify the mean and standard deviation to describe where it’s centered, and how fat or thin it is (that’s two parameters). A Chi-square distribution requires that you specify the degrees of freedom (that’s only one parameter). You can find out exactly what distributions require what parameters by going here: http://en.wikibooks.org/wiki/R_Programming/Probability_Distributions.

Here is an example that draws from an exponential distribution with a mean of 1/1 (you specify the number you want in the denominator of the mean):

sdm.sim(50,src.dist="E",param1=1)

The code above produces this sequence of plots:

sd-blog-1

You aren’t allowed to change the number of replications in this simulation because of the nature of the sampling distribution: it’s a theoretical model that describes the distribution of statistics from an infinite number of samples. As a result, if you increase the number of replications, you’ll see the mean of the sampling distribution bounce around until it converges on the mean of the population. This is just an artifact of the simulation process: it’s not a characteristic of the sampling distribution, because to be a sampling distribution, you’ve got to have an infinite number of samples. Watkins et al. have a great description of this effect that all statistics instructors should be aware of. I chose 10,000 for the number of replications because 1) it’s close enough to infinity to ensure that the mean of the sampling distribution is the same as the mean of the population, but 2) it’s far enough away from infinity to not crash your computer, even if you only have 4GB or 8GB of memory.

Here are some more examples to try. You can see that as you increase your sample size (n), the shapes of the sampling distributions become more and more normal, and the variance decreases, constraining your estimates of the population parameters more and more.

sdm.sim(10,src.dist="E",1)
sdm.sim(50,src.dist="E",1)
sdm.sim(100,src.dist="E",1)
sdm.sim(10,src.dist="X",14)
sdm.sim(50,src.dist="X",14)
sdm.sim(100,src.dist="X",14)
sdm.sim(10,src.dist="N",param1=20,param2=3)
sdm.sim(50,src.dist="N",param1=20,param2=3)
sdm.sim(100,src.dist="N",param1=20,param2=3)
sdm.sim(10,src.dist="G",param1=5,param2=5)
sdm.sim(50,src.dist="G",param1=5,param2=5)
sdm.sim(100,src.dist="G",param1=5,param2=5)

One-proportion z test in R

For quite a while, I’ve been confused by the behavior of the prop.test function in R: 1) the p-values always seem to be slightly off, and 2) the confidence intervals never seem to be symmetric around my estimate of the population proportion. Today, I decided I was going to figure out what was up… and whether I should trust my own analytical calculations for this test or what I get in R. I started by checking R Bloggers to see if anyone had explored this problem, but I only found one post about the two proportion z-test, and found none focusing on the one proportion z-test.

Here’s the essence of the test: you observe x subjects out of a sample size of n observations. Perhaps you just want to create a confidence interval to get a sense of where the true population proportion lies. Or maybe, you’d like to test the null hypothesis that the true population proportion is some value p0, against the alternative that it is really greater than, less than, or not equal to that value.

To do this, you can use the normal approximation to the binomial distribution to calculate a test statistic z. The estimated population proportion, p-hat, is just x/n. You have to find the standard error (SE) of your estimate as well to get the value of z:

one-prop-z-ts-formula

The confidence interval is constructed from the estimated population proportion, and the margin of error, which is determined by the standard error of the estimate and a scaling factor (z*) that establishes the width of the confidence interval:

ci-general-form

ci-one-proportion-i

ci-ine-proportion-iiHere’s an example. Let’s say I know that nationally, 90% of home buyers would select a home with energy efficient features, even paying a premium of 2-3%, to reduce their utility costs long term. As the hypothetical president of an engineering firm that installs these energy efficient features, I want to know if less than 90% of buyers in my local area feel this way (because if so, I might want to launch an education campaign to help my business). I randomly sample 360 potential buyers and find that 312 of them would indeed select a home with energy efficient features.

I want to test my sample against the null hypothesis that the true population proportion is 0.90, with the alternative that the proportion is less than 0.90. I use a estimated population proportion (p-hat) of 312/360=0.867, and using the equations above, find that my test statistic z turns out to be -2.108, with a corresponding p-value of 0.0175. I reject the null hypothesis that the true population proportion is 0.90 in favor of the alternative, and start making plans to launch my education program.  

(I don’t get the same values when I use prop.test. But later in this post, I’ll show you why I shouldn’t have expected that in the first place!!)

I wrote an R function to automate my z test (and also, to double check my calculations):

z.test <- function(x,n,p=NULL,conf.level=0.95,alternative="less") {
   ts.z <- NULL
   cint <- NULL
   p.val <- NULL
      phat <- x/n
      qhat <- 1 - phat
      # If you have p0 from the population or H0, use it.
      # Otherwise, use phat and qhat to find SE.phat:
      if(length(p) > 0) { 
         q <- 1-p
         SE.phat <- sqrt((p*q)/n) 
         ts.z <- (phat - p)/SE.phat
         p.val <- pnorm(ts.z)
         if(alternative=="two.sided") {
            p.val <- p.val * 2
         }
         if(alternative=="greater") {
            p.val <- 1 - p.val
         }
      } else {
      # If all you have is your sample, use phat to find
      # SE.phat, and don't run the hypothesis test:
         SE.phat <- sqrt((phat*qhat)/n)
      }
      cint <- phat + c( 
         -1*((qnorm(((1 - conf.level)/2) + conf.level))*SE.phat),
         ((qnorm(((1 - conf.level)/2) + conf.level))*SE.phat) )
   return(list(estimate=phat,ts.z=ts.z,p.val=p.val,cint=cint))
}

When I run it on my example above, I get what I expect:

> z.test(312,360,p=0.9)
$estimate
[1] 0.867

$ts.z
[1] -2.11

$p.val
[1] 0.0175

$cint
[1] 0.836 0.898

But when I run prop.test, this isn’t what I see at all!

> prop.test(312,360,p=0.9)

 1-sample proportions test with continuity correction

data: 312 out of 360, null probability 0.9
X-squared = 4.08, df = 1, p-value = 0.04335
alternative hypothesis: true p is not equal to 0.9
95 percent confidence interval:
 0.826 0.899
sample estimates:
 p 
0.867 

Why are there differences? Here’s what I found out:

  • The prop.test function doesn’t even do a z test. It does a Chi-square test, based on there being one categorical variable with two states (success and failure)! Which I should have noticed immediately from the line that starts with “X-squared” 🙂
  • The Yates continuity correction, which adjusts for the differences that come from using a normal approximation to the binomial distribution, is also automatically applied. This removes 0.5/n from the lower limit of the confidence interval, and adds 0.5/n to the upper limit.
  • Most notably, I was really disturbed to see that the confidence interval given by prop.test wasn’t even centered around the sample estimate, p-hat. Oh no! But again, this is no concern, since prop.test uses the Wilson score interval to construct the confidence interval. This results in an asymmetric, but presumably more accurate (with respect to the real population) confidence interval.

Moral of the story? There are many ways to test an observed proportion against a standard, target, or recommended value. If you’re doing research, and especially if you’re using R, be sure to know what’s going on under the hood before you report your results.

(Other moral? Try binom.test.)

Writing about Business Model Innovation: Where Do You Start?

Image Credit: Doug Buckley of http://hyperactive.to

Image Credit: Doug Buckley of http://hyperactive.to

Innovation is what happens when we create new value by meeting needs. Sometimes, this comes in the form of a new process, product, or program. Other times, it comes in the form of reconfiguring the whole ecosystem for creating and sharing that value — also known as “business model innovation.”

A business model is a conceptual tool containing a set of objects, concepts and their relationships with the objective to express the business logic of a specific firm. Therefore we must consider which concepts and relationships allow a simplified description and representation of what value is provided to customers, how this is done and with which financial consequences. — Osterwalder, Pigneur, & Tucci (2005)

But if you’re an academic (or part of the BIF community), where do you find research on business model innovation, and where should you publish research and insights about business model innovation? I have a paper that’s in preparation, and I need to know where I should send it (mostly, just to know how many words I should have before I stop writing). So I went on a little journey to help figure this out, which also yielded a recommendation for three authors that you really should read if you’re publishing in this area.

Step 1: Read Zott, Amit & Massa (2011) – which provides a contemporary literature review of research on business models.

Step 2: Decide whether you’d like to publish in a traditional journal that covers business models and business model innovation, but is not solely dedicated to that pursuit. As its source material, the Zott article drew from 9 academic journals and 3 practitioner journals that meet this criterion. You can start your process by exploring business model research in these journals:

Academic Journals:

Practitioner-oriented journals:

Step 3: Consider journals that are new and/or primarily focused on business model innovation. Here are 3 that I found, with information about their publication and the publishing process. Enjoy exploring.

1. Journal of Business Models
http://journals.aau.dk/index.php/JOBM
Author guidelines at http://journals.aau.dk/index.php/JOBM/about/submissions#authorGuidelines
Sample Issue at: http://journals.aau.dk/index.php/JOBM/issue/view/106
PAGE CHARGES: Yes, but amount not specified

OPEN ACCESS POLICY/Creative Commons (CC BY-NC-ND 3.0)
Audience: Academics and Consultants
Scope: International
Format: 5000-8000 words, MS Word
References: Harvard style (examples in author guidelines)
Accepts: Research paper, viewpoint, technical paper, conceptual paper, case study, lit review, review

The array of perspectives presented above lead to the identification of 10 key theme areas for which the
Journal of Business Models intends to have a special focus:

1. Business Model Design: designing, rejuvenating, innovating and facilitating
2. Implementing business models: the execution process
3. Commercialization and exploitation of ideas through business models: challenging entrepreneurial processes
4. Seeking the true benefits of a globalised world: how internationalization of activities affects business
models
5. Business model archetypes and key components: integrating building blocks and typologies
6. The strategic partnerships of business models: Roles and relationships within and among business models
7. Business models and high-tech ventures
8. The performance of business models: Dilemmas and paradoxes of performance measurement consequences
9. Defining what business models are about: The epistemological and conceptual roots of business models and
their differences with strategy, strategic management, organisation and business planning
10. Tools and techniques

“Soon we are opening a new section on book reviews. If you are interested in making a book review please send
an e-mail to Christian Nielsen (chn@business.aau.dk). The first review is of Osterwalder and Pigneur’s new
book “Value Proposition Design” which will be published soon.”

2. Long Range Planning (an International Journal of Strategic Management)
http://www.journals.elsevier.com/long-range-planning/
Author guidelines at: http://www.elsevier.com/journals/long-range-planning/0024-6301/guide-for-authors
Sample articles at: http://www.journals.elsevier.com/long-range-planning/open-access-articles/
PAGE CHARGES: Yes, $1800 USD, if you want your article to be open access.

OPEN ACCESS OPTION
Audience: Academics
Scope: International
Format: MS Word, length not specified
References: examples in author guidelines
Accepts: Research paper, technical paper

“The areas of work published by LRP include, among others: corporate strategy and governance, business strategy and new business models, international dimensions of strategy, strategies for emerging markets, entrepreneurship, innovation, organizational structure and design, corporate social responsibility, management of technology, methods for strategy research, and business processes.”

3. Open Journal of Business Model Innovation
http://www.scipublish.com/journals/BMI/
Auhor Guidelines at
Sample Article at: http://www.scipublish.com/journals/BMI/papers/1250 (can download PDF)
PAGE CHARGES: YES, but only after 3/31/05

OPEN ACCESS POLICY/IP sharing license not cited
Audience: Academics and Practitioners
Scope: International
Format: MS Word, “approximately 10 pages” with Cover Letter
References: examples in sample article
Accepts: Research papers, issue analysis, rigorous new insights that will advance the field

The Open Journal of Business Model Innovation is a peer-reviewed journal published by Scientific Online
Publishing. It presents current academic research and practical findings in field of business model
innovation. Topics appropriate and related to business model innovation include the role of business models
within corporations, the process and instruments for business model innovation, business models within
several industries, social business models and business models in emerging markets. Topics also include the
quantitative and qualitative evaluation of business models. The journal addresses issues as: What are the
drivers for business model innovation? How companies innovate their business model? How do companies evaluate
existing and new business models? How do companies integrate business models in their corporation? How do
companies manage multiple business models? Disciplinary boundaries that straddle business model innovation
include strategic management, entrepreneurship, innovation management and others.

Change vs. Transformation: What’s the Difference?

Image Credit: Doug Buckley of http://hyperactive.to

Transformation involves changing your frame of reference. Image Credit: Doug Buckley of http://hyperactive.to

Last week, on the Lean Six Sigma Worldwide discussion group, Gaurav Navula (CEO of Perky Pat India) asked us to reflect on the difference between change and transformation. Change management was a major thrust in the 1990’s and early 2000’s, but you don’t hear as much about it anymore. Today, the tools of change management (making the business case, aligning strategy with tactics, engaging stakeholders, instituting goal-directed training and education programs, etc.) have faded into the everyday landscape of management. Leaders seem to be focused more on “surviving and thriving” in the midst of rapid and disruptive innovation, which enhances the importance of transformation.

But what’s the difference? Just a couple months ago, Ron Ashkenas (on the Harvard Business Review blog) asserted that we don’t know the difference: “We really do know how to execute discrete changes. What we know much less about is how to engineer a transformation.”

But I think we do know how to engineer a transformation, and we can use this recipe. I’ll explain more towards the end of this post, but it acknowledges the relationship between larger-scale changes and transformation: that change is required for transformation, and all transformation involves change, but not all change is transformational. This is based on the idea that all observable changes come with “shifts in state” – from the quality management perspective, you can think of these as observed changes in system performance (cost savings, more efficient or effective use of time, increasing throughput, enhancing return on investment).

transformation

What this says is: transformation is what you get when you adjust the frame of reference that you observe the world with, and then add to that new perspective the product of all the shifts in state that have occurred as a result of incremental changes. I say “when you adjust the reference frame,” but that’s somewhat misleading. Usually there is some sort of transformational experience… an “a-ha” moment or event… where the scales fall from your eyes and you see the world in a completely different way. The shift in reference frame always involves relationships: either your relationship to other people or other groups, or your relationship to yourself and how you see yourself, or maybe both. 

“My sense is that there’s an underlying semantic problem, stemming from confusion between what constitutes “change” versus “transformation.” Many managers don’t realize that the two are not the same. And while we’ve actually come a long way in learning how to manage change, we continue to struggle with transformation.” — Ron Ashkenas, HBR blog

Here are some of the qualitative descriptions that have been offered to further articulate the differences between change and transformation. Notice that they do not conflict with the expression for transformation above.

Change:

  • Finite initiatives which may or may not be cross-cutting (HBR)
  • Desire to improve the past directs what we do (Mohanty)
  • Makes the system better (Mohanty)
  • Any time an organization asks its people or systems to stop, start, or execute in a new way a process, behavior or location of performanc (Holtz)
  • Making setups in different format within the given system to achieve improvements in performance (Bob Matthew)
  • Incremental (Anand)

Transformation:

  • A portfolio of open-ended initiatives which are necessarily cross-cutting (HBR)
  • The future directs your actions and only the limits of imagination and courage constrain possibilities (Mohanty)
  • Makes a better system (Mohanty)
  • The base of transformational is the word “formation” – the stuff things are made of or the structure – that needs to change for the change to be transformational. (du Plessis)
  • Encompasses bigger, more radical shifts (Holtz)
  • Makes a total change of system, procedure and a total mindset to get a better transparency and communication within the process owners including the customers. (Bob Matthew)
  • Should be informed by strategy (Kshirsagar)
  • Transformation is not a preference; it’s a necessity as a result of resistance to change. (Aydin)
  • Major; result of many changes (Anand)

Think about the last time you experienced a transformative change, perhaps even in your personal life. For example, think of a time when you were able to truly and completely forgive someone for some way they had wronged you. There were certainly a collection of changes in state that occurred — prior to, during, and after the forgiveness experience. But as a result, didn’t you also come to see the world in a completely different way? Your frame of reference with respect to that person… and probably, other people you have relationships with… also shifted.

Why the Ban on P-Values? And What Now?

Just recently, the editors of the academic journal Basic and Applied Social Psychology have decided to ban p-values: that’s right, the nexus for inferential decision making… gone! This has created quite a fuss among anyone who relies on significance testing and p-values to do research (especially those, presumably, in social psychology who were hoping to submit a paper to that journal any time soon). The Royal Statistical Society even shared six interesting letters from academics to see how they felt about the decision.

These letters all tell a similar story: yes, p-values can be mis-used and mis-interpreted, and we need to be more careful about how we plan for — and interpret the results of — just one study! But not everyone advocated throwing out the method in its entirety. I think a lot of the issues could be avoided if people had a better gut sense of what sampling error is… and how easy it is to encounter (and as a result, how easy it can be to accidentally draw a wrong conclusion in an inference study just based on sampling error). I wanted to do a simulation study to illustrate this for my students.

The Problem

You’re a student at a university in a class full of other students. I tell you to go out and randomly sample 100 students, asking them what their cumulative GPA is. You come back to me with 100 different values, and some mean value that represents the average of all the GPAs you went out and collected. You can also find the standard deviation of all the values in your sample of 100. Everyone has their own unique sample.

“It is misleading to emphasize the statistically significant findings of any single team. What matters is the totality of the evidence.” – John P. A. Ioannidis in Why Most Published Research Findings are False

It’s pretty intuitive that everyone will come back with a different sample… and thus everyone will have a different point estimate of the average GPA that students have at your university. But, according to the central limit theorem, we also know that if we take the collection of all the average GPAs and plot a histogram, it will be normally distributed with a peak around the real average GPA. Some students’ estimates will be really close to the real average GPA. Some students’ estimates will be much lower (for example, if you collected the data at a meeting for students who are on academic probation). Some students’ estimates will be much higher (for example, if you collected the data at a meeting for honors students). This is sampling error, which can lead to incorrect inferences during significance testing.

Inferential statistics is good because it lets us make decisions about a whole population just based on one sample. It would require a lot of time, or a lot of effort, to go out and collect a whole bunch of samples. Inferential statistics is bad if your sample size is too small (and thus you haven’t captured the variability in the population within your sample) or have one of these unfortunate too-high or too-low samples, because you can make incorrect inferences. Like this.

The Input Distribution

Let’s test this using simulation in R. Since we want to randomly sample the cumulative GPAs of students, let’s choose a distribution that reasonably reflects the distribution of all GPAs at a university. To do this, I searched the web to see if I could find data that might help me get this distribution. I found some data from the University of Colorado Boulder that describes GPAs and their corresponding percentile ranks. From this data, I could put together an empirical CDF, and then since the CDF is the integral of the PDF, I approximated the PDF by taking the derivatives of the CDF. (I know this isn’t the most efficient way to do it, but I wanted to see both plots):

score <- c(.06,2.17,2.46,2.67,2.86,3.01,3.17,3.34,3.43,3.45,
3.46,3.48,3.5,3.52,3.54,3.56,3.58,3.6,3.62,3.65,3.67,3.69,
3.71,3.74,3.77,3.79,3.82,3.85,3.88,3.91,3.94,3.96,4.0,4.0)
perc.ranks <- c(0,10,20,30,40,50,60,70,75,76,77,78,79,80,81,
82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100)
fn <- ecdf(perc.ranks)
xs <- score
ys <- fn(perc.ranks)
slope <- rep(NA,length(xs))
for (i in 2:length(xs)) {
 slope[i] <- (ys[i]-ys[i-1])/(xs[i]-xs[i-1])
}
slope[1] <- 0
slope[length(xs)] <- slope[length(xs)-1]

Then, I plotted them both together:

par(mfrow=c(1,2))
plot(xs,slope,type="l",main="Estimated PDF")
plot(xs,ys,type="l",main="Estimated CDF")
dev.off()

pdf-cdf-empirical

I looked around for a distribution that might approximate what I saw. (Yes, I am eyeballing.) I found the Stable Distribution, and then played around with the parameters until I plotted something that looked like the empirical PDF from the Boulder data:

x <- seq(0,4,length=100)
hx <- dstable(x, alpha=0.5, beta=0.75, gamma=1, delta=3.2)
plot(x,hx,type="l",lty=2,lwd=2)

stable-dist

The Simulation

First, I used pwr.t.test to do a power analysis to see what sample size I needed to obtain a power of 0.8, assuming a small but not tiny effect size, at a level of significance of 0.05. It told me I needed at least 89. So I’ll tell my students to each collect a sample of 100 other students.

Now that I have a distribution to sample from, I can pretend like I’m sending 10,000 students out to collect a sample of 100 students’ cumulative GPAs. I want each of my 10,000 students to run a one-sample t-test to evaluate the null hypothesis that the real cumulative GPA is 3.0 against the alternative hypothesis that the actual cumulative GPA is greater than 3.0. (Fortunately, R makes it easy for me to pretend I have all these students.)

sample.size <- 100
numtrials <- 10000
p.vals <- rep(NA,numtrials)
gpa.means <- rep(NA,numtrials)
compare.to <- 3.00
for (j in 1:numtrials) {
     r <- rstable(n=1000,alpha=0.5,beta=0.75,gamma=1,delta=3.2)
     meets.conds <- r[r>0 & r<4.001]
     my.sample <- round(meets.conds[1:sample.size],3)
     gpa.means[j] <- round(mean(my.sample),3)
     p.vals[j] <- t.test(my.sample,mu=compare.to,alternative="greater")$p.value
     if (p.vals[j] < 0.02) {
          # capture the last one of these data sets to look at later
     capture <- my.sample
     }
}

For all of my 10,000 students’ significance tests, look at the spread of p-values! They are all over the place! And there are 46 students whose p-values were less than 0.05… and they rejected the null. One of the distributions of observed GPAs for a student who would have rejected the null is shown below, and it looks just fine (right?) Even though the bulk of the P-Values are well over 0.05, and would have led to the accurate inference that you can’t reject the null in this case, there are still plenty of values that DO fall below that 0.05 threshold. 

> summary(p.vals)
 Min. 1st Qu. Median Mean 3rd Qu. Max. 
0.005457 0.681300 0.870900 0.786200 0.959300 1.000000 
> p.vals.under.pointohfive <- p.vals[p.vals<0.05]
> length(p.vals.under.pointohfive)
[1] 46
> par(mfrow=c(1,2))
> hist(capture,main="One Rogue Sample",col="purple")
> boxplot(p.vals,main="All P-Values")

rogue-sample

Even though the p-value shouted “reject the null!” from this rogue sample, a 99% confidence interval shows that the value I’m testing against… that average cumulative GPA of 3.0… is still contained within the confidence interval. So I really shouldn’t have ruled it out:

> mean(capture) + c(-1*qt(0.995,df=(sample.size-1))*(sd(capture)/sqrt(sample.size)),
+ qt(0.995,df=(sample.size-1))*(sd(capture)/sqrt(sample.size)))
[1] 2.989259 3.218201
> t.test(capture,mu=compare.to,alternative="greater")$p.value
[1] 0.009615011

If you’re doing real research, how likely are you to replicate your study so that you know if this happened to you? Not very likely at all, especially if collecting more data is costly (in terms of money or effort). Replication would alleviate the issues that can arise due to the inevitable sampling error… it’s just that we don’t typically do it ourselves, and we don’t make it easy for others to do it. Hence the p-value controversy.

What Now?

What can we do to improve the quality of our research so that we avoid the pitfalls associated with null hypothesis testing completely, or, to make sure that we’re using p-values more appropriately?

  • Make sure your sample size is big enough. This usually involves deciding what you want the power of the test to be, given a certain effect size that you’re trying to detect. A power of 0.80 means you’ll have an 80% chance of detecting an effect that’s actually there. However, knowing what your effect size is prior to your research can be difficult (if not impossible).
  • Be aware of biases that can be introduced by not having a random enough or representative enough sample.
  • Estimation. In our example above we might ask “How much greater than 3.0 is the average cumulative GPA at our university?” Check out Geoff Cummings’ article entitled “The New Statistics: Why and How” for a roadmap that will help you think more in terms of estimation (using effect sizes, confidence intervals, and meta-analysis).
  • Support open science. Make it easy for others to replicate your study. If you’re a journal reviewer, consider accepting more articles that replicate other studies, even if they aren’t “novel enough”.

I am certain that my argument has holes, but it seems to be a good example for students to better embrace the notion of sampling error (and become scared of it… or at least more mindful). Please feel free to suggest alternatives that could make this a more informative example. Thank you!

[ALSO… this xkcd cartoon pretty much nails the whole process of why p-values can be misleading. Enjoy.]

A Linear Congruential Generator (LCG) in R

In my simulation classes, we talk about how to generate random numbers. One of the techniques we talk about is the Linear Congruential Generator (LCG). Starting with a seed, the LCG produces the first number in the sequence, and then uses that value to generate the second one. The second value is used to generate the third, the third to generate the fourth, and so on. The equation looks like this:

lcg-regular

where a is a multiplier, c is a shift, and m is a modulus. (Modulus just means “find me the remainder when you divide the stuff to the left of the mod operator by the stuff to the right”. Fortunately there is an easy way to do this in R: a mod b is expressed as a %% b.) When you use the LCG to generate a stream of random numbers, they will always be between 0 and (m-1), and the sequence will repeat every m occurrences. When you take the sequence of values x and divide it by the mod m, you get uniformly distributed random numbers between 0 and 1 (but never quite hitting 1). The LCG is a pseudorandom number generator because after a while, the sequence in the stream of numbers will begin to repeat.

I wrote an exam last week with some LCG computations on it, but then I got lazy and didn’t want to do the manual calculations… so I wrote a function in R. Here it is. It returns two different vectors: the collection of numbers between 0 and (m – 1) that the LCG produces, and the collection of uniformly distributed random numbers between 0 and 1 that we get by dividing the first collection by m:

lcg <- function(a,c,m,run.length,seed) {
    x <- rep(0,run.length)
    x[1] <- seed
    for (i in 1:(run.length-1)) {
       x[i+1] <- (a * x[i] + c) %% m
    }
    U <- x/m # scale all of the x's to
             # produce uniformly distributed
             # random numbers between [0,1)
    return(list(x=x,U=U))
}

So for example, if your LCG looks like this:

lcg-specific

Then, after loading in the function above, you could choose a seed (say, 5) and generate a stream of 20 random numbers:

> z <- lcg(6,7,23,20,5)
> z
$x
 [1] 5 14 22 1 13 16 11 4 8 9 15 5 14 22 1 13 16 11
[19] 4 8

$U
 [1] 0.21739130 0.60869565 0.95652174 0.04347826 0.56521739
 [6] 0.69565217 0.47826087 0.17391304 0.34782609 0.39130435
[11] 0.65217391 0.21739130 0.60869565 0.95652174 0.04347826
[16] 0.56521739 0.69565217 0.47826087 0.17391304 0.34782609

The values contained within z$x are the numbers that the LCG produces, and the values contained within z$U are the values of z$x scaled to fall between 0 and 1. Both z$x and z$U are uniformly distributed, although the pattern will become more apparent as you generate more and more random numbers in your stream. Here’s what they look like with 10000 numbers in the stream:

> z <- lcg(6,7,23,10000,5)
> par(mfrow=c(1,2))
> hist(z$x,col="red")
> hist(z$U,col="purple")

lcg-hists

But how good is your random number generator? First, you know you want a large (and preferably odd valued mod). That way, the numbers will not repeat so quickly. But there are also considerations dealing with what multiplier and shift you should use. I like checking out the quality of my random number generator by plotting how they move in sequence around a two dimensional grid: the better the random number generator, the more filling of the space you will see. 

You can play around with these plots to find a “good” random number generator:

> z1 <- lcg(6,7,23,2000,5) # "Bad" LCG
> z2 <- lcg(1093,18257,86436,2000,12) # "Good" LCG
> par(mfrow=c(1,2))
> plot( z1$U[1:(length(z1$U)-1)], z1$U[2:length(z1$U)] , main="Bad")
> plot( z2$U[1:(length(z2$U)-1)], z2$U[2:length(z2$U)] , main="Good")

lcg-bad-good

Although there are many heuristics you can use to avoid creating a “bad” random number generator, it’s really difficult to design a “good” one. People make careers out of this. I don’t claim to be an expert, I just know how to test to see if a random number generator is suitable for my simulations. (I also really like the handout at http://www.sci.csueastbay.edu/~btrumbo/Stat3401/Hand3401/CongGenIntroB.pdf that explains this graphing approach more, and also talks about how you have to look at patterns in three dimensions as well. When I was searching around for a “good LCG that would give me a nice graph” I found this handout.)

Quality Has Always Been Global

In his February post, ASQ CEO Bill Troy asks “Why Should Quality ‘Go Global’?” ASQ has, over the past several years, expanded its reach as a member organization… “going global” to expand awareness of quality tools and techniques. This is being done to more deeply realize ASQ’s mission to “increase the use and impact of quality in response to the diverse needs of the world”.

But this approach forgets that quality can’t go global… it already is global! The notion (and pursuit) of quality is evident in the history of water quality and sanitation dating back to Ancient Greece and Rome, the creation (over centuries) of the measuring instruments and standards that have made way for modern methods of industrial production, cave paintings discovered in Egypt that show quality assurance inspectors presiding over work, and other stories. Deming’s groundbreaking work took place in Japan, in the midst of a vastly different culture than Deming’s own. In 1990, Quality Progress ran a series of articles called “China’s Ancient History of Managing for Quality” that provides a very rich examination of quality practices in that region.

Sure, Frederick Winslow Taylor was an American manufacturer, meaning that the principles of scientific management were first tested and implemented here… but the chemist, Le Chatelier, very quickly translated Taylor’s work to French and introduced the principles to manufacturing plans during World War I. At the same time, Henri Fayol was conceptualizing similar techniques for assessing and managing quality in that country. The journal Quality Engineering ran a piece in 1999 that described the history of quality management in France in the 20th century, along with practices and trends from several other European countries.

Quality systems provide mechanisms for us to achieve and accomplish whatever it is that we value. Every culture has a long and vibrant history of using tools, techniques, and standards to make these things happen. Perhaps instead of aiming to simply push the message of quality beyond the United States, ASQ could also seek the message of quality that artisans, engineers, and citizens in vastly different environments and cultures have developed over the past several centuries to offer quality professionals everywhere.