Top 10 Statistics Topics for the Six Sigma Black Belt (CSSBB) Exam

Image Credit: Doug Buckley (http://hyperactive.to)

[IMPORTANT NOTE! As of April 20, 2015 I now have a NEW FAVORITE introductory statistics textbook… the one I’ve always dreamed of having, but it just never existed before. But today it does!! <3]

Not too long ago, Darrah Turman from New Jersey contacted me for some additional insight into preparing for the ASQ Certified Six Sigma Black Belt (CSSBB) exam. He’s taking it in March, and like many prospective Black Belts, he’s most concerned about the statistics parts of the exam… it’s been many years since he’s had a statistics course.

As a result, I’m going to start a series of blog posts over the next two weeks that you can follow along with if you’re busily getting ready for your exam. Today, we’ll start with one of Darrah’s questions: How do I focus on the right statistics? I’ve decided to post my “Top 10 Statistics Topics” that seem to be featured heavily in Six Sigma.

Here are My Top 10 Six Sigma Statistics Topics!

Central Limit Theorem – This is the magic that serves as the foundation for so much of what quality professionals practice. In short, whenever you take many samples for which you have a sum or an average value that you’ve computed over that sample, the distribution of the whole collection of sums or means is going to be normal!! This is why when we’re spot checking parts or products in quality control, we take batch averages and know they’re going to be distributed normally. Find out more here!
Know Your Distributions! – Distributions come in many shapes and sizes, and you should be familiar with how to describe and characterize them (also, be able to recognize their equations). Continuous, discrete, normal, Poisson, binomial, hypergeometric, exponential, Weibull, uniform, symmetric, unimodal, bimodal… you should be familiar with all the words that describe distributions.
Know Your Inference Tests! – It’s helpful to have a general sense of which inference test is appropriate for which kind of problem. For example, if you’re trying to figure out whether two categorical variables are independent, that’s a Chi square test of independence. If you’re trying to figure out whether a mean matches a particular standard, target, or recommended value, that’s a one-sample t-test. As part of knowing your inference tests, you should know what the form of the null hypothesis is for each test, as well as the form for each incarnation of the alternative hypothesis (there will be between one and three of them for each test).
Type I, Type II, and Power Analysis – [Book Chapter + PPT] – If you’re planning a statistical inference test, it’s important to know how big a sample size you need so that your results will be statistically significant, and you’ll also need to balance the trade-offs between the different types of errors you can encounter. This chapter will help you do all that.
Computing Confidence Intervals – Just by knowing the average and standard deviation of a small sample size, you can use the Student’s t distribution to quickly and easily compute a confidence interval, because all confidence intervals come in the form Estimate +/- Margin of Error. The most complex part is learning how to look up the t value for the appropriate confidence interval size, and degrees of freedom. (Confused as to whether you should use the normal distribution or the t distribution? Don’t be… always use the t distribution. As your sample size gets bigger and bigger, the shape of the t distribution will get more and more like the shape of the corresponding normal distribution, until they are exactly the same.)
Using the Normal Model to Find Areas Under the Curve – [Book Chapter + PPT] – It’s really good to be familiar with z-score problems. In addition to making you more comfortable with the normal model, it’s a useful technique for finding the probability of observing values in a particular range.
Understanding Scatterplots, Correlation Coefficient (r), and Coefficient of Determination (R2) – Scatterplots help us see the relationship between values of two quantitative variables. Correlation tells us how much scatter is in the data, and the coefficient of determination tells us what proportion of the variability in the data is explained by a (typically linear) model.
Process Capability Problems – You should be able to tell the difference between your Cp’s and Cpk’s, and perform basic calculations. Also know that if your data is not normal, you’re going to have to use some kind of data transformation before you determine process capability.
Know Your Control Charts! – There are many different incarnations of control charts. You should be able to distinguish your variables from your attributes, and understand when to apply the various kinds of control chart (along with basic calculations).
Logit-Probit & Odds Ratios – These models help you deal with situations where there is a binary response variable. Basic familiarity with what the regression models do, and how to calculate odds, should be a part of your study plan.