Category Archives: Measurement

What Kentucky Derby Handicapping Can Teach Us About Organizational Metrics

My Favorite (#10, Firing Line), from http://www.telegraph.co.uk/sport/horseracing/11574821/Kentucky-Derby-Simon-Callaghan-has-Firing-Line-primed.html

My Favorite (#10, Firing Line), from http://www.telegraph.co.uk/sport/horseracing/11574821/Kentucky-Derby-Simon-Callaghan-has-Firing-Line-primed.html. Apr 29, 2015; Louisville, KY, USA; Exercise rider Humberto Gomez works out Kentucky Derby hopeful Firing Line trained by Simon Callaghan at Churchill Downs. Mandatory Credit: Jamie Rhodes-USA TODAY Sports

I love horse racing. More specifically, I love betting on the horses. Why? Because it’s a complex exercise in data science, requiring you to integrate (what feels like) hundreds of different kinds of performance measures — and environmental factors (like weather) — to predict which horse will come in first, second, third, and maybe even fourth (if you’re betting a superfecta). And, you can win actual money!

I spent most of the day yesterday handicapping for Kentucky Derby 2015, before stopping at the track to place my bets for today. As I was going through the handicapping process, I realized that I’m essentially following the analysis process that we use as Examiners when we review applications for the Malcolm Baldrige National Quality Award (MBNQA). We apply “LeTCI” — pronounced like “let’s see” — to determine whether an organization has constructed a robust, reliable, and relevant assessment program to evaluate their business and their results. (And if they haven’t, LeTCI can provide some guidance on how to continuously improve to get there).

LeTCI stands for “Levels, Trends, Comparisons, and Integration”. In Baldrige parlance, here’s what we mean by each of those:

  • Levels: This refers to categorical or quantitative values that “place or position an organization’s results and performance on a meaningful measurement scale. Performance levels permit evaluation relative to past performance, projections, goals, and appropriate comparisons.” [1] Your measured levels refer to where you’re at now — your current performance. 
  • Trends: These describe the direction and/or rate of your performance improvements, including the slope of the trend data (if appropriate) and the breadth of your performance results. [2] “A minimum of three data points is generally needed to begin to ascertain a trend.” [1]
  • Comparisons: This “refers to establishing the value of results by their relationship to similar or equivalent measures. Comparisons can be made to results of competitors, industry averages, or best-in-class organizations. The maturity of the organization should help determine what comparisons are most relevant.” [1] This also includes performance relative to benchmarks.
  • Integration: This refers to “the extent to which your results measures address important customer, product, market, process, and action plan performance requirements” and “whether your results are harmonized across processes and work units to support organization-wide goals.” [2]

(Quoted sections above come from http://www.dtic.mil/ndia/2008cmmi/Track7/TuesdayPM/7059olson.pdf, Slide 31 [1] and http://www.baldrige21.com/Baldrige%20Scoring%20System.html. [2])

Here’s a snapshot of my Kentucky Derby handicapping process, using LeTCI. (I also do it for other horse races, but the Derby has got to be one of the most challenging prediction tasks of the year.) Derby prediction is fascinating because all of the horses are excellent, for the most part — and what you’re trying to do is determine on this particular day, against these particular competitors, how likely is a horse to win? Although my handicapping process is much more complex than what I lay out below, this should give you a sense of the process that I use, and how it relates to the Baldrige LeTCI approach:

  • Levels: First, I have to check out the current performance levels of each contender in the Derby. What’s the horse’s current Beyer speed score or Bris score (that is, are they fast enough to win this race)? What are the recent exercise times? If a horse isn’t running 5 furlongs in under a minute, then I wonder (for example) if they can handle the Derby pace. Has this horse raced on this particular track, or with this particular jockey? I can also check out the racing pedigree of the horse through metrics like “dosage”. 
  • Trends: Next, I look at a few key trends. Have the horse’s past races been preparing him for the longer distance of the Derby? Ideally, I want to see that the two prior races were a mile and a sixteenth, and a mile and an eighth. Is their Beyer speed score increasing, at least over the past three races? Depending on the weather for Louisville, has this horse shown a liking for either fast or muddy tracks? Has the horse won a race recently? 
  • Comparisons: Is the horse paired with a jockey he has been successful with in the past? I spend a lot of time comparing the horses to each other as well. A horse doesn’t have to beat track records to win… he just has to beat the other horses. Even a slow horse will win if the other horses are slower. Additionally, you have to compare the horse’s performance to baselines provided by the other horses throughout the duration of the race. Does your horse tend to get out in front, and then burn out? Or does he stalk the other horses and then launch an attack in the end, pulling out in front as a closer? You have to compare the performance of the horse to the performance of the other horses longitudinally  — because the relative performance will change as the race progresses.
  • Integration: What kind of story do all of these metrics tell together? That’s the real trick of handicapping horse races… the part where you have to bring everything together in to a cohesive, coherent way. This is also the part where you have to apply intuition. Do I really think this horse is ready to pull off a victory today, at this particular track, against these contenders and embedded in the wild and festive Derby environment (which a horse may not have experienced yet)?

And what does this mean for organizational metrics? To me, it means that when I’m formulating and evaluating business metrics I should take a perspective that’s much more like handicapping a major horse race — because assessing performance is intricately tied to capabilities, context, the environment, and what’s bound to happen now, in the near future.

Analyzing Monthly Expenses with a Pareto Chart

andy-duong-picThis month, ASQ CEO Paul Borawski encourages us to share stories about “quality solutions in unexpected places.” This is such a fun question, because now I’ll be noticing these unexpected gems all month – and probably beyond! 

Today’s gem comes from my former student Andy, who has heard me get excited about quality tools and continuous improvement – and the R statistical software – a LOT over the past few years! Even though he graduated in the spring of 2012, he’s still applying quality solutions to his own life – and this was a very unexpected place for me to find such a thing! I can’t hold back my own personal excitement for improvement and the pursuit of excellence, even as my standards for excellence evolve, and it’s so heartwarming to see how this has influenced Andy’s life.

A couple months ago, Andy posted about how he used a Pareto chart to explore his own monthly expenses, and brainstorm ways to improve his financial situation as a recent college graduate. Want to explore your own finances? Andy’s post can help you… and can also help you use R to produce nice charts and graphs to tell your story. Check it out!!

Google Measures Energy to Conserve Energy

Why measure? Because measurement compels behavior. I’ve written about this previously in my article on the Trash Guy, but now Google is taking note:

”Studies show that being able to see your energy usage makes it easier to reduce it.”

This is the driver for their new Google PowerMeter project, which envisions a future where access to energy informatics is through your desktop. The project, an initiative of Google.org (the philanthropic research arm of Google), provides this as their pitch:

London to Brighton Veteran Car Run“How much does it cost to leave your TV on all day? What about turning your air conditioning 1 degree cooler? Which uses more power every month — your fridge or your dishwasher? Is your household more or less energy efficient than similar homes in your neighborhood? … At Google we’re committed to helping enable a future where access to personal energy information helps everyone make smarter energy choices. To get started, we’re working on a tool called Google PowerMeter which will show consumers their electricity consumption in near real-time in a secure iGoogle Gadget. We think PowerMeter will offer more useful and actionable feedback than complicated monthly paper bills that provide little detail on consumption or how to save energy.”

I like it. I’ve always wanted to have a simple way to monitor my home energy usage that doesn’t require me to buy an expensive device like the Black & Decker EM100B Energy Saver Series Power Monitor– that probably doesn’t give me the granularity of information I’m looking for anyway.

Inspection, Abstraction and Shipping Containers

maerskOn my drive home tonight, a giant “Maersk Sealand” branded truck passed me on the highway. It got me thinking about the innovation of the shipping container, and how introducing a standard size and shape revolutionized the shipping industry and enabled a growing global economy. At least that’s the perspective presented by Mark Levinson in The Box: How the Shipping Container Made the World Smaller and the World Economy Bigger. A synopsis of the story and a sample chapter are available; Wikipedia’s entry on containerization also presents a narrative describing the development and its impacts.

Here’s how impactlab.com describes it:

Indeed, it is hard to imagine how world trade could have grown so fast—quintupling in the last two decades—without the “intermodal shipping container,” to use the technical term. The invention of a standard-size steel box that can be easily moved from a truck to a ship to a railroad car, without ever passing through human hands, cut down on the work and vastly increased the speed of shipping. It represented an entirely new system, not just a new product. The dark side is that these steel containers are by definition black boxes, invisible to casual inspection, and the more of them authorities open for inspection, the more they undermine the smooth functioning of the system.

Although some people like to debate whether the introduction of the shipping container represented an incremental improvement or a breakthrough innovation, I’d like to point out an entirely different aspect of this story: a process improvement step yielded a plethora of benefits because the inspection step was eliminated. Inspection happened naturally the old way, without planning it explicitly; workers had to unpack all the boxes and crates from one truck and load them onto another truck, or a ship. It would be difficult to overlook a nuclear warhead or a few tons of pot.

To make the system work, the concept of what was being transported was abstracted away from the problem, making the shipping container a black box. If all parties are trustworthy and not using the system for a purpose other than what was intended, this is no problem. But once people start using the system for unintended purposes, everything changes.

This reflects what happens in software development as well: you code an application, abstracting away the complex aspects of the problem and attaching unit tests to those nuggets. You don’t have to inspect the code within the nuggets because either you’ve already fully tested them, or you don’t care – and either way, you don’t expect what’s in the nugget to change. Similarly, the shipping industry did not plan that the containers would be used to ship illegal cargo – that wasn’t one of the expectations of what could be within the black box. The lesson (to me)? Degree of abstraction within a system, and the level of inspection of a system, are related. When your expectations of what constitutes your components changes, you need to revisit whether you need inspection (and how much).

What is an Environmental Analysis?

An environmental analysis (or environmental assessment) is a decision-making tool, often applied in technology management to characterize the forces impacting an emerging technology or a new or existing product. The environmental analysis can help you determine the effects of a proposed project or policy, and to proactively assess the impacts of a developing or emerging product or discipline. An environmental analysis also provides a really useful structure for learning about an area or a theme new to you or your company and identifying what the “state of the art” is (e.g. petascale computing, nanotechnology, innovative composite materials).

To conduct an environmental analysis, you should investigate and outline:

  • CONTEXT. The technology of interest and the context in which it is/to be used
  • CHALLENGES. The challenges that are presently identifiable; what you know, and how it compares and contrasts with the unknowns
  • EXTERNAL ENVIRONMENT. How the competitive environment impacts the scenario. This can be done via SWOT analysis (strengths, weaknesses, opportunities, threats) and/or by examining Porter’s (1980) Five Forces (supplier power, barriers to entry, threat of substitutes, buyer power, degree of rivalry)
  • INTERNAL ENVIRONMENT. How themes influence and affect the scenario (e.g. via PEST analysis – political, economic, socio-cultural, technological impacts)
  • ALTERNATIVES. Examine alternatives to the scenario being evaluated, and investigate what criteria (e.g. values, beliefs, project constraints, technical constraints) might be used if you will choose between competing alternatives in the future

Where can you get data for an environmental analysis? In addition to searching through resources from newspapers, magazines and trade journals, check the following:

Organization for Economic Cooperation and Development (OECD)

  • The OECD statistics portal contains international databases on agriculture, education, development, finance, labor, science and technology, energy, globalization, productivity, welfare, transport
  • Their online library also contains environmental outlooks, news on economic policy reforms, and issues like work/life balance

World Economic Forum Global Competitiveness Report

  • In the Growth Competitiveness Index (GCI) issued by the World Economic Forum, which is measured for more a hundred countries every year, there are four dimensions of global competitiveness routinely assessed: institutions, infrastructure, the macroeconomic environment, and health and education.
  • Because technology has the potential to impact productivity at many levels, and because it is embedded in each of these areas, the effects of technological change are implicit in macroeconomic measures of competitiveness.
  • You can learn more about the Global Competitiveness Report on Wikipedia
  • Or use the Analyzer to explore the data

National Science Foundation Solicitations for Research Proposals – The NSF solicitations are an excellent place to learn about the state of the art in various fields. The solicitations explain what topics are the most interesting to the experts today, and what they are willing to pay to know more about. Often, the solicitations will explain the most recent trends that may be difficult to ascertain from the industry and academic literature.

Google Tracks Spread of Flu

google-orgIs the flu spreading across your state? You can find out using Google Flu Trends, which projects the spread of influenza based on how people are using Google to search for health information. Check out the movie illustrating how search data appears to correlate with flu data from the Center for Disease Control.

The reason this interests me is that Google is using a tracer – examining search patterns in terms of where the searches are originating from geographically to infer how diseases might be spreading. They are not tracking diagnosis information or other “hard” data which would affirm the presence of disease, only recognizing that people will tend to be more interested in the flu when they’re trying to figure out whether they have it! (The most useful aspect of the search data is that it appears to serve as a leading indicator for the CDC data, which has a two week lag.)

Are any companies out there using patterns in their Google searches on their websites to infer what consumers or constituents are most interested in at any given time? It would be interesting to see what other “real” things Google search data can serve as a leading indicator for. I could see this as a useful technique for diagnosing the “voice of the customer” in a novel way.

Election 2008: Struggle Between Tradition and Innovation

Today is Monday, November 3rd. Election Day, when the U.S. picks its 44th President, is less than 24 hours away. And as of Saturday night, just 72 hours before the polls close, 27 MILLION early votes and absentee ballots had already been placed. This represents almost 13% of the total population that’s eligible to vote this year, and 22% of all the people who voted in 2004. (The numbers are from Michael McDonald’s dataset; he is an associate professor specializing in voting behaviors. The VEP column in his table represents the total number of eligible voters over 18 and not in prison, on probation or on parole. )

Remember, long ago (or maybe more recently) in statistics class, when you learned that you could learn a lot about the properties of a population by taking a random sample? Having approximately 20% of the vote already in from a sample expected to be between 120 and 150 million is extremely significant – remember, these are actual votes, and not someone’s report of what they may or may not vote “for real”. Assuming that systematic errors have not played a large part in early voting behavior, the winner is already determined, and we just don’t know it yet.

“We go around in a circle and suppose, but the answer lies in the middle and knows.” –Robert Frost

However, ignoring systematics is indeed a significant assumption, one that’s discussed by Peter Norvig, Director of Research at Google, in his excellent explanation of the accuracy of polls. Which is why the campaigns are rightly pushing EVERYONE to get out there and vote – to mitigate the impact of systematic errors. (After all, you don’t want to stop voting if the other side keeps voting.) So if you are reading this and you haven’t voted yet, DO IT! Go vote!

I see three potential scenarios:

  • Breakthrough: the decision has already been made, is accurately reflected in the actual sample of early votes, and the votes placed on Tuesday won’t change the pattern at all. The additional votes amount to nothing (other than beating down or insuring against systematic error).
  • Breakdown: a flood of voters overwhelm the capacity of the voting stations, the voting machines just can’t handle it, and the polls close before everyone can get through the door and get an error-free ballot submitted. I think there might be social unrest if this is the case.
  • Breakout: a single demographic (or two) comes out in droves to vote on Tuesday, breaking out of wherever they’ve been hiding, and shifting the balance of the race in a huge upset. Certainly a possibility.

Whatever happens, the 2008 Election reflects a mythical struggle between structure, order, hierarchy, stability, and tradition on one side; revolution, dynamism, community, collaboration, and exploration on the other. One potential leader clearly has more experience on one side of the coin, and the other potential leader is stronger in the opposite area. Each candidate has plenty of experience on the side of the coin he’s promoting. The difference will be how the voter determines which standard the candidate’s experience should be measured against!

Why am I interested in all this? First, because polling is measurement, and quality assurance requires effective measurement. But more importantly, because the themes of this election parallel the struggle that many organizations face with quality and innovation – getting the job done reliably is paramount, and experience is important, but we cannot lose sight of the way we need to reinvent ourselves and our companies to continue being competitive. Accepting the wilder side, where structures are not sacrosanct and community is more productive than hierarchy, is hard to swallow.

The old methods that tell us how to manage projects, do budgeting, evaluate employees, and manage change are incomplete in such a global, dynamic competitive environment. New organizational models that help us deal with complexity more effectively will be required, but will the 2008 Election usher one into the institution of government?

36 hours from now (hopefully), we’ll know.


Other Resources:

  • Peter Norvig, Director of Research at Google, keeps a 2008 Election site with the most comprehensive collection of data-based reports I’ve encountered
  • CNN’s early voting map shows how many early ballots were cast according to state and proportion of Democrats/Republicans voting
  • If the whole world could vote, according to the Economist, the “Global Electoral College” would be stacked.
  • « Older Entries