That’s the way this process works. As a National Examiner, you will be frustrated, you may cry, and you may think your team of examiners will never come to consensus on the right words to say to the applicant! But because there is a structured process and a discipline, it always happens, and everyone learns.
I’ve been working with the Baldrige Excellence Framework (BEF) for almost 20 years. In the beginning, I used it as a template. Need to develop a Workforce Management Plan that’s solid, and integrates well with leadership, governance, and operations? There’s a framework for that (Criterion 5). Need to beef up your strategic planning process so you do the right thing and get it done right? There’s a framework for that (Criterion 2).
Need to develop Standard Work in any area of your organization, and don’t know where to start (or, want to make sure you covered all the bases)? There’s a framework for that.
Once you become a National Examiner (my first year was 2009), you get to look at the Criteria Questions through a completely different lens. You start to see the rich layers of its structure. You begin to appreciate that this guidebook was carefully and iteratively crafted over three decades, drawing from the experiences of executives and senior leaders across a wide swath of industries, faced with both common and unique challenges.
The benefits to companies that are assessed for the award are clear and actionable, but helping others helps examiners, too. Yes, we put in a lot of volunteer hours on evenings and weekends (56 total, for me, this year) — but I got to go deep with one more organization. I got to see how they think of themselves, how they designed their organization to meet their strategic goals, how they act on that design. Our team of examiners got to discuss the strengths we noticed individually, the gaps that concerned us, and we worked together to come to consensus on the most useful and actionable recommendations for the applicant so they can advance to the next stage of quality maturity.
One of the things I learned this year was how well Baldrige complements other frameworks like ISO 9001 and lean. You may have a solid process in place for managing operations, leading continuous improvement events, and sustaining the improvements. You may have a robust strategic planning process, with clear connections between overall objectives and individual actions.
What Baldrige can help you do, even if you’re already a high performance organization, is:
tighten the gaps
call out places where standard work should be defined
identify new breakthrough opportunities for improvement
help everyone in your workforce see and understand the connections between people, processes, and technologies
The whitespace — those connections and seams — are where the greatest opportunities for improvement and innovation are hiding. The Criteria Questions in the Baldrige Excellence Framework (BEF) can help you illuminate them.
I believe that the data scientist “unicorn” is hidden right in front of our faces; the purpose of this post is to help you find it.First, we’ll take a look at some models, and then I’ll present my version of what a data scientist is (and how this person can become “great”).
#1 Drew Conway’s popular “Data Science Venn Diagram” — created in 2010 — characterizes the data scientist as a person with some combination of skills and expertise in three categories (and preferably, depth in all of them): 1) Hacking, 2) Math and Statistics, and 3) Substantive Expertise (also called “domain knowledge”).
Later, he added that there was a critical missing element in the diagram: that effective storytelling with data is fundamental. The real value-add, he says, is being able to construct actionable knowledge that facilitates effective decision making. How to get the “actionable” part? Be able to communicate well with the people who have the responsibility and authority to act.
“To me, data plus math and statistics only gets you machine learning, which is great if that is what you are interested in, but not if you are doing data science. Science is about discovery and building knowledge, which requires some motivating questions about the world and hypotheses that can be brought to data and tested with statistical methods. On the flip-side, substantive expertise plus math and statistics knowledge is where most traditional researcher falls. Doctoral level researchers spend most of their time acquiring expertise in these areas, but very little time learning about technology. Part of this is the culture of academia, which does not reward researchers for understanding technology. That said, I have met many young academics and graduate students that are eager to bucking that tradition.” — Drew Conway, March 26, 2013
#2 In 2013, Harlan Harris (along with his two colleagues, Sean Patrick Murphy and Marck Vaisman) published a fantastic study where they surveyed approximately 250 professionals who self-identified with the “data science” label. Each person was asked to rank their proficiency in each of 22 skills (for example, Back-End Programming, Machine Learning, and Unstructured Data). Using clustering, they identified four distinct “personality types” among data scientists:
Data Businesspeople who are most focused on the information itself and how it is applied to business decisions. (These people were least likely to identify with the “data scientist” label.)
Data Developers, the wizards of the technical aspects of data management (accessing it, moving it around, archiving it, curating it), and
Data Researchers, those deeply familiar with the mathematical and statistical underpinnings of the work, who can develop new techniques as necessary (in addition to correctly selecting from available techniques).
As a manager, you might try to cut corners by hiring all Data Creatives(*). But then, you won’t benefit from the ultra-awareness that theorists provide. They can help you avoid choosing techniques that are inappropriate, if (say) your data violates the assumptions of the methods. This is a big deal! You can generate completely bogus conclusions by using the wrong tool for the job. You would not benefit from the stress relief that the Data Developers will provide to the rest of the data science team. You would not benefit from the deep domain knowledge that the Data Businessperson can provide… that critical tacit and explicit knowledge that can save you from making a potentially disastrous decision.
“The data scientist’s skills – advanced analytics, data integration, software development, creativity, good communications skills and business acumen – often already exist in an organisation. Just not in a single person… likely to be spread over different roles, such as statisticians, bio-chemists, programmers, computer scientists and business analysts. And they’re easier to find and hire than data scientists.”
They cite British Airways as an exemplar:
“[British Airways] believes that data scientists are more effective and bring more value to the business when they work within teams. Innovation has usually been found to occur within team environments where there are multiple skills, rather than because someone working in isolation has a brilliant idea, as often portrayed in TV dramas.”
Their position is you can’t get all those skills in one person, so don’t look for it. Just yesterday I realized that if I learn one new amazing thing in R every single day of my life, by the time I die, I will probably be an expert in about 2% of the package (assuming it’s still around).
#4 Others have chimed in on this question and provided outlines of skill sets, such as:
The Udacity blog: basic tools (R, Python), software engineering, statistics, machine learning, multivariate calculus, linear algebra, data munging, data visualization and communication, and the ultimately nebulous “thinking like a data scientist”
IBM: “part analyst, part artist” skilled in “computer science and applications, modeling, statistics, analytics and math… [and] strong business acumen, coupled with the ability to communicate findings to both business and IT leaders in a way that can influence how an organization approaches a business challenge.”
SAS: “a new breed of analytical data expert who have the technical skills to solve complex problems – and the curiosity to explore what problems need to be solved. They’re part mathematician, part computer scientist and part trend-spotter.” (Doesn’t that sound exciting?)
DataJobs.Com: well, these guys just took Drew Conway’s Venn diagram and relabeled it.
#5 My Answer to “What is a Data Scientist?”:A data scientist is a sociotechnical boundary spanner who helps convert data and information into actionable knowledge.
Based on all of the perspectives above, I’d like to add that the data scientist must have an awareness of the context of the problems being solved: social, cultural, economic, political, and technological. Who are the stakeholders? What’s important to them? How are they likely to respond to the actions we take in response to the new knowledge data science brings our way? What’s best for everyone involved so that we can achieve sustainability and the effective use of our resources? And what’s with the word “helps” in the definition above? This is intended to reflect that in my opinion, a single person can’t address the needs of a complex data science challenge. We need each other to be “great” at it.
A data scientist is someone who can effectively span the boundaries between
1) understanding social+ context,
2) correctly selecting and applying techniques from math and statistics,
3) leveraging hacking skills wherever necessary,
4) applying domain knowledge, and
5) creating compelling and actionable stories and connections that help decision-makers achieve their goals. This person has a depth of knowledge and technical expertise in at least one of these five areas, and a high level of familiarity with each of the other areas (commensurate with Harris’ T-model). They are able to work productively within a small team whose deep skills span all five areas.
It’s data-driven decision making embedded in a rich social, cultural, economic, political, and technological context… where the challenges may be complex, and the stakes (and ultimately, the benefits) may be high.
(*) Disclosure: I am a Data Creative!
(**)Quality professionals (like Six Sigma Black Belts) have been doing this for decades. How can we enhance, expand, and leverage our skills to address the growing need for data scientists?
As of today, I now have a NEW FAVORITE introductory statistics textbook… the one I’ve always dreamed of having. I’ve been looking for a book to use in my classes for undergraduate sophomores and juniors, but none of the textbooks I considered over the past three years (and I’ve looked at over a hundred!) had all of the things I really, really wanted. So I had to go make it happen myself. These things are:
1) An integrated treatment of theory and practice. All of my stats textbooks have a lot of formulas, and no information about how to do what the formulas do in the R statistical software. All of my R textbooks have a lot of information about how to run the commands, but not really much information about what formulas are being used. I wanted a book that would show how to solve problems analytically (using the equations), and then show how they’re done in R. If there were discrepancies between the stats textbook answers and the R answers, I wanted to know why. A lot of times, the developers of R packages use very sophisticated adjustments and corrections, which I only became aware of because my analytical solutions didn’t match the R output. At first, I thought I was wrong. But later, I realized I was right, and R was right: we were just doing different things. I wanted my students to know what was going on under the hood, and have an awareness of exactly which methods R was using at every moment.
2) An easy way to develop research questions for observational studies and organize the presentation of results. We always do small research projects in my classes, and in my opinion, this is the best way for students to get a strong grasp of the fundamental statistical concepts. But they always have the same questions: Which statistical test should I use? How should I phrase my research question? What should I include in my report? I wanted a book that made developing statistical research questions easy. In fact, I know a lot of people I went to PhD school with that would have loved to have this book while they were proposing, conducting, and defending their dissertations.
3) A confidence interval cookbook. This is probably one of the most important things I want my students to leave my class remembering: that from whatever sample you collect, you can construct a confidence interval that will give you an idea of what the true population parameter should be. You don’t even need to do a hypothesis test! but it can be difficult to remember which formula to use… so I wanted an easy reference where I’d be able to look things up, and find out really easily how to use R to construct those confidence intervals for me. Furthermore, some of the confidence intervals that everyone is taught in an introductory statistics course are wildly inaccurate – and statisticians know this. But they hesitate to scare away novice data analysts with long, scary looking equations, and so students keep learning those inaccurate methods and believing they’re good. Since so many people never get beyond introductory statistics and still turn into researchers in other fields, I thought this was horrible. I want to make sure my students know the best way to do each confidence interval in their first class… even if the equations are not as friendly.
4) An inference test cookbook. I wanted a book that stepped me through each of the primary parametric inference tests analytically (using the equations), and then showed me how it was done in R. If there were discrepancies, I wanted to know why. I wanted an easy way to remember the assumptions for each test, and when to use a pooled standard deviation versus an unpooled one. There’s a lot to keep track of! I wanted a reference that it would make it easy to keep track of all of it: assumptions, tests for assumptions, equations, R code, and diagnostic plots.
5) No step left behind. It’s really frustrating to me how so many R books assume you can do a psychic fill-in-the-blank for missing code. Since I’ve been using R for several years now, I’ve gotten to the point where my psychic abilities are pretty good, and at least 60% of the time I can figure out the missing pieces. But wow, what a waste of time! So I wanted a book that had all of the steps for each example. Even if it was a little repetitive. I may have missed this in a few places, but I think beginners will have a much easier time with this book. Also, I put all my data and functions on GitHub for people to run the examples with. I’m growing this slowly, but I don’t want people to be left in the lurch.
6) An easy way to produce any of the charts and graphs in the book. One of my pet peeves about R books is that the authors generate beautiful charts and graphs, and then you’re reading through the book and say “Yes!! Yes!! That’s the chart I need for my report… I want to do that… how did they do that?” and they don’t tell you anywhere how they did it. I did not want there to be any secrets in this book. If I generated a page of interesting looking simulated distributions, I wanted you to know how I did it (just in case you want to do it later).
GRANTED… I am sure it will not be perfect – no book is. (For example, Google Forms changes a lot and there are a couple examples that use it that will probably be outdated when the book gets to press… and I just found out this morning that you don’t need the source_https trick in R 3.2.0 and beyond.) [Note: data access has been fully updated in the 2nd Edition.] However, I will keep updating my blog with posts about useful things as they evolve.
In any case, I hope you enjoy my book as much as I’ve been enjoying using it as a reference for myself… it really is all my most important notes, neatly organized into just over 500 pages of everything I want to remember. And everything I want to make sure my students take with them after they leave my class.
Image Credit: Doug Buckley of http://hyperactive.to
Following in the footsteps of fellow ASQ Influential VoiceJohn Hunter(who publishedManagement Matters on LeanPub) — I’ve had the intention for the past couple years to write my next book using LeanPub too.
There’s only one problem: LeanPub requires that you prepare and format your book in Markdown. I know Markdown is not that hard, but in order to move forward with it, I would have to find at least a couple days without distractions to get my head into it and start flowing with that approach. With work and kid’s-school-schedule and my travel schedule, this has been darn near next to impossible.
It’s amazing how sometimes, just a tiny TINY little stir-of-the-consciousness can yield amazing insights.
That’s what just happened to me a few minutes ago. While scanning this morning’s Twitter feed, I saw this one:
It reminded me of an article I posted in early 2011 titled “Is Profit Waste?” where I posed the question of whether profit was just one of many kinds of waste – that is, overproduction of revenue. When companies talk about a desire to grow, usually they mean they need to figure out a way to grow their revenue stream (and often this means growing the organization, expanding the scope, or adding to product lines and service offerings). In fact, one of the strongest drivers for pushing innovation is that desire to grow.
But WHY? Why do you want to grow? It’s that question that the tweet above answered for me in less than 140 characters.
Here’s a company that’s not trying to sell you on the WHAT that they do. It’s a new company, so obviously they’re trying to get started, but they’re immediately clear about WHY they want to grow… they want to get more women into technology! And the clear outward sign of successful growth will be getting more women into technology.And oh – by the way – in order for us to pursue our PURPOSE of getting women into technology, we need to make some money, and to do this we’ve written our first app.And won’t you please buy it… because if you do, you can help us work to get more women into technology!
I love this. I think more of us should approach our business stories this way! Don’t focus on business growth or profit growth, focus on WHY you’re working in the first place and WHAT you want more revenue to spend on. If we support your mission, we’re more likely to support your product, even if it doesn’t meet all our needs. Furthermore, we’re more likely to want to work with you to enhance the products, expand your reach, and collaborate to achieve higher levels of quality and serendipitous innovation.
The other day I read a news article or blog post (or something; I can’t remember) that explained one reason we get irritating songs stuck in our heads. The post was based on a research paper byWilliamson et al. (2011) in the journal Psychology of Music. Usually, when we catch one of these “earworms” because we’ve heard a snippet of a catchy and familiar song, we’ll walk away or turn off the song in the beginning or the middle of it.
The tune, however, like a rapid flesh-eating organism invading our very soul, continues without compunction. Because we stopped the song in the middle, our unconscious becomes fixed on the task of finishing it. And so it continues, on and on, all day!
The solution, we’re told, is to listen to the annoying song until it’s over… our unconscious, at that point, will be content that the tune is complete and will be happy to move on to other topics.
I didn’t think too much of this piece of trivia until I was reading an interview with Erik Larson, author of the fantastic 2003 novelThe Devil in the White City. His book provides an amazing account of the technology development and social context that went into organizing the 1893 World’s Fair in Chicago – it’s a totally satisfying read. When asked about his discipline for writing, and for avoiding writer’s block, he described a method that might actually leverage the same hold on the unconscious that earworms grab:
And I try to write a couple of pages. I’m not firm. I don’t have a specific goal. But the one thing I always adhere to is that I stop while I’m ahead. If I’m going to take that break for breakfast, I may stop in the middle of the sentence or the middle of the paragraph. Something I know how to finish. Because as any writer knows, it’s — that’s what kills you is when you just don’t know what to do when you come back. And all the demons accumulate. And then you go out for a cappuccino, that kind of thing.
If you want to avoid writer’s block, leave your unconscious a hook – an easy way back in to your writing productivity!
If you want to avoid ramp-up time (or context switching time) to get your head back into a problem – which has been estimated, for software development at least, to be on average a full 15 minutes for every interruption – leave your unconscious an easy way back in to productivity! A half written module or subroutine… or a half written sentence on your notepad!
These are just hypotheses, but they’re definitely testable. I’m going to try testing this out in my own life immediately.