Category Archives: Software Quality

How to Assess the Quality of a Chatbot

Image Credit: Doug Buckley of http://hyperactive.to

Quality is the “totality of characteristics of an entity that bear upon its ability to meet stated and implied needs.” (ISO 9001:2015, p.3.1.5) Quality assurance is the practice of assessing whether a particular product or service has the characteristics to meet needs, and through continuous improvement efforts, we use data to tell us whether or not we are adjusting those characteristics to more effectively meet the needs of our stakeholders.

But what if the entity is a chatbot?

In June 2017, we published a paper that explored that question. We mined the academic and industry literature to determine 1) what quality attributes have been used by others to determine chatbot quality, we 2) organized them according to the efficiency, effectiveness, and satisfaction (using guidance from the ISO 9241 definition of usability), and 3) we explored the utility of Saaty’s Analytic Hierarchy Process (AHP) to help organizations select between one or more versions of chatbots based on quality considerations. (It’s sort of like A/B testing for chatbots.)

“There are many ways for practitioners to apply the material in this article:

  • The quality attributes in Table 1 can be used as a checklist for a chatbot implementation team to make sure they have addressed key issues.
  • Two or more conversational systems can be compared by selecting the most significant quality attributes.
  • Systems can be compared at two points in time to see if quality has improved, which may be particularly useful for adaptive systems that learn as they as exposed to additional participants and topics.”

Data Quality is Key for Asset Management in Data Science

This post was motivated by two recent tweets by Dr. Diego Kuonen, Principal of Statoo Consulting in Switzerland (who you should definitely follow if you don’t already – he’s one of the only other people in the world who thinks about data science and quality). First, he shared a slide show from CIO Insight with this clickbaity title, bound to capture the attention of any manager who cares about their bottom line (yeah, they’re unicorns):

“The Best Way to Use Data to Cut Costs? Delete It.”

I’m so happy this message is starting to enter corporate consciousness, because I lived it throughout the decade of the 2000’s — working on data management for the National Radio Astronomy Observatory (NRAO). I published several papers during that time that present the following position on this theme (links to the full text articles are at the bottom of this post):

  • First, storing data means you’ve saved it to physical media; archiving data implies that you are storing data over a longer (and possibly very long) time horizon.
  • Even though storage is cheap, don’t store (or archive) everything. Inventories have holding costs, and data warehouses are no different (even though those electrons are so, so tiny).
  • Archiving data that is of dubious quality is never advised. (It’s like piling your garage full of all those early drafts of every paper you’ve ever written… and having done this, I strongly recommend against it.)
  • Sometimes it can be hard to tell whether the raw data we’re collecting is fundamentally good or bad — but we have to try.
  • Data science provides fantastic techniques for learning what is meant by data quality, and then automating the classification process.
  • The intent of whoever collects the data is bound to be different than whoever uses the data in the future.
  • If we do not capture intent, we are significantly suppressing the potential that the data asset will have in the future.

Although I hadn’t seen this when I was deeply enmeshed in the problem long ago, it totally warmed my heart when Diego followed up with this quote from Deming in 1942:

dont-archive-it

 

In my opinion, the need for a dedicated focus on understanding what we mean by data quality (for our particular contexts) and then working to make sure we don’t load up our Big Data opportunities with Bad Data liabilities will be the difference between competitive and combustible in the future. Mind your data quality before your data science. It will also positively impact the sustainability of your data archive.

Papers where I talked about why NOT to archive all your data are here:

  1. Radziwill, N. M., 2006: Foundations for Quality Management of Scientific Data Products. Quality Management Journal, v13 Issue 2 (April), p. 7-21.
  2. Radziwill, N. M., 2006: Valuation, Policy and Software Strategy. SPIE, Orlando FL, May 25-31.
  3. Radziwill, N.M. and R. DuPlain, 2005: A Framework for Telescope Data Quality Management. Proc. SPIE, Madrid, Spain, October 2-5, 2005.
  4. DuPlain, R. F. and N.M. Radziwill, 2006: Autonomous Quality Assurance and Troubleshooting. SPIE, Orlando FL, May 25-31.
  5. DuPlain, R., Radziwill, N.M., & Shelton, A., 2007: A Rule-Based Data Quality Startup Using PyCLIPS. ADASS XVII, London UK, September 2007.

 

Innovation Without Maintenance is (Potentially) Cyberterrorism

Image Credit: Lucy Glover Photography (http://lucyglover.com/)

Image Credit: Lucy Glover Photography (http://lucyglover.com/)

On July 8th, Wall Street’s software failed (and the WSJ web site went down). United’s planes were grounded for two hours across the entire US. And this all happened only shortly after China’s stocks mysteriously plummeted. Odd coincidence, or carefully planned coordinated cyberattack? Bloggers say don’t worry… don’t panic. Probably not a big deal overall:

Heck, the whole United fleet was grounded last month too… NYSE is one stock exchange among many. The website of a newspaper isn’t important, and the Chinese stocks are volatile… we should not worry that this is a coordinated attack, especially of the dreaded “cyber-terrorist” kind…

The big problem we face isn’t coordinated cyber-terrorism, it’s that software sucks. Software sucks for many reasons, all of which go deep, are entangled, and expensive to fix. (Or, everything is broken, eventually). This is a major headache, and a real worry as software eats more and more of the world.

In a large and complex system, something will ALWAYS be broken. Our job is to make sure we don’t let the wrong pieces get broken and stay broken… and we need to make sure our funding, our policies, and our quality systems reflect this priority.

Once upon a time in the early 2000’s, I worked as a technology manager at a great big telescope called the GBT (not an acronym for great big, but rather Green Bank… where it’s located in West Virginia).

It cost a lot to maintain and operate that telescope… nearly $10M every year. About 10-15% of this budget was spent on software development. Behind all great hardware and instrumentation, there’s great (or at least functional) software that helps you accomplish whatever goals and objectives you have that require the hardware. Even though we had to push forward and work on new capabilities to keep our telescope relevant to the scientists who used it to uncover new knowledge about the universe, we had to continue maintaining the old software… or the whole telescope might malfunction.

It’s not popular to keep putting money into maintenance at the expense of funding innovation. But it’s necessary:

  • Without spending time and money to continuously firm up our legacy systems, we’re increasing the likelihood that they will crash (all on their own), producing devastating impacts (either individually or collectively).
  • Without spending time and money to continuously firm up our legacy systems, we’re also increasing the possibility that some rogue hacker (or completely legitimate domestic or foreign enemy) will be able to trigger some form of devastation that impacts the safety, security, or well-being of many people.

When we choose to support innovation at the expense of regular maintenance and continuous improvement, we’re terrorizing our future selves. Especially if our work involves maintaining software that connects or influences people and their devices. Isn’t that right, Amtrak?

Why I <3 R: My Not-So-Secret Valentine

My valentine is unique. It will not provide me with flowers, or chocolates, or a romantic dinner tonight, and will certainly not whisper sweet nothings into my good ear. And yet – I will feel no less loved. In contrast, my valentine will probably give me some routines for identifying control limits on control charts, and maybe a way to classify time series. I’m really looking forward to spending some quality time today with this great positive force in my life that saves me so much time and makes me so productive.

Today, on Valentine’s Day, I am serenading one of the loves of my life – R. Technically, R is a statistical software package, but for me, it’s the nirvana of data analysis. I am not a hardcore geek programmer, you see. I don’t like to spend hours coding, admiring the elegance of the syntax and data structures, or finding more compact ways to get the job done. I just want to crack open my data and learn cool things about it, and the faster and more butter-like the better.

Here are a few of the reasons why I love R:

  • R did not play hard to get. The first time I downloaded R from http://www.r-project.org, it only took about 3 minutes, I was able to start playing with it immediately, and it actually worked without a giant installation struggle.
  • R is free. I didn’t have to pay to download it. I don’t have to pay its living expenses in the form of license fees, upgrade fees, or rental charges (like I did when I used SPSS). If I need more from R, I can probably download a new package, and get that too for free.
  • R blended into my living situation rather nicely, and if I decide to move, I’m confident that R will be happy in my new place. As a Windows user, I’m accustomed to having hellacious issues installing software, keeping it up to date, loading new packages, and so on. But R works well on Windows. And when I want to move to Linux, R works well there too. And on the days when I just want to get touchy feely with a Mac, R works well there too.
  • R gets a lot of exercise, so it’s always in pretty good shape. There is an enthusiastic global community of R users who number in the tens of thousands (and maybe more), and report issues to the people who develop and maintain the individual packages. It’s rare to run into an error with R, especially when you’re using a package that is very popular.
  • R is very social; in fact, it’s on Facebook. And if you friendR Bloggersyou’ll get updates about great things you can do with the software (some basic techniques, but some really advanced ones too). Most updates from R Bloggers come with working code.
  • Instead of just having ONE nice package, R has HUNDREDS of nice packages. And each performs a different and unique function, from graphics, to network analysis, to machine learning, to bioinformatics, to super hot-off-the-press algorithms that someone just developed and published. (I even learned how to use the “dtw” package over the weekend, which provides algorithms for time series clustering and classification using a technique called Dynamic Time Warping. Sounds cool, huh!) If you aren’t happy with one package, you can probably find a comparable package that someone else wrote that implements your desired functions in a different way.
  • (And if you aren’t satisfied by those packages, there’s always someone out there coding a new one.)
  • R helps me meditate. OK, so we can’t go to tai chi class together, but I do find it very easy to get into the flow (a la Csikzentmihalyi) when I’m using R.
  • R doesn’t argue with me for no reason. Most of the error messages actually make sense and mean something.
  • R always has time to spend with me. All I have to do is turn it on by double-clicking that nice R icon on my desktop. I don’t ever have to compete with other users or feel jealous of them. R never turns me down or says it’s got other stuff to do. R always makes me feel important and special, because it helps me accomplish great things that I would not be able to do on my own. R supports my personal and professional goals.
  • R has its own journal. Wow. Not only is it utilitarian and fun to be around, but it’s also got a great reputation and is recognized and honored as a solid citizen of the software community.
  • R always remembers me. I can save the image of my entire session with it and pick it up at a later time.
  • R will never leave me. (Well, I hope.)

The most important reason I like R is that I just like spending time with it, learning more about it, and feeling our relationship deepen as it gently helps me analyze all my new data. (Seriously geeky – yeah, I know. At least I won’t be disappointed by the object of MY affection today : )

Maker’s Meeting, Manager’s Meeting

In July, Paul Graham posted an article called “Maker’s Schedule, Manager’s Schedule“. He points out that people who make things, like software engineers and writers, are on a completely different schedule than managers – and that by imposing the manager’s schedule on the developers, there is an associated cost. Makers simply can’t be as productive on the manager’s schedule:

When you’re operating on the maker’s schedule, meetings are a disaster. A single meeting can blow a whole afternoon, by breaking it into two pieces each too small to do anything hard in. Plus you have to remember to go to the meeting. That’s no problem for someone on the manager’s schedule. There’s always something coming on the next hour; the only question is what. But when someone on the maker’s schedule has a meeting, they have to think about it.

For someone on the maker’s schedule, having a meeting is like throwing an exception. It doesn’t merely cause you to switch from one task to another; it changes the mode in which you work.

I find one meeting can sometimes affect a whole day. A meeting commonly blows at least half a day, by breaking up a morning or afternoon. But in addition there’s sometimes a cascading effect. If I know the afternoon is going to be broken up, I’m slightly less likely to start something ambitious in the morning.

This concept really resonated with us – we know about the costs of context switching, but this presented a nice concept for how a developer’s day can be segmented such that ample time is provided for getting things done. As a result, we attempted to apply the concept to achieve more effective communication between technical staff and managers. And in at least one case, it worked extremely well.

Case: Ron DuPlain (@rduplain) and I frequently work together on technical projects. I am the manager; he is the developer. More than we like, we run into problems communicating, but fortunately we are both always on the lookout for strategies to help us communicate better. We decided to apply the “makers vs. managers” concept to meetings, to see whether declaring whether we were having a maker’s meeting or a manager’s meeting prior to the session would improve our ability to communicate with one another.

And it did. We had a very effective maker’s meeting today, for example… explored some technical challenges, worked through a solution space, and talked about possible design options and background information. It was great. As a manager, I got to spend time thinking about a technical problem, but temporarily suspended my attachment to dates, milestones and artifacts. As a developer, Ron got the time and attention from me that he needed to explain his challenges, without the pressure of knowing that I was in a hurry and just needed the bottom line. As a result, Ron felt like I was able to understand the perspectives he was presenting more effectively, and get a better sense of the trade-offs he was exploring.

We had the opportunity to meet on the same terms, all because we declared the intent of our meeting up front in terms of “makers” and “managers”. Thanks Paul – this common language is proving to be a powerful concept for achieving a shared and immediate understanding of technical problems.

Disorganization and Changing Your Mind are Both Expensive

NOISERon DuPlain forwarded me an interesting post from November 2008 (via @duanegran, I believe) called How much do websites cost? It’s a great comprehensive overview of the different kinds of web sites that can be built – the spectrum of customization, interactivity, and intent that dictate whether a web site will cost $200 or $2 million. But what really struck me about this article was one tiny little section that talks about value, emphasizing the relationship between quality, waste, and the changeability of human wants, needs, and desires:

2.) A small company site that has 5 to 10 pages describing the products/services offered by the company. $500 to $2,000 depending on how prepared you are, and also on how clear in your own head you are about what you want. Disorganization and changing your mind are both expensive.

Disorganization is expensive because it blocks action. When your house is disorganized, you waste time and energy trying to find stuff. When the processes you use in the workplace are disorganized, time and physical energy can be wasted engaging in non-value-adding activities, and mental and emotional time and energy wasted in unproductive communications. Wasting time and energy can generate short-term real costs (for example, moving parts or products around a factory or supply chain can delay time-to-market while costing more in fuel for transport), long-term real costs (e.g. reinforcing negative behaviors that lead to breakdowns in interpersonal relationships, teamwork, or morale) or opportunity costs.

Changing your mind is expensive for the same reason: it either blocks new action from taking place, or it eliminates the value that could have been added by prior work. A task is not actionable unless you have the 1) resources to do the job, 2) the information and interest to complete it, 3) the skills and capabilities to make it happen, as well as a clear idea of what needs to be done, and 4) an execution environment that supports getting things done. Changing your mind can erode #2 or #3.

To reduce the risk associated with development, and to control the costs of the project, find out:

  • How much do you really know about what you want?
  • What essential elements are you pretty sure you’ll still want or need in 3 months, 6 months, 1 year, 5 years?
  • What parts do you have the resources, information/interest, capabilities/skills/clarity, and execution environment to get done now? (I call this RICE to make it easier to remember)

The lesson: to get higher quality and lower costs (that is, greater value), focus on those parts of a project that are least likely to change and do those first. This is, of course, if you have the luxury to be agile (highly regulated environments may impose restrictions or limits). Then stop – figure out what your experience working on those parts tells you about how you can approach the problem more systematically and effectively– and repeat the cycle until you iterate to the desired solution. This is the essence of applying organizational learning to your day to day tasks.

Achieving Quality when Something is Always Broken

frac1In the quality profession, we are accustomed to thinking about product and component quality in terms of compliance (whether specifications are met), performance (e.g. whether requirements for reliability or availability are met), or other factors (like whether we are controlling for variation effectively, or “being lean” which is realized in the costs to users and consumers). So this morning I attended Ed Seidel’s keynote talk at TeraGrid 09, and was struck by one of his passing statements on the quality issues associated with a large supercomputer or grid of supercomputers.

He said (and I paraphrase, so this might be slightly off):

We are used to thinking about the reliability of one processor, or a small group of processors. But in some of these new facilities, there are hundreds of thousands of processors. Fault tolerance takes on a new meaning because there will be a failure somewhere in the system at all times.

This immediately made me think of society: no matter how much “fault tolerance” a nation or society builds into its social systems and institutions, at the level of the individual there will always be someone at any given time who is dealing with a problem (in technical terms, “in a failure state”). Our programs that aim for quality on the scale of society should take this into account, and learn some lessons from how today’s researchers will deal with fault tolerance in hugely complex technological systems.

It also makes me wonder whether there is any potential in exploring the idea of quality holography. In large-scale systems built of closely related components, is the quality of the whole system embodied in the quality of each individual part? And is there a way to measure or assess this or otherwise relate these two concepts operationally? Food for thought.

« Older Entries