Root Cause Analysis Saves Time and Money
Problems have a nasty way of coming back to haunt you. You think they solve them, and, oops! they show up again. Problems in business are like acne on a teenager's face, I guess, or dandelions in a garden.
But if you dig all the way down and pull out a dandelion root and all, it doesn't grow back. So, if we can find the deep root of a problem, we can solve it once and for all. And, usually, the deeper solutions actually cost less than the more expensive remedies that we've tried.
So, the first step in creating inexpensive, permanent solutions is to find the deep cause, or root cause of the problem. And the Five Whys - asking why? five times - is the easiest way to do it!
What Root Cause Analysis Can Do
Root cause analysis can solve the most mysterious and stickiest problems. If we don’t know why a problem is happening, we use root cause analysis to solve the mystery. If we thought we fixed a problem, and it keeps coming back, it’s a sticky problem. We use root cause analysis to discover the real, deepest (root) cause, and take care of that. Then we get the problem unstuck, and it doesn’t come back.
Root Cause Analysis Works in Two Directions
Root cause analysis can be used in two different ways. In engineering, it is used to get to the root of one particular problem. Knowing the root cause, we can eliminate the source of the defects once, and permanently. Usually, this is also the lowest-cost solution. Preventing a problem from ever occurring is usually less costly than letting it happen, and then dealing with it after the fact.
Root cause analysis can also be used at a management level to eliminate a problem and also prevent many other problems of a similar nature. Our case study will show how to use a root cause analysis both ways.
When we use a root cause analysis to both eliminate a problem and also eliminate many similar problems, we see a huge improvement in quality and value at very low cost. But, as you’ll see, many companies are not willing to use root cause analysis at a management level.
Good engineering requires very precise use of language. Even everyday words like "why" and "defect" and "error" are given precise meanings. Here are the terms you need to know.
Problem: something going wrong that creates bad products or delays or waste, costing us money.
Process: a series of steps of work, such as happen on an assembly line, or when processing an insurance claim.
Product, Result, Output, Deliverable: The result of a process or a step in a process. Often, the output of one process is the input to the next process. And the final output is the product for the customer.
Defect: A quality of a product that does not work, that does not meet requirements or customer specifications.
Error: A flaw in a process that leads to, or can lead to, a defect.
Variability or Variation: A difference in process or result that may or may not be an error and may or may not lead to a defect. If it is not an error and does not lead to a defect, then it is acceptable variation.
Engineering: The technical work of defining and measuring processes to create products.
Quality Engineering: The technical work of preventing, eliminating, or reducing errors to an acceptable level so that the resulting products are either zero-defect, or have an acceptably low defect rate.
Cause (of an error): a difference in process that leads to a significant difference, which leads to a defect.
Why: What difference (in process) led to a significant difference (an error) that led to a defect?
The Five Whys: A Method for Root Cause Analysis
Root cause analysis begins by looking at a defect, that is, the failed result of a process. We then begin looking for the error, that is, the step of the process done in such a way that it produces a defect.
The key is that the defective results is different from the results where the product works. So, the question “why?” really means, precisely, “What is the difference in process that leads to a difference in result?” Or, “What error in process leads to a defective result?”
In examining a process, we look at inputs, work process, outputs, tools, techniques, resources, and the work environment. We know that, some of the time, we have a defective output. On those occasions, there must be some difference in the work process, or the inputs, or the techniques being used, or the tools, resources, or work environment. Once we have a difference that correlates with the production of the defect, we have a possible cause. But why did that error occur? To find that out, we ask “why?” again, repeating the process. At some point, we come to a simple, obvious factor that can easily be managed and changed. That is usually at the point where we have asked “why?” five times. Sometimes we need fewer than five repetitions of the question "why?" Very rarely, we need more. So we call the technique of repeated investigation into causes The Five Why’s.
Our Case Study: Cars on Fire
A colleague of mine and quality engineer, Jim Sorensen, was called in on a special engineering project that illustrates Root Cause Analysis. There was an unusual problem on an automobile assembly line: At random, cars were catching on fire. At a certain stage, some of the cars, the ones getting a clear coat, are driven into an oven to heat and fuse the paint coating on the body of the car. The baking is done in a paint oven with infrared radiation, and takes about eight minutes. There were three identical ovens working right next to each other. Some of the cars burst into flames in each oven. But the pattern appeared completely random: A few cars burned up in each oven, and no one could see what made those cars different from any other cars going into the oven, nor could they find a difference in the way the oven was working.
Jim was called in to find the problem and propose a solution.
Jim spent three weeks sitting on a stool, watching cars go into the oven. From this, he confirmed what others had seen: There was nothing different at the oven, or visible in the car, that explained why a few cars caught fire, and most did not. The only clue was that all the fires started at the same location, under the hood, on the engine.
Jim began to dig deeper. He traced the assembly line process backwards.
One of the stations did the work of assembling the layers of the hood. The inner, or bottom, layer of the hood was in place. A worker placed a thermal blanket on top of the hood. Then the top of the hood was fastened down over the thermal blanket. Jim noticed that every tenth blanket had a big orange X on it. He wondered why?
These thermal blankets were designed to radiate heat from the engine up into the air, and also to reflect the heat of sunlight away from the engine. So they only worked right if they were installed right-side up.
Jim traced the blankets back to the loading dock where they came from the sub-contractor who manufactured them. He found them in crates, with a note, “Every tenth blanket turned upside-down for counting purposes.” The back of every blanket was painted with a big orange X to say, “This Side Down.” But every tenth blanket was upside-down. And that was why Jim saw a big orange X on the thermal blanket going under the hood. Things began to make sense. A reversed thermal blanket would focus the heat of the oven downward, onto the engine. With more heat, the engine might catch fire.
Jim traced the blankets back to the sub-contractor. He found that the sub-contractor was delivering the blankets to specification: There was nothing in the contract that said that the blankets had to all be turned right-side up, only that the bottom side needed to have a big orange X on it, which it did.
So, here are the Five Why’s in Jim’s process:
- Why are these cars catching fire, and not others? Answer: it’s not visible at the oven where the process occurs.
- Why is it not visible? It must be under the hood.
- Why is there a problem? What is under the hood, and how is it different? There is a thermal blanket that is upside down about one time in ten. This focuses the heat of the oven on the engine, sometimes causing a fire.
- Why is the blanket installed upside-down? Because it arrives upside-down from the manufacturer?
- Why doesn’t the installer turn it right-side up? No one told him that the blanket needed to be placed with the big orange X facing down.
This leads to a rather obvious solution: Tell the installer of the blanket to make sure that the side of the blanket with the big orange X is facing down, on the bottom side of the hood, towards the engine.
Asking the Right "Why"
"Why" may seem like a simple word, but it is not.
There are many ways of asking "why?" that give no useful answer.
- "Why me?" is victim thinking.
- "Why does God let this happen?" is about justice and in the field of religion, but not part of engineering.
- "Why do things always go wrong?" is too vague and general to be useful.
- "Why am I doing this?" can be a useful question if it means, "How does this benefit me or my customers?" But that is part of executive management, not quality management.
The "why" in quality management is very specific. What variation in process made "a difference that made a difference" (Gregory Bateson, his definition of information). What variation in process produced a result that led to a defect? If we can answer this question, we can change the process, prevent the error, and eliminate the defect.
More About the Solution
Actually, the solution is not as easy as it seems. Telling the installer is not enough. The written Standard Operating Procedure for that job must be re-written. Otherwise, the problem will recur if a new employee takes on that particular job.
There is another solution, as well. We could re-write the contract with the sub-contractor of the blankets, requiring him to deliver all blankets right-side up. He would have to come up with a different counting method, but that’s his problem.
In fact, it would probably be best to put both solutions in place. This does all we can do to reduces the chances of the problem recurring. And problems like this do tend to come back and haunt us.
The Mystery That Remains
If the reversed blanket was the only cause of the fires, then one car in every ten would have caught fire. But that didn’t happen. Why not?
The answer turned out to be that the increased heat from the reversed blanket heated up the engine. But it only heated it up enough to cause a fire if some other difference was also present.
A deeper look showed that, on top of the engine, there is a wiring harness, a bundle of electrical wires serving different features of the car. Depending on different added features, different harnesses were used. Only the harness that went right on top of the engine, closest to the hood, was in danger of catching fire. And some of those harnesses only had wire melt problems, not a visible fire. In fact, the manufacturer hunted down those cars and replaced the wiring before the cars were sold.
So, the only cars that caught fire had:
- A clear coat (so that they went into the oven)
- A particular wire harness (with wires close to the hood that might catch fire)
- An invisible upside-down thermal blanket sealed inside the hood
The wire harness and clear coat were standard options. Only the upside-down thermal blanket was a defect. So preventing that one error - upside-down installation of the thermal blanket - prevented cars from catching fire or having melted wires.
Assembly Lines Over the Years
The End of Our Story
Jim brought the results of his analysis back to the Operations Manager who had hired him. The Operations Manager said, “Who’s fault is this?”
The answer was quite clear, though Jim didn’t say it. The answer was, “Your fault, sir.” Only the Operations Manager can know enough about the flow of all operations to make sure that the output of one process is a correct input for the next process.
Note that the manager asked the wrong question. He did not ask, “Why?” He asked “Who? Who’s fault is this?”
Now, we will go on an imaginary journey. What if the manager had asked the right question: Why do problems like this occur in my factory? That would begin a management root cause analysis.
The Management Five Why’s
Here we go:
- Why did we have random fires breaking out with no discernible cause? Because we did not know the details of our own process.
- Why did we not know the details of our own process? Because jobs were not defined with enough precision, and workers were not empowered by being told why they were doing what they were doing and not told fully correctly how to do it.
- Why did that happen? Because we made assumptions, instead of carefully documenting procedures at an engineering level in a clear, aware way.
- Why did we make assumptions? Because we are operating in a blame environment, where the most senior person asks, “Who’s fault is it?”
- Why is the senior manager thinking in terms of blame? Because the corporate culture is focused on blame, and not on genuine understanding. With genuine understanding as a goal, we could cooperate as a team. Each worker could know how is job is supposed to be done, and be sure to do it right. We would have transparent processes and see why defects occur much more easily. We would be empowered to raise questions about variability in process early, instead of just doing our jobs in a mechanical way.
At a management level, the 5 Why’s consistently demonstrate that well-defined, repeatable processes are the beginning of effective operations that deliver high-quality results at low cost.
We take this further by working as a team to create continuous improvement. Whether we see this as implementing Six Sigma across the organization or growing in capability through the Capability Maturity Model (CMM), either way, we are improving operations by integrating the wisdom of what was originally called Total Quality Management (TQM) into our operations management.
Did you know the Five Whys?
This article is accurate and true to the best of the author’s knowledge. Content is for informational or entertainment purposes only and does not substitute for personal counsel or professional advice in business, financial, legal, or technical matters.
Sid Kemp (author) from Boca Raton, Florida (near Miami and Palm Beach) on September 26, 2012:
Hi Cygnet! Well put! The "no blame environment" is the central feature of all my consulting work.
Cygnet Brown from Springfield, Missouri on September 26, 2012:
Well written, detailed article! I especially like the last paragraph where you talk about placing blame. The biggest problem with the idea of placing blame is that usually "the you know what" rolls downhill and the little guy who put the blanket in wrong gets "the blame". Ideally what should happen is that everyone from the "little guy at the bottom" to the president of the company "takes responsibility" and as you suggest, instead blame rolling downhill, responsibility goes up the chain.