Defect Elimination 101

Robert Latino | President, Prelical Solutions

In recent years, “Defect Elimination” has become a hot topic in the reliability world. But what exactly does defect elimination mean? How does it differ from other maintenance practices? Is it more than just a new way to describe planned and predictive maintenance? This article will seek to answer these questions in a very practical manner.


Let’s start by answering a more basic question. What is a defect? According to Merriam Webster’s online dictionary, a defect is defined as:

“an imperfection that impairs worth or utility.”

The simple definition of a defect that The Manufacturing Game® (TMG) workshops have used for the past 25 years is:

“Anything that erodes value, reduces production, compromises health, safety or environmental performance or creates waste”.

Or in the words of the fictional character, Chance Brooks, plant manager turned corporate manager in the book Don’t Just Fix It, Improve It, “…I came to regard defects as my real enemy. I always thought of them as the little imperfections that caused all of our problems and upsets. Some were big and some small, but when they lined up in just the right way… kaboom!!! A catastrophe would hit.”


A lot of time and attention has been paid to breaking the reactive maintenance cycle –

1.   equipment fails

2.   operations calls maintenance

3.   maintenance repairs the equipment

4.   maintenance turns it back over to operations

5.   rinse and repeat

Many organizations now perform time-based Preventive Maintenance tasks, use predictive technologies, and perform operator rounds; all to find defects. Then effective planning (the ‘what, why and how’ of the job) and scheduling (the ‘when and by whom’ of the job) are employed to remove those defects prior to the functional failure, or perish the thought, the catastrophic failure of the equipment. Those are all characteristics of planned maintenance – simply put:

“finding defects and efficiently removing them before they cause a failure.”

“A much more efficient and cost-effective approach than fixing broken stuff!”

But what can be done to prevent defects from getting into the equipment in the first place? Like death and taxes, having some defects is a certainty – normal wear and tear happens. But most organizations have far more defects than can be ascribed to normal wear and tear. There is no doubt that “extra” defects sneak in based on how we operate and maintain the equipment. It is these defects that are NOT inevitable. And if we prevent them from ever getting into the equipment, we can avoid all that work required to detect and remove them.


Let us take a practical field example and analyze it to uncover the defects hidden in plain sight.

This case will reveal an issue with ‘Misalignment’, which is one of those “extra” defects that can be avoided through proper repair and installation techniques. Both of these are made significantly easier and more effective with the use of laser alignment tools. As this case will demonstrate, properly aligned equipment has less machine vibration, fewer bearing and coupling failures, and may even consume less energy.

In Figure 1, the EVENT is the reason we care enough to commission an RCA (Root Cause Analysis). In our example, the event is ‘Unexpected Downtime Due to Pump-235 Failure’. Now the MODES are going to be how we have experienced such failures in the recent past. Most of our CMMS’s can produce these high-level modes. In this case much downtime due to this pump failure has been attributed primarily to the critical bearings. In our case we know that bearing failures on this pump represent the most annual costs.


Figure 1: Event + Modes = Top Box [All must be FACTS]

The Mode level is what we consider our FACT LINE to start. If we start with facts, and provide our hypotheses with sound validations, we will end with facts. Keep in mind we are traveling down the path of the physics of the failure, so we will continually ask the same question, ‘How Could’.

As you use a logic tree to explore the physics of failure, imagine you have the luxury of a video recorder in your head and you are watching the event as it’s played in reverse. In our case, ‘be the bearing’. Ask yourself, ‘How could I have just failed?’. Move back in short increments of time. It takes some getting used to this type of thinking, but that is the beauty of the logic tree, it guides us without any biases. This tool, when used properly, should be non-personal and non-threatening. We are interested in valid hypotheses, possibilities…that’s it. Then we will use evidence to demonstrate which hypotheses were true and not true. We will only continue drilling down on the one that was true.

In our case, based on the SME (Subject Matter Experts) on our team, we conclude there are only four (4) ways in which a component can fail: Erosion, Corrosion, Fatigue and Overload. So, we list them as shown in Figure 2.

No alt text provided for this image

Figure 2: Hypothesizing and Validation

In our example, we have our on-staff metallurgist (or resident materials guru) visually inspect the failed bearing. They determine with certainty from a visual review (however lab reviews are always more conclusive :-), the bearing failed due to Fatigue. No additional exhaustive testing like scanning electron microscopy (SEM) was determined to be needed. This makes the other hypotheses NOT TRUE.

Same questioning, ‘How could we have fatigue of the failed bearing?’. SME’s indicate either from Thermal or Mechanical Fatigue. The metallurgist (or guru) confirms Mechanical Fatigue.

Our team now asks, ‘How can we have Mechanical Fatigue?’ The prevailing opinion is a sole hypothesis of High Vibration. A review of our PM histories demonstrates this hypothesis to be true.

Questions only beget more questions, as that’s what effective RCA analysts do for a living; they ask the right questions. So, ‘How could we have had High Vibration?’. Our RCA team members collectively come up with: Resonance, Misalignment, Imbalance and Looseness. Evidence pooled together to validate these hypotheses and the team determines that only Misalignment is valid. The journey continues!

No alt text provided for this image

Figure 3: Continued Hypothesizing and Root Labeling

How could we have ended up with misalignment? This is where we are now crossing over from the physics of failure, to the human and systems side of failure. Either someone misaligned the pump from initial installation or repair, OR it was aligned correctly and then became misaligned in operation. Vibration histories demonstrate that since the last installation, this pump has chronically had vibration issues. Notice here that we switched the label on the High Vibration node from a Hypothesis to a Physical Root. This is because this is the first visible consequence after the triggering decision.

Notice that after the decision point, everything is triggered on its own, as cause-and-effect linkages go into play. If there are no human interventions to break the error chain, then it will play out and contribute to the undesirable outcome (Event).

We are at a pivotal point in our Logic Tree at this time. Why? Because we have uncovered a decision point. The mechanic in our case chose to align the way they did, on that day. A ‘decision’ point is our queue to identify a human root, and to switch our questioning to ‘Why’ instead of ‘How Could’. We are not interested in learning of the infinite reasons the human ‘could have’ made the decision, we are interested ‘why’ they did.

So, let’s drill down further and see if we can figure out what was going on in the mechanic’s mind that day!

No alt text provided for this image

Figure 4: Continued Root Labeling

So, in Figure 4, after talking with our fellow mechanic, we uncover many things that we did not know.

1. LTA Procedure: The current alignment procedures are obsolete. They were not updated when new technologies were introduced to the operation.

2. LTA Tools:The alignment tools provided were less than adequate (LTA), represented old technologies.

3. LTA Skill Sets:The senior mechanic retired, and part of his duties conveyed to the junior mechanic that did not have the training and certifications to align properly. They did the best they could with the hand they were dealt.

Chances are, all of these ‘Systemic Roots’ have contributed to other failures as well individually or in combinations. This is because most systems are put in place for a multitude of people to use, under a variety of conditions. This particular combination of system flaws, converged on this day, to influence the well-intended mechanic’s decision that day.


Finally let’s go ahead and review the common defect categories, so it’s easier to track and trend the results of the overall defect elimination strategy across a company.

Figure 5 shows these common categories and provides examples within each.

No alt text provided for this image

Figure 5: Sources of Defects

Let’s refresh our memories on the SYSTEMIC CAUSES. We find that these defects are all related to ‘Workmanship’. This is an indicator the ‘Workmanship’ bucket may warrant some deeper review in general. As a suggestion, if it was happening to this technician in this case, we should check the skill level of others who may be victims of their own systems as well. This is when we determine if the recommendation/correction action is isolated to a single case or more universal.

1. LTA Procedure: Workmanship

2. LTA Tools: Workmanship

3. LTA Skill Sets: Workmanship

This is just a simple example, but the opportunities don’t stop there. Tools that have historically been used in support of planned maintenance to find defects that are hidden from the human senses, can also be used in a defect elimination capacity to avoid putting defects into the equipment in the first place.

For example, ultrasound solutions can be used to provide insight into the current health of an asset in a planned maintenance capacity. However, they can also be used in a proactive defect elimination capacity to ensure the proper amount of lubrication is administered. This avoids the introduction of over or under lubrication related defects.

To truly achieve breakthrough performance, an organization must do more than improve their work efficiency through Planned Maintenance. They must find ways to make some of the work go away. That is where harnessing the power of a site-wide Defect Elimination strategy and available technologies, can provide true leverage.

To achieve sustainable Reliability improvements, the frontline work force can make or break you. So, don’t just tell them about it, don’t just show them; instead, give everyone in the organization a role to play in the process so that they can truly understand and contribute. That brings true buy-in and with it, a fighting chance at sustainability. Go beyond Planned Maintenance…Don’t Just Fix It, Improve It!

About the Authors

Michelle Ledet Henley is President of TMG Frontline Solutions where she has spent the past 25 years helping hundreds of organizations navigate the difficult waters of organizational change using a game-based simulation.

Her enthusiastic facilitation style along with the innovative workshop design bring the workforce (even the most skeptical among them) energetically onboard with their site’s reliability improvement efforts.

Co-authoring various articles and the book Level 5 – Leadership at Work, the sequel to the popular Don’t Just Fix It, Improve It, Michelle has become a thought leader on the emerging and often misunderstood topic of defect elimination. Michelle can be contacted at

Robert J. Latino is currently a Principal at Prelical Solutions, LLC. Bob and his family are formerly the founders of Reliability Center, Inc. (RCI) a 50-year-old Reliability Consulting firm specializing in improving Equipment, Process and Human Reliability.

Bob has been consulting with his clientele around the world for over 38 years and has taught over 10,000 student analysts. Mr. Latino is author or co-author of ten (10) books related to RCA, Reliability, FMEA, and/or Human Error Reduction. Bob can be contacted at

Notify of

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Inline Feedbacks
View all comments

About the Author

Robert Latino President, Prelical Solutions

Mr. Latino is an internationally recognized author, trainer, software developer, lecturer and practitioner of best practices in the field of Reliability Engineering and specifically in Root Cause Analysis & Investigation Management.