White Papers  

RELIABILITY CENTERED MAINTENANCE AUDITS: Why Regular Auditing Better Ensures a Positive Return on Investment

Stephen Allen Bailey | SENIOR ANALYST, IVC Technologies

As advancements in technologies continue to change the face of the industry and its processes, the challenges faced by manufacturing companies in their pursuit of reliability remain the same:

Decreasing maintenance costs and downtime while simultaneously increasing production of high-quality goods, throughput and profits. Reliability programs designed to improve machine reliability play a critical role in achieving these goals.

When designed and implemented correctly reliability programs provide a comprehensive, data-driven solution to improving machine reliability that is tailored to meet the specific needs and budgets of the manufacturing company for which is was created.

These programs encompass not only the specific machines to be monitored, but also the people, processes, and technologies involved in the implementation and ongoing management of the program.

A critical but often overlooked component of a strong reliability program is conducting regular audits. Maintenance operations change over time due to equipment being added and/or replaced, changes to maintenance personnel, and changes to suppliers, etc. Regular auditing of a reliability program sheds light on any new inefficiencies that have emerged, as well as any new opportunities for improvements, enabling a company to reduce maintenance costs, and maintain a positive return on their investment.

This paper will discuss the key components of a “world-class” reliability program that will include technologies utilized (i.e. the fault types they detect, their PF Curve impact, the pitfalls associated with their incorrect application and/or usage), appropriate management systems to have in place, and examples from previous system audits showing how the audit process unveiled hidden deficiencies within the plant’s reliability program.

What is Reliability Centered Maintenance?

Machines are not built to run forever. In fact, as sad a reality as it may be, just like a human technically begins the process of dying from the moment they are born, a machine begins its journey toward failure from the moment it is installed. And just as there are things a person can do to prolong the length and quality of their life, there are things that can be done to extend the life and improve the reliability of industrial machines.

Machine maintenance has been and will always be, a key component to any industrial operation. However, progressive manufacturing companies of today are realizing the importance of viewing maintenance as a significant part of a greater, more comprehensive reliability centered approach to maintaining the health of their machine assets while increasing production and decreasing maintenance costs.

Contrary to what the name may imply, reliability centered maintenance (RCM) is not an actual maintenance program in and of itself. Rather, it refers to a process for carefully analyzing machine assets on an individual basis to identify potential problems and the appropriate maintenance method to apply to ensure those assets continue to produce at maximum capacity.

RCM is about working smarter…not necessarily harder.

Where a preventive maintenance strategy involves checking all assets regularly, RCM looks at assets deemed most important/critical, asking “what could go wrong to cause it to fail”, and planning accordingly. Sure, it takes time to establish and implement an RCM program, but once it’s up and running, it will save a lot of time and money.

For an RCM program to be effective, it must be built on a foundation of solid asset condition based maintenance (CBM), which in turn relies on accurate data. There are many predictive maintenance (PdM) and condition monitoring tools and technologies available today for detecting potential machine issues before they become major problems such as vibration analysis, infrared thermography, and ultrasonics. But in order to use these tools, one must have a thorough understanding of how a machine breaks down in the first place.

The P-F Curve

Just because a machine is working right now does not mean that failures have not already begun within the system. In fact, most early signals generated by a machine cannot be detected without the use of the PdM and CBM tools referenced above. In addition, most machines will continue to run even after a failure has begun. However, once this incident occurs, it’s merely a matter of time before the machine fails catastrophically.

The P-F Curve is among the most important tools for an RCM plan due to the valuable insight it provides into the relationship between machine failure/breakdown, cost/time, and how it can be prevented. By examining the various elements that make up the P-F curve, it’s easy to see how vital and cost saving RCM truly is.

The P-F Curve Explained

Two axes create a plane upon which the P-F curve lies. The X-axis represents time elapsed. The beginning of the axis (left) is when failure starts to occur and the end (right) is when the machine actually fails. Along the Xaxis there are many instances where faults can be detected before the point of failure. Unfortunately, the ones that are the most recognizable without professional, high-tech tools (e.g. heat, noise, smoke) will usually already mean costly repairs or replacements.

The Y-axis represents the machine’s condition. The assumption is that the machine is already in top working condition just before and at the point of failure (top left of the graph). As time progresses from the initial point of failure, the machine’s condition moves down the Y-axis until it ultimately fails. The P-F Curve interactive illustration below shows the relationship between when failures are detected and the cost to repair them. Notice that the later they are detected, the more expensive the repair.

This is why RCM is so important. Since early detection indicators are not noticeable without the aid of technologies (e.g. ultrasound, vibration analysis, oil analysis and infrared thermography), implementing routine inspection along designated routes by a licensed professional as part of an overall RCM strategy is the best way and most cost-effective approach to keeping machines running at optimum efficiency.

Technologies of Early Detection and Pitfalls of Incorrect Application. There are many different technologies available for identifying failure indicators before they snowball into costly
repairs and worse, causing a shutdown of production. However, if not performed correctly, the value of these technologies is significantly diminished. Let’s take a closer look at the “big players” in early detection and the pitfalls of not using them correctly.

Vibration Analysis

Most failure modes can cause an increase in vibration. That’s why this area of machine health is the predominant and most widely used method to determine equipment condition and predict failures. “No single measurement can provide as much information about a machine as the vibration signature.” – Art Crawford, founder of IRD Mechanalysis Common faults detected with vibration analysis:

Common faults detected with vibration analysis:

• Misalignment
• Looseness
• Unbalance
• Bent shaft
• Bearing failure
• Gearbox failure
• Pump cavitation
• Electrical faults in motors
• Resonance and natural frequencies

Pitfalls of incorrect application

As powerful a tool as vibration analysis is for early fault detection, it is often the most misapplied technology of all the condition monitoring tools. Here are a few examples:

• Improper accelerometer mounting

Mounting an accelerometer correctly is key to obtaining reliable data. Too often, when placing the accelerometer, technicians allow the device’s magnet to pull the accelerometer out of their hand and slam it down onto the machine. Doing so does two things: it disturbs the settling time needed by the accelerometer before accurate data can be taken and it can be damaging to the accelerometer itself. The proper way to mount an accelerometer is to bring it close to the machine at a side angle so the magnet
doesn’t pull the device from your hand and gently tip it onto the machine.

• Improper monitoring techniques for specific machine types

Many times, we find customers who have a Vibration Analysis Program that are routinely collecting data on equipment, but have databases setup too generically. Database setups define the vibration test data results captured while taking route-based samples and need to be specifically tailored for each machine type being monitored. All too often only speed is used to differentiate measurement types from machine to machine, leading to improper Frequency Ranges and Lines of Resolutions being collected for analysis. This leads to many missed diagnoses, increases in time for taking additional “off-route” data, and limits the ability for early fault detection.

• Using vibration analysis more reactively than proactively

All too often, companies lose sight of the value of vibration analysis as a tool of PdM and use it as more of a reactive maintenance tool. For example, simply using vibration analysis data to find a bearing defect in a machine and then scheduling it for repairs without utilizing that information in a Root Cause Failure Analysis. One could argue that the technology “predicted” a potential catastrophic failure, but without the Root Cause Failure Analysis, you are still simply “reacting” to a failed bearing. The PdM data needs to be used to its fullest in order to obtain maximum cost reductions and payback to the facility by actually providing answers to why equipment fails in the first place. The information then can be used by the reliability program to implement changes that will actually “prevent” future failures of this type from occurring.

Oil Analysis

Oil analysis is the sampling and laboratory analysis of a lubricant’s properties, suspended contaminants, and wear debris. (Vibration institute) Since oil and other lubricant types can be considered the “Life Blood” of our equipment, it only makes sense that PdM and CBM tests can yield information regarding impending failures. Oil Analysis data contains information not only on the state of the lubricant being sampled, but also the condition of the equipment that it was pulled.

Common faults detected with oil analysis:

Lubrication Faults
• Viscosity problems
• Water incursion
• Additive pack depletion
• Other chemical properties (acid levels, etc.)
• Oil mixes of different types of lubricants
Machine Faults
• Gear wear
• Bearing failure
• Seal failures
• Filtration issues

Pitfalls of incorrect application

• Testing only when a problem occurs

Much like vibration analysis, oil analysis is a bit of a trending game. Using oil analysis only when a problem occurs and therefore, not establishing any historical data during “normal” operating times, the diagnosis of issues is highly unreliable and shifts its use from a “predictive” tool to a “reactive” function.

• Testing too infrequently

Part of establishing historical data is performing oil analysis at regular and frequent intervals. While companies sometimes cut back on the frequency of oil analysis as a cost saving measure, they often find it a short-sighted solution with long term ramifications.

• Utilizing poor sampling techniques

Extracting oil samples is often the weakest link in the oil analysis process as certain methods are more likely to result in contamination. For example, sampling from the drain port using a drop-tube and vacuum pump are probably the most commonly used techniques for extracting oil samples. Unfortunately, they are also the two techniques most likely to produce contaminated samples. Dedicated sample valves installed in correct locations, along with consistently documented sampling procedures is the more reliable way to go. For circulating lubrication systems, sampling should be obtained ahead of any filters.

Infrared Thermography (electrical or mechanical)

Nearly everything that uses or transmits power gets hot before it fails. Infrared Thermography is the ONLY diagnostic technology that enables instant visualization and verification of thermal performance when used by a qualified technician.

Common faults detected with infrared thermography:
• Loose connections
• High resistance connections
• Damaged contacts
• Overloaded circuits
• Faulty heat exchangers
• Plugged filters
• Coupling defects
• Bearing defects

Pitfalls of incorrect application

• Not scanning under load
If a machine is without load it won’t generate heat and as a result, faults will not be visible. This is valid for both mechanical and electrical surveys.

• Misreading reflections
A common mistake when using infrared thermography is to identify a “fault” that is actually a reflection. For example, if you’re standing in front of a metal surface conducting an infrared scan, chances are that the “hotspots” you see are really just reflections of heat from your own body, another heat source.

• No direct line of sight/field of view
IR cameras are not X-ray cameras…they can’t see through things. They only detect heat from the object you are looking at so if you do not have a direct line of sight, faults could be missed.

Electric Motor Testing (online and offline)

Photo provided by PdMA Corporation

Motors are literally the driving force behind many industrial processes. So, when they fail, it can cause production shutdown and millions in lost revenue. That’s why by both online and
offline testing to be considered when developing a reliability centered maintenance program.

Common faults detected with online motor testing
Online testing, also known as dynamic testing, is performed while a motor is running and provides data on the
power quality and operating condition of a motor. Faults detected include:
• High/low voltage levels
• Voltage and amperage imbalances
• Rotor bar and air gap defects
• Bearing defects

Common faults detected with offline (static) motor testing
Offline testing, also known as static testing, is performed while the motor is not running and provides data on the electrical circuit and in some instances the rotor’s reactance to the windings. Faults detected include:
• High resistance connections
• Imbalances between the windings

Pitfalls of incorrect application

• Improper training
Though most online and offline test equipment comes with preset test regiments and Alert / Alarm settings, it is key for the person performing the tests to know the test tools, equipment being tested, and basic electrical theory in order to effectively diagnose the results. All too often these tools are introduced to PdM departments with only the OEM’s basic training provided to a handful of technicians who have little to no electrical background.

• Improper periodicities
Motor testing, like any PdM technology, should be setup to provide meaningful data on equipment prior to a catastrophic failure occurring. A dedicated test schedule should be built up front in order to aggressively find problems prior to failures, and guide the scheduling of necessary repairs. Program metrics will then need to be used to monitor the amount of equipment being tested vs. the amount of findings being made to ensure money and resources are not wasted on unneeded testing.

Ultrasonic Leak Testing
Ultrasonic leak testing is often one of the most overlooked types of CBM applications. Compressed air, gasses, and vacuum systems are often some of the most expensive systems for a plant to operate due inefficacies created by leaks. Ultrasonic Leak Testing is one of the cheaper technologies that can be deployed at a facility, with the largest “instantaneous” payback with tangible results. Additionally, most PdM Technicians can be trained using these tools in less than one day.

Below is an example of what recirculated cavitation looks and sounds like via ultrasonic testing

Common faults detected with ultrasonic leak testing
• Gasket leaks
• Faulty connections
• Damaged pipe/lines/hoses
• Vessel leaks
• Vacuum leaks

Pitfalls of incorrect application

• Well defined routes, periodicities, and tagging methods not defined

Without defined routes and setup periodicities, there really is no “program”. Going out and identifying leaks is by far easier than managing the findings. A well-defined tagging, reporting, and documenting procedure needs to be maintained to ensure program success.

• Cost per CFM on compressed air and gasses not defined

• The whole purpose of isolating and repairing leaks is to save the facility money.

In order to justify equipment costs, manpower, and the cost of repairs, one must know the value associated with each leak found. The costs for “plant air” will be lower than the costs for “instrument air”, and other purchased gasses at a facility will be even higher. Knowing these costs, calculating each finding, and then showing the resulting “savings” accrued by performing the repairs has to be maintained in order to obtain plant “buy-in” for these services.

Management of Deployment Technologies

As amazing as today’s CBM technologies are at the early detection of potentially catastrophic machine problems,companies must precisely manage the deployment of these technologies to ensure that there is a clear understanding of their impact on reliability and that they are being utilized appropriately. That’s why a topdown reliability vision, with clearly defined goals and objectives must be communicated, along with ample support and reasonable expectations for compliance. Our many years of PdM/CBM auditing experience has revealed the following key components to have in place to better ensure a maintenance strategy that will deliver its value of cost-savings and an increase in machine reliability:

• Management support

Even before a reliability program is implemented, it is critical that everyone within an organization (e.g.maintenance, operations, purchasing etc.) understands the equipment reliability process and is willing to work together to achieve reliability goals and objectives. Too many reliability programs fail because management and plant personnel lose confidence in the process and its ability to deliver the required results simply because they don’t clearly understand the process and the technologies involved. The success of any reliability program is dependent on management support and commitment to follow-through.

Therefore, another factor in successful programs is ensuring that the various technologies deployed are set up correctly, monitored at the appropriate time intervals with correct collection procedures followed, and the resulting data analyzed efficiently and effectively, documenting trends and abnormalities consistently.

• Functional effectiveness

Functional effectiveness is the ability of an asset or system to meet acceptable standards of performance.Identifying and evaluating failure modes and their effect on system function helps to prioritize and determine the appropriate maintenance strategy and is a key component to minimizing the likelihood of a system failure occurring. It is vital that the individuals doing this initial analysis have a familiarity with the machinery involved as well as a solid understanding of the process of determining failure modes as it will be the foundation upon which the maintenance program is built.

• Linkage/integration

The maintenance tasks and recommendations resulting from the initial core analysis must be carefully and effectively implemented into a comprehensive maintenance program. A planning and scheduling program should be linked to the CBM program for acting on exceptional equipment identified in CBM reports. Follow-up measurements should be documented to ensure of proper repair/replacement of faulty equipment.

• Training

Technicians and machine operators are a company’s first line of defense when it comes to detecting machine issues. They wrestle with the day-to-day machine problems, often sensing when a machine is not running normally by using their five senses. But it takes an experienced technologist to determine the source of the machinery problem. Many CBM technologies require formalized training and on the job training to become proficient in the technology. Trusting in the expertise and experience of trained personnel will improve confidence in CBM recommendations by maintenance and manufacturing organizations. After all, a maintenance program is only as good as the accurate data it provides and training of personnel is the key to a successful program.

• Staffing

When budgets need to be cut, maintenance is unfortunately often the first place management looks. Yet a solid maintenance program is the very thing that saves companies the most money in the long run. Selection of personnel should be performed by a qualification based process to ensure the ability to learn the needed skill to become proficient in the technology. Each CBM technology should have minimum requirements for performing the tasks involved. Having adequate personnel on the reliability team is crucial to ensuring that the program continues to deliver the cost-savings and machine reliability for which it was designed

• Metrics

What gets measured gets improved. That’s why one of the most important decisions in any CBM program is choosing the right metrics. Metrics should be simple, straight forward, and aligned with
tangible goals and short-term actions. Tracking and reporting on them allows the focus to be on areas that need improvement and provides upper management with the data needed to back it up.

In God we trust. All others must bring data.” – Dr. W. Edwards Deming

Sample Data
A cross-section review of CBM / PdM Audit data taken in chemical, oil, building product, and steel manufacturers over the past few years was performed to try and highlight outliers within the audits that negatively affected the Management Systems and Technologies cumulative scores. Multiple key indicators that contribute to facilities scoring in the Bottom Quartile on their Management Systems Rankings were identified during this study.

Data reviewed found:
• 100% of the participants had glaring deficiencies in their Management Support, Target Development,and Metrics portions of the audits.
• 60% of the participants scored poorly in their Functional Effectiveness and Technology sections, with 40% of them receiving low marks for their Training review.
• Linkage/Integration scores were also found to be in the Bottom Quartile for 20% of the audits reviewed, and in the Bottom Middle Quartile for 40% of these facilities. Staffing was another section of the audit that negatively impacted Management Systems Overall Rankings in this study, with 80% of all facilities scoring in the Bottom Middle Quartile for this indicator.

Data reviewed in regards to the Technologies Section of these audits found:
• Reporting and Advanced Feature Usage were two items scored in the Bottom Quartile for 86% of facilities sampled.
• Database Completion was another section that negatively affected 57% of all facilities reviewed.
• Technician Certifications were found to be a problem 43% of the time.
• Other notable items included Technician Training, Technician Time Usage, and Equipment Used sections that were found to be in the Bottom Quartile for 29% of these facilities.

As manufacturing companies continue to push their production equipment to the brink to squeeze out every last ounce of capacity, the reliability of their machine assets becomes even more important. The difference between successful maintenance programs that actually produce cost savings and increased machine reliability and unsuccessful ones that start out strong but then fail to deliver is ongoing assessments of the program itself.CBM/RCM programs need constant monitoring and tweaking to ensure that machines are being regularly maintained and redundancies eliminated. In addition, there should also be a system in place for tracking the program’s effectiveness. Regular auditing of CBM/RCM programs ensures that companies are receiving a positive return on what is surely a large but worthy investment.


About IVC Technologies
IVC Technologies is dedicated to helping our customers achieve optimal efficiencies through condition-based monitoring (CBM) utilizing our highly experienced and certified CBM analysts, cutting-edge PdM technologies, and equipment with unsurpassed analytic capabilities. Our Advanced Testing Group (ATG) is comprised of the foremost leading experts in the diagnosis of the most complex problems plaguing industry today.

Notify of

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Inline Feedbacks
View all comments

About the Author

Stephen Allen Bailey SENIOR ANALYST, IVC Technologies

Allen’s depth of knowledge and dedication to industrial maintenance stems from over 25 years of experience in the field, with eighteen of those being in Predictive Maintenance
and Reliability Technologies. He is a Certified Level III Vibration Analyst specializing in Turbo Equipment testing, ODS & Experimental Modal Analysis, and Continuous Monitoring Systems.

Allen has extensive experience performing diagnostic testing in many different industrial environments. These environments have included petrochemical, power generation,
steel manufacturing, wood products, mining, air separation, and off shore oil field development. Allen’s comprehensive industrial experience and extensive equipment knowledge will
provide outstanding troubleshooting and analysis to all of our respected clients