Five Minute Facts  

Reliability Engineering: Back to Basics

Howard Penrose | President, MotorDoc LLC

Reliability is a subordinate topic to industrial and manufacturing engineering that has been branching out into it’s own more and more the past few decades including an increase in the number of reliability engineer titles that are handed out within organizations.  The problem is that the work surrounding the reliability engineer is not what it should be and is often just another title for maintenance manager, maintenance engineer or planner.  In the end, the opportunities surrounding the application of a true reliability engineering department are lost to the facility, or company, as another box is checked off to state that ‘we are reliable.’

To understand the problem that we are facing we have to first understand what it is that we are discussing as the variety of information available in the after-market space combines the terms reliability, maintenance, physical asset management and even asset management in much the same way.  Overall, a review of the information available from different organizations and consulting groups has resulted in a very confusing mish-mash that promotes the confusion that senior management, let alone those in the industry itself, are dealing with.

What is reliability?  The general definition is: the probability that a system or product (system) will perform in a satisfactory manner for a given period of time when used under specified operating conditions or operational context.  This means that it is a system that allows us to determine a measurable chance and confidence level that the system will be available upon demand to perform as expected, not necessarily as designed, given an operating context.  As a result, the concept around reliability allows us to determine means and methods to: know what will occur if we operate out of the design thresholds, identify changes that would cause the system to no longer perform in a satisfactory manner, and will allow us to perform tasks that will either maintain the system operation or project some time to failure based upon tests or observations.

Wind Turbine Study to Identify Component and Maintenance Practice Modifications to Improve Operational Availability

One of the concepts discussed within the recent past is the concept of ‘inherent reliability.’  This really has been an undefined term from the design and manufacturing perspective, but has been recently used in marketing terms as the ‘design reliability’ of the system, which is, in industrial and design engineering, the availability (inherent, achieved, and operational).  For example, when participating in the design of a hybrid tractor, the work we performed on the electric machines was not to determine an ‘inherent reliability,’ but to determine how to optimally set the ‘availability’ for a percentage of the systems to survive across a period of time to a specific level of confidence for a specific operating context.  We can define that as the B20 at 20,000 hours, to a confidence level of 85%.  Research into components and survival can then be performed and the proper and cost effective selection of materials and manufacturing processes can be developed.  However, we cannot do that using the variety of inherent reliability definitions that can be found.  Finally, the concept of inherent reliability, itself, puts extreme limits on the reliability engineer.  In effect, it creates a glass ceiling saying that ‘you cannot pass this level of reliability’ with the system which, by definition, is false.

The after-market reliability engineer that is involved post-design should be involved in determining the impact from the operating context of the system outside of the designer’s operating context.  A system that is designed, such as a pumping system, is developed and information is made available to design into an application that, most likely, will not be operating within the original design concepts of the manufacturer.  A reliability engineer would be tasked with determining the operational availability of the system in the new operating context and, understanding the new operating context, identify options such as monitoring, re-engineering, or run to failure.  This is where tools such as Reliability-Centered Maintenance (RCM) come into play as well as a variety of modeling systems available to the engineer either from the manufacturer or elsewhere.

Field Testing Design Study Results at General Motors Engineering Laboratory

Design Study to Evaluate Hybrid Vehicle Improvements for Improved Inherent Availability

One of the options available to the reliability engineer is re-engineering, or modifying, the system to meet the expected operation context and the required availability.  If we are working with a product line that has a specific end-date for product line replacement or retirement, then expectations can be developed for the system, or components within the system.  When involved at the specification development of the product line, reliability engineers can determine the optimal costs associated with meeting the prescribed availability expected by the organization and provide options along with financial impacts.  Once we know the expectations, then we can take a closer look at the maintenance needs from corrective to predictive maintenance practices.  This can even occur on the design side of the system.

Using the example of the hybrid tractor, the electric machine selected has specific design characteristics due to the motor manufacturer’s patents.  However, with the knowledge of reliability expectations of the tractor manufacturer, it is determined that the design characteristics cannot be met through the motor manufacturer’s process and the material selection does not meet aging requirements in the operating context of the tractor.  The reliability engineer then works with the manufacturer to determine if motor manufacturing can be modified, or the design can be modified, in order to meet the availability expectations for the system.  Alternately, expectations can be modified or maintenance tasks developed to meet the manufacturer’s target.

644 Tractor Hybrid Prototype Being Prepared for Testing Circa 2010

Once the tractor arrives at the site, such as a quarry, it is quickly discovered that the operators have been tasked with moving a specific amount of product within a specified period of time that the tractor is capable of, but exceeds the manufacturer’s design context.  A review by the owner’s reliability engineer and the manufacturer’s engineer determine that the result will be increased particulate in the cooling system, a part of the hydraulic system, which would cause the motor bearings to fail or the windings to short, based upon product research.  Changing the filter to a more dense filter would restrict the cooling system to a point where the thermal life of the machine would be similar to the result of no action taken.  It is finally determined that a slightly more dense filter is selected and the filter replacement maintenance task is more frequent, while other modifications are made to the operation of the machine to allow the increased workload.  Under the new modifications and operating context, the operating temperature of the motors and hydraulic fluid are closely monitored in order to identify early defects from wear and the new inherent availability and overall reliability is determined.

Hybrid Tractor Testing and Improvements to Lower Reliability Components for Improvement and Aging Studies

Sometimes the issue comes in the form of the result of a root-cause failure analysis.  A flywheel energy storage system was designed with special characteristics that included direct exposure of the insulation system to a vacuum.  The 0.5 MegaWatt system was originally designed to be able to provide power for a specific amount of time at 460 Volts as the torsional energy of the flywheel was converted back to electrical energy through a control.  In order to do this, an inverter sped the motor-generator up to a speed not to exceed 7000 RPM, and maintained that speed, per the design and the vacuum had to be held at less than 0.08 Bar for both friction and insulation resistance purposes.  A sales engineer sold the system with the expectations of a longer run-time based upon smaller units which required an operating speed of 12,000 RPM.  The insulation systems started failing within a few hundred hours of operation.  Engineers decided, at this point, to increase the operating pressure to 0.4 Bar, which would have a minimal effect on the coast-down and immediately noticed that the machines started catestrophically failing within 10s of hours.  An RCFA was implemented.

Flywheel Control Trailer Testing as Part of RCFA

A review of the original design research identified that the maximum speed that the machine could operate was 9000 RPM and that specific electrical and mechanical conditions existed past this point that would severely impact the reliability of the system.  It was also determined that increasing the pressure in the system caused additional problems per Paschen’s law, in which the winding now became conducive to ground.  Another unexpected finding was that the output filters between the drive and motor-generator were home-made resulting in significant stress to the winding.  In order to meet an acceptable operational availability that would allow a reasonable simple payback to the owner and investors, the proposal was 9000 RPM, the addition of sine wave filters, and returning the pressure back to 0.08 Bar.  While the increase in speed would reduce the overall expected life-cycle of the machine, the reliability of the system could be calculated and the owners/investors could make a business decision on a lower return on investment or retiring the system.  The purpose of reliability engineering then becomes investigation; assist in design review, and then provide data back to the owners/investors so that a decision could be made.

Modified Vacuum Pressure Tank for Insulation Studies as Part of Flywheel RCFA and Reliability Improvements

In less dramatic cases, a reliability engineer would be investigating such things as greasing frequencies, the application of IoT systems, the development of cost-effective maintenance strategies, root cause analysis, design review, manufacturing process review, and other activities that can potentially have a significant impact on the organization.  According to the National Institute of Standards and Technology study, “Economics of Manufacturing Machinery Maintenance: A Survey and Analysis of US Costs and Benefits,” just the development of optimal maintenance practices can have a significant impact (in relation to NAICS 321-339 excluding NAICS 324 and 325 – petro-chem) per year:

  • Reduction of maintenance costs by up to $16.3 billion from unplanned failures plus buffer inventory costs of at least $0.9 billion, of the $74.5 billion in annual maintenance activity expenditures;
  • Avoid losses due to preventable maintenance issues up to $119.1 billion: $18.1 billion due to downtime; $0.8 billion due to defects; and 100.2 billion due to lost sales from delays and defects.
  • The reducton of an estimated 16.03 injuries and 0.05 deaths per million employees;
  • Advanced maintenance strategy impacts of predictive maintenance: $6.5 billion from downtime reduction and $67.3 billion in increased sales;
  • Additional benefits where less than 50% of the strategy is reactive maintenance: 15% less downtime, 87% lower defect rate, 66% less inventory; and,
  • Energy cost improvments of at least 15%.

Overall, a properly selected and applied reliability engineer or reliability engineering department will have an overall impact on the company bottom line.  The reliability engineer provides the most significant impact when tasked with new and repaired equipment specifications, root cause analysis, and the development of optimized maintenance strategies.

For purposes of the article, the above reliability targets are fictional, with the exception of the NIST report, and not representative of the actual targets pursued by the projects described.

Notify of

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Inline Feedbacks
View all comments

About the Author

Howard Penrose President, MotorDoc LLC

Howard is the President of MotorDoc® LLC and the 2018 Chair of SMRP. He has over 35 years of electric motor testing, repair and design experience, starting with a US Navy motor repair job to advanced electric machinery design. Howard is also involved in legislation with the US Government regarding Cyber Security, Infrastructure, Energy, SmartGrid Education and Safety.