What is Reliability Centered Maintenance (RCM) and How Do You Get There?

Reliability Centered Maintenance

What is Reliability Centered Maintenance (RCM) and How Do You Get There?

What is RCM

What is RCM?

Reliability centered maintenance (RCM) is a study in how to balance requirements using risk-ranking, optimized maintenance strategies, and focusing on worst offenders. RCM analysis is a review process to preserve a system’s function by identifying likely failure modes and assigning feasible proactive tasks. This overall strategy helps stakeholders work on the right asset, with the right strategy, at the right time, by the right resource, at the least cost. If this strategy succeeds, there should be less unplanned work – meaning less equipment failures.

But How Do You Get There?

Being an advanced process, there are several steps to “get there”:
  1. Perform benchmarking – Acquire the knowledge essential to asset management; you don’t know what you don’t know. Evaluate new/innovative ideas to see if they might add value to corporate goals/objectives.
  2. Identify the endgame – Create a policy statement that converts the mission/vision into an Asset Management Plan that establishes SLA requirements and tracks condition. Create a CMMS Utilization Plan that describes how the software will support asset reliability, work force productivity, and job safety. Then, create a Long Range Plan which pursues continuous improvement and identifies the shortest path to value.
  3. Select a CMMS which is configurable – The odds are great that you will need to configure your CMMS to support advanced processes, such as adding failure mode to work order, storing results of RCM analysis inside CMMS, and creating the failure analytic. Becoming a reliability leader is a journey which lasts a long time.
  4. Identify the location hierarchy and asset registry – Apply a risk-based, criticality to all assets. Capture asset condition, useful life, replacement cost, install date, manufacturer, warranty expiry data, and nameplate data. Link to operating location and system.
  5. Establish a Reliability Team – Encourage asset management stakeholders to become certified reliability leaders (i.e. CRL by ReliabilityWeb) so they all speak the same language. This team will manage RCA, chronic failures, defect elimination, and review significant feedback. This team will leverage data in the CMMS to manage by exception.
  6. Conduct maintenance needs assessment – This could be a combination of RCM/FMEA analysis, PM optimization, OEM guidelines, and senior staff input. Identify optimum maintenance strategies including condition-based technology and wireless sensors.
  7. Set up job plans and PMs – These should store job instructions (what to do), craft requirements, materials, tools, hazard precautions, and permit requirements. Also, link job plans to PM.
  8. Implement closed-loop processes to support continuous improvement:
    1. Work order feedback – Examples include maintainability, ergonomic issues, design flaws, safety issues, and inadequate PM-job plan instructions or frequency
    2. Defect elimination – Cross functional teams proactively look for defects on critical systems on a regular basis, and make corrections
    3. Root cause analysis (RCA) – Significant events require in depth study to prevent recurrence
    4. Chronic failure analysis – The reliability team runs an asset offender report to identify bad actors and drills down on failure modes
  9. Set up formal planning/scheduling process – Implement a risk-based work prioritization scheme. Apply ranking to both planning and scheduling backlog. Also, develop integrated project tracking which includes a WBS to store budgets, work orders to capture actuals, and schedule activities to capture progress.
  10. Conduct audits and surveys for data and processes – The more advanced a system is, the more likely “things can go wrong” – especially data quality. If the goal is to make data-based decisions by leveraging data in the CMMS, then multiple techniques need to be deployed to ensure data quality, such as:
    1. Train staff not only on software navigation but also business rules and definitions. For example, is it clear what defines an asset? A weekly schedule? Reactive maintenance? Maintenance backlog?
    2. Train operators on identifying failure modes and implement operator driven reliability philosophy.
    3. Train working level on precision maintenance.
    4. Install a CMMS gatekeeper role to perform work order quality grading. Also, apply consistent categorization/prioritization.
    5. Utilize Business Analyst user community surveys. Also, perform process/procedure audits.
    6. Proactive error checks, such as, INPRG work with “old actuals”: time in status report, PM-WOs that have been routinely cancelled/skipped, foundation data missing key attributes, assets with no maintenance strategy, track stockouts, critical spares, and slow-moving inventory.
    7. Encourage staff feedback. Imagine the benefit of on-going feedback by maintainers, operators, and even engineers to continuously refine maintenance strategies. Communication of this type makes the difference between average and best-in-class.
    8. Make it easy to capture/enter missing data elements such as failed component or asset. As far as “ease of use,” strongly consider mobile/handheld solutions.

In Summary

The above 10 steps will move any organization from an implement/operate mode to a utilization/optimization mindset. And with a long rang plan in hand, organizations can take a step-by-step approach and be assured they are on the right path.
LinkedInTwitterFacebookEmail

Introducing Universal Failure Codes

universal failure code

Introducing Universal Failure Codes

universal failure codes

For many years, the CMMS community has stumbled over this design. Some user sites build failure code hierarchies that have over 20,000 boxes. Some believe failure codes should always be specific to the equipment classification, or failure class. Lastly, some are not familiar with the RCM standard SAE JA1011, where failure mode is clearly defined.

There are 3 Types of Assets

    1. System assets — are complex; can have close to a hundred components
    2. Simple assets — are basic in design; they have a short list of components that can fail (e.g. a Fire Door)
    3. Pseudo-Assets — are not real assets, but get worked on by Facility Maintenance (e.g. a regular door)

The above categorization is used to determine the component list. And the good news is, there is absolutely no reason why you cannot capture the failed component on all 3 types.

Key Design Elements

  • Maintenance is performed at the component level.
  • Failure code is not the same as failure mode.
  • It is not necessary to store Cause and Remedy at the bottom of a Failure Code Hierarchy.

Procrastination

Some put this effort off until later due to its complexity. Thus, they may be operational for years and never have any failure data. This is very sad because it is rather hard to go back in time to recover this data (from narrative text fields).

The first step towards understanding this puzzle is to understand the different types of failure data. Some of it is structured (validated) and some is unstructured (text field). Both are needed for proper analysis; however, it is the failure mode that needs to be broken into 3 pieces:

  • Failed component
  • Component problem
  • Cause code

The Failure Analytic is Based on What?

Speaking of analysis, the endgame is to make better decisions, not to build a hierarchy of codes. If you had thousands of assets, how do you know which one to focus on? How would you extract a bad actor list programmatically? I’ve listed some ideas below:

Extract those assets which have had

  • the most repair work performed, or
  • the highest cost to the asset, or
  • the most downtime, or
  • the smallest MTBF, or
  • asset criticality times asset condition

In other words, you need to think about the metric you will use to extract these bad actors. The metric I like the most is Average Annual Maintenance Cost divided by Replacement Cost. Once you get a top 10 list, then you need to be able to drill down on cause.

There is one type of failure analysis that offers the greatest impact to reducing O&M costs. And that is shown in this picture. This design is further described in my book, Failure Modes to Failure Codes.

In addition, I am willing to visit your site to explain in detail exactly how this advanced design is implemented, or you can attend MaximoWorld 2019 and the 3-hour short course on Demanding Excellence from your Asset Management System.

LinkedInTwitterFacebookEmail

Do You Have a CMMS Utilization Plan?

CMMS Plan

Do You Have a CMMS Utilization Plan?

CMMS utilization plan

On average, most organizations struggle to capture actionable data inside the CMMS that can be used for analysis. But then again, maybe management never intended to. This all comes back to “how leadership plans to use the CMMS.” Without leadership involvement, the asset management design will suffer.

When questioned about advanced processes, some organizations quickly state that they don’t have the staff to perform advanced processes. The question then becomes, “What do you really expect from the CMMS?” Advanced processes have complexity, but they also offer the greatest return on investment.

  • You can have excellent procedures in place, but lack enforcement.
  • You can have excellent procedures in place, but lack staff or clear roles.
  • You can have staff in place, but lack vision. This includes understanding the endgame, with a roadmap to get there, coupled with an ongoing search for excellence. This also includes the understanding of advanced processes, which offer the greatest return on value.

Key roles are identified in the graphic above, which match up to processes. Some are basic and some are more advanced. If your goal is to create and close work orders, then maybe you don’t need planners and schedulers. If your goal is to perform (chronic) failure analysis using the CMMS, then you will need supporting roles and processes.

Many organizations fail to create a utilization plan. This document takes the ISO-55000 SAMP one step further and documents how the CMMS can/should be used. Below are questions that may give insight into where you are versus where you want to be:

  • Is there an oversight Core Team? Do they perform benchmarking?
  • Is there missing foundation data?
  • Are you experiencing recurring failures? Are there high levels of reactive maintenance?
  • Is the maintenance backlog out-of-control?
  • Is work being performed without work orders? Are all actuals being captured?
  • Are you in charge of the assets, or, are they in charge of you?
  • Is tribal knowledge the only source for failure history? Can you run a report to identify bad actors and failure modes?
  • Do stakeholders have a strong understanding of asset management fundamentals?
LinkedInTwitterFacebookEmail

Reliability Leaders Can Be Everywhere

reliability error

Reliability Leaders Can Be Everywhere

reliability leaders

The 4 Principles of a Reliability Leader

  1. Integrity — Doing the right thing when no one else is around.
  2. Authenticity — Being who you say you are.
  3. Responsibility — Following through on assigned actions.
  4. Aim — Pursuing goals greater than one’s self.

The above is from a Reliability Web CRL study guide. The point to be made is that when talking about reliability it is not just an asset or equipment, rather it is also a person, at any level in the organization, who seeks continuous improvement in individual performance as well as process, role, and data.

Reliability Leaders Do the Following

  • Perform regular benchmarking activities (search for knowledge).
  • Acquire a common language for reliability through study.
  • Create a declaration of reliability so that others know where you stand.
  • Strive to take action to improve reliability.
  • Support defect elimination.
  • Help to empower others.

Example Roles Supporting Reliability

  • Senior Management sets goals for performance, identifies RCA projects, and manages the long-range plan.
  • Engineering is responsible for the design of (and often oversees the installation of) new equipment. Engineering also tracks asset performance, identifies likely failure modes, and tracks risk.
  • Purchasing procures new equipment, which in turn, affects reliability.
  • Storeroom staff contribute to equipment reliability by ensuring that proper parts are available, staged, and properly managed inside the storeroom. Critical parts are categorized, and standardized descriptions support search.
  • Planners and schedulers focus on the right work at the right time based on criticality.
  • Operators are trained to discover failure modes. They also perform minor maintenance tasks, including condition-based monitoring. Operators report emergency/urgent repairs, but otherwise adhere to the schedule.
  • Maintenance staff perform scheduled work using the planned work instructions and adhere to job safety precautions. They also provide regular feedback at work order completion by updating the CMMS with accurate feedback regarding PM-job plan accuracy, asset maintainability, design flaws, ergonomic issues, safety hazards, and energy inefficiencies. Plus, they can provide overall asset condition updates on monthly PMs.
  • The Reliability Action Team performs chronic failure analysis by extracting the top 10 worst performers using CMMS failure analytic.
  • The Core Team manages the overall asset management information strategy by ensuring clear roles and processes, staff training, audits, and benchmarking activities. They ensure accurate data is in place to support analytical reports.
  • The Business Analyst works closely with the Core Team to capture feedback, from working level, to minimize problems.
LinkedInTwitterFacebookEmail

How Culture and Strategy are Related

How Culture and Strategy are Related

How Culture and Strategy are Related

culture

Communication is a Two-Way Street

The executive branch provides the vision, mission, and focus for an organization. They define upper level goals and objectives. But it is up to lower level staff to sub-divide higher level goals into relevant tasks which are aligned with the company mission. And even before that, they must have a clear understanding of objectives which explain what is expected, how this impacts their role, and how this benefits the company. In turn there is a trust in management that the path set forward is the optimum path to ensure profitability and competitiveness.

Reactive, Planned, and Proactive Modes

Reactive: Let’s assume the CMMS is operational, the organization could still be in “reactive mode.” The lower level may have already “given up” on leadership if:
  • The goals are unclear
  • The staff lacks precision maintenance skills
  • There is no planning/scheduling
  • There are no reliability leaders in place
  • There is no ability to leverage failure data in the CMMS to identify bad actors
Planned: The organization could be in a “planned state.” This means work orders have been planned, schedules have been created, but unplanned breakdowns are still occurring. In other words, the root cause of failure was never eliminated.Proactive: In a “proactive mode,” goals are aligned, the staff feels empowered, feedback is routine, defects are being removed, and reliability leaders are able to identify bad actors and manage by exception.

Best-in-Class Organizations

The “best of the best” organizations are in “proactive mode.” They are able to sustain excellence using a variety of techniques, including autonomous maintenance, Kaizen sessions, work order feedback, chronic failure analysis, root cause analysis, ongoing benchmarking activities, and adherence to a long-range plan.
LinkedInTwitterFacebookEmail