Each minute of our life is a lesson but most of us fail to read it. I thought I would just add my daily lessons & the lessons that I learned by seeing the people around here. So it may be useful for you and as memories for me.
Article by Karthik.
RCA (Root cause analysis) is a mechanism of analyzing the Incidents/defects, to identify its cause. We brainstorm, read and dig the Incidents/defect to identify whether the defect was due to “testing miss”, “development miss” or was a “requirement or designs miss”. RCA is simply about determining, very specifically, the when, the where, and the why of a problem at its source, before it can ripple out to affect the end-user of an application or website a second time.
Doing the RCA accurately helps to prevent defects in the later releases or phases. If we find, that a defect was due to design miss, we can review the design documents and can take appropriate measures. Similarly, if we find that a defect was due to testing miss, we can review our test cases or metrics, and update it accordingly.
Root Cause Analysis is like a chain of events which go backward, right from the last possible action to the previous and so on, till you reach the start of the problem and the exact point at which it was introduced as a defect. This is called as reverse engineering. There are a few two major questions that we need to ask – WHAT, WHY, WHEN, HOW. With the assistance of these questions, we can dig into each phase of the software life cycle to exactly track the origin of the defect and the point at which it was injected into the system.
Goals: The primary goal of using RCA is to analyze problems or events to identify:
- What happened
- How it happened
- Why it happened…so that
- Actions for preventing reoccurrence are developed
Benefits: Implementing RCA will help the agency:
- Identify barriers and the causes of problems, so that permanent solutions can be found.
- Develop a logical approach to problem-solving, using data that already exists in the agency.
- Identify current and future needs for organizational improvement.
- Establish repeatable, step-by-step processes, in which one process can confirm the results of another.
- Focusing on corrective measures of root causes is more effective than simply treating the symptoms of a problem or event.
- RCA is performed most effectively when accomplished through a systematic process with conclusions backed up by evidence.
- There is usually more than one root cause for a problem or event.
- The focus of investigation and analysis through problem identification is WHY the event occurred, and not who made the error.
Roots: Root cause analysis is not a one-size-fits-all methodology. There are many different tools, processes, and philosophies of accomplishing RCA. In fact, it was born out of a need to analyze various enterprise activities such as:
- Accident analysis and occupational safety and health
- Quality control
- Efficient business process
- Engineering and maintenance failure analysis
- Various systems-based processes, including change management and risk management
Applying RCA: Examples of events where RCA is used to solve problems and provide preventive actions include:
- Major accidents
- Everyday incidents
- Minor near-misses
- Human errors
- Maintenance problems
- Medical mistakes
- Productivity issues
- Manufacturing mistakes
- Environmental releases
- Risk analysis, risk mapping
Basic method to use
- Define the problem.
- Gather information, data and evidence.
- Identify all issues and events that contributed to the problem.
- Determine root causes.
- Identify recommendations for eliminating or mitigating the reoccurrence of problems or events.
- Implement the identified solutions.
The nature of RCA is to identify all and multiple contributing factors to a problem or event. This is most effectively accomplished through an analysis method. Some methods used in RCA include:
- The “5-Whys” Analysis” — A simple problem-solving technique that helps users get to the root of the problem quickly. It was made popular in the 1970’s by the Toyota Production System. This strategy involves looking at a problem and asking “why” and “what caused this problem”. Often the answer to the first “why” prompts a second “why” and so on—providing the basis for the “5-why” analysis.
- Barrier Analysis — Investigation or design method that involves the tracing of pathways by which a target is adversely affected by a hazard, including the identification of any failed or missing countermeasures that could or should have prevented the undesired effect(s).
- Change Analysis — Looks systematically for possible risk impacts and appropriate risk management strategies in situations where change is occurring. This includes situations in which system configurations are changed, operating practices or policies are revised, new or different activities will be performed, etc.
- Causal Factor Tree Analysis — An investigation and analysis technique used to record and display, in a logical, tree-structured hierarchy, all the actions and conditions that were necessary and sufficient for a given consequence to have occurred.
- Failure Mode and Effects Analysis — A “system engineering” process that examines failures in products or processes.
- Fish-Bone Diagram or Ishikawa Diagram — Derived from the quality management process, it’s an analysis tool that provides a systematic way of looking at effects and the causes that create or contribute to those effects. Because of the function of the fishbone diagram, it may be referred to as a cause-and-effect diagram. The design of the diagram looks much like the skeleton of a fish—hence the designation “fishbone” diagram.
- Pareto Analysis — A statistical technique in decision making that is used for analysis of selected and a limited number of tasks that produce significant overall effect. The premise is that 80% of problems are produced by a few critical causes (20%).
- Fault Tree Analysis — The event is placed at the root (top event) of a “tree of logic”. Each situation causing effect is added to the tree as a series of logic expressions.
Based on the results of RCA, you can determine which of the phase has problem areas. For example, if you determine most of the RCA of the defects are due to requirement miss, then you can improve the requirement gathering / understanding phase by introducing more reviews or walk-through sessions.
Similarly, if you find that mostly defects are due to testing miss, you need to improve the testing process. You can introduce metrics like requirement tractability metrics, test coverage metrics or can keep a check on the review process or any other step which you feel would improve the efficiency of the testing. It is the responsibility of the entire team to sit and analyze the defects, and contribute to the product and process improvement.
Please feel free to share your story and any lessons you learned, you experienced, you came across in your life in the comments below.