ITIL Problem Management Primer

Getting To The Root Of The Problem

What is Problem Management?

The objective of the Problem Management process, according to ITIL®, is: “To prevent problems and the resulting incidents from happening, to eliminate recurring incidents and to minimize the impact of incidents that cannot be prevented”.

However, many people I meet sometimes misunderstand the difference between an Incident and a Problem. A Problem is not a Major Incident or an outage; it’s simply “the underlying cause of one or more incidents”.

But, what does this really mean? It means that Problem Management focuses on detecting and driving out systemic issues that are causing IT services to fall below service levels and expectations.  It is getting to the root of the problem by looking at why incidents are occurring. It means getting to the root cause so that we can identify a permanent solution.  Problem Management is about eliminating those causes and thereby preventing incidents from happening.

Why Problem Management?
Implementing ITIL Problem Management process is one of the keys to improving overall IT service since the Problem management focuses on driving out recurring and systemic incidents that cause service outages and erode the faith of the Service Desk.  By concentrating on this discipline, you will have the opportunity to improve all elements of your service, gain tangible business and operational benefits and improve the customer’s perception of IT:

      • Reduced impact of incidents
      • Reduced number of recurring incidents; Fewer Incidents to handle
      • A greater percentage of incidents resolved by the first point of contact
      • Less frustration from handling recurring incidents
      • Greater ability to meet Service Level Agreements
      • A better quality of customer service with less downtime and fewer disruptions
      • Improved end-user satisfaction
      • A more cost-effective service

Root Cause: Ask WHY!
It amuses me when I encounter organizations that think they are performing Problem Management and coming up with work around’s and temporary solutions to address the incident during their root cause analysis. So, let’s talk about Root Cause Analysis. After all, it is at the heart of the discipline and without fully understanding what Root Cause Analysis is, it’s hard to do Problem Management.  Having a process that limits one to just identifying workarounds, quick fixes, and circumventions, isn’t real Problem Management.

There are quite a few Problem Management Root Cause Analysis techniques to choose from, but one of simpler and the more popular technique is The Five Whys.

The 5 Whys is a technique used in the Analyze phase of the Six Sigma DMAIC (Define, Measure, Analyze, Improve, Control) methodology.  Basically, by repeatedly asking the question “Why” (typically five times), you start picking at the layers of symptoms which can lead to the root cause of the problem.

Start with a problem and ask “why” it is occurring.  Make sure that your answer is grounded in fact, then ask “why” again. Continue the process until you reach the problem’s root cause, and you can identify a counter-measure that prevents it from recurring.

Make sure that these solutions are not band-aids and are permanent solutions.  Implementing a problem management process whose outcome provides temporary quick-fixes to get the customer back up and running and the incident closed is not true problem management; go for permanent solutions so that you stop the bleeding; stop the incidents from recurring!

Once the root cause analysis has been completed and a solution has been identified, don’t forget that you should engage the Change Management process to get the root cause resolved.  If not, you may (probably will) end up in a constant loop of problem, unauthorized change, more incidents and further problem, etc.

Problem Management Success Factor
There are many tips one can follow to assure a successful Problem Management implementation, but the below are the basics that you want to have in place:

      • Make sure there is upper/executive management commitment and sponsorship. Make sure the organization understands why Problem Management process is being implemented and that this is made a priority.
      • Identify and Assign a Problem manager with clearly defined role responsibilities.  The Problem manager is not necessarily a technical role, and the problem manager does not undertake problem investigations activities. Their primary responsibilities are in fact administration, coordination, and facilitation.
      • Have a documented Process Management Policy and Process. Make sure that everyone understands it.
      • Put in place appropriate process metrics to measure if you are meeting your objectives.
      • Create a Problem Management Team. Select the right people; get your A-TEAM in place.   They will need to be persistent individuals with good skills in Problem Solving, Analysis, and Communication.  Make sure Problem Management team does not get dragged into Incident Management activities. I’ll expand on building a good team in future blogs, so look out for it soon.
      • Focus on permanent solutions during the Root Cause Analysis phase; where the underlying cause of the incident(s) has been addressed in such a way as to prevent the incident from occurring again!
      • Create a Knowledge Base that can be used by everyone, especially the Service Desk staff during Incident Management activities. The Problem Management team can document the details of Problems and their related workarounds / Known Errors and be stored as Knowledge Articles in this repository.

To the Point
Implementing ITIL Problem Management allows us to view service outages and incidents with an eye on eliminating systemic issues.  It demands that we review the incident or sets of related incidents to understand why they occurred.  It wants us to find the underlying problems that caused the incident so that we stop them from happening again and again.

Introducing ITIL Problem Management into your IT service management process will deliver huge benefits by:

    • Measurably increasing your service and application availability
    • Increasing the quality of customer service with less downtime and fewer disruptions
    • Reducing the cost of doing business

Author: Cristy Castano
Cristy Castano is an IT Service Management Consultant with Flycast Partners, Inc.  With over 22 years of working in the IT Service Management industry, Cristy has extensive knowledge of IT Service Management principles (ITILv3) and process improvement techniques.  If you think your organization can benefit from adopting ITSM / ITIL best practices, or if you’d like more help with Problem Management, then contact Flycast Partners and get a helping hand.