Incident Management

Incident Management and Problem Management are intimately related within the context of the ITIL framework. On the surface, these two processes may seem to be very similar, however, they work to achieve very different objectives within an organization. One of the purposes of the ITIL framework was to establish a common lexicon, an IT vernacular, which terms and definitions are standardized to establish common understanding and communication.

To begin our differentiation between these two processes, it’s beneficial to understand the definitions of both incidents and problems. An incident is any unplanned interruption/reduction in quality of service. To define it more broadly, an incident is anything that can affect IT services. A problem, on the other hand, is the cause of one or more incidents where the root cause is often not initially understood.

The purpose of Incident Management is ALWAYS to restore normal service operation as quickly as possible. Minimizing impact on business operations is the primary function of Incident Management whereas Problem Management is concerned with eliminating recurring incidents and to minimize the impacts of incidents that are unpreventable. The incident management process is where workarounds are acceptable because the goal is to restore service to your customers as quickly as possible. The problem management process occurs when an organization has compiled several incidents that seem to be related to the same issue; (problem) warranting an investigation into the root causes of the error.

Since Incident Management’s primary objective is to restore service to normal operations as quickly as possible, it’s imperative that organizations agree on timescales for handling incidents based upon incident response as well as the resolution targets outlined in the SLA’s (Service Level Agreements). All support groups need to understand and adhere to these timescales to promote homogeny throughout the incident management process.

Incidents often require additional resources to resolve the incident within the timescales outlined by your SLAs. The ITIL framework defines these activities as escalations and further differentiates between two primary types of escalation: functional and hierarchical. If the Service Desk is unable to resolve an incident, it’s crucial that it’s escalated immediately to remain compliant with your SLAs. This is known as a functional escalation. If an incident is serious enough, it may be prudent to involve IT managers. This is known as a hierarchical escalation and typically used for high-priority incidents or for situations with lengthy investigation and diagnosis.

To understand where each escalation is warranted, it’s prudent to understand the nature of priority. First and foremost, your support team should be determining the priority level of all incidents rather than your customer. If your support techs had a dollar for every time they had a customer tell them that their incident was the top priority, then they’d probably be enjoying a comfortable retirement right now. Instead, ITIL defines priority as a function of both impact and urgency. Impact is a measure of the effects of an incident, problem or change. Urgency is a measure of how long it will take an incident, problem or change, to adversely impact the business. In this way, priority is used to both identify the relative importance of an incident, problem or change as it relates to the business as a whole and also to define the timescale for the actions necessary to correct the issue.