Incident Management vs. Problem Management
I was talking to this executive in a fortune 500 company and he was struggling with putting in some SLAs for his IT organization. He wanted to show the business side that he was making operational improvements. One of the things he was struggling was Incidents and Problems that folks were capturing. He could not get his team to look as these separately and was talking about how Incident Management is related to Problem management and how it should be different.
I thought of capturing my point of view as below. Whether you use ITIL terminology or not, here’s the difference:
- An Incident is any event that is not part of the standard operation of a service and that causes an interruption to, or a reduction in, the delivery of that service. Incident Management is concerned with restoring service to a user as quickly as possible whenever an incident occurs.
- A Problem is generally an application-related occurrence or event that causes some level of disruption to normal client business operations, e.g. incidents with a major impact or of a repetitive nature may have an underlying problem. Problem Management determines the root cause of problems identifies interim workarounds if available and implements long term solutions to prevent their recurrence or mitigate their impacts.
An incident is like some car had a fender bender. Problem is like the timing belt was not replaced for some time :-)
The typical process steps for Incident Management are as below:
- Detect & Record – the Incident / Service Request is detected and recorded in the service management tool
- Categorize & Prioritize – the appropriate ticket category is selected, and the impact and severity the incident determines the priority which is agreed with the user
- Provide Initial Service Request Support – wherever possible the request is fulfilled at the time of capture, otherwise is routed to an appropriate support group; Security requests are routed to the Security Group and the Security Management process
- Provide Initial Incident Support – wherever possible the incident is resolved at the time of capture, otherwise is routed to an appropriate support group
- Receive & Accept – the receiving support group confirms their ownership and initiates resolution of the incident or fulfilment of the service request in accordance with the priority service levels
- Investigate & Diagnose – knowledge bases are accessed to determine if there is a related open or closed incident or problem, or a similar request. Incidents are assessed to identify a solution or workaround, and service requests to determine how to fulfil
- Resolve / Fulfil – the solution is developed and confirmed as acceptable by the user before implementing and verifying in production which may invoke Change Management; if the incident is thought to be caused by an underlying problem, the Problem Management process is invoked
- Close – ticket is updated with resolution results and positive confirmation from the user is recorded before closing the ticket