I will go through main elements of ITSM processes and give my comments as they come. I am going through my ITIL Quick Reference card I made in Excel, preparing for ITIL Service Manager exam. Have been thinking lately to put it on this site for download, please comment if you would like that.
Incident Management is usually the first ITIL process implemented in Support organizations.
ITIL Incident Definition: "any event which is not part of the standard operation of a service and which causes, or may cause, an interruption to, or a reduction in, the quality of that service"
So, Incident is not a Problem, nor a Change Request. And not every Service Call is an incident. Seems very logical, but a lot of business organizations in practice tend to forget this from time to time. Example: a mail server is down. Service Desk receives X phone calls. Is every call an incident, or there is only one incident and all the rest are connected service calls? According to the above definition, there is only one event that interrupted the service, and therefore only one incident. Still, a lot of Incident Managers and ITIL theoretics count them all as incidents.
To restore normal service operation as quickly as possible and minimize the adverse impact on business operations.
No comment on this one, obviously. Reactive business is this IM.
- Incident Manager
This guy is usually a Service Desk manager. No conflicts of interests here. He is a slave driver of the first and second line staff.
- First, second & third line support staff
Although mentioned in the same sentence, these people differ a lot. First are the least respected people in IT, and the last, 3rd line are usually holly cows.
|Picture: Incident Management process|
- Incident details sourced from Service Desk, networks or computer operations
- Configuration details from CMDB
What assets are involved, where are they, who is responsible for them and who uses them?
- Response from Incident matching against Problems and Known Errors
Are similar problems detected in Problem Management? If yes, then can we offer a Workaround or a Fix to the Customer?
- Resolution details
Response on RFC to effect resolution for Incidents If a Request for Change was initiated from an Incident, then the latter usually waits in "Pending RFC" status until a Change is implemented. Then you notify your Customer and resolve an incident.
- Incident detection and recording
One of the primary jobs of IM staff.
- Classification and initial support
First and most important task of 1st line support is to put a name on an Incident, i.e. categorize it properly. This will not only please the Management, but also will enable an escalation procedure to assign it to an adequate Queue or Assignment group, in case that it can't be solved in a first call.
- Investigation and diagnosis
This one is logical.
- Resolution and recovery
An incident can be resolved and "the service restored to it's standard operation mode".
- Incident closure
A VERY good ITIL recommendation is this two-step incident closure. After the resolution, Customer is notified, and only upon his confirmation, an incident is CLOSED. This is good for both, a customer and IM people. Think why.
- Incident ownership, monitoring, tracking and communication
Wherever the incident is escalated, there is a person in IM that owns it and takes care of it. Incidents do not go to a fade-out.
- RFC for Incident resolution
We said that an incident can initiate a RFC
- Updated Incident record (incl. resolution and/or Work-arounds)
- Resolved and closed Incidents
The main motive.
- Communication to Customers Management information (reports)
"This is what we did for you". Very important way of improving IM visibility.
For the business as a whole:
- Reduced business impact of Incidents by timely resolution, thereby increasing effectiveness
- The proactive identification of beneficial system enhancements and amendments
- The availability of business-focused management information related to the SLA
- Improved monitoring, allowing performance against SLAs to be accurately measured
- Improved management information on aspects of service quality
- Better staff utilization, leading to greater efficiency
- Elimination of lost or incorrect Incidents and service requests
- More accurate CMDB information (giving an ongoing audit while registering Incidents)
- Improved User and Customer satisfaction
Critical Success Factors
- Up-to-date CMDB
I have seen implementations of Incident Management without a real CMDB, even without a simple Inventory database. *sigh* Please, do not do that. Firefighting is probably your greatest pain during the implementation phase of IM and it's easy to fall to a temptation of Quick'n'Dirty methodology and implement CMDB-less IM. CMDB is a base function for all ITSM activities and if you neglect this fact, you will suffer. If you are in a great hurry, at least implement a simple Inventory tool with some automatic discovery options, so you have something to build on later. It is OK to have up-to date CMDB on at least 70% of your CIs in the beginning.
- Knowledge Base
This is an interesting one. Knowledge was a major hype in '90s. A number of methodologies and semi-religions were built around Knowledge Management, and some of them persisted due to lack of better methods for managing collective knowledge. In practice, most IM people access the needed knowledge on the net, on vendor-specific databases or in old incidents. So, when you decide for a IM tool, do not fall for an usual marketing Knowledge Management mumbo-jumbo. Put yourself in your SD Staff shoes and think how would you resolve various incidents. There are tools that handle knowledge better, with simple workflows and that integrate easily with vendor's knowledge bases. Implement these or none at all, doesn't matter much in the beginning.
- Automated Incident Management system
Tools? A lot of them out there. Hope that I will have time to review major ones here.
- Close link to SLM for SLA targets
Yess! Have in mind what you agreed with customers, and make all the people aware of the Service Level Agreements targets.
- Numbers of Incidents
- Mean time to resolution
- Percentage within agreed response time
- Average cost per Incident
- Percentage closed by Service Desk
- Incidents per SD workstation
- Percentage resolved without a visit
A lot has been said on Incident Management, a lot of good stuff can be found on the net, and a lot of discussions on newsgroups. Keep Googling, and if I get the time, I will review a few sites and share my thoughts and links with you these days.