ITIL Service Management: ITIL Major Incident

Mar 24, 2011

ITIL Major Incident - All you want to know

What is a Major Incident in ITIL? What are the roles and responsibilities? How to avoid common mistakes? What to do After the Resolution?

Trust me, I know what I'm doing!
Sledge Hammer

What is a Major Incident?
Definition of a Major Incident has to be clear to every employee in Service Support. Therefore it has to be clearly described in a separate document, Major Incident Procedure.

What makes a Major Incident? It is usually defined by the impact outage has or could have on customer’s business process. Also, it may be determined by priority of the incident or by its urgency.

How come that the impact isn’t allways the only factor in defining the Major incident? For example, an incident of high impact can be resolved by Service Desk thru a simple resolution procedure, like switch resetting after network down event, or connecting a backup provider after internet down event.

Both examples are definitely high impact but we don’t have to recruit a bunch of higher level people on it just yet. We just have to have in mind that they are Priority 1 and they have to be resolved ASAP. In case they can’t be resolved by standard procedure, THEN they can be marked Major and handled with appropriate procedure and policy. That’s why most leading Incident Management tools on the market have a separate checkbox Major or Hot incident.

This was all theory. In practice, to simplify the procedure and make it easier to Service Desk staff, this is what I usually advise: all priority 1 incidents are Major Incidents, if they are not exceptions. Exceptions can be easily defined for particular customers, contracts and incident categories. For example: Major incidents are all Priority 1 incidents except cash register tickets, which are urgent but can be fixed by technicians, no need to involve for more important people. Or: all categories except end user incidents. Simple.

Major Incident Team

OK, now we have determined it’s a Major Incident. What next? We establish a Major Incident Team. Members are:

Service Desk Manager – he will be responsible for communication with resolution team and timely reporting to the customer
Incident Manager – in reasonable service organizations Incident Manager is usually also the Service Desk Manager. If not, then these two have to work closely together.
Major Incident Manager: a frequent mistake is to promote Incident or Service Desk Manager into Major Incident Manager. This doesn’t have to, but can cause some serious conflicts of interests: he has to survive somewhere between Incident Management, Problem Management, Business Management and the customer.
Major Incident Manager has to be a liaison between all internal parties involved, also acquainted well to technical aspects of the outage. So he will often be recruited between people formerly engaged in a project, or those involved in service catalogue definition.
Problem Manager: remember him? He will be most helpful here in investigative phase, towards closure phase, and a life saver in post mortem reporting. Better keep him on our side. Mind you, Major Incident is still an incident, but usually has some underlying cause which will be recognized as a Problem. Hence Incident and Problem Manager have to work closely here, each with his own goal in mind (service restoration vs. underlying cause).
Other members of Major Incident Team: representatives of all people involved, impacted users, competent technical staff, vendors... Good practice would be to choose people here the same way you would choose ECAB (Emergency Change Advisory Board) members. There is always a chance that you will be implementing an Emergency Change during a Major Incident resolution process.

Resolution Process

Major Incident resolution works on tight SLA parameters. Service Desk takes care of them. Also, ticket updates and frequent feedback to customers is performed by Service Desk. Remember, customer hates to be kept in the dark, even if news are bad (no progress) they must be updated frequently.

Major Incident Procedure has to define the escalation policy in case of SLA breech. Usually the incident is escalated vertically to higher level IT / Business management and to vendors of services/equipment underpinning the service.

After the battle

Upon the resolution, Incident Management Team stays “on call” and monitors the service for the period defined by Major Incident Manager. He also schedules a short team meeting for the next day.

Incident Review is performed on this meeting, points for improvement and lessons learned are defined and Post Mortem Major Incident Report is created.

Incident Manager sends the report to the customer.

I have prepared for you a template for Major Incident Report, free for download here.

Related posts:

Incident Management Elements
Key elements of Incident management.

Incident Management Mind Map
Download the incident management mind map.

All About Incident Classification

How to deal with incident categories.

What Everyone Should Know About Incident Priority

Incident prioritization in ITIL.

Hope this helps. Have a nice day!

12 comments:

Steve@itilnews.com said...: Sledge
One point I would re-iterate is that the Major Incident Process and Procedures are documented and agreed / signed up to by all involved, especially third party suppliers / vendors in advance. In addition the documents shoud be cross-referenced in any OLA/SLAs that exits.
Some great Major Incident nuggets.
Steve@itilnews.com (www.itilnews.com); 30/3/11 02:28
Anonymous said...: Прочитал на сайте (проблемы компьютера ) положительные отзывы о вашем ресурсе. Даже не поверил, а теперь убедился лично. Оказывается, меня не обманули.

Интересный сайт! Все стильно сделано.; 10/4/11 09:25
Information Technology said...: Good Day,

So, now that there's a Major Incident Manager, will a new Super Major Incident Manager role be introduced, next?

Correct me if I'm wrong but this all sounds like an argument to further bloat what many would already consider to be bloated IT Support with even more headcount, rather than getting the existing resources to work more efficiently and drive the size of support downward. Increasing the size of an organization that often acts as a budget cost center, in many enterprises, is rarely an answer that will be accepted with open arms.

One of the biggest issues that many Business Leaders have is the bloated size and, therefore, the slow response time of their IT support organizations. The goal is to minimize the non-value-add roles in IT and, while we all understand that support is important, looking for excuses to keep adding headcount to IT support is probably not the best answer. Remember, to Business Leaders who would rather focus their funds on more productive value-add activities, fat IT is bad IT. This is what drives so much outsourcing.

My personal opinion is that if your existing support manager is not already effectively acting as your Incident Manager, for all incidents including those that are major or minor, then you already have a problem that is bigger than needing an extra resource to be dedicated to Major Incidents. One example might be that your quality is too low and that you're enterprise is, therefore, experiencing so many incidents that your current Incident Manager is too overworked to handle any incoming Major Incidents.

Remember, this bloated approach to IT is often one of the key reasons that many Business Leaders shy away from allowing their IT organizations to do things like implementing ITIL. The operating and cost models seem to encourage "big IT", rather than smaller, leaner, and faster IT.

The world is demanding more of a Dev/Ops mentality that nurtures the Dev side and works hard to drive down the Ops side. Remember, Ops is considered to be a necessary evil and, in theory, shouldn't exist if quality is perfect and no outside factors could influence your apps. However, we all know this is not possible and some form of Ops is always required. However, and while I personally can't speak for all Business Leaders, I think that based on experience it would be fair to say that most Business Leaders who are forced to spend on IT will prefer to spend on "delivery" (i.e. development & engineering) because it is viewed as more of a value add activity, as opposed to spending on support.

Again, there will always be a need for support. However, just like the world keeps pushing developers to have to work faster, leaner and meaner, I believe the same should be expected of support and I could be wrong but it looks like hiring, yet another incident manager, doesn't appear to go in the direction of 'leaner.'

Besides, if you're a smart organization, you already planned for how you will have efficiently used your existing resources to handle all forms of incidents, major or minor. It's not as if you leave Major Incidents out when you plan to set up your escalation and handling procedures, right?

I hope this adds value.

My Best,

Frank Guerino
Chairman
The International Foundation for Information Technology (IF4IT); 19/4/11 04:46
doctor said...: Hi Frank,

thanks for this strong but very elaborated opinion :)

I've read the article again and found the potential problem: maybe it is not clear enough that Major Incident Manager is recruited the same way as other members of the Major Incident Team - similar to ECAB members recruitment.

So, SD Manager and Incident Manager are real job names (they can be, and often they are one person). Major Incident Manager is a function related to a specific major incident.

We usually appoint Technical Account Manager to this function, but it can also be the architect which was a project manager designing the service. In any case, it has to be the person who has the most knowledge about the customer and the related service.

In smaller IT orgs, it will of course be the Incident Manager.

Hope this clarifies it.

Have a good day!; 19/4/11 08:49
Information Technology said...: It does. Thanks, very much, for taking the time to respond and explain.

My Best,

Frank Guerino
Chairman
The International Foundation for Information Technology (IF4IT); 19/4/11 10:44
Anonymous said...: Hey - I am definitely happy to discover this. great job!; 23/5/11 15:06
Amelia @ IT Management said...: Very informative post, thanks.

This is giving us a peek at the bigger picture. Given the complex market conditions, an evaluation of IT infrastructure can help companies determine the efficiency of their operations. IT managers need to comprehensively control their IT assets by implementing solutions to address challenges.; 20/7/11 16:23
Amelia @ IT Management said...: This is a great post and very helpful, too. I like the Major Incident Report template.

I was about to make a comment regarding the structure of the management team but Information Technology has done a very good job of talking about it. I'm glad to know that a major incident team can be flexible and one person can hold multiple positions at once to save cost and time.; 20/8/11 08:06
ajmal said...: Thanks for this great post! really helpful; 17/1/12 17:41
Reema Kapoor said...: Yes, I found it beneficial for me. I think ITIL v3 Certification is very much mandatory for one’s career growth both financially and career growth path wise. You can also refer ITIL Certification . Major audience who should attend this consist majorly of IT, Telecom & Civil Sectors. All the best for your future endeavors.; 29/1/12 22:24
D.Matte said...: Good article.

You may be interested in these articles:

Who closes the major incident

Should a Major Incident be declared if it was resolved automatically by the fail-over

D.; 17/11/14 05:21
incident management report template said...: A well written article, good job. It would be useful tough if there's a robust incident management template that can track and manage the major incident.

Regards,

Stuart; 9/7/17 01:02

ITIL Service Management

Mar 24, 2011

ITIL Major Incident - All you want to know

12 comments:

SEARCH

Tags

Popular Posts

Blog Archive

Links

Linking stuff