Insights That Power Innovation | Praecipio

Incident Management Best Practices

Written by Charlotte D’Alfonso | Nov 1, 2022 8:50:00 AM

A company's users cannot access their reports. A company's website is down for 40% of their users. A new firewall rule causes integration with a channel partner to fail. A user cannot change their address in their profile. Incidents don't just impact your users. Your bottom line also takes a hit with lost data, employee time, and loss of revenue. What is going on and how do we stop it? The answer begins with having a strong incident management process.

What is incident management?

What are incidents? Incidents are unplanned events that disrupt or reduce the quality of your service (or threaten to do so).

A major incident is a critical disruption to a service that requires an emergency response. It has high impact, and involves many people to resolve. A minor incident is low impact and a front-line customer service agent can resolve.  

Incident management is the process of responding immediately when something goes wrong and restoring service to its operational state. This is one of the core IT Service Management guiding practices. Effective incident management requires a strong team culture, an incident management guiding practice and tools such as Atlassian's Jira Service Management and Opsgenie integration. These tools are purpose-built to address many of the challenges of incident management, which include:

  • Frequency of major incidents and outages
  • Use of multiple ticketing and monitoring systems and communication outlets potentially preventing effective automation, possible data loss and difficulty in learning from the incident
  • Alert overload potentially leading to long and undetected outages
  • Configuration management difficulties leading to long diagnostic cycles
  • Poor communication and visibility

Best Practices and Incident Management Life Cycle

  • Have a single source of truth.
  • Follow a process.
  • Utilize a workflow where you can put safeguards around each step.
  • Have a response team designated in advance so work is delegated to the right person/people.
  • Automate any activities, notifications, alerts that will help shorten the process.

The Lifecycle with tips to improve

  1. Detect – Use monitor and alerting tools that will automatically detect and inform your team about an incident before your customers even notice.
  2. Classify and Respond – Assess the impact and classify it to help in the response by the appropriate team. Prioritizing and categorizing the level of incident into major/minor allows you to escalate the incident to the right people immediately if it needs a swarm of people to tackle the issue.
  3. Communicate Communicating quickly and regularly about incidents helps to build trust with customers. Automating communications can deliver a consistent message.
  4. Investigate and Diagnose Leverage a Configuration Management Database (CMDB) for a faster resolution. A CMDB helps the response team understand the interdependencies and relationships within your IT infrastructure. Knowing this not only allows you to better diagnose potential causes of the incident but also correct any domino effects of the incident. Set up a an internal communication channel so that your response team can work together. 
  5. Learn and Improve Determine what can be done to prevent similar incidents from happening in the future and what actions were taken to mitigate and resolve the incident. This is called an "incident postmortem" or "post-incident review." This is also where you can determine service improvement and help identify better ways of working across teams.

Conclusion

How do you completely eliminate future incidents? You don't! Trying to do so will slow your organization down. It will add complexity and too many checks to your software development process. The goal instead is to resolve incidents quickly and reduce future incidents by continuously learning and improving. Want to learn how to modernize your IT operations, facilitate collaboration, and deliver new services with agility? Download Praecipio's Blueprint for Success with ESM to learn which ITSM practices are essential for keeping up with today's fast-paced world and accelerating business transformation.

Engaging with an expert in full solutions will help you embed best practices into your organization and reduce incidents. Praecipio is here to help guide you in all steps of software development and best practices. If you'd like to chat with an expert, drop us a line; we'd be happy to help.