Make sure you know about your incidents before your customers do


Make sure you know about your incidents before your customers do

Alexander Ling, System Developer - Transformation & Automation 

You expect that the services that you currently use are up and running day and night without any problems. This expectation is of course shared by your colleagues, customers and other stakeholders as well. In reality it is almost impossible to build a service that is working perfectly 100% of the time without a hitch. Unexpected events will occur, and services will have downtime or stability issues. The question among you and your colleagues when any service disruption happen will probably be why today?

To reduce the negative impact that a service outage can have on your own or your customers’ organizations and make the question why today a bit less stressful there is a product in the Atlassian suite called Opsgenie. Opsgenie is a modern incident management platform designed to help your organization manage critical incidents and service disruptions. Often one person cannot resolve a critical incident on their own and need help from their team members and/or other cross-functional teams. This is where Opsgenie shines by having information about the incident in one single location, notifying the right people and escalating the issue if it hasn’t been acknowledged. Just like the rest of the Atlassian products collaboration is key which means that you easily can work together to resolve service outages and disruptions as quickly as possible. 

So how is this possible?

The first step is to integrate your services to Opsgenie and decide what events that should trigger an incident. It is possible to define which types of events that should trigger what type of incident (P1, P2, P3…) and which members/stakeholders/teams that should be notified about the incident and at what time. This can also be specified in even more detail by using the built-in scheduler that can be used to plan on-call schedules which can then be used to automatically send notifications to the right people. This kind of filtering prevents your team from getting overwhelmed by minor disruptions and letting the right people focus on the critical issues immediately instead. How your team members want to be notified is up to personal preference. Push notifications, sms, email, slack and phone calls (and more) are all available to choose from (you could also choose all at the same time which I managed to do once, I do not recommend this…) which makes it easy and flexible to set your preferred notification settings.

Monitoring tools

More advanced features also include custom actions which can for example automate restarts of different applications by a single click from your phone. This is especially useful in incidents where the action to resolve an incident is generally the same every time. This saves a lot of time and makes it easier to handle repetitive tasks. There are a lot of different options with actions that organizations need to explore on their own so that the automated actions fit their needs.

Even though Opsgenie can be used by itself the true potential of the application is unleashed when it is used together with other Atlassian products like Statuspage. This allows organizations to focus on resolving the incident internally and at the same time having a clear platform for external stakeholders to quickly find out if there are any active incidents and what the status is.

To conclude this post, Opsgenie allows you and your organization to react quickly to service disruptions so you can take control over the situation. The ability to instantly notify all key people automatically when an incident occurs gives you a clear head start compared if you had to contact each relevant individual manually. This can give you that edge to resolve incidents faster, hopefully even before your customers even notices that an incident has occurred. That is why it, as stated in the title, is important to know about your incidents before your customer do.

Want to find out more?