Woman DevOps Green

Global Media and Cable Company Offloads DevOps Toolchain Management to Praecipio

May 18, 2023

Overview

As with any change, the transition from managing your IT infrastructure internally to handing off those responsibilities to a Modern Service Management partner can be turbulent. It takes time to work out all of the historical issues that brought you to that partner in the first place. 

This was the case for a Jira Server instance that Praecipio managed for a global communications and technology company. Our customer's Jira instance was unstable and their IT team didn’t have the bandwidth to manage it along with all the other IT systems they were responsible for overseeing.

In this case study, we explore how we provided a managed hosting solution through Cumulus, our private cloud hosting platform. Cumulus allowed the global media company to offload their DevOps toolchain maintenance to Praecipio while also providing a home for their instance that offers the convenience of Atlassian Cloud and the control of an on-premise instance. 

Challenges

After onboarding to Cumulus, we immediately noticed that Jira would crash every 30 hours like clockwork. As a bandaid solution for this issue, the global media company would restart Jira daily. However, the Jira logs and our telemetry showed that the Linux host was exceeding its open file handle limit, though it wasn’t clear why this was happening.  

To temporarily remediate the issue, the Praecipio team increased the file limit to give more headspace to whatever Jira was doing. We attempted this a few times, and each time Jira would stay up longer but eventually crash in a way that was directly proportional to the file handle limit. It was clear this problem could not be outrun and instead needed to be addressed head-on.

Solution

Our primary goals with this project were to keep Jira up and running while maximizing the overall experience for Jira users. We also wanted to uncover the root cause of why Jira was crashing and resolve the issue. 

As a first step in tackling our objectives, we set up recurring stand-ups with the client’s IT operations team to facilitate communication, provide updates, and address any roadblocks. 

Next on our list was getting to the bottom of the cable company’s unstable Jira instance. Given the predictable nature of the problem and the relative ease of monitoring it, we immediately moved the Jira instance to Data Center and added a second node. Now the operations team simply needed to monitor the open file handles and restart the troubled node before reaching its limit. 

Because the uptime of the two nodes was offset by one node that was always healthy, Jira Data Center automatically routed users to the healthy node while the degraded node was restarted. After implementing this solution, users no longer experienced downtime issues with Jira. The positive user experience enabled the IT team to focus on resolving the technical problem instead of having to manage displeased users. 

Through thread dumps and additional investigation, we eventually tracked down the root cause of Jira crashing. Our team discovered the culprit was an export app that never closed temporary file handles. To troubleshoot the issue, we recommended that our client temporarily disable the app, which the customer agreed to since it wasn’t critical to their operations. 

With Jira stable, the operations team was able to resume working without interruptions, and in the meantime, our team worked with the app vendor to patch their product. Within a week, a new version of the app was published to the Atlassian Marketplace and our customer regained full functionality in a more stable Jira Data Center instance.

This case study is a great example of a solid incident management strategy that was executed well thanks to a team proactively managing and monitoring the Jira instance. With a stable, high-performing environment, the operations team was able to focus on the user experience while developers were free to focus on delivering quality product. 

Our partnership with the global media company also demonstrated the importance of letting a team of experts manage your IT infrastructure and DevOps toolchain. When our client tapped into Praecipio’s Cumulus solution to host and manage their Jira instance, they modernized their IT operations and improved the developer experience


Is your organization in need of a partner that helps you align your technology to support your evolving business strategies? Get in touch with our team of experts and we will help you build a connected enterprise that seamlessly connects your people, processes, and technology.

Bring Your Teams Together

Improve DevOps Success with a Focus on DevEx