An app crashing is one of the most noticeable bugs in any mobile app - and often ones with high business impact. Users might not complete a key flow, and they might grow frustrated and stop using the app (also referred to as churning), or leave poor reviews.
Crashes are not a mobile-only concern: they are a major focus area on the backend, where monitoring uncaught exceptions or 5XX status codes is common practice. On the web, due to its nature - single-threaded execution within a sandbox - crashes are rarer than with mobile apps.
The first rule of crashes is you need to track when they happen and have sufficient debug information. Once you track crashes, you’ll want to report on what percentage of sessions end up crashing: and reduce this number as much as you can. At Uber, we tracked the crash rates from the early days, working continuously to reduce the rate of crashed sessions.
You can choose to build your own implementation of crash reporting, or use an off-the-shelf solution. As of 2021, most teams choose one of the many crash reporting solutions, such as Crashlytics or Bugsnag, for native apps.
Bugsnag is an error monitoring and application stability management solution. Not all bugs are worth fixing, and stability is the key to making data-driven decisions on whether to build software or to fix bugs.
Recognized for best-in-class mobile support, their diagnostic data enables engineering teams to improve application health and accelerate business growth. Bugsnag helps drive code ownership, balance faster release cycles, reduce technical debt, and improve user experience.
Processing over a billion crash reports a day, Bugsnag is used by over 5,000 of the world’s best small and large engineering teams such as Airbnb, Slack, Square, Lyft, Shopify and Tinder. Get started for free today.
On iOS, crash reports are generated on the device with every crash that you can use to map these logs to your code. Apple provides ways for developers to collect crash logs from users who opted to share this information via TestFlight or the App Store. This approach works well enough for smaller apps. On Android, Google Play also lets developers view crash stack traces through Android Vitals in the Google Play Console. As with Apple, only users who have opted in to send bug reports to developers will have these crashes logged in this portal.
Third-party or custom-built crash reporting solutions offer a few advantages on top of what the App Store and Google Play have to offer. The advantages are plenty, and most mid-sized and above apps go with either a third party or build a solution with the below benefits:
- More diagnostic information. You’ll often want to log additional information in your app on events that might lead up to a crash.
- Rich reporting. Third-party solutions usually offer grouping of reports and comparing iOS and Android crash rates.
- Monitoring and alerting capabilities. You can set up to get alerts when a new type of crash appears or when certain crashes spike.
- Integrations with the rest of the development stack. You’ll often want to connect new crashes with your ticketing system or reference them in pull requests.
At Uber, we used third-party crash reporting from the early days. However, an in-house solution was built later. A shortcoming of many third-party crash reporting solutions is how they only collect health information on crashes and non-fatal errors, but not on app-not-responding (ANR) and memory problems. Organizations with many apps might also find the reporting not rich enough and might want to build their own reporting to compare health statuses across many apps. Integrating better with in-house project management and coding tools could also be a reason to go custom.
Reproducibility and debuggability of crashes are another pain point that impacts mobile more than backend or web teams. Especially in the Android world, users have a variety of devices that run a wide range of OS versions with a variety of app versions. If a crash can be reproduced on a simulator or on any device: you have no excuse not to fix the problem. But what if the crash only happens on specific devices?
Put a prioritization framework in place to define thresholds, above which you’ll spend time investigating and fixing crashes. This threshold will be different based on the nature of the crash, the customer lifetime value, and other business considerations.
You need to compare the cost of investigation and fixing compared to the upside of the fix, and the opportunity cost lost in an engineer spending time on something else, like building revenue-generating functionality.
This threshold will be different based on the nature of the crash - is it on a core flow or a less important one? -, the customer lifetime value and other business considerations. You need to compare the cost of investigation and fixing compared to the upside of the fix, and the opportunity cost lost in an engineer spending time on something else, like building revenue-generating functionality.
Building Mobile Apps at Scale
"An essential read for anyone working with mobile apps. Not just for mobile engineers - but also on the backend or web teams. The book is full of insights coming from someone who has done engineering at scale."
- Ruj Sabya, formerly Sr Engineering Manager @ Flipkart