New Relic is a hosted web-application monitoring tool that we use at Atlassian for our production services. Brian Doll, an application performance engineer at New Relic, recently wrote up the architecture of their high-performance metric collection and analysis platform:
New Relic’s multitenant, SaaS web application monitoring service collects and persists over 100,000 metrics every second on a sustained basis, while still delivering an average page load time of 1.5 seconds.
- 20+ Billion application metrics collected every day
- 1.7+ Billion web page metrics collected every week
- Each “timeslice” metric is about 250 bytes
- 100k timeslice records inserted every second
- 7 Billion new rows of data every day
- Data collection handled by 9 sharded MySQL servers
Having so many tables with this amount of data in them makes schema migrations impossible. Instead, “template” tables are used from which new timeslice tables are created. New tables use the new definition while old tables are eventually purged from the system. The application code needs to be aware that multiple table definitions may be active at one time.
Use the right tech for the job. The main New Relic web application has always been a Rails app. The data collection tier was originally written in Ruby, but was eventually ported over to Java. The primary driver for this change was performance. This tier currently supports over 180k requests per minute and responds in around 2.5 milliseconds with plenty of headroom to go.
It’s an interesting read for anyone who is aspiring to build high performance online services.