Preparing MySQL for millions of users

(This post is a part of a series. See original)

As the world’s largest marketplace for small services, Fiverr deals with gigantic amounts of rich user-generated content at any given moment. We run our own billing, messaging, search and order management systems. Our systems were designed for maximum data integrity, so most non ACID NoSQL solutions are out of the question. In addition, unlike blogs or news sites where data remains static after it has been posted, our data continues to change as users interact with their orders, sellers and gigs constantly. This makes our databases write-intensive and much harder to scale out.

The database stack is always our toughest challenge in scaling Fiverr. On our first year, we spent countless hours optimising the database to handle the rapid growth of the site. As it turns out, optimisation can only take you so far, and eventually you’ll reach a glass ceiling that can’t be crossed only by making your queries better. We hit that barrier on our second year. It was then, that we knew we had to rethink everything about how our database works.

If you’re starting a new project or getting closer to your own glass ceiling, it might be a good idea for you to check some of these pointers out:

  1. Manage your database connections in one place

Make sure your queries go through a centralised connection manager and that your production database connection config files is kept in a separate file, outside of your main codebase. While you’re at it, make sure your db config file only points to hostnames and not hardcoded IPs.

Scaling web apps through centralised configuration and data access

Having a separate, well configured, external db connection file will make adding new slaves and changing stuff much easier. Regardless of the size and complexity of your app, making it aware of architecture changes will only take seconds. (If you’re using Rails, this would be your database.yml file). I can’t stress enough how important this is for your app’s flexibility. We even try to maintain this approach with our NoSQL data sources such as MongoDB and Redis.

To achieve fine grain control over which servers handle specific writes and reads we use the db_charmer gem. This gem lets you very easily assign specific queries to specific slaves. For example, you might want to have dedicated slaves (that do not serve live traffic) handle your heavy queries asynchronously.

  1. Load balance your database servers

Real traffic spikes come unannounced, creating a ripple effect in your system. The first to get hit is usually the database. Overwhelmed by the amount of connections it has to serve, it simply fails to respond in time, causing a traffic jam of application server workers. At that time, users fail to get a response from your site and go into refresh frenzy. From this point, site downtime is imminent.

illustration of how to load balance database access from web applications

Going through a scenario like this without a having a database load balancer is a real bummer. At Fiverr, all queries go through a dedicated db balancer. In practice, this means that your application only gets to know two* db hostnames (one master and one slave). Each of these database hosts will eventually map to a set of slave servers and one master in the load balancer’s configuration. The distinction between the master and slaves is usually done using a designated port.

* In a logical sharding configuration this would look a little different, as you will have a master and a slave hostname for each shard.

Ouch Tip: Make sure you configure slaves for read-only operation. This will  protect you from replication hell if someone configures something wrong in the balancer.

screenshot of database.yml configured for sending queries to database through a load balancer

  1. ActiveRecord is for pussies

One of the first things people love about rails is ActiveRecord and they way it magically ‘writes’ database queries for them. It’s easy to learn, maintain and read. It automatically maps your data to models and really allow you to focus on functionality and less on how things actually work under the hood.

But wait, there’s a problem. ActiveRecord will never be as smart as you when it comes to building real-world queries. A good example for this is ActiveRecord’s poor utilisation of Indexes, specially in the case of Compound ones. A query written in an order that doesn’t match your Compound Index’s order will simply get ignored. You can try to force ActiveRecord to use indexes the way you want, but it’s not trivial and takes the charm out of the entire thing.

Named Scopes are also great for building stuff quickly. However, they strength is also their weakness, as they are so easy to temper with. At one occasion, a developer in our team changed an innocent looking Named Scope which was originally only meant for use in display. Turned out that the change affected something deeper which caused serious damage and took weeks to fix.

We are constantly optimising our most heaviest named-scopes by turning them into hand crafted optimised SQL. As a result, we see  dramatic improvements in performance, with queries dropping from several seconds to a few milliseconds.

Ouch Tip: Remember to sanitize your pure SQL queries before executing them to protect yourself against SQL injections. You can do that very easily using ActiveRecord sanitize features.

  1. Slicing and dicing

For write intensive applications like Fiverr, scaling eventually means having more than one master. A typical scenario is when write operations start taking longer and longer to complete with each day that goes by. This also has a tremendous effect on replication times as slaves fall behind the master.

When this happens, it might mean that you’ve reached the maximum potential of the master database. Scaling up the master server will not do much other than buy you some time.

There are several ways to scale-out MySQL masters. The two most common ones are master-master replication and sharding. For us, sharding was the lower hanging fruit and we chose to start with it. Moving to sharded master-slave trees was a game changer for Fiverr. Suddenly, MySQL is no longer the bottle neck.

illustration of database slicing to shards

There is more than one way to shard your data. We decided to approach sharding gradually. We started by creating Logical Database Trees (LDTs), each consisting of a load balancer, a master server and a bunch of slaves. Each LDT is responsible for a single logical application unit such as Gigs, Users and Orders.

Making this change on a live system was not easy, but we achieved it with absolutely no downtime thanks to the way our system is configured (see previous points)

  1. Monitor and Diagnose

Pareto’s principle claims that 80% of the problems come from 20% of a system. From our experience, in real life this is even more extreme.

Before you reach scalability nirvana, a single poorly written query can bring down your entire app. That’s why it’s so important to have eyes on your system at all times, and to be able to dig deeper when needed. At Fiverr, we rely on several tools to monitor our app. I’ll probably post a separate article about that, but let me just say that if I had to choose one tool to prevent us from running blind, that would be NewRelic.

monitoring, diagnostics big title text

New Relic allows us to diagnose, track and get alerts for our entire stack. Starting from the servers hardware, through database optimisation, ruby code, and up to client side rendering performance.

Conclusion

There is no magic pill to making your system scalable. These guidelines are just the tip of the iceberg. However, done right, they can get you a long way into making your app more durable and flexible for scale out changes – from day one.

10 thoughts on “Preparing MySQL for millions of users

  1. thank you for sharing Shai, lots of great inputs in this post.

    1. you’ve mentioned using hostnames instead of ip addresses in you database.yml file, whats the advantage in that?

    2. from my experience, a lot can be achieved by also tuning any MySql parameters (which i’m sure you have), can you please share some information about that, maybe even share the my.cnf file?

    3. another thing that I would love to learn from your experience is what kind of server infrastructure you use, e.g.: could based servers, dedicated server, etc. and why?

    keep on doing such an AMAZING job with Fiverr and sharing the knowledge, it really helps us a lot :)

  2. Ran, thanks!

    1. hostnames give you greater flexibility to switch cloud servers when they suddenly die on you. A new cloud server instance usually gets a new IP, which means that you’ll have to redeploy a modified configuration file to all your app servers and restart them to pick up the changes. Using hostnames lets you update the hosts file on the server without having to restart anything.

    2. You are correct. I chose not to get into this subject. Tuning MySQL is an art form, and should be done only by people who know what they’re doing. I might have one of our DevOps guys do a post on that subject.

    3. I’ll try to do a post on our infrastructure soon. Until then, I can just tell you that we use pure cloud infrastructure, but nothing out of the box (as in ready made load balancer instances etc.)

  3. As to the second item (“I might have one of our DevOps guys do a post on that subject.”) – look forward to.
    Very pleased “ActiveRecord is for pussies …” )))))
    If that – sorry for my bad english

  4. Nice post, thanks!
    Which Hardware / Software do you use for MySQL Load Balancing? Would be nice to know. Thanks.

  5. Hi Shai,

    Thanks for your post. Definitely learning some best practices from this post like using hostnames instead of IP addresses and further optmizing your SQL queries.

    Btw, Do you have any new posts in this series of talks about scalability in Rails ? Your original [post](http://hackingshmacking.com/category/web-scaleability/) had a link to this post, but I couldn’t find any link from this post that continues discussing further on this topic. Also, you mentioned that probably someone from the DevOps team would elaborate on the MySQL conf, any updates on that ?

  6. Hi Shai,

    Thanks for your post. I got to learn some best practices from your post like using hostnames instead of IP addresses and other things like further optmizing your SQL queries etc.,
    BTW, Do you have any new posts in this series of talks about scalability in Rails ? Your original [post](http://hackingshmacking.com/category/web-scaleability/) had a link to this post, but I couldn’t find any link from this post that continues discussing further on this topic. Also, you mentioned that probably someone from the DevOps team would elaborate on the MySQL conf, any updates on that ?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s