Database Scalability (and Drupal)

Disclaimer:

  1. I am not a DBA nor having any other database expert title attached.
  2. I am not Drupal expert either

Having spend some good days browsing the net and read like crazy, I managed to write up the following.

Database Scalability options:

  • Scale up
    • more powerful hardware: RAM, processors, storage
  • Scale out
    • Federation
      • MySQL 5.x support federated table (remote table). However its still have some issues: heavy traffic between federated server
      • Benefits: spreaded storage, relatively reducing main server load (delegated to federated server)
      • Disadvantages: heavy network traffic between federated server
      • Issues: network connection capacity between federated servers
    • Sharding (Partitioning)
      • Benefits: spreaded server load, spreaded storage
      • Disadvantages: relatively complicated since it’s involving application layer changes
      • Issues: Involves application layer

Federation is more transparent to developer as application can be totally unaware of the federation setup. However, its bottleneck for sure is giving a limit to scaling (out).
More to explore:

  • Replicating federated database.
  • Load balancing federated database;

Sharding on the other hand, while giving enourmous flexibility to scaling (out) options, is likely requiring ‘built-from-scratch’ application setup/environment. A special data hashing/sharding logic must be incorporated to the application layer in order to implement sharding, which sometimes against the intention of some framework.

What about Drupal then? Drupal depends so much on the node table. Though developer can new type of nodes which uses external table (for extra attributes), those nodes will still be related to the core node table (API restriction, eg: node_load, node_save, and other node_xxx)  From these points, the only avaliable scaling option for Drupal would be scaling up or using federated option (which is somehow felt like scaling up, eventually)

There has been rumour about ASQL (automated sharding proxy) but I haven’t found any available code yet.

PS:

  1. For load balancing, there are many options includes using hardware (BIG-IP) or software (mysqlproxy, sql-relay, sequoia)
  2. Sharding approach is similar to BigTable approach which spreads storage to localize (spread) load via colum based database structure.
  3. The more you want to scale (your app/database) the more you will love data redundancy and giving up normalization

References:
[1] http://en.wikipedia.org/wiki/Federated_database_system
[2] http://dev.mysql.com/tech-resources/articles/mysql-federated-storage.html
[3] http://www.onlamp.com/pub/a/databases/2006/08/10/mysql-federated-tables.html
[4] http://buytaert.net/scaling-with-mysql-replication
[5] http://www.johnandcailin.com/blog/john/scaling-drupal-step-four-database-segmentation-using-mysql-proxy
[6] http://mysqldba.blogspot.com/2006/11/unorthodox-approach-to-database-design.html
[7] http://sequoia.continuent.org/HomePage