Anti-RDBMS and distributed key-value stores
I’ve been comparing and playing with a bunch of distributed key-value stores lately. Here’s an annotated list of interesting articles:
Richard Jones (from Last.fm) has compiled an useful comparison of the various distributed key-values stores out there including MemcacheDB, Cassandra and CouchDB. He seems most interested in Scalaris and Project Voldemort.
redis seems to be a very interesting key-value store. It behaves like memcached but supports strings, lists and sets as values. It also includes atomic operations like increments, push/pop, etc. They even wrote a Twitter clone using redis + PHP.
Bret Taylor from Friendfeed goes in-depth about how they built a ’schema-less’ MYSQL schema which attempts to solve the problem of extensibility and maintainability in most large RDBMS solutions. It doesn’t hurt improving performance as a result.
As our database has grown, we have tried to iteratively deal with the scaling issues that come with rapid growth. We did the typical things, like using read slaves and memcache to increase read throughput and sharding our database to improve write throughput. However, as we grew, scaling our existing features to accomodate more traffic turned out to be much less of an issue than adding new features.
Leonard Lin also has some notes on distributed key-value stores. He compares some current tools out there, but finding none viable, creates his own. He has some words of wisdom:
- The distributed stores out there is currently pretty half-baked at best right now. Your comfort-level running in prod may vary, but for most sane people, I doubt you’d want to.
- If you’re dealing w/ a reasonable number of items (<50M), Tokyo Tyrant is crazy fast. If you’re looking for a known, MySQL is probably an acceptable solution.
- Don’t believe the hype. There’s a lot of talk, but I didn’t find any public project that came close to the (implied?) promise of tossing nodes in and having it figure things out.
- Based on the maturity of projects out there, you could write your own in less than a day. It’ll perform as well and at least when it breaks, you’ll be more fond of it. Alternatively, you could go on the conference circuit and talk about how awesome your half-baked distributed keystore is.
The popular consensus seems to be that even though numerous solutions exist out there, none seem to be widely adopted and production-proven yet.
July 1st, 2009 Update: Think Vitamin has a good article on reasons for switching to a non-relational database and also compares the tradeoffs between the different data storage options.
July 21st, 2009 Update: Evan Weaver (of Twitter) has a good intro article to getting up and running with Cassandra.
PrasSarkar.com
No Comments, Comment or Ping
Reply to “Anti-RDBMS and distributed key-value stores”