| 1 | <?page |
|---|
| 2 | wintitle=>a distributed memory object caching system |
|---|
| 3 | body<= |
|---|
| 4 | |
|---|
| 5 | <?h1 What is <?memd?>? h1?> |
|---|
| 6 | <p><?memd?> is a high-performance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.</p> |
|---|
| 7 | <p><a href="http://www.danga.com/">Danga Interactive</a> developed <?memd?> to enhance the speed of <a href="http://www.livejournal.com/">LiveJournal.com</a>, a site which was already doing 20 million+ dynamic page views per day for 1 million users with a bunch of webservers and a bunch of database servers. <?memd?> dropped the database load to almost nothing, yielding faster page load times for users, better resource utilization, and faster access to the databases on a memcache miss.</p> |
|---|
| 8 | |
|---|
| 9 | <?h1 How it Works h1?> |
|---|
| 10 | <p>First, you start up the <?memd?> daemon on as many spare machines as you have. The daemon has no configuration file, just a few command line options, only 3 or 4 of which you'll likely use: |
|---|
| 11 | <pre class='example'># ./memcached -d -m 2048 -l 10.0.0.40 -p 11211</pre> |
|---|
| 12 | <p>This starts <?memd?> up as a daemon, using 2GB of memory, and listening on IP 10.0.0.40, port 11211. Because a 32-bit process can only address 4GB of virtual memory (usually significantly less, depending on your operating system), if you have a 32-bit server with 4-64GB of memory using PAE you can just run multiple processes on the machine, each using 2 or 3GB of memory.</p> |
|---|
| 13 | |
|---|
| 14 | <?h1 Porting the Application h1?> |
|---|
| 15 | <p>Now, in your application, wherever you go to do a database query, first check the memcache. If the memcache returns an undefined object, then go to the database, get what you're looking for, and put it in the memcache:</p> |
|---|
| 16 | <pre class='example'> |
|---|
| 17 | <div class='exampletitle'>Perl Example (see <a href="apis.bml">APIs page</a>)</div> |
|---|
| 18 | sub get_foo_object { |
|---|
| 19 | my $foo_id = int(shift); |
|---|
| 20 | my $obj = $::MemCache->get("foo:$foo_id"); |
|---|
| 21 | return $obj if $obj; |
|---|
| 22 | |
|---|
| 23 | $obj = $::db->selectrow_hashref("SELECT .... FROM foo f, bar b ". |
|---|
| 24 | "WHERE ... AND f.fooid=$foo_id"); |
|---|
| 25 | $::MemCache->set("foo:$foo_id", $obj); |
|---|
| 26 | return $obj; |
|---|
| 27 | }</pre> |
|---|
| 28 | |
|---|
| 29 | <p>(If your internal API was already clean enough, you should only have to do this in a few spots. Start with the queries that kill your database the most, then move to doing as much as possible.)</p> |
|---|
| 30 | |
|---|
| 31 | <p>You'll notice the data structure the server provides is just a dictionary. You assign values to keys, and you request values from keys.</p> |
|---|
| 32 | |
|---|
| 33 | <p>Now, what actually happens is that the API hashes your key to a unique server. (You define all the available servers and their weightings when initializing the API) Alternatively, the APIs also let you provide your own hash value. A good hash value for user-related data is the user's ID number. Then, the API maps that hash value onto a server (modulus number of server buckets, one bucket for each server IP/port, but some can be weighted heigher if they have more memory available).</p> |
|---|
| 34 | |
|---|
| 35 | <p>If a host goes down, the API re-maps that dead host's requests onto the servers that are available.</p> |
|---|
| 36 | |
|---|
| 37 | <?h1 Shouldn't the database do this? h1?> |
|---|
| 38 | <p>Regardless of what database you use (MS-SQL, Oracle, Postgres, MysQL-InnoDB, etc..), there's a lot of overhead in implementing <a href="http://www.wikipedia.org/wiki/ACID">ACID</a> properties in a RDBMS, especially when disks are involved, which means queries are going to block. For databases that aren't ACID-compliant (like MySQL-MyISAM), that overhead doesn't exist, but reading threads block on the writing threads.</p> |
|---|
| 39 | <p><?memd?> never blocks. See the "Is memcached fast?" question below.</p> |
|---|
| 40 | |
|---|
| 41 | <?h1 What about shared memory? h1?> |
|---|
| 42 | |
|---|
| 43 | <p>The first thing people generally do is cache things within their |
|---|
| 44 | web processes. But this means your cache is duplicated multiple |
|---|
| 45 | times, once for each mod_perl/PHP/etc thread. This is a waste of |
|---|
| 46 | memory and you'll get low cache hit rates. If you're using a |
|---|
| 47 | multi-threaded language or a shared memory API (IPC::Shareable, etc), |
|---|
| 48 | you can have a global cache for all threads, but it's per-machine. It doesn't scale to multiple machines. |
|---|
| 49 | Once you have 20 webservers, those 20 independent caches start to look |
|---|
| 50 | just as silly as when you had 20 threads with their own caches on a |
|---|
| 51 | single box. (plus, shared memory is typically laden with limitations)</p> |
|---|
| 52 | |
|---|
| 53 | <p>The <?memd?> server and clients work together to implement one |
|---|
| 54 | global cache across as many machines as you have. In fact, it's |
|---|
| 55 | recommended you run both web nodes (which are typically memory-lite |
|---|
| 56 | and CPU-hungry) and memcached processes (which are memory-hungry and |
|---|
| 57 | CPU-lite) on the same machines. This way you'll save network |
|---|
| 58 | ports.</p> |
|---|
| 59 | |
|---|
| 60 | <?h1 What about MySQL 4.x query caching? h1?> |
|---|
| 61 | <p>MySQL query caching is less than ideal, for a number of reasons:</p> |
|---|
| 62 | <ul class='spaced'> |
|---|
| 63 | |
|---|
| 64 | <li>MySQL's query cache destroys the entire cache for a given table whenever that table is changed. On a high-traffic site with updates happening many times per second, this makes the the cache practically worthless. In fact, it's often harmful to have it on, since there's a overhead to maintain the cache.</li> |
|---|
| 65 | |
|---|
| 66 | <li>On 32-bit architectures, the entire server (including the query cache) is limited to a 4 GB virtual address space. <?memd?> lets you run as many processes as you want, so you have no limit on memory cache size.</li> |
|---|
| 67 | |
|---|
| 68 | <li>MySQL has a query cache, not an object cache. If your objects require extra expensive construction after the data retrieval step, MySQL's query cache can't help you there.</li> |
|---|
| 69 | |
|---|
| 70 | </ul> |
|---|
| 71 | <p>If the data you need to cache is small and you do infrequent updates, MySQL's query caching should work for you. If not, use <?memd?>.</p> |
|---|
| 72 | |
|---|
| 73 | <?h1 What about database replication? h1?> |
|---|
| 74 | |
|---|
| 75 | <p>You can spread your reads with replication, and that helps a lot, |
|---|
| 76 | but you can't spread writes (they have to process on all machines) and |
|---|
| 77 | they'll eventually consume all your resources. You'll find yourself |
|---|
| 78 | adding replicated slaves at an ever-increasing rate to make up for the |
|---|
| 79 | diminishing returns each addition slave provides.</p> |
|---|
| 80 | |
|---|
| 81 | <p>The next logical step is to horizontally partition your dataset |
|---|
| 82 | onto different master/slave clusters so you can spread your writes, |
|---|
| 83 | and then teach your application to connect to the correct cluster |
|---|
| 84 | depending on the data it needs.</p> |
|---|
| 85 | |
|---|
| 86 | <p>While this strategy works, and is recommended, more databases (each |
|---|
| 87 | with a bunch of disks) statistically leads to more frequent hardware |
|---|
| 88 | failures, which are annoying.</p> |
|---|
| 89 | |
|---|
| 90 | <p>With <?memd?> you can reduce your database reads to a mere |
|---|
| 91 | fraction, leaving the databases to mainly do infrequent writes, and |
|---|
| 92 | end up getting much more bang for your buck, since your databases |
|---|
| 93 | won't be blocking themselves doing ACID bookkeeping or waiting on |
|---|
| 94 | writing threads.</p> |
|---|
| 95 | |
|---|
| 96 | <?h1 Is <?memd?> fast? h1?> <p>Very fast. It uses <a |
|---|
| 97 | href="http://www.monkey.org/~provos/libevent/">libevent</a> to scale |
|---|
| 98 | to any number of open connections</a> (using <a |
|---|
| 99 | href="http://www.xmailserver.org/linux-patches/nio-improve.html">epoll</a> |
|---|
| 100 | on Linux, if available at runtime), uses non-blocking network I/O, refcounts internal objects |
|---|
| 101 | (so objects can be in multiple states to multiple clients), and uses |
|---|
| 102 | its own slab allocator and hash table so virtual memory never gets |
|---|
| 103 | externally fragmented and allocations are guaranteed O(1).</p> |
|---|
| 104 | |
|---|
| 105 | <?h1 What about race conditions? h1?> |
|---|
| 106 | <p>You might wonder: <i>"What if the <tt>get_foo()</tt> function adds a stale version of the Foo object to the cache right as/after the user updates their Foo object via update_foo()?"</i></p> |
|---|
| 107 | <p>While the server and API only have one way to get data from the cache, there exists 3 ways to put data in:</p> |
|---|
| 108 | <ul class='spaced'> |
|---|
| 109 | <li><b>set</b> -- unconditionally sets a given key with a given value (<tt>update_foo()</tt> should use this)</li> |
|---|
| 110 | <li><b>add</b> -- adds to the cache, only if it doesn't already exist (<tt>get_foo()</tt> should use this)</li> |
|---|
| 111 | <li><b>replace</b> -- sets in the cache only if the key already exists (not as useful, only for completeness)</li> |
|---|
| 112 | </ul> |
|---|
| 113 | Additionally, all three support an expiration time. |
|---|
| 114 | |
|---|
| 115 | <h2>Need help? Have Questions?</h2> |
|---|
| 116 | <ul> |
|---|
| 117 | <li>Join <a href="http://lists.danga.com/mailman/listinfo/memcached">the mailing list</a>! We'd love to help you out.</li> |
|---|
| 118 | </ul> |
|---|
| 119 | |
|---|
| 120 | <=body |
|---|
| 121 | page?> |
|---|