root/branches/binary/server/doc/protocol.txt @ 751

Revision 751, 23.5 kB (checked in by dsallings, 21 months ago)

Merge commit 'trunk' into lbinary as of r750

Conflicts:

server/memcached.c

  • Property svn:eol-style set to native
  • Property svn:keywords set to Author Date Id Revision
Line 
1Protocol
2--------
3
4Clients of memcached communicate with server through TCP connections.
5(A UDP interface is also available; details are below under "UDP
6protocol.") A given running memcached server listens on some
7(configurable) port; clients connect to that port, send commands to
8the server, read responses, and eventually close the connection.
9
10There is no need to send any command to end the session. A client may
11just close the connection at any moment it no longer needs it. Note,
12however, that clients are encouraged to cache their connections rather
13than reopen them every time they need to store or retrieve data.  This
14is because memcached is especially designed to work very efficiently
15with a very large number (many hundreds, more than a thousand if
16necessary) of open connections. Caching connections will eliminate the
17overhead associated with establishing a TCP connection (the overhead
18of preparing for a new connection on the server side is insignificant
19compared to this).
20
21There are two kinds of data sent in the memcache protocol: text lines
22and unstructured data.  Text lines are used for commands from clients
23and responses from servers. Unstructured data is sent when a client
24wants to store or retrieve data. The server will transmit back
25unstructured data in exactly the same way it received it, as a byte
26stream. The server doesn't care about byte order issues in
27unstructured data and isn't aware of them. There are no limitations on
28characters that may appear in unstructured data; however, the reader
29of such data (either a client or a server) will always know, from a
30preceding text line, the exact length of the data block being
31transmitted.
32
33Text lines are always terminated by \r\n. Unstructured data is _also_
34terminated by \r\n, even though \r, \n or any other 8-bit characters
35may also appear inside the data. Therefore, when a client retrieves
36data from a server, it must use the length of the data block (which it
37will be provided with) to determine where the data block ends, and not
38the fact that \r\n follows the end of the data block, even though it
39does.
40
41Keys
42----
43
44Data stored by memcached is identified with the help of a key. A key
45is a text string which should uniquely identify the data for clients
46that are interested in storing and retrieving it.  Currently the
47length limit of a key is set at 250 characters (of course, normally
48clients wouldn't need to use such long keys); the key must not include
49control characters or whitespace.
50
51Commands
52--------
53
54There are three types of commands.
55
56Storage commands (there are six: "set", "add", "replace", "append"
57"prepend" and "cas") ask the server to store some data identified by a key. The
58client sends a command line, and then a data block; after that the
59client expects one line of response, which will indicate success or
60faulure.
61
62Retrieval commands (there are two: "get" and "gets") ask the server to
63retrieve data corresponding to a set of keys (one or more keys in one
64request). The client sends a command line, which includes all the
65requested keys; after that for each item the server finds it sends to
66the client one response line with information about the item, and one
67data block with the item's data; this continues until the server
68finished with the "END" response line.
69
70All other commands don't involve unstructured data. In all of them,
71the client sends one command line, and expects (depending on the
72command) either one line of response, or several lines of response
73ending with "END" on the last line.
74
75A command line always starts with the name of the command, followed by
76parameters (if any) delimited by whitespace. Command names are
77lower-case and are case-sensitive.
78
79Expiration times
80----------------
81
82Some commands involve a client sending some kind of expiration time
83(relative to an item or to an operation requested by the client) to
84the server. In all such cases, the actual value sent may either be
85Unix time (number of seconds since January 1, 1970, as a 32-bit
86value), or a number of seconds starting from current time. In the
87latter case, this number of seconds may not exceed 60*60*24*30 (number
88of seconds in 30 days); if the number sent by a client is larger than
89that, the server will consider it to be real Unix time value rather
90than an offset from current time.
91
92
93Error strings
94-------------
95
96Each command sent by a client may be answered with an error string
97from the server. These error strings come in three types:
98
99- "ERROR\r\n"
100
101  means the client sent a nonexistent command name.
102
103- "CLIENT_ERROR <error>\r\n"
104
105  means some sort of client error in the input line, i.e. the input
106  doesn't conform to the protocol in some way. <error> is a
107  human-readable error string.
108
109- "SERVER_ERROR <error>\r\n"
110
111  means some sort of server error prevents the server from carrying
112  out the command. <error> is a human-readable error string. In cases
113  of severe server errors, which make it impossible to continue
114  serving the client (this shouldn't normally happen), the server will
115  close the connection after sending the error line. This is the only
116  case in which the server closes a connection to a client.
117
118
119In the descriptions of individual commands below, these error lines
120are not again specifically mentioned, but clients must allow for their
121possibility.
122
123
124Storage commands
125----------------
126
127First, the client sends a command line which looks like this:
128
129<command name> <key> <flags> <exptime> <bytes> [noreply]\r\n
130cas <key> <flags> <exptime> <bytes> <cas unqiue> [noreply]\r\n
131
132- <command name> is "set", "add", "replace", "append" or "prepend"
133
134  "set" means "store this data". 
135
136  "add" means "store this data, but only if the server *doesn't* already
137  hold data for this key". 
138
139  "replace" means "store this data, but only if the server *does*
140  already hold data for this key".
141
142  "append" means "add this data to an existing key after existing data".
143
144  "prepend" means "add this data to an existing key before existing data".
145
146  The append and prepend commands do not accept flags or exptime.
147  They update existing data portions, and ignore new flag and exptime
148  settings.
149
150  "cas" is a check and set operation which means "store this data but
151  only if no one else has updated since I last fetched it."
152
153- <key> is the key under which the client asks to store the data
154
155- <flags> is an arbitrary 16-bit unsigned integer (written out in
156  decimal) that the server stores along with the data and sends back
157  when the item is retrieved. Clients may use this as a bit field to
158  store data-specific information; this field is opaque to the server.
159  Note that in memcached 1.2.1 and higher, flags may be 32-bits, instead
160  of 16, but you might want to restrict yourself to 16 bits for
161  compatibility with older versions.
162
163- <exptime> is expiration time. If it's 0, the item never expires
164  (although it may be deleted from the cache to make place for other
165  items). If it's non-zero (either Unix time or offset in seconds from
166  current time), it is guaranteed that clients will not be able to
167  retrieve this item after the expiration time arrives (measured by
168  server time). 
169
170- <bytes> is the number of bytes in the data block to follow, *not*
171  including the delimiting \r\n. <bytes> may be zero (in which case
172  it's followed by an empty data block).
173
174- <cas unique> is a unique 64-bit value of an existing entry.
175  Clients should use the value returned from the "gets" command
176  when issuing "cas" updates.
177
178- "noreply" optional parameter instructs the server to not send the
179  reply.  NOTE: if the request line is malformed, the server can't
180  parse "noreply" option reliably.  In this case it may send the error
181  to the client, and not reading it on the client side will break
182  things.  Client should construct only valid requests.
183
184After this line, the client sends the data block:
185
186<data block>\r\n
187
188- <data block> is a chunk of arbitrary 8-bit data of length <bytes>
189  from the previous line.
190
191After sending the command line and the data blockm the client awaits
192the reply, which may be:
193
194- "STORED\r\n", to indicate success.
195
196- "NOT_STORED\r\n" to indicate the data was not stored, but not
197because of an error. This normally means that either that the
198condition for an "add" or a "replace" command wasn't met, or that the
199item is in a delete queue (see the "delete" command below).
200
201- "EXISTS\r\n" to indicate that the item you are trying to store with
202a "cas" command has been modified since you last fetched it.
203
204- "NOT_FOUND\r\n" to indicate that the item you are trying to store
205with a "cas" command did not exist or has been deleted.
206
207
208Retrieval command:
209------------------
210
211The retrieval commands "get" and "gets" operates like this:
212
213get <key>*\r\n
214gets <key>*\r\n
215
216- <key>* means one or more key strings separated by whitespace.
217
218After this command, the client expects zero or more items, each of
219which is received as a text line followed by a data block. After all
220the items have been transmitted, the server sends the string
221
222"END\r\n"
223
224to indicate the end of response.
225
226Each item sent by the server looks like this:
227
228VALUE <key> <flags> <bytes> [<cas unique>]\r\n
229<data block>\r\n
230
231- <key> is the key for the item being sent
232
233- <flags> is the flags value set by the storage command
234
235- <bytes> is the length of the data block to follow, *not* including
236  its delimiting \r\n
237
238- <cas unique> is a unique 64-bit integer that uniquely identifies
239  this specific item.
240
241- <data block> is the data for this item.
242
243If some of the keys appearing in a retrieval request are not sent back
244by the server in the item list this means that the server does not
245hold items with such keys (because they were never stored, or stored
246but deleted to make space for more items, or expired, or explicitly
247deleted by a client).
248
249
250Deletion
251--------
252
253The command "delete" allows for explicit deletion of items:
254
255delete <key> [<time>] [noreply]\r\n
256
257- <key> is the key of the item the client wishes the server to delete
258
259- <time> is the amount of time in seconds (or Unix time until which)
260  the client wishes the server to refuse "add" and "replace" commands
261  with this key. For this amount of item, the item is put into a
262  delete queue, which means that it won't possible to retrieve it by
263  the "get" command, but "add" and "replace" command with this key
264  will also fail (the "set" command will succeed, however). After the
265  time passes, the item is finally deleted from server memory.
266
267  The parameter <time> is optional, and, if absent, defaults to 0
268  (which means that the item will be deleted immediately and further
269  storage commands with this key will succeed).
270
271- "noreply" optional parameter instructs the server to not send the
272  reply.  See the note in Storage commands regarding malformed
273  requests.
274
275The response line to this command can be one of:
276
277- "DELETED\r\n" to indicate success
278
279- "NOT_FOUND\r\n" to indicate that the item with this key was not
280  found.
281
282See the "flush_all" command below for immediate invalidation
283of all existing items.
284
285
286Increment/Decrement
287-------------------
288
289Commands "incr" and "decr" are used to change data for some item
290in-place, incrementing or decrementing it. The data for the item is
291treated as decimal representation of a 64-bit unsigned integer. If the
292current data value does not conform to such a representation, the
293commands behave as if the value were 0. Also, the item must already
294exist for incr/decr to work; these commands won't pretend that a
295non-existent key exists with value 0; instead, they will fail.
296
297The client sends the command line:
298
299incr <key> <value> [noreply]\r\n
300
301or
302
303decr <key> <value> [noreply]\r\n
304
305- <key> is the key of the item the client wishes to change
306
307- <value> is the amount by which the client wants to increase/decrease
308the item. It is a decimal representation of a 64-bit unsigned integer.
309
310- "noreply" optional parameter instructs the server to not send the
311  reply.  See the note in Storage commands regarding malformed
312  requests.
313
314The response will be one of:
315
316- "NOT_FOUND\r\n" to indicate the item with this value was not found
317
318- <value>\r\n , where <value> is the new value of the item's data,
319  after the increment/decrement operation was carried out.
320
321Note that underflow in the "decr" command is caught: if a client tries
322to decrease the value below 0, the new value will be 0.  Overflow in
323the "incr" command will wrap around the 64 bit mark.
324
325Note also that decrementing a number such that it loses length isn't
326guaranteed to decrement its returned length.  The number MAY be
327space-padded at the end, but this is purely an implementation
328optimization, so you also shouldn't rely on that.
329
330Statistics
331----------
332
333The command "stats" is used to query the server about statistics it
334maintains and other internal data. It has two forms. Without
335arguments:
336
337stats\r\n
338
339it causes the server to output general-purpose statistics and
340settings, documented below.  In the other form it has some arguments:
341
342stats <args>\r\n
343
344Depending on <args>, various internal data is sent by the server. The
345kinds of arguments and the data sent are not documented in this vesion
346of the protocol, and are subject to change for the convenience of
347memcache developers.
348
349
350General-purpose statistics
351--------------------------
352
353Upon receiving the "stats" command without arguments, the server sents
354a number of lines which look like this:
355
356STAT <name> <value>\r\n
357
358The server terminates this list with the line
359
360END\r\n
361
362In each line of statistics, <name> is the name of this statistic, and
363<value> is the data.  The following is the list of all names sent in
364response to the "stats" command, together with the type of the value
365sent for this name, and the meaning of the value.
366
367In the type column below, "32u" means a 32-bit unsigned integer, "64u"
368means a 64-bit unsigner integer. '32u:32u' means two 32-but unsigned
369integers separated by a colon.
370
371
372Name              Type     Meaning
373----------------------------------
374pid               32u      Process id of this server process
375uptime            32u      Number of seconds this server has been running
376time              32u      current UNIX time according to the server
377version           string   Version string of this server
378pointer_size      32       Default size of pointers on the host OS
379                           (generally 32 or 64)
380rusage_user       32u:32u  Accumulated user time for this process
381                           (seconds:microseconds)
382rusage_system     32u:32u  Accumulated system time for this process
383                           (seconds:microseconds)
384curr_items        32u      Current number of items stored by the server
385total_items       32u      Total number of items stored by this server
386                           ever since it started
387bytes             64u      Current number of bytes used by this server
388                           to store items
389curr_connections  32u      Number of open connections
390total_connections 32u      Total number of connections opened since
391                           the server started running
392connection_structures 32u  Number of connection structures allocated
393                           by the server
394cmd_get           64u      Cumulative number of retrieval requests
395cmd_set           64u      Cumulative number of storage requests
396get_hits          64u      Number of keys that have been requested and
397                           found present
398get_misses        64u      Number of items that have been requested
399                           and not found
400evictions         64u      Number of valid items removed from cache                                                                           
401                           to free memory for new items                                                                                       
402bytes_read        64u      Total number of bytes read by this server
403                           from network
404bytes_written     64u      Total number of bytes sent by this server to
405                           network
406limit_maxbytes    32u      Number of bytes this server is allowed to
407                           use for storage.
408threads           32u      Number of worker threads requested.
409                           (see doc/threads.txt)
410
411
412Item statistics
413---------------
414CAVEAT: This section describes statistics which are subject to change in the
415future.
416
417The "stats" command with the argument of "items" returns information about
418item storage per slab class. The data is returned in the format:
419
420STAT items:<slabclass>:<stat> <value>\r\n
421
422The server terminates this list with the line
423
424END\r\n
425
426The slabclass aligns with class ids used by the "stats slabs" command. Where
427"stats slabs" describes size and memory usage, "stats items" shows higher
428level information.
429
430The following item values are defined as of writing.
431
432Name                   Meaning
433------------------------------
434number                 Number of items presently stored in this class. Expired
435                       items are not automatically excluded.
436age                    Age of the oldest item in the LRU.
437evicted                Number of times an item had to be evicted from the LRU
438                       before it expired.
439outofmemory            Number of times the underlying slab class was unable to
440                       store a new item. This means you are running with -M or
441                       an eviction failed.
442
443Note this will only display information about slabs which exist, so an empty
444cache will return an empty set.
445
446
447Item size statistics
448--------------------
449CAVEAT: This section describes statistics which are subject to change in the
450future.
451
452The "stats" command with the argument of "sizes" returns information about the
453general size and count of all items stored in the cache.
454WARNING: This command WILL lock up your cache! It iterates over *every item*
455and examines the size. While the operation is fast, if you have many items
456you could prevent memcached from serving requests for several seconds.
457
458The data is returned in the following format:
459
460<size> <count>\r\n
461
462The server terminates this list with the line
463
464END\r\n
465
466'size' is an approximate size of the item, within 32 bytes.
467'count' is the amount of items that exist within that 32-byte range.
468
469This is essentially a display of all of your items if there was a slab class
470for every 32 bytes. You can use this to determine if adjusting the slab growth
471factor would save memory overhead. For example: generating more classes in the
472lower range could allow items to fit more snugly into their slab classes, if
473most of your items are less than 200 bytes in size.
474
475
476Slab statistics
477---------------
478CAVEAT: This section describes statistics which are subject to change in the
479future.
480
481The "stats" command with the argument of "slabs" returns information about
482each of the slabs created by memcached during runtime. This includes per-slab
483information along with some totals. The data is returned in the format:
484
485STAT <slabclass>:<stat> <value>\r\n
486STAT <stat> <value>\r\n
487
488The server terminates this list with the line
489
490END\r\n
491
492Name                   Meaning
493------------------------------
494chunk_size             The amount of space each chunk uses. One item will use
495                       one chunk of the appropriate size.
496chunks_per_page        How many chunks exist within one page. A page by
497                       default is one megabyte in size. Slabs are allocated per
498                       page, then broken into chunks.
499total_pages            Total number of pages allocated to the slab class.
500total_chunks           Total number of chunks allocated to the slab class.
501used_chunks            How many chunks have been allocated to items.
502free_chunks            Chunks not yet allocated to items, or freed via delete.
503free_chunks_end        Number of free chunks at the end of the last allocated
504                       page.
505active_slabs           Total number of slab classes allocated.
506total_malloced         Total amount of memory allocated to slab pages.
507
508
509Other commands
510--------------
511
512"flush_all" is a command with an optional numeric argument. It always
513succeeds, and the server sends "OK\r\n" in response (unless "noreply"
514is given as the last parameter). Its effect is to invalidate all
515existing items immediately (by default) or after the expiration
516specified.  After invalidation none of the items will be returned in
517response to a retrieval command (unless it's stored again under the
518same key *after* flush_all has invalidated the items). flush_all
519doesn't actually free all the memory taken up by existing items; that
520will happen gradually as new items are stored. The most precise
521definition of what flush_all does is the following: it causes all
522items whose update time is earlier than the time at which flush_all
523was set to be executed to be ignored for retrieval purposes.
524
525The intent of flush_all with a delay, was that in a setting where you
526have a pool of memcached servers, and you need to flush all content,
527you have the option of not resetting all memcached servers at the
528same time (which could e.g. cause a spike in database load with all
529clients suddenly needing to recreate content that would otherwise
530have been found in the memcached daemon).
531
532The delay option allows you to have them reset in e.g. 10 second
533intervals (by passing 0 to the first, 10 to the second, 20 to the
534third, etc. etc.).
535
536
537"version" is a command with no arguments:
538
539version\r\n
540
541In response, the server sends
542
543"VERSION <version>\r\n", where <version> is the version string for the
544server.
545
546"verbosity" is a command with a numeric argument. It always succeeds,
547and the server sends "OK\r\n" in response (unless "noreply" is given
548as the last parameter). Its effect is to set the verbosity level of
549the logging output.
550
551"quit" is a command with no arguments:
552
553quit\r\n
554
555Upon receiving this command, the server closes the
556connection. However, the client may also simply close the connection
557when it no longer needs it, without issuing this command.
558
559
560UDP protocol
561------------
562
563For very large installations where the number of clients is high enough
564that the number of TCP connections causes scaling difficulties, there is
565also a UDP-based interface. The UDP interface does not provide guaranteed
566delivery, so should only be used for operations that aren't required to
567succeed; typically it is used for "get" requests where a missing or
568incomplete response can simply be treated as a cache miss.
569
570Each UDP datagram contains a simple frame header, followed by data in the
571same format as the TCP protocol described above. In the current
572implementation, requests must be contained in a single UDP datagram, but
573responses may span several datagrams. (The only common requests that would
574span multiple datagrams are huge multi-key "get" requests and "set"
575requests, both of which are more suitable to TCP transport for reliability
576reasons anyway.)
577
578The frame header is 8 bytes long, as follows (all values are 16-bit integers
579in network byte order, high byte first):
580
5810-1 Request ID
5822-3 Sequence number
5834-5 Total number of datagrams in this message
5846-7 Reserved for future use; must be 0
585
586The request ID is supplied by the client. Typically it will be a
587monotonically increasing value starting from a random seed, but the client
588is free to use whatever request IDs it likes. The server's response will
589contain the same ID as the incoming request. The client uses the request ID
590to differentiate between responses to outstanding requests if there are
591several pending from the same server; any datagrams with an unknown request
592ID are probably delayed responses to an earlier request and should be
593discarded.
594
595The sequence number ranges from 0 to n-1, where n is the total number of
596datagrams in the message. The client should concatenate the payloads of the
597datagrams for a given response in sequence number order; the resulting byte
598stream will contain a complete response in the same format as the TCP
599protocol (including terminating \r\n sequences).
Note: See TracBrowser for help on using the browser.