| 1 | -- if run out filedescriptors in mogilefsd, mogilefsd shouldn't crash. it might |
|---|
| 2 | now. needs a test. |
|---|
| 3 | |
|---|
| 4 | -- change debug level at runtime from mgmt port, propogate to children. |
|---|
| 5 | |
|---|
| 6 | -- MogileFS::Device ->make_directory on lighttpd isn't exactly right, as WebDAV |
|---|
| 7 | spec says MKCOL on an existing directory should return 405 (Method Not Allowed), |
|---|
| 8 | which I think is the same as a server without WebDAV enabled... we need to distinguish |
|---|
| 9 | between those two cases. (perhaps in Monitor job?) |
|---|
| 10 | 405 (Method Not Allowed) - MKCOL can only be executed on a deleted/non-existent resource. |
|---|
| 11 | |
|---|
| 12 | -- mogdbsetup should use /etc/mogilefs/mogilefsd.conf for upgrade dsn info |
|---|
| 13 | |
|---|
| 14 | -- fix 5/second in mogadm fsck status. hard-coded to 5. whoops. |
|---|
| 15 | |
|---|
| 16 | -- in MogileFS::Device, |
|---|
| 17 | + # FIXME: don't use local machine's time() for this. time sync |
|---|
| 18 | + # issues! instead, the monitor process should track this, |
|---|
| 19 | + # noting the difference in relative time between the server's |
|---|
| 20 | + # time (in Date: response header) and time in the usage.txt |
|---|
| 21 | + # file. |
|---|
| 22 | |
|---|
| 23 | -- make mogilefsd trackers speak FUSE, so we could mount all of mogilefs using, say: |
|---|
| 24 | http://noedler.de/projekte/wdfs/index.html |
|---|
| 25 | things like paths could be exposed as extended attributes, or as pseudo files: |
|---|
| 26 | cat /mnt/mogile/<domain>/paths/<key> |
|---|
| 27 | cat /mnt/mogile/<domain>/contents/<key> |
|---|
| 28 | |
|---|
| 29 | -- if create_open before monitor runs yet, block and wait for a round? better than 'no_devices' |
|---|
| 30 | might need some marker for 'end of monitoring'? or just wait until we have 1 or 3? |
|---|
| 31 | or if they asked for 3, only return 1 in that rare case, preferring latency to redundancy. |
|---|
| 32 | plus, we'd know that 1 is writable in last few seconds! |
|---|
| 33 | |
|---|
| 34 | -- update the 'repl' command for new file_to_replicate table |
|---|
| 35 | |
|---|
| 36 | -- replication policy error storm when a device is known to be observably down: |
|---|
| 37 | |
|---|
| 38 | [replicate(28037)] replication policy ran out of suggestions for us replicating fid 197214 |
|---|
| 39 | [replicate(28037)] replication policy ran out of suggestions for us replicating fid 197237 |
|---|
| 40 | [replicate(28037)] replication policy ran out of suggestions for us replicating fid 197219 |
|---|
| 41 | [replicate(28037)] replication policy ran out of suggestions for us replicating fid 197256 |
|---|
| 42 | [replicate(28037)] replication policy ran out of suggestions for us replicating fid 197226 |
|---|
| 43 | |
|---|
| 44 | -- telnet to 7001 and send "!"<enter> will crash. |
|---|
| 45 | |
|---|
| 46 | -- fsck job/command. need 'last fsck' table per fid? or column? |
|---|
| 47 | |
|---|
| 48 | -- fix haphazard CapitalStyle vs capital_style in ProcManager for class methods |
|---|
| 49 | |
|---|
| 50 | -- 'every' func should select on psock, to process parent-sent commands |
|---|
| 51 | during worker's breaks |
|---|
| 52 | |
|---|
| 53 | -- create close could wake a replicate process. |
|---|
| 54 | |
|---|
| 55 | -- optional 'wait_until_replicated=1' flag to create close, so client doesn't |
|---|
| 56 | get success until file is everywhere. |
|---|
| 57 | |
|---|
| 58 | -- redo/reevaluate the 'unreachable_fids' logic: unreachable should only mean |
|---|
| 59 | host/device are up, but file is 404. |
|---|
| 60 | |
|---|
| 61 | -- test database failures |
|---|
| 62 | |
|---|
| 63 | -- identify idempotent commands and replay them 'n' times if query worker dies |
|---|
| 64 | during processing. |
|---|
| 65 | |
|---|
| 66 | -- have queries workers be able to broadcast back up to parent "can't parse this" |
|---|
| 67 | at which point parent parses it (e.g. "help" command), so admins don't |
|---|
| 68 | need to remember the "!" prefix. of course, "!" prefix can always be used to |
|---|
| 69 | reach parent faster. |
|---|
| 70 | |
|---|
| 71 | -- mb_asof handling in find_deviceid seems broken. less than max age? wrong units. |
|---|
| 72 | |
|---|
| 73 | -- make generic script to write out usage files for people not using mogstored |
|---|
| 74 | -- or, let mogstored be run in 'usage' file writing only mode |
|---|
| 75 | |
|---|
| 76 | -- wake up deleter process? totally overkill, but why not? |
|---|
| 77 | |
|---|
| 78 | * 404 storms during replicating: (1.5 year old email, might be fixed, verify) |
|---|
| 79 | |
|---|
| 80 | :: [replicate(12648)] Error: Resource http://10.0.0.82:7500/dev15/0/015/693/0015693821.fid failed: HTTP 404 |
|---|
| 81 | :: [replicate(12648)] Copier failed replicating 15693821 |
|---|
| 82 | :: [replicate(12648)] Error: Resource http://10.0.0.82:7500/dev15/0/015/693/0015693819.fid failed: HTTP 404 |
|---|
| 83 | :: [replicate(12648)] Copier failed replicating 15693819 |
|---|
| 84 | :: [replicate(12648)] Error: Resource http://10.0.0.81:7500/dev9/0/015/693/0015693844.fid failed: HTTP 404 |
|---|
| 85 | :: [replicate(12648)] Copier failed replicating 15693844 |
|---|
| 86 | :: [replicate(12646)] Copier failed replicating 15693846 |
|---|
| 87 | :: [replicate(12646)] Error: Resource http://10.0.0.82:7500/dev15/0/015/693/0015693821.fid failed: HTTP 404 |
|---|
| 88 | :: [replicate(12646)] Copier failed replicating 15693821 |
|---|
| 89 | :: [replicate(12646)] Error: Resource http://10.0.0.81:7500/dev9/0/015/693/0015693844.fid failed: HTTP 404 |
|---|
| 90 | :: [replicate(12646)] Copier failed replicating 15693844 |
|---|
| 91 | :: [replicate(12648)] Error: Resource http://10.0.0.81:7500/dev3/0/015/693/0015693848.fid failed: HTTP 404 |
|---|
| 92 | :: [replicate(12650)] Error: Resource http://10.0.0.82:7500/dev15/0/015/693/0015693819.fid failed: HTTP 404 |
|---|
| 93 | :: [replicate(12648)] Copier failed replicating 15693848 |
|---|
| 94 | ...... |
|---|
| 95 | |
|---|
| 96 | -- fsck should catch weird state where file exists in 'file' table with |
|---|
| 97 | devcount>0 but does not exist in file_on. Or count does not match. |
|---|
| 98 | |
|---|
| 99 | -- fsck for case where row from file_to_replicate(fid,fromdevid) does not exist |
|---|
| 100 | in file_on(fid,devid). This is a byproduct of a failed inject. |
|---|