root/tags/mogilefs-server-2.17/TODO

Revision 1054, 5.0 kB (checked in by bradfitz, 3 years ago)

todo

Line 
1-- if run out filedescriptors in mogilefsd, mogilefsd shouldn't crash.  it might
2   now.  needs a test.
3
4-- change debug level at runtime from mgmt port, propogate to children.
5
6-- MogileFS::Device ->make_directory on lighttpd isn't exactly right, as WebDAV
7   spec says MKCOL on an existing directory should return 405 (Method Not Allowed),
8   which I think is the same as a server without WebDAV enabled... we need to distinguish
9   between those two cases.  (perhaps in Monitor job?)
10   405 (Method Not Allowed) - MKCOL can only be executed on a deleted/non-existent resource.
11
12-- mogdbsetup should use /etc/mogilefs/mogilefsd.conf for upgrade dsn info
13
14-- fix 5/second in mogadm fsck status.  hard-coded to 5.  whoops.
15
16-- in MogileFS::Device,
17 +        # FIXME: don't use local machine's time() for this.  time sync
18 +        # issues!  instead, the monitor process should track this,
19 +        # noting the difference in relative time between the server's
20 +        # time (in Date: response header) and time in the usage.txt
21 +        # file.
22
23-- make mogilefsd trackers speak FUSE, so we could mount all of mogilefs using, say:
24      http://noedler.de/projekte/wdfs/index.html
25   things like paths could be exposed as extended attributes, or as pseudo files:
26      cat /mnt/mogile/<domain>/paths/<key>
27      cat /mnt/mogile/<domain>/contents/<key>
28
29-- if create_open before monitor runs yet, block and wait for a round?  better than 'no_devices'
30   might need some marker for 'end of monitoring'?  or just wait until we have 1 or 3?
31   or if they asked for 3, only return 1 in that rare case, preferring latency to redundancy.
32   plus, we'd know that 1 is writable in last few seconds!
33
34-- update the 'repl' command for new file_to_replicate table
35
36-- replication policy error storm when a device is known to be observably down:
37
38[replicate(28037)] replication policy ran out of suggestions for us replicating fid 197214
39[replicate(28037)] replication policy ran out of suggestions for us replicating fid 197237
40[replicate(28037)] replication policy ran out of suggestions for us replicating fid 197219
41[replicate(28037)] replication policy ran out of suggestions for us replicating fid 197256
42[replicate(28037)] replication policy ran out of suggestions for us replicating fid 197226
43
44-- telnet to 7001 and send "!"<enter>  will crash.
45
46-- fsck job/command.  need 'last fsck' table per fid?  or column?
47
48-- fix haphazard CapitalStyle vs capital_style in ProcManager for class methods
49
50-- 'every' func should select on psock, to process parent-sent commands
51   during worker's breaks
52
53-- create close could wake a replicate process.
54
55-- optional 'wait_until_replicated=1' flag to create close, so client doesn't
56   get success until file is everywhere.
57
58-- redo/reevaluate the 'unreachable_fids' logic:  unreachable should only mean
59   host/device are up, but file is 404.
60
61-- test database failures
62
63-- identify idempotent commands and replay them 'n' times if query worker dies
64   during processing.
65
66-- have queries workers be able to broadcast back up to parent "can't parse this"
67   at which point parent parses it (e.g. "help" command), so admins don't
68   need to remember the "!" prefix.  of course, "!" prefix can always be used to
69   reach parent faster.
70
71-- mb_asof handling in find_deviceid seems broken.  less than max age?  wrong units.
72
73-- make generic script to write out usage files for people not using mogstored
74    -- or, let mogstored be run in 'usage' file writing only mode
75
76-- wake up deleter process?  totally overkill, but why not?
77
78* 404 storms during replicating:  (1.5 year old email, might be fixed, verify)
79
80:: [replicate(12648)] Error: Resource http://10.0.0.82:7500/dev15/0/015/693/0015693821.fid failed: HTTP 404
81:: [replicate(12648)] Copier failed replicating 15693821
82:: [replicate(12648)] Error: Resource http://10.0.0.82:7500/dev15/0/015/693/0015693819.fid failed: HTTP 404
83:: [replicate(12648)] Copier failed replicating 15693819
84:: [replicate(12648)] Error: Resource http://10.0.0.81:7500/dev9/0/015/693/0015693844.fid failed: HTTP 404
85:: [replicate(12648)] Copier failed replicating 15693844
86:: [replicate(12646)] Copier failed replicating 15693846
87:: [replicate(12646)] Error: Resource http://10.0.0.82:7500/dev15/0/015/693/0015693821.fid failed: HTTP 404
88:: [replicate(12646)] Copier failed replicating 15693821
89:: [replicate(12646)] Error: Resource http://10.0.0.81:7500/dev9/0/015/693/0015693844.fid failed: HTTP 404
90:: [replicate(12646)] Copier failed replicating 15693844
91:: [replicate(12648)] Error: Resource http://10.0.0.81:7500/dev3/0/015/693/0015693848.fid failed: HTTP 404
92:: [replicate(12650)] Error: Resource http://10.0.0.82:7500/dev15/0/015/693/0015693819.fid failed: HTTP 404
93:: [replicate(12648)] Copier failed replicating 15693848
94......
95
96-- fsck should catch weird state where file exists in 'file' table with
97   devcount>0 but does not exist in file_on. Or count does not match.
98
99-- fsck for case where row from file_to_replicate(fid,fromdevid) does not exist
100   in file_on(fid,devid). This is a byproduct of a failed inject.
Note: See TracBrowser for help on using the browser.