root/branches/server-newrepl/TODO

Revision 375, 3.4 kB (checked in by bradfitz, 2 years ago)

new note

Line 
1 -- if create_open before monitor runs yet, block and wait for a round?  better than 'no_devices'
2    might need some marker for 'end of monitoring'?  or just wait until we have 1 or 3?
3    or if they asked for 3, only return 1 in that rare case, preferring latency to redundancy.
4    plus, we'd know that 1 is writable in last few seconds!
5
6 -- update the 'repl' command for new file_to_replicate table
7
8 -- replication policy error storm when a device is known to be observably down:
9
10 [replicate(28037)] replication policy ran out of suggestions for us replicating fid 197214
11 [replicate(28037)] replication policy ran out of suggestions for us replicating fid 197237
12 [replicate(28037)] replication policy ran out of suggestions for us replicating fid 197219
13 [replicate(28037)] replication policy ran out of suggestions for us replicating fid 197256
14 [replicate(28037)] replication policy ran out of suggestions for us replicating fid 197226
15
16 -- telnet to 7001 and send "!"<enter>  will crash.
17
18 -- fsck job/command.  need 'last fsck' table per fid?  or column?
19
20 -- fix haphazard CapitalStyle vs capital_style in ProcManager for class methods
21
22 -- 'every' func should select on psock, to process parent-sent commands
23    during worker's breaks
24
25 -- create close could wake a replicate process.
26
27 -- optional 'wait_until_replicated=1' flag to create close, so client doesn't
28    get success until file is everywhere.
29
30 -- redo/reevaluate the 'unreachable_fids' logic:  unreachable should only mean
31    host/device are up, but file is 404.
32
33 -- test database failures
34
35 -- identify idempotent commands and replay them 'n' times if query worker dies
36    during processing.
37
38 -- have queries workers be able to broadcast back up to parent "can't parse this"
39    at which point parent parses it (e.g. "help" command), so admins don't
40    need to remember the "!" prefix.  of course, "!" prefix can always be used to
41    reach parent faster.
42
43 -- mb_asof handling in find_deviceid seems broken.  less than max age?  wrong units.
44
45 -- make generic script to write out usage files for people not using mogstored
46     -- or, let mogstored be run in 'usage' file writing only mode
47
48 -- wake up deleter process?  totally overkill, but why not?
49
50 * 404 storms during replicating:  (1.5 year old email, might be fixed, verify)
51
52 :: [replicate(12648)] Error: Resource http://10.0.0.82:7500/dev15/0/015/693/0015693821.fid failed: HTTP 404
53 :: [replicate(12648)] Copier failed replicating 15693821
54 :: [replicate(12648)] Error: Resource http://10.0.0.82:7500/dev15/0/015/693/0015693819.fid failed: HTTP 404
55 :: [replicate(12648)] Copier failed replicating 15693819
56 :: [replicate(12648)] Error: Resource http://10.0.0.81:7500/dev9/0/015/693/0015693844.fid failed: HTTP 404
57 :: [replicate(12648)] Copier failed replicating 15693844
58 :: [replicate(12646)] Copier failed replicating 15693846
59 :: [replicate(12646)] Error: Resource http://10.0.0.82:7500/dev15/0/015/693/0015693821.fid failed: HTTP 404
60 :: [replicate(12646)] Copier failed replicating 15693821
61 :: [replicate(12646)] Error: Resource http://10.0.0.81:7500/dev9/0/015/693/0015693844.fid failed: HTTP 404
62 :: [replicate(12646)] Copier failed replicating 15693844
63 :: [replicate(12648)] Error: Resource http://10.0.0.81:7500/dev3/0/015/693/0015693848.fid failed: HTTP 404
64 :: [replicate(12650)] Error: Resource http://10.0.0.82:7500/dev15/0/015/693/0015693819.fid failed: HTTP 404
65 :: [replicate(12648)] Copier failed replicating 15693848
66 ......
67
Note: See TracBrowser for help on using the browser.