sybil ingest failing quietly
onlyjazz opened this issue · 5 comments
Sure I'm missing something here - but I can't seem to get this ingested
Pulled and compiled with go.
Seems to work with ingest from mongodb collection export.
{"record": {"time":1527019503, "alertcount":1, "metric":"Subject reports pre-treatment", "subject":"AE-012", "study":"Study SP301","site":"UCLA MC"} }
{"record": {"time":1527232204, "alertcount":1, "metric":"Subject reports pre-treatment", "subject":"BD-006", "study":"Study SP301", "site":"UCLA MC"} }
it's likely the samples are in the write ahead log, use -read-log
during query to query the log. we should likely make this flag on by default.
okay@chalk:~/tonka/src/sybil$ cat test.json
{"record": {"time":1527019503, "alertcount":1, "metric":"Subject reports pre-treatment", "subject":"AE-012", "study":"Study SP301","site":"UCLA MC"} }
{"record": {"time":1527232204, "alertcount":1, "metric":"Subject reports pre-treatment", "subject":"BD-006", "study":"Study SP301", "site":"UCLA MC"} }
okay@chalk:~/tonka/src/sybil$ sybil ingest -table test1 --debug < test.json
2020/12/23 12:27:43 EXCLUDING COLUMN
2020/12/23 12:27:44 CANT CREATE LOCK FILE: open db/test1/info.lock: no such file or directory
2020/12/23 12:27:44 LOCK FAIL! db/test1/info.lock
2020/12/23 12:27:44 LOAD TABLE INFO LOCK TAKEN
2020/12/23 12:27:44 PATH IS [$]
2020/12/23 12:27:44 KEY TABLE map[record_alertcount:3 record_metric:4 record_site:1 record_study:0 record_subject:5 record_time:2]
2020/12/23 12:27:44 KEY TYPES map[0:2 1:2 2:1 3:1 4:2 5:2]
2020/12/23 12:27:44 SAVING RECORDS 2 TO INGESTION LOG
2020/12/23 12:27:44 SERIALIZED INTO LOG db/test1/.ingest.temp/ingest_753861583.db 507 BYTES ( PER RECORD 253 )
2020/12/23 12:27:44 WRITING PID 2449428 TO LOCK db/test1/info.lock
2020/12/23 12:27:44 LOCKING db/test1/info.lock
2020/12/23 12:27:44 SERIALIZED TABLE INFO info INTO 1353 BYTES
2020/12/23 12:27:44 UNLOCKING db/test1/info.lock
okay@chalk:~/tonka/src/sybil$ sybil query -table test1
okay@chalk:~/tonka/src/sybil$ sybil query -table test1 -read-log
total 2
okay@chalk:~/tonka/src/sybil$ sybil query -table test1 -read-log -samples
RECORD &{[0 0 0 0 0 0 0] [0 0 1527019503 1 0 0 0] map[] [2 2 1 1 2 2 0] 0 0xc0000a2000}
2 record_time 1527019503
3 record_alertcount 1
0 record_study Study SP301
1 record_site UCLA MC
4 record_metric Subject reports pre-treatment
5 record_subject AE-012
RECORD &{[0 0 0 0 0 1 0] [0 0 1527232204 1 0 0 0] map[] [2 2 1 1 2 2 0] 0 0xc0000a2090}
2 record_time 1527232204
3 record_alertcount 1
0 record_study Study SP301
1 record_site UCLA MC
4 record_metric Subject reports pre-treatment
5 record_subject BD-006
okay@chalk:~/tonka/src/sybil$
(how many samples are you ingesting to check?)
I'm ingesting 10K samples for testing. It works when I specify reading the WAL but returns zero records otherwise
sybil query -table c -info -read-log
dannyl@carina go % sybil query -table c -info -read-log
String Columns
metric
site
study
subject
Integer Columns
alertcount
time
Set Columns
Stats
count 10669
storageSize 1 MB
avgObjSize 127.63 bytes
I guess we need to modify the snorkel.query wrapper - btw to make life easier I put sybil into /usr/local/bin
Is there a way to execute a checkpoint and flush the WAL to disk?
sybil digest -table foo
should cause a digest of the contents of the WAL -> columnar storage
modifying snorkel.query to always add it sounds reasonable
Digest works great!