logv/sybil

sybil ingest failing quietly

onlyjazz opened this issue · 5 comments

Sure I'm missing something here - but I can't seem to get this ingested
Pulled and compiled with go.
Seems to work with ingest from mongodb collection export.

{"record": {"time":1527019503, "alertcount":1, "metric":"Subject reports pre-treatment", "subject":"AE-012", "study":"Study SP301","site":"UCLA MC"} }
{"record": {"time":1527232204, "alertcount":1, "metric":"Subject reports pre-treatment", "subject":"BD-006", "study":"Study SP301", "site":"UCLA MC"} }

it's likely the samples are in the write ahead log, use -read-log during query to query the log. we should likely make this flag on by default.

okay@chalk:~/tonka/src/sybil$ cat test.json
{"record": {"time":1527019503, "alertcount":1, "metric":"Subject reports pre-treatment", "subject":"AE-012", "study":"Study SP301","site":"UCLA MC"} }
{"record": {"time":1527232204, "alertcount":1, "metric":"Subject reports pre-treatment", "subject":"BD-006", "study":"Study SP301", "site":"UCLA MC"} }
okay@chalk:~/tonka/src/sybil$ sybil ingest -table test1 --debug < test.json
2020/12/23 12:27:43 EXCLUDING COLUMN
2020/12/23 12:27:44 CANT CREATE LOCK FILE: open db/test1/info.lock: no such file or directory
2020/12/23 12:27:44 LOCK FAIL! db/test1/info.lock
2020/12/23 12:27:44 LOAD TABLE INFO LOCK TAKEN
2020/12/23 12:27:44 PATH IS [$]
2020/12/23 12:27:44 KEY TABLE map[record_alertcount:3 record_metric:4 record_site:1 record_study:0 record_subject:5 record_time:2]
2020/12/23 12:27:44 KEY TYPES map[0:2 1:2 2:1 3:1 4:2 5:2]
2020/12/23 12:27:44 SAVING RECORDS 2 TO INGESTION LOG
2020/12/23 12:27:44 SERIALIZED INTO LOG db/test1/.ingest.temp/ingest_753861583.db 507 BYTES ( PER RECORD 253 )
2020/12/23 12:27:44 WRITING PID 2449428 TO LOCK db/test1/info.lock
2020/12/23 12:27:44 LOCKING db/test1/info.lock
2020/12/23 12:27:44 SERIALIZED TABLE INFO info INTO  1353 BYTES
2020/12/23 12:27:44 UNLOCKING db/test1/info.lock
okay@chalk:~/tonka/src/sybil$ sybil query -table test1
okay@chalk:~/tonka/src/sybil$ sybil query -table test1 -read-log
total               2
okay@chalk:~/tonka/src/sybil$ sybil query -table test1 -read-log -samples
RECORD &{[0 0 0 0 0 0 0] [0 0 1527019503 1 0 0 0] map[] [2 2 1 1 2 2 0] 0 0xc0000a2000}
   2 record_time 1527019503
   3 record_alertcount 1
   0 record_study Study SP301
   1 record_site UCLA MC
   4 record_metric Subject reports pre-treatment
   5 record_subject AE-012
RECORD &{[0 0 0 0 0 1 0] [0 0 1527232204 1 0 0 0] map[] [2 2 1 1 2 2 0] 0 0xc0000a2090}
   2 record_time 1527232204
   3 record_alertcount 1
   0 record_study Study SP301
   1 record_site UCLA MC
   4 record_metric Subject reports pre-treatment
   5 record_subject BD-006
okay@chalk:~/tonka/src/sybil$

(how many samples are you ingesting to check?)

I'm ingesting 10K samples for testing. It works when I specify reading the WAL but returns zero records otherwise
sybil query -table c -info -read-log
dannyl@carina go % sybil query -table c -info -read-log

String Columns
metric
site
study
subject

Integer Columns
alertcount
time

Set Columns

Stats
count 10669
storageSize 1 MB
avgObjSize 127.63 bytes

I guess we need to modify the snorkel.query wrapper - btw to make life easier I put sybil into /usr/local/bin

Is there a way to execute a checkpoint and flush the WAL to disk?

sybil digest -table foo should cause a digest of the contents of the WAL -> columnar storage

modifying snorkel.query to always add it sounds reasonable

Digest works great!