Fall 2022 Research work for Notre Dame's Biometrics Research Grid
- set up miniconda
- python 3.7
- mysql-python-connector
- wrote 3 example queries to increase comfortability with my-python-connector
- silentGeneration.py: select all users born before 1945
- subjectOnDay.py: select all face images of subject on set day
- nikonSensors.py: select all nikon sensors
- metadata
- query files and replicas tables for 10 fileids using mysql-python-connector
- convert tuple results to dictionary and store in chirpedFiles.json
- files
- chirp each file from a replica machine
- validate chirped file by computing md5sum and comparing with checksum in query results
- chirp files into directory following table/subjectid/date directory schema
- each file's corresponding query results are stored in directory's results.json
- iterate through replicas until successful chirp
- in previous week, only attempted chirp on one host
- increased number of files chirped into treed directories
- restructured code from week 3
- ran performance tests on chirping
-
create preliminary version of "bxgrid in a box" (bxbox)
- allow user to query files from specified bxgrid table into filesystem with desired schema
- bxbox syntax: "materialize {tablename} as {schema for filesystem} {MySQL WHERE, ORDER BY, LIMIT clauses}
-
example results from bgbox materialization query in 'irises_still/'
- bgbox> materialize irises_still as date/weather/eye where subjectid = 'nd1S04473'
- convert preliminary materializer to take in command line arguments
- new query features:
- user may materialize into an already existing directory if the new and old schemas match by using the '-force' flag
- only get metadata into filesystem by using the '-dryrun' flag
- can specify root directory, otherwise name of top level directory follows TABLE + timestamp convention
- file changes:
- queried files may never be overwritten (even when using '-force' flag)
- large metadata file at top level includes subject data
- all files and directories created with mode 444
- change '-dryrun' flag to '-nofiles'
- improve function usage output
- store user login credentials in $HOME/.bxgrid/credentials
- store query history in $HOME/.bxgrid/history.json
- show progress bar for materializations
- provide user with feedback on chirps
- Warning for failed chirp
- Error when all chirps fail
- push failed servers to back of queue for future chirps
- materialization history
- entries include command line input
- save last materialization query in latest_materializtion.json
- save 500 (like GNU history) last materialization queries in history_materializations.json
- SQL work (for group feature)
- wrote general query to select n entries from each group set by user
- work in querywork.txt
- after meeting, approach for next week is 2-phase materialization
- fist phase is getting data (mandatory)
- second phase is file retrieval into directory tree (optional)
- separate command line materializer into two separate programs
- export
- store queried data from bxgrid in csv
- use sytax similar to previous materializer tool or custom sql queries
- materialize
- read in data from csv
- store files and metadata in schema provided by user
- export
- switched to using Python chirp module
- Ben Tovar helped me get it working
- reuse downloaded files
- after chirping, store file path in $HOME/.bxgrid/chirpedFiles.json
- when getting files, first check above json for file if '-smartchirp' flag passed