speediedan/deep_classiflie_db

Deep_classiflie_db is the backend data system for managing Deep Classiflie metadata, analyzing Deep Classiflie intermediate datasets and orchestrating Deep Classiflie model training pipelines. Deep_classiflie_db includes data scraping modules for the initial model data sources. Deep Classiflie depends upon deep_classiflie_db for much of its analytical and dataset generation functionality but the data system is currently maintained as a separate repository here to maximize architectural flexibility. Depending on how Deep Classiflie evolves (e.g. as it supports distributed data stores etc.), it may make more sense to integrate deep_classiflie_db back into deep_classiflie. Currently, deep_classiflie_db releases are synchronized to deep_classiflie releases. To learn more, visit deepclassiflie.org.

Jupyter Notebook

Issues

Add additional inputs to model such as geolocation etc.
#35 opened 4 years ago by speediedan
0
create dcbot service instead of aliases
#37 opened 4 years ago by speediedan
0
Write core tests
#38 opened 4 years ago by speediedan
0
Add factbase tweet parser conditionally into initial tweet history loading path
#48 opened 4 years ago by speediedan
0
Perform dataset ablations
#34 opened 4 years ago by speediedan
0
Add static randomization seed to convergence subclass sql and main convergence sql for reproducibility
#33 opened 4 years ago by speediedan
0
Bokeh graph for most relevant summary metric chart (avg/std confidence by confusion matrix class)
#58 opened 4 years ago by speediedan
0
Add model analysis/viz notebook to repo
#51 opened 4 years ago by speediedan
0
drop db and rebuild from scratch using last pre-release build
#49 opened 4 years ago by speediedan
0
Create views ranking fp and fn with greatest confidence deltas
#57 opened 4 years ago by speediedan
0
review/test/re-tune model-based truth deletion logic
#59 opened 4 years ago by speediedan
0
Add view for confusion matrix by confidence
#54 opened 4 years ago by speediedan
0
Create test-set only version of date-wise tweet, non-tweet and all confusion matrix
#53 opened 4 years ago by speediedan
0
remove unnecessary randomization seed from converged dt falsehoods
#50 opened 4 years ago by speediedan
0
update tweet_target_dist def
#47 opened 4 years ago by speediedan
0
update pt_converged_tweet_truths/falsehoods defs to use date bounds like pt_converged_dt_truths/falsehoods
#43 opened 4 years ago by speediedan
0
create tweet-based version of dist_based_filter_vw
#44 opened 4 years ago by speediedan
0
find appropriate "false truth" threshold (l2buckets) for tweets only dataset
#45 opened 4 years ago by speediedan
0
update secondaryft_converge_dist_subclasses, secondaryft_converge_class_dist, secondaryft_converge_dist_class_sql, secondaryft_converge_dist_class_card_sql, secondaryft_converge_dist_dt_class_bound_card_sql defs
#46 opened 4 years ago by speediedan
0
Adapt secondary ft (tweet dataset) to include model-based filtering
#39 opened 4 years ago by speediedan
0
create all_tweet_statements_tmp code path branch for filtering "false" truth tweets (truncate tweet temp table, etc.)
#40 opened 4 years ago by speediedan
0
create all_tweet_statements_tmp_v and all_tweet_statements_tmp temp table
#41 opened 4 years ago by speediedan
0
create tweet version of falsehood_data_driver on pt_converged_tweet_falsehoods, add secondaryft_dist_dt_bound_sql
#42 opened 4 years ago by speediedan
0
transition to soft links for each dataset version
#36 opened 4 years ago by speediedan
0
Add stmt source scalar to data pipeline
#31 opened 4 years ago by speediedan
1
change db password to be read from env config file
#29 opened 4 years ago by speediedan
0
Purchase GPU for running model daemon
#30 opened 4 years ago by speediedan
0
expand exception handling to include tracebacks like tweetbot
#26 opened 4 years ago by speediedan
0
address tweet index out of range bug (thread building related?)
#27 opened 4 years ago by speediedan
0
add report retention period in days
#28 opened 4 years ago by speediedan
0
change reporting to report on all statements with accuracy > n
#24 opened 4 years ago by speediedan
0
add type hints to all relevant functions
#25 opened 4 years ago by speediedan
0
add date aligned flow prior to convergence in dataprep and retrain
#23 opened 4 years ago by speediedan
0
Score all falsehoods w/ model then remove sufficiently similar truths statements for each falsehood in training set only for further fine tuning
#20 opened 4 years ago by speediedan
0
Get stmt timings and stmt sources
#21 opened 4 years ago by speediedan
0
Fine tune model with tweet-only datasets
#22 opened 4 years ago by speediedan
0
Find tweet stream source
#17 opened 4 years ago by speediedan
0
Change truths view to canonical truths table/materialized view which includes sid
#18 opened 4 years ago by speediedan
0
Generate single truths sid-embedding cache, sid-falsehoods cache, find proper distance threshold for eliminating only falsehoods in truths table by truth sid
#19 opened 4 years ago by speediedan
0
save all report pics grouped by day into folders
#15 opened 4 years ago by speediedan
0
refine tweet-only dataset to use only tweets since 2017.01.20
#16 opened 4 years ago by speediedan
0
refactor config processing to avoid configuration complexity hell
#14 opened 4 years ago by speediedan
0
refactor log_report in dc_tweebot into simpler funcs
#11 opened 4 years ago by speediedan
0
add a flag (and logic) to enable publishing of only projected falsehoods on main thread
#12 opened 4 years ago by speediedan
0
refactor envconfig into class
#13 opened 4 years ago by speediedan
0
make auto and manual PEP8 code compliance changes
#8 opened 4 years ago by speediedan
0
fix name error fbase_statement childid truth test
#9 opened 4 years ago by speediedan
0
remove dcbot inference_output dir
#10 opened 4 years ago by speediedan
0
refactor deep_classiflie_db scraping_classes modules
#6 opened 4 years ago by speediedan
0
refactor dataloading class, moving some functions to dataprep_utils
#7 opened 4 years ago by speediedan
0