speediedan/deep_classiflie_db
Deep_classiflie_db is the backend data system for managing Deep Classiflie metadata, analyzing Deep Classiflie intermediate datasets and orchestrating Deep Classiflie model training pipelines. Deep_classiflie_db includes data scraping modules for the initial model data sources. Deep Classiflie depends upon deep_classiflie_db for much of its analytical and dataset generation functionality but the data system is currently maintained as a separate repository here to maximize architectural flexibility. Depending on how Deep Classiflie evolves (e.g. as it supports distributed data stores etc.), it may make more sense to integrate deep_classiflie_db back into deep_classiflie. Currently, deep_classiflie_db releases are synchronized to deep_classiflie releases. To learn more, visit deepclassiflie.org.
Jupyter Notebook
Issues
- 0
- 0
create dcbot service instead of aliases
#37 opened by speediedan - 0
Write core tests
#38 opened by speediedan - 0
Add factbase tweet parser conditionally into initial tweet history loading path
#48 opened by speediedan - 0
Perform dataset ablations
#34 opened by speediedan - 0
Add static randomization seed to convergence subclass sql and main convergence sql for reproducibility
#33 opened by speediedan - 0
Bokeh graph for most relevant summary metric chart (avg/std confidence by confusion matrix class)
#58 opened by speediedan - 0
Add model analysis/viz notebook to repo
#51 opened by speediedan - 0
- 0
- 0
- 0
Add view for confusion matrix by confidence
#54 opened by speediedan - 0
Create test-set only version of date-wise tweet, non-tweet and all confusion matrix
#53 opened by speediedan - 0
- 0
update tweet_target_dist def
#47 opened by speediedan - 0
update pt_converged_tweet_truths/falsehoods defs to use date bounds like pt_converged_dt_truths/falsehoods
#43 opened by speediedan - 0
- 0
find appropriate "false truth" threshold (l2buckets) for tweets only dataset
#45 opened by speediedan - 0
update secondaryft_converge_dist_subclasses, secondaryft_converge_class_dist, secondaryft_converge_dist_class_sql, secondaryft_converge_dist_class_card_sql, secondaryft_converge_dist_dt_class_bound_card_sql defs
#46 opened by speediedan - 0
- 0
create all_tweet_statements_tmp code path branch for filtering "false" truth tweets (truncate tweet temp table, etc.)
#40 opened by speediedan - 0
- 0
create tweet version of falsehood_data_driver on pt_converged_tweet_falsehoods, add secondaryft_dist_dt_bound_sql
#42 opened by speediedan - 0
- 1
Add stmt source scalar to data pipeline
#31 opened by speediedan - 0
- 0
Purchase GPU for running model daemon
#30 opened by speediedan - 0
- 0
- 0
add report retention period in days
#28 opened by speediedan - 0
- 0
add type hints to all relevant functions
#25 opened by speediedan - 0
- 0
Score all falsehoods w/ model then remove sufficiently similar truths statements for each falsehood in training set only for further fine tuning
#20 opened by speediedan - 0
Get stmt timings and stmt sources
#21 opened by speediedan - 0
Fine tune model with tweet-only datasets
#22 opened by speediedan - 0
Find tweet stream source
#17 opened by speediedan - 0
Change truths view to canonical truths table/materialized view which includes sid
#18 opened by speediedan - 0
Generate single truths sid-embedding cache, sid-falsehoods cache, find proper distance threshold for eliminating only falsehoods in truths table by truth sid
#19 opened by speediedan - 0
- 0
- 0
- 0
- 0
add a flag (and logic) to enable publishing of only projected falsehoods on main thread
#12 opened by speediedan - 0
refactor envconfig into class
#13 opened by speediedan - 0
- 0
- 0
remove dcbot inference_output dir
#10 opened by speediedan - 0
- 0