/Resources

A common place to keep helpful resources for informatics, data science, and more!

Resources

A common place to keep helpful resources for informatics, data science, and more!

Learning Resources

  • ML For Healthcare - MIT - Course Website for HST.956 at MIT, Machine Learning for Healthcare. Has slides and problem sets, as well as links to recorded lectures
  • StrataScratch - LeetCode for Data Science. An interactive learning resource for learning to use SQL/Python to query and manipulate data tables

Research

  • Consensus - A search platform using AI to "Aggregate and distill" findings from scientific research

Regex

  • AutoRegex - An "english-regex" translator built on top of GPT-3.
  • Regex Generator - A more traditional tool that allows use of sample text to interactively generate Regex

Machine Learning

  • BertTopic - A topic modeling algorithm to disocver topics using BERT language transformer models
  • Ray Tune - Automated Hypermparamter Tuning implemented in python for libraries like PyTorch, Tensorflow, and ScikitLearn
  • Opt_List - A compiled list of hyperparameters that have been produced by google for various machine learning libraries, that they have tried and found to be effective
  • MCA - Multiple Correspondence Analysis is like PCA for categorical variables

Python Specific

  • Polars - Performant, multi-threaded Dataframe manipulation library; meant as a replacement for Pandas
  • PyGWalker - A drag-and-drop library to graph a dataset for Explortatory Data Analysis within a Python environment. Also some mroe advanced features for feature selection etc..

Data Science

  • DuckDB - Creates an in-memory database which can be queried and mutated using SQL syntax. Provides APIs is Python, R, and Java
  • SteamPipe - A Command Line tool that allows the use of SQL to access popular cloud services as well as useful APIs (Twitter, Reddit etc...)
  • RATH - A more advanced Tableau-like data visualization suite. A little tricky to put together as of this note (requires spinning it up via node)
  • Tad - Open-Source tabular data viewer. Uses DuckDB to handle millions of rows with a handy GUI.

Other

  • Penpot - Open-Source Design app, may be helpful for designing presentations, figures etc
  • mjml - Open_source tool to design interactive emails