Pinned Repositories
pysequila
Python wrapper for SeQuiLa: Distributed analytics for genomics based on Apache Spark!
sequila
SeQuiLa: Distributed analytics for genomics based on Apache Spark!
sequila-cloud-recipes
SeQuiLa recipes, examples and other cloud-related content
google_cloud_mlflow
Experimental MLflow plugin for Google Cloud Vertex AI
pilosa
Pilosa is an open source, distributed bitmap index that dramatically accelerates queries across multiple, massive data sets.
spark-gotchas
Spark Gotchas. A subjective compilation of the Apache Spark tips and tricks
mwiewior's Repositories
mwiewior/google_cloud_mlflow
Experimental MLflow plugin for Google Cloud Vertex AI
mwiewior/airflow-dag-action
mwiewior/algorithms-with-predictions.github.io
Overview website for research on Algorithms with Predictions (ALPS)
mwiewior/continue
⏩ The easiest way to code with any LLM—Continue is an open-source autopilot for VS Code and JetBrains
mwiewior/databricks-tpc-di
Databricks Implementation of the TPC-DI Specification using Traditional Notebooks and/or Delta Live Tables
mwiewior/dbt-learn-codespaces
mwiewior/dbt-tpcdi
A Python Snowpark CLI for loading the TPC-DI dataset into Snowflake. Additional dbt models for building the data warehouse.
mwiewior/disq
A library for manipulating bioinformatics sequencing formats in Apache Spark
mwiewior/dive-action
Runs dive as GitHub action to scan your Docker image to find wasted disk space
mwiewior/GreenBST
GreenBST: Energy-efficient concurrent search tree
mwiewior/gtars
Performance-critical tools to manipulate, analyze, and process genomic interval data. Primarily focused on building tools for geniml - our genomic machine learning python package.
mwiewior/Hadoop-BAM
Hadoop-BAM is a Java library for the manipulation of files in common bioinformatics formats using the Hadoop MapReduce framework
mwiewior/iitii
Implicit Interval Tree with Interpolation Index
mwiewior/interval
generic, fast lookup on one dimensional intervals. The implementation is based on treaps, augmented for intervals. Treaps are randomized self balancing binary search trees.
mwiewior/kedro-starters
Templates for your Kedro projects.
mwiewior/levels-of-rag
mwiewior/Llama2-HPO-Normalization
Fine-tuning LLaMA 2 for rare disease concept normalization
mwiewior/LLM_SE_Papers_List
mwiewior/machine-learning-with-ontologies
mwiewior/mlflow-appengine-terraform
Terraform module for deploying MLflow on Google Cloud AppEngine Flexible
mwiewior/ok-to-test
Example workflow configuration showing how to use GitHub Actions secrets in pull requests from forks 🍴🔑
mwiewior/phd-scientific-revolution-essay
mwiewior/phenotype_embedding
mwiewior/snowflake-terraform
mwiewior/sphinx-action
Github action that builds docs using sphinx and places errors inline
mwiewior/sql-eval
Evaluate the accuracy of LLM generated outputs
mwiewior/tbd-tpc-di
TPC-DI benchmark using Apache Spark and dbt
mwiewior/tbd-workshop-1-public-workshop
mwiewior/VarNote
Fast and scalable variant annotation tool
mwiewior/voice-activated-teleprompter
Free open-source web-based voice-activated teleprompter software that actually works