Pinned Repositories
42-CFR
Random scripts and work
hospital-chargemaster
hospital chargemaster lists for open source healthcare
nexmon_csi
Channel State Information Extraction on Various Broadcom Wi-Fi Chips
predicting-Paid-amount-for-Claims-Data
Introduction The context is the 2016 public use NH medical claims files obtained from NH CHIS (Comprehensive Health Care Information System). The dataset contains Commercial Insurance claims, and a small fraction of Medicaid and Medicare payments for dually eligible people. The primary purpose of this assignment is to test machine learning (ML) skills in a real case analysis setting. You are expected to clean and process data and then apply various ML techniques like Linear and no linear models like regularized regression, MARS, and Partitioning methods. You are expected to use at least two of R, Python and JMP software. Data details: Medical claims file for 2016 contains ~17 millions rows and ~60 columns of data, containing ~6.5 million individual medical claims. These claims are all commercial claims that were filed by healthcare providers in 2016 in the state of NH. These claims were ~88% for residents of NH and the remaining for out of state visitors who sought care in NH. Each claim consists of one or more line items, each indicating a procedure done during the doctor’s visit. Two columns indicating Billed amount and the Paid amount for the care provided, are of primary interest. The main objective is to predict “Paid amount per procedure” by mapping a plethora of features available in the dataset. It is also an expectation that you would create new features using the existing ones or external data sources. Objectives: Step 1: Take a random sample of 1 million unique claims, such that all line items related to each claim are included in the sample. This will result in a little less than 3 million rows of data. Step 2: Clean up the data, understand the distributions, and create new features if necessary. Step 3: Run predictive models using validation method of your choice. Step 4: Write a descriptive report (less than 10 pages) describing the process and your findings.
PyTorchNLPBook
Code and data accompanying Natural Language Processing with PyTorch published by O'Reilly Media https://amzn.to/3JUgR2L
wsheffel's Repositories
wsheffel/42-CFR
Random scripts and work
wsheffel/2014-08-st-louis-county-segregation
Analysis and data notes for the August 20, 2014 BuzzFeed News article, "The Ferguson Area Is Even More Segregated Than You Probably Guessed"
wsheffel/Adversarial-Tradecraft-in-Cybersecurity
A repo to support the book
wsheffel/AMLSim
The AMLSim project is intended to provide a multi-agent based simulator that generates synthetic banking transaction data together with a set of known money laundering patterns - mainly for the purpose of testing machine learning models and graph algorithms. We welcome you to enhance this effort since the data set related to money laundering is critical to advance detection capabilities of money laundering activities.
wsheffel/api
services to access govinfo content and metadata
wsheffel/Architecting-Enterprise-React-Applications-with-Hooks
Architecting Enterprise React Applications with Hooks, published by Packt
wsheffel/bokeh
Interactive Data Visualization in the browser, from Python
wsheffel/cleanlab
The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
wsheffel/dash-sample-apps
Open-source demos hosted on Dash Gallery
wsheffel/Data-science
Collection of useful data science topics along with code and articles
wsheffel/Data-Science-for-Marketing-Analytics-Second-Edition
wsheffel/datasets
Datasets used in Plotly examples and documentation
wsheffel/ExData_Plotting1
Plotting Assignment 1 for Exploratory Data Analysis
wsheffel/fastai
The fastai deep learning library
wsheffel/fastbook
The fastai book, published as Jupyter Notebooks
wsheffel/fastcore
Python supercharged for the fastai library
wsheffel/flask
The Python micro framework for building web applications.
wsheffel/Hands-On-Data-Analysis-with-Pandas-2nd-edition
Materials for following along with Hands-On Data Analysis with Pandas – Second Edition
wsheffel/Healthcare-Fraud-Detection
An end to end web application developed using Streamlit to predict if a healthcare provider is fraud or not based on inpatient claims, outpatient claims and beneficiary details.
wsheffel/Jason-Brownlee-books
wsheffel/jekyll
Jekyll-based static site for The Programming Historian
wsheffel/jinja
A very fast and expressive template engine.
wsheffel/jupyter-dash
Develop Dash apps in the Jupyter Notebook and JupyterLab
wsheffel/knowledge-table
Knowledge Table is an open-source package designed to simplify extracting and exploring structured data from unstructured documents.
wsheffel/MedCAT
Medical Concept Annotation Tool
wsheffel/Microsoft-Azure-Security-Technologies-Certification-and-Beyond
Microsoft Azure Security Technologies Certification and Beyond, published by Packt
wsheffel/my-first-binderhub-
How to install BinderHub in Repos to see and run code in github
wsheffel/ovc
the Open Vision Computer
wsheffel/plotly.js
Open-source JavaScript charting library behind Plotly and Dash
wsheffel/spark-nlp-workshop
Public runnable examples of using John Snow Labs' NLP for Apache Spark.