Pinned Repositories
airflow-pdf2embeddings
NLP tool for scraping text from a corpus of PDF files, embedding the sentences in the text and finding semantically similar sentences to a given search query.
coffee-and-coding-public
MoJ coffee and coding sessions that can be made publicly available
etl-pipeline-example
An example of an ETL pipeline that lays out generic DE processes. This is now out of date but still provides useful information
etl_manager
A python package to create a database on the platform using our moj data warehousing framework
our-coding-standards
DASD's coding principles for analytical projects
shinyGovstyle
Apply GOV.UK styled components and formats in shiny
splink
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
splink_demos
Interactive notebooks containing demonstration code of the splink library
user-guidance
User guidance for the MoJ Analytical Platform
xltabr
xltabr: An R package for writing formatted cross tabulations (contingency tables) to Excel using openxlsx
MoJ Analytical Services's Repositories
moj-analytical-services/splink
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
moj-analytical-services/shinyGovstyle
Apply GOV.UK styled components and formats in shiny
moj-analytical-services/etl_manager
A python package to create a database on the platform using our moj data warehousing framework
moj-analytical-services/mojchart
R package for formatting ggplot2 charts and applying MoJ corporate colours.
moj-analytical-services/dataengineeringutils3
Fully unit tested utility functions for data engineering. Python 3 only.
moj-analytical-services/user-guidance
User guidance for the MoJ Analytical Platform
moj-analytical-services/pydbtools
Python version of dbtools
moj-analytical-services/data-engineering-and-modelling-applicant-info
Information for potential applicants to MoJ Data Engineering, including links to our work and information about our teams.
moj-analytical-services/mojap-arrow-pd-parser
Conforms pandas to "correct" datatypes to ensure data in/out using CSV, JSONL and Parquet is read the same (using arrow).
moj-analytical-services/ggplotTraining
R Charting Training - mainly ggplot2
moj-analytical-services/dbtools
Basic wrapper functions to query data using boto3 and Athena
moj-analytical-services/iam_builder
Little helper to write IAM policies
moj-analytical-services/intro_r_training_extension
An extension to the IntroRTraining course
moj-analytical-services/data-engineering-exports
Infrastructure to allow data from the Analytical Platform to be accessed by other services
moj-analytical-services/intro_to_github_training
moj-analytical-services/data-and-analytics-engineering-tech-radar
Visualizing our technology choices
moj-analytical-services/ap-tools-training
moj-analytical-services/github-outside-collaborators
Manage outside collaborators on our Github repositories
moj-analytical-services/gluejobutils
Python 2.7 utility functions to include with AWS glue jobs
moj-analytical-services/intro-to-python
moj-analytical-services/knife_possession_sankey
A sankey diagram for knife possession statistics
moj-analytical-services/airflow-de-intro-project-dami
moj-analytical-services/airflow-matrix-scraper
scraper for matrixbooking api
moj-analytical-services/dmet-cfe
moj-analytical-services/dmet-recruitment-202410
moj-analytical-services/moj-dlt-workshop
Repo for the workshop on 12/9/24
moj-analytical-services/splink3_legacy_docs
moj-analytical-services/splink_colab_links
Generate collab notebooks for the Splink demos and examples
moj-analytical-services/stats-forward-look
moj-analytical-services/test_RSF