Pinned Repositories
airflow-pdf2embeddings
NLP tool for scraping text from a corpus of PDF files, embedding the sentences in the text and finding semantically similar sentences to a given search query.
coffee-and-coding-public
MoJ coffee and coding sessions that can be made publicly available
etl-pipeline-example
An example of an ETL pipeline that lays out generic DE processes. This is now out of date but still provides useful information
etl_manager
A python package to create a database on the platform using our moj data warehousing framework
our-coding-standards
DASD's coding principles for analytical projects
shinyGovstyle
Apply GOV.UK styled components and formats in shiny
splink
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
splink_demos
Interactive notebooks containing demonstration code of the splink library
user-guidance
User guidance for the MoJ Analytical Platform
xltabr
xltabr: An R package for writing formatted cross tabulations (contingency tables) to Excel using openxlsx
MoJ Analytical Services's Repositories
moj-analytical-services/splink
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
moj-analytical-services/shinyGovstyle
Apply GOV.UK styled components and formats in shiny
moj-analytical-services/etl_manager
A python package to create a database on the platform using our moj data warehousing framework
moj-analytical-services/dataengineeringutils3
Fully unit tested utility functions for data engineering. Python 3 only.
moj-analytical-services/user-guidance
User guidance for the MoJ Analytical Platform
moj-analytical-services/pydbtools
Python version of dbtools
moj-analytical-services/data-engineering-and-modelling-applicant-info
Information for potential applicants to MoJ Data Engineering, including links to our work and information about our teams.
moj-analytical-services/mojap-arrow-pd-parser
Conforms pandas to "correct" datatypes to ensure data in/out using CSV, JSONL and Parquet is read the same (using arrow).
moj-analytical-services/dbtools
Basic wrapper functions to query data using boto3 and Athena
moj-analytical-services/mojap-metadata
Schema definitions and management of our metadata used by the Data Engineering Team at MoJ
moj-analytical-services/iam_builder
Little helper to write IAM policies
moj-analytical-services/data-engineering-exports
Infrastructure to allow data from the Analytical Platform to be accessed by other services
moj-analytical-services/intro_to_github_training
moj-analytical-services/Rs3tools
moj-analytical-services/airflow-de-intro-project
moj-analytical-services/ap-tools-training
moj-analytical-services/data-and-analytics-engineering-tech-radar
Visualizing our technology choices
moj-analytical-services/github-outside-collaborators
Manage outside collaborators on our Github repositories
moj-analytical-services/gluejobutils
Python 2.7 utility functions to include with AWS glue jobs
moj-analytical-services/knife_possession_sankey
A sankey diagram for knife possession statistics
moj-analytical-services/tech-radar-demo
Visualizing our technology choices
moj-analytical-services/airflow-de-intro-project-dami
moj-analytical-services/airflow-matrix-scraper
scraper for matrixbooking api
moj-analytical-services/dmet-cfe
moj-analytical-services/splink3_legacy_docs
moj-analytical-services/splink_colab_links
Generate collab notebooks for the Splink demos and examples
moj-analytical-services/stats-forward-look
moj-analytical-services/tech-radar
Visualizing our technology choices
moj-analytical-services/tech-radar-demo-test
Visualizing our technology choices
moj-analytical-services/test_RSF