Pinned Repositories
aws-workspace-login-rpa
This repo is meant for people using Amazon workspaces. Though the code targets a specific environment, the bot can be tweaked to suit your organization
chess-analytics
This repository contains code to programmatically retrieve PGNs from chess.com servers and parse them into a relational format with a move per row. The goal is to build a dashboard on top of the dataset
cloud-dataproc
Cloud Dataproc: Samples and Utils
csql-copy-dataflow
csql-dataflow-pgcopy-connector
This Beam pipeline ingests CSV files from Google Cloud Storage (GCS), and efficiently loads them into a Cloud SQL PostgreSQL database using the `COPY` command. The template is designed for parallel processing, enabling you to load large datasets quickly.
custom-genai-search-engine
This tool creates a custom search engine using VertexAI, Langchain and Streamlit. It allows users to input the URL of a website's sitemap XML file, which will serve as the knowledge base. The app then crawls the entire website, refreshes vector embeddings, and uses the information as a knowledge base to answer user queries.
dataproc-autoscaler-metrics
This application extracts autoscaler metrics and dumps them into a CSV file
etl-alert-framework
A framework to create a standardized feedback mechanism for ETL processes while keeping the developer free from implementation details of the alert system
helpme-readme
This tool automatically generates README.md files for GitHub repositories using Google AI.
professional-services
Common solutions and tools developed by Google Cloud's Professional Services team. This repository and its contents are not an officially supported Google product.
datasherlock's Repositories
datasherlock/chess-analytics
This repository contains code to programmatically retrieve PGNs from chess.com servers and parse them into a relational format with a move per row. The goal is to build a dashboard on top of the dataset
datasherlock/custom-genai-search-engine
This tool creates a custom search engine using VertexAI, Langchain and Streamlit. It allows users to input the URL of a website's sitemap XML file, which will serve as the knowledge base. The app then crawls the entire website, refreshes vector embeddings, and uses the information as a knowledge base to answer user queries.
datasherlock/etl-alert-framework
A framework to create a standardized feedback mechanism for ETL processes while keeping the developer free from implementation details of the alert system
datasherlock/spark-config-calculator
The Spark Configuration Tool is a Streamlit-based application designed to assist users in optimizing Apache Spark configurations. It allows users to input various parameters related to cluster, node, and executor configurations, providing recommended Spark configurations based on those inputs.
datasherlock/aws-workspace-login-rpa
This repo is meant for people using Amazon workspaces. Though the code targets a specific environment, the bot can be tweaked to suit your organization
datasherlock/cloud-dataproc
Cloud Dataproc: Samples and Utils
datasherlock/csql-copy-dataflow
datasherlock/csql-dataflow-pgcopy-connector
This Beam pipeline ingests CSV files from Google Cloud Storage (GCS), and efficiently loads them into a Cloud SQL PostgreSQL database using the `COPY` command. The template is designed for parallel processing, enabling you to load large datasets quickly.
datasherlock/dataproc-autoscaler-metrics
This application extracts autoscaler metrics and dumps them into a CSV file
datasherlock/dataproc-properties-propagator
datasherlock/datasherlock
Config files for my GitHub profile.
datasherlock/helpme-readme
This tool automatically generates README.md files for GitHub repositories using Google AI.
datasherlock/datastage-unix-wrapper-script
A DataStage wrapper script written in bash
datasherlock/ecstats2
datasherlock/gcp-poc-setup-scripts
This repository contains all the source code I've written to setup infrastructure, generate test data, perform benchmarks and other repetetive actions that are required through the course of learning a technology or while working with a customer.
datasherlock/initialization-actions
Run in all nodes of your cluster before the cluster starts - lets you customize your cluster
datasherlock/memorystore-batch-delete
datasherlock/oracle-trigger-creator
datasherlock/professional-services
Common solutions and tools developed by Google Cloud's Professional Services team. This repository and its contents are not an officially supported Google product.
datasherlock/python-dataproc
datasherlock/python-docs-samples
Code samples used on cloud.google.com
datasherlock/quiz-quotient-webapp
datasherlock/real-time-demo-data-generator
datasherlock/redis-datasets
A Curated List of Sample Redis Datasets
datasherlock/Snowflake