bluishglc
Architect, author of the book Big Data Platform Architecture and Prototype Implementation,sales page: https://item.jd.com/12677623.html
Shanghai, China
Pinned Repositories
apache-hudi-core-conceptions
A set of notebooks to explore and explain core conceptions of Apache Hudi, such as file layouts, file sizing, compaction, clustering and so on.
aws-cli-plus
This command line tool is a useful complement to aws-cli. It offers a suite of utilities that manages and operates ec2, emr and other aws services.
bash-ini-multisection
This is a bash library for reading ini style config files, which allows multiple [section] entries.
bdp
A prototype project of big data platform, the source codes of the book Big Data Platform Architecture and Prototype
emr-edgenode-maker
This tool can easily make / build an emr cluster edge node / client node / gateway node
flink-recommendsystem-demo
:helicopter::rocket:基于Flink实现的商品实时推荐系统。flink统计商品热度,放入redis缓存,分析日志信息,将画像标签和实时记录放入Hbase。在用户发起推荐请求后,根据用户画像重排序热度榜,并结合协同过滤和标签两个推荐模块为新生成的榜单的每一个产品添加关联产品,最后返回新的用户列表。
glue-hudi-integration-example
An example project to demo how Glue read and write hudi dataset, and also sync metadata to Glue Catalog.
ranger-emr-cfn-installer
This project is a series of aws cloudformation templates which are used to install ranger and integrate a AWS EMR cluster and a windows AD or Open LDAP server as authentication channel.
ranger-emr-cli-installer
This is a powerful cli tool for Apache Ranger and AWS EMR automated installation & integration with OpenLDAP & Windows AD. It supports Open-Source Ranger and EMR-Native Ranger both, supports OpenLDAP & Windows AD both, and works in all AWS regions (also including China regions).
serverless-datalake-example
A serverless datalake project and framework based on AWS S3,Glue,Athena,MWAA and QuickSight. With a series of best practices, it guides you how to build a serverless datalake.
bluishglc's Repositories
bluishglc/serverless-datalake-example
A serverless datalake project and framework based on AWS S3,Glue,Athena,MWAA and QuickSight. With a series of best practices, it guides you how to build a serverless datalake.
bluishglc/apache-hudi-core-conceptions
A set of notebooks to explore and explain core conceptions of Apache Hudi, such as file layouts, file sizing, compaction, clustering and so on.
bluishglc/emr-edgenode-maker
This tool can easily make / build an emr cluster edge node / client node / gateway node
bluishglc/ranger-emr-cli-installer
This is a powerful cli tool for Apache Ranger and AWS EMR automated installation & integration with OpenLDAP & Windows AD. It supports Open-Source Ranger and EMR-Native Ranger both, supports OpenLDAP & Windows AD both, and works in all AWS regions (also including China regions).
bluishglc/glue-hudi-integration-example
An example project to demo how Glue read and write hudi dataset, and also sync metadata to Glue Catalog.
bluishglc/flink-recommendsystem-demo
:helicopter::rocket:基于Flink实现的商品实时推荐系统。flink统计商品热度,放入redis缓存,分析日志信息,将画像标签和实时记录放入Hbase。在用户发起推荐请求后,根据用户画像重排序热度榜,并结合协同过滤和标签两个推荐模块为新生成的榜单的每一个产品添加关联产品,最后返回新的用户列表。
bluishglc/ranger-emr-cfn-installer
This project is a series of aws cloudformation templates which are used to install ranger and integrate a AWS EMR cluster and a windows AD or Open LDAP server as authentication channel.
bluishglc/aws-cli-plus
This command line tool is a useful complement to aws-cli. It offers a suite of utilities that manages and operates ec2, emr and other aws services.
bluishglc/bdp-platform
大数据生态解决方案数据平台:基于大数据、数据平台、微服务、机器学习、商城、自动化运维、DevOps、容器部署平台、数据平台采集、数据平台存储、数据平台计算、数据平台开发、数据平台应用搭建的大数据解决方案。
bluishglc/emr-serverless-utils
A utilities library for Amazon EMR Serverless, i.e. a generic job class for executing sql files, and so on.
bluishglc/flink-sql-CDC
Self-contained demo using Flink SQL and Debezium to build a CDC-based analytics pipeline. All you need is Docker! :whale:
bluishglc/aws-cli
Universal Command Line Interface for Amazon Web Services
bluishglc/aws-glue-kafka-python-exemplos
Exemplos de consumo e produção de eventos no Kafka (+ Schema Registry) utilizando o AWS Glue.
bluishglc/bigdata-growth
大数据知识仓库涉及到数据仓库建模、实时计算、大数据、数据中台、系统设计、Java、算法等。
bluishglc/Bigdata_Code_Tutorial
Flink cdc 整库同步 & flink 代码 demo
bluishglc/blogimages
bluishglc/debezium-timestamp-converter
bluishglc/docker-hadoop-workbench
bluishglc/flying-diamond
This project is developed in 2011, I wrote it for learning MVC pattern and Java Swing library.
bluishglc/handson-ml2
A series of Jupyter notebooks that walk you through the fundamentals of Machine Learning and Deep Learning in Python using Scikit-Learn, Keras and TensorFlow 2.
bluishglc/jmeter-kafka-producer-example
An example project to demo how to generate dummy messages and push into kafka with JMeter.
bluishglc/kafka-connect-msk-demo
For a series of posts on Amazon MSK, Amazon EKS, and Amazon EMR
bluishglc/kafka-oneclick-to-s3-datalake
bluishglc/lake-formation-data-location-test-cases
bluishglc/msk-config-providers
A fork version from https://github.com/aws-samples/msk-config-providers, but changed aws sdk http client from ApacheHttpClient (default) to UrlConnectionHttpClient, and removed ssm and s3 config providers, only keep SecretsManager provider.
bluishglc/nyc-tlc-data
bluishglc/pydata-book
Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media
bluishglc/ranger-repo
A repo for Apache Ranger artifacts
bluishglc/Real-time-Data-Warehouse
Real-time Data Warehouse with Apache Flink & Apache Kafka & Apache Hudi
bluishglc/shared-repo