Pinned Repositories
1brc
1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
airflow
Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
alluxio
Alluxio, formerly Tachyon, Memory Speed Virtual Distributed Storage System
arthas
Alibaba Java Diagnostic Tool Arthas/Alibaba Java诊断利器Arthas
atlas
Apache Atlas
aws-earth-examples
Example code of how to freely use Met Office's weather datasets through Earth on AWS.
cdh-package
spark-summit-east-2017
SparkInternals
Notes talking about the design and implementation of Apache Spark
michaelli916's Repositories
michaelli916/1brc
1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java
michaelli916/alluxio
Alluxio, formerly Tachyon, Memory Speed Virtual Distributed Storage System
michaelli916/ChatGPT-Next-Web
One-Click to deploy well-designed ChatGPT web UI on Vercel. 一键拥有你自己的 ChatGPT 网页服务。
michaelli916/CLIP-Chinese
中文CLIP预训练模型
michaelli916/cm_ext
Cloudera Manager Extensibility Tools and Documentation.
michaelli916/CyberHashira.github.io
TBDL....
michaelli916/datafaker
Generating fake data for the JVM (Java, Kotlin, Groovy) has never been easier!
michaelli916/DataX
michaelli916/elasticsearch
Open Source, Distributed, RESTful Search Engine
michaelli916/flink-api-examples
michaelli916/flink-faker
A data generator source connector for Flink SQL based on data-faker.
michaelli916/GCViewer
Fork of tagtraum industries' GCViewer. Tagtraum stopped development in 2008, I aim to improve support for Sun's / Oracle's java 1.6+ garbage collector logs (including G1 collector)
michaelli916/GenAI_Agents
This repository provides tutorials and implementations for various Generative AI Agent techniques, from basic to advanced. It serves as a comprehensive guide for building intelligent, interactive AI systems.
michaelli916/HanLP
中文分词 词性标注 命名实体识别 依存句法分析 成分句法分析 语义依存分析 语义角色标注 指代消解 风格转换 语义相似度 新词发现 关键词短语提取 自动摘要 文本分类聚类 拼音简繁转换 自然语言处理
michaelli916/hello-world
michaelli916/HikariCP
光 HikariCP・A solid, high-performance, JDBC connection pool at last.
michaelli916/hudi
Upserts, Deletes And Incremental Processing on Big Data.
michaelli916/iceberg
Apache Iceberg
michaelli916/incubator-seatunnel
SeaTunnel is a distributed, high-performance data integration platform for the synchronization and transformation of massive data (offline & real-time).
michaelli916/influxdb
Scalable datastore for metrics, events, and real-time analytics
michaelli916/jacoco
:microscope: Java Code Coverage Library
michaelli916/jieba
结巴中文分词
michaelli916/kafka
Mirror of Apache Kafka
michaelli916/kubernetes-the-hard-way
Bootstrap Kubernetes the hard way on Google Cloud Platform. No scripts.
michaelli916/ollama
Get up and running with Llama 3, Mistral, Gemma, and other large language models.
michaelli916/packetdrill
The official Google release of packetdrill
michaelli916/RAG_Techniques
This repository showcases various advanced techniques for Retrieval-Augmented Generation (RAG) systems. RAG systems combine information retrieval with generative models to provide accurate and contextually rich responses.
michaelli916/spring-security-architecture-workshop
Workshop on understanding Spring Security
michaelli916/v2fly-github-io
V2Fly Website & Documentation
michaelli916/v2ray-core
A platform for building proxies to bypass network restrictions.