gaopengchao's Stars
Stirling-Tools/Stirling-PDF
#1 Locally hosted web application that allows you to perform various operations on PDF files
mlabonne/llm-course
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
QuivrHQ/quivr
Opiniated RAG for integrating GenAI in your apps 🧠 Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: PGVector, Faiss. Any Files. Anyway you want.
alibaba/easyexcel
快速、简洁、解决大文件内存溢出的java处理Excel工具
hashicorp/vault
A tool for secrets management, encryption as a service, and privileged access management
FlowiseAI/Flowise
Drag & drop UI to build your customized LLM flow
GorvGoyl/Clone-Wars
100+ open-source clones of popular sites like Airbnb, Amazon, Instagram, Netflix, Tiktok, Spotify, Whatsapp, Youtube etc. See source code, demo links, tech stack, github stars.
apache/flink
Apache Flink
apache/rocketmq
Apache RocketMQ is a cloud native messaging and streaming platform, making it simple to build event-driven applications.
microsoft/graphrag
A modular graph-based Retrieval-Augmented Generation (RAG) system
Infisical/infisical
♾ Infisical is the open-source secret management platform: Sync secrets across your team/infrastructure, prevent secret leaks, and manage internal PKI
opendatalab/MinerU
A one-stop, open-source, high-quality data extraction tool, supports PDF/webpage/e-book extraction.一站式开源高质量数据提取工具,支持PDF/网页/多格式电子书提取。
antlr/grammars-v4
Grammars written for ANTLR v4; expectation that the grammars are free of actions.
great-expectations/great_expectations
Always know what to expect from your data.
opensearch-project/OpenSearch
🔎 Open source distributed and RESTful search engine.
open-policy-agent/opa
Open Policy Agent (OPA) is an open source, general-purpose policy engine.
pentaho/pentaho-kettle
Pentaho Data Integration ( ETL ) a.k.a Kettle
adithya-s-k/omniparse
Ingest, parse, and optimize any data format ➡️ from documents to multimedia ➡️ for enhanced compatibility with GenAI frameworks
apache/incubator-streampark
Make stream processing easier! Easy-to-use streaming application development framework and operation platform.
erupts/erupt
🚀 General data management framework, objects are pages
alldatacenter/alldata
🔥🔥 AllData可定义数据中台,以数据平台为底座,以数据中台为桥梁,以机器学习平台为工厂,以大模型应用为上游产品,提供全链路数字化解决方案。采购商业版、加入技术社区:https://docs.qq.com/doc/DVHlkSEtvVXVCdEFo
apache/paimon
Apache Paimon is a lake format that enables building a Realtime Lakehouse Architecture with Flink and Spark for both streaming and batch operations.
sohutv/mqcloud
RocketMQ企业级一站式服务平台
apache/polaris
Apache Polaris, the interoperable, open source catalog for Apache Iceberg
apache/amoro
Apache Amoro (incubating) is a Lakehouse management system built on open data lake formats.
Mrkuhuo/data-warehouse-learning
【2024最新版】 大数据 数据分析 电商系统 实时数仓 离线数仓 数据湖 建设方案及实战代码,涉及组件 #flink #paimon #doris #seatunnel #dolphinscheduler #datart #dinky #hudi #iceberg。
eclipse-edc/Connector
EDC core services including data plane and control plane
642933588/jiron-cloud
该项目整合了多款优秀的开源产品,构建了一个功能全面的数据开发平台。平台提供了强大的数据集成、数据开发、数据查询、数据服务、数据质量管理、工作流调度和元数据管理功能。#dinky #dolphinscheduler #datavines #flinkcdc #openmetadata #flink #数据开发 #数据平台 # 数据开发平台 #大数据
apache/nifi-python-extensions
Apache NiFi Python Extensions
lifan0127/nifi-langchain
LangChain Expression Language (LCEL) As NiFi Processors