thisnew's Stars
996icu/996.ICU
Repo for counting stars and contributing. Press F to pay respect to glorious developers.
hankcs/HanLP
Natural Language Processing for the next decade. Tokenization, Part-of-Speech Tagging, Named Entity Recognition, Syntactic & Semantic Dependency Parsing, Document Classification
alibaba/easyexcel
快速、简洁、解决大文件内存溢出的java处理Excel工具
apache/dolphinscheduler
Apache DolphinScheduler is the modern data orchestration platform. Agile to create high performance workflow with low-code
yahoo/CMAK
CMAK is a tool for managing Apache Kafka clusters
leanote/leanote
Not Just A Notepad! (golang + mongodb) http://leanote.org
datahub-project/datahub
The Metadata Platform for your Data and AI Stack
mobz/elasticsearch-head
A web front end for an elastic search cluster
pentaho/pentaho-kettle
Pentaho Data Integration ( ETL ) a.k.a Kettle
Ryochan7/DS4Windows
Like those other ds4tools, but sexier
NLPchina/elasticsearch-sql
Use SQL to query Elasticsearch
water8394/flink-recommandSystem-demo
:helicopter::rocket:基于Flink实现的商品实时推荐系统。flink统计商品热度,放入redis缓存,分析日志信息,将画像标签和实时记录放入Hbase。在用户发起推荐请求后,根据用户画像重排序热度榜,并结合协同过滤和标签两个推荐模块为新生成的榜单的每一个产品添加关联产品,最后返回新的用户列表。
DTStack/chunjun
A data integration framework
linkedin/databus
Source-agnostic distributed change data capture system
DTStack/flinkStreamSQL
基于开源的flink,对其实时sql进行扩展;主要实现了流与维表的join,支持原生flink SQL所有的语法
jly8866/archer
基于inception的自动化SQL操作平台,支持SQL执行、LDAP认证、发邮件、OSC、SQL查询、SQL优化建议、权限管理等功能,支持docker镜像
ucarGroup/DataLink
DataLink是一个满足各种异构数据源之间的实时增量同步、离线全量同步,分布式、可扩展的数据交换平台。
RedisLabs/spark-redis
A connector for Spark that allows reading and writing to/from Redis cluster
majinju/kettle-manager
专门为kettle这款优秀的ETL工具开发的web端管理工具。
zhaxiaodong9860/kettle-scheduler
一款简单易用的Kettle调度监控平台,专门用来调度和监控由kettle客户端创建的job和transformation。整体的框架是由spring+sprin gmvc +beetlsql整合而成,通过调用kettle的API来执行转换和作业,并且使用quartz框架完成调度工作。
hortonworks-spark/shc
The Apache Spark - Apache HBase Connector is a library to support Spark accessing HBase table as external data source or sink.
Qihoo360/logkafka
Collect logs and send lines to Apache Kafka
japila-books/spark-structured-streaming-internals
The Internals of Spark Structured Streaming
polomarcus/Spark-Structured-Streaming-Examples
Spark Structured Streaming / Kafka / Cassandra / Elastic
ansrivas/spark-structured-streaming
Spark structured streaming with Kafka data source and writing to Cassandra
fooinha/nginx-json-log
Highly configurable JSON format logging per Location - nginx logging module - aka. kasha 🍲
cloudera/parquet-examples
Example programs and scripts for accessing parquet files
fansy1990/hanlp-test
HanLP 测试
jiang3ye/hdfsreader
parquet for DataX - hdfsreader
zrk1000/drpcproxy
DRPC-Proxy是基于使用storm DRPC的RPC服务,解耦业务代码与storm框架代码的一个简单框架; 在某些场景下,有使用DRPC但不注重使用storm的流式计算的需求,通常情况下使用DRPCServer做为服务提供方接收请求,bolt中处理业务,ReturnResults返回结果;bolt中会将业务代码与storm代码交织、耦合,为后期升级、扩展留下难题。 DRPC-Proxy提供解耦业务与storm,服务消费方使用动态代理生调用DRPCClient与DRPCServer通讯,DRPCServer将请求匹配到对应的服务提供方,最终结果由DRPCServer返回给消费方。