data

There are 19453 repositories under data topic.

  • datasets

    TFDS is a collection of datasets ready to use with TensorFlow, Jax, ...

    Language:Python4.3k
  • glide-data-grid

    glide-data-grid

    🚀 Glide Data Grid is a no compromise, outrageously react fast data grid with rich rendering, first class accessibility, and full TypeScript support.

    Language:TypeScript4.1k
  • bad-data-guide

    An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.

  • gray-matter

    Smarter YAML front matter parser, used by metalsmith, Gatsby, Netlify, Assemble, mapbox-gl, phenomic, vuejs vitepress, TinaCMS, Shopify Polaris, Ant Design, Astro, hashicorp, garden, slidev, saber, sourcegraph, and many others. Simple to use, and battle tested. Parses YAML by default but can also parse JSON Front Matter, Coffee Front Matter, TOML Front Matter, and has support for custom parsers. Please follow gray-matter's author: https://github.com/jonschlinkert

    Language:JavaScript4k
  • tinybase

    The reactive data store for local‑first apps.

    Language:TypeScript3.9k
  • arroyo

    Distributed stream processing engine in Rust

    Language:Rust3.8k
  • data-transfer-project

    The Data Transfer Project makes it easy for platforms to build interoperable user data portability features. We are establishing a common framework, including data models and protocols, to enable direct transfer of data both into and out of participating online service providers.

    Language:Java3.6k
  • react-refetch

    A simple, declarative, and composable way to fetch data for React components

    Language:JavaScript3.4k
  • cognita

    cognita

    RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry

    Language:Python3.3k
  • awesome-json-datasets

    A curated list of awesome JSON datasets that don't require authentication.

    Language:JavaScript3.3k
  • TextRecognitionDataGenerator

    A synthetic data generator for text recognition

    Language:Python3.3k
  • docta

    docta

    A Doctor for your data

    Language:Python3.3k
  • memphis

    memphis

    Memphis.dev is a highly scalable and effortless data streaming platform

    Language:Go3.3k
  • falso

    All the Fake Data for All Your Real Needs 🙂

    Language:TypeScript3.2k
  • quadratic

    Quadratic | Spreadsheet with Python, SQL, and AI

    Language:Rust3k
  • aresdb

    A GPU-powered real-time analytics storage and query engine.

    Language:Go3k
  • weld

    High-performance runtime for data analytics applications

    Language:Rust3k
  • pandas-datareader

    Extract data from a wide range of Internet sources into a pandas DataFrame.

    Language:Python3k
  • data-diff

    Compare tables within or across databases

    Language:Python2.9k
  • stats

    stats

    A well tested and comprehensive Golang statistics library package with no dependencies.

    Language:Go2.9k
  • dlt

    data load tool (dlt) is an open source Python library that makes data loading easy 🛠️

    Language:Python2.7k
  • incubator-devlake

    Apache DevLake is an open-source dev data platform to ingest, analyze, and visualize the fragmented data from DevOps tools, extracting insights for engineering excellence, developer experience, and community growth.

    Language:Go2.6k
  • scio

    A Scala API for Apache Beam and Google Cloud Dataflow.

    Language:Scala2.6k
  • gopup

    数据接口:百度、谷歌、头条、微博指数,宏观数据,利率数据,货币汇率,千里马、独角兽公司,新闻联播文字稿,影视票房数据,高校名单,疫情数据…

    Language:Python2.5k
  • pypika

    PyPika is a python SQL query builder that exposes the full richness of the SQL language using a syntax that reflects the resulting query. PyPika excels at all sorts of SQL queries but is especially useful for data analysis.

    Language:Python2.5k
  • graphic-walker

    graphic-walker

    An open source alternative to Tableau. Embeddable visual analytic

    Language:TypeScript2.5k
  • datasets

    datasets

    🎁 5,400,000+ Unsplash images made available for research and machine learning

    Language:Jupyter Notebook2.4k
  • PyFunctional

    Python library for creating data pipelines with chain functional programming

    Language:Python2.4k
  • DeepBI

    LLM based data scientist, AI native data application. AI-driven infinite thinking redefines BI.

    Language:Python2.4k
  • mito

    The mitosheet package, trymito.io, and other public Mito code.

    Language:Jupyter Notebook2.3k
  • fake2db

    create custom test databases that are populated with fake data

    Language:Python2.3k
  • sketch

    AI code-writing assistant that understands data content

    Language:Python2.2k
  • TigerBot

    TigerBot: A multi-language multi-task LLM

    Language:Python2.2k
  • gobblin

    A distributed data integration framework that simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

    Language:Java2.2k
  • generatedata

    A powerful, feature-rich, random test data generator.

    Language:TypeScript2.2k
  • ISO-3166-Countries-with-Regional-Codes

    ISO 3166-1 country lists merged with their UN Geoscheme regional codes in ready-to-use JSON, XML, CSV data sets

    Language:Ruby2.2k