data

There are 19403 repositories under data topic.

  • query

    TanStack/query

    🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.

    Language:TypeScript42.6k2031.9k2.9k
  • metabase/metabase

    The simplest, fastest way to get business intelligence and analytics to everyone in your company :yum:

    Language:Clojure38.9k63720.8k5.2k
  • run-llama/llama_index

    LlamaIndex is a data framework for your LLM applications

    Language:Python36.9k2435.5k5.3k
  • sheetjs

    SheetJS/sheetjs

    📗 SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs

  • swr

    vercel/swr

    React Hooks for Data Fetching

    Language:TypeScript30.6k2179191.2k
  • DataExpert-io/data-engineer-handbook

    This is a repo with links to everything you'd ever want to learn about data engineering

    Language:Makefile20.5k310233.1k
  • mendableai/firecrawl

    🔥 Turn entire websites into LLM-ready markdown or structured data. Scrape, crawl and extract with a single API.

    Language:TypeScript19k1043971.5k
  • PrefectHQ/prefect

    Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

    Language:Python17.6k1665.7k1.6k
  • fivethirtyeight/data

    Data and code behind the articles and graphics at FiveThirtyEight

    Language:Jupyter Notebook16.9k1.3k16210.9k
  • airbytehq/airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

    Language:Python16.3k18814.7k4.2k
  • presto

    prestodb/presto

    The official home of the Presto distributed SQL query engine for big data

    Language:Java16.1k8546.7k5.4k
  • Sinaptik-AI/pandas-ai

    Chat with your database (SQL, CSV, pandas, polars, mongodb, noSQL, etc). PandasAI makes data analysis conversational using LLMs (GPT 3.5 / 4, Anthropic, VertexAI) and RAG.

    Language:Python13.6k1117361.3k
  • oxnr/awesome-bigdata

    A curated list of awesome big data frameworks, ressources and other awesomeness.

  • faker-js/faker

    Generate massive amounts of fake data in the browser and node.js

    Language:TypeScript13k35700919
  • pwxcoo/chinese-xinhua

    :orange_book: 中华新华字典数据库。包括歇后语,成语,词语,汉字。

    Language:Python11k308602.6k
  • apple/pkl

    A configuration as code language with rich validation and tooling.

    Language:Java10.4k57253280
  • prql

    PRQL/prql

    PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement

    Language:Rust10k441k218
  • akshare

    akfamily/akshare

    AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

    Language:Python9.5k2061.6k1.9k
  • Bogus

    bchavez/Bogus

    :card_index: A simple fake data generator for C#, F#, and VB.NET. Based on and ported from the famed faker.js.

    Language:C#8.9k121342505
  • rawgraphs-app

    rawgraphs/rawgraphs-app

    A web interface to create custom vector-based visualizations on top of RAWGraphs core

    Language:JavaScript8.7k3233041.8k
  • mage-ai/mage-ai

    🧙 Build, run, and manage data pipelines for integrating and transforming data.

    Language:Python8k63880775
  • machine-learning-roadmap

    mrdbourke/machine-learning-roadmap

    A roadmap connecting many of the most important concepts in machine learning, how to learn them and what tools to use to perform them.

  • snowplow

    snowplow/snowplow

    The leader in Next-Generation Customer Data Infrastructure

    Language:Scala6.8k2694k1.2k
  • olifolkerd/tabulator

    Interactive Tables and Data Grids for JavaScript

    Language:JavaScript6.8k1384.3k821
  • dformoso/machine-learning-mindmap

    A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

  • cloudquery

    cloudquery/cloudquery

    The open source high performance ELT framework powered by Apache Arrow

    Language:Go5.9k632.2k513
  • Parsr

    axa-group/Parsr

    Transforms PDF, Documents and Images into Enriched Structured Data

    Language:JavaScript5.9k81163311
  • flyte

    flyteorg/flyte

    Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.

    Language:Go5.8k2603.2k660
  • Countly/countly-server

    Countly is a product analytics platform that helps teams track, analyze and act-on their user actions and behaviour on mobile, web and desktop applications.

    Language:JavaScript5.6k220444975
  • airbnb/knowledge-repo

    A next-generation curated knowledge sharing platform for data scientists and other technical professions.

    Language:Python5.5k174291688
  • cue-lang/cue

    The home of the CUE language! Validate and define text-based and dynamic configuration

    Language:Go5.1k462.6k295
  • browser-compat-data

    mdn/browser-compat-data

    This repository contains compatibility data for Web technologies as displayed on MDN

    Language:JSON5k2525.2k2k
  • superduper

    superduper-io/superduper

    Superduper: Build end-to-end AI applications and agent workflows on your existing data infrastructure and preferred tools - without migrating your data.

    Language:Python4.8k451.3k466
  • brianvoe/gofakeit

    Random fake data generator written in go

    Language:Go4.6k25173263
  • ckan/ckan

    CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.

    Language:Python4.5k1983.5k2k
  • lk-geimfari/mimesis

    Mimesis is a robust data generator for Python that can produce a wide range of fake data in multiple languages.

    Language:Python4.5k61357335