data

There are 22560 repositories under data topic.

  • query

    TanStack/query

    🤖 Powerful asynchronous state management, server-state utilities and data fetching for the web. TS/JS, React Query, Solid Query, Svelte Query and Vue Query.

    Language:TypeScript47.3k2092.2k3.5k
  • run-llama/llama_index

    LlamaIndex is the leading framework for building LLM-powered agents over your data.

    Language:Python45.1k2596.8k6.5k
  • metabase/metabase

    The easy-to-use open source Business Intelligence and Embedded Analytics tool that lets everyone work with data :bar_chart:

    Language:Clojure44.5k64023.6k6k
  • DataExpert-io/data-engineer-handbook

    This is a repo with links to everything you'd ever want to learn about data engineering

    Language:Jupyter Notebook38.5k454787.4k
  • sheetjs

    SheetJS/sheetjs

    📗 SheetJS Spreadsheet Data Toolkit -- New home https://git.sheetjs.com/SheetJS/sheetjs

  • swr

    vercel/swr

    React Hooks for Data Fetching

    Language:TypeScript32.1k2169631.3k
  • sinaptik-ai/pandas-ai

    Chat with your database or your datalake (SQL, CSV, parquet). PandasAI makes data analysis conversational using LLMs and RAG.

    Language:Python22.5k1699022.2k
  • PrefectHQ/prefect

    Prefect is a workflow orchestration framework for building resilient data pipelines in Python.

    Language:Python20.8k1616.6k2k
  • airbytehq/airbyte

    The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

    Language:Python20k18615.5k4.9k
  • fivethirtyeight/data

    Data and code behind the articles and graphics at FiveThirtyEight

    Language:Jupyter Notebook17.2k1.3k16411.1k
  • presto

    prestodb/presto

    The official home of the Presto distributed SQL query engine for big data

    Language:Java16.6k8327.1k5.5k
  • faker-js/faker

    Generate massive amounts of fake data in the browser and node.js

    Language:TypeScript14.6k357761k
  • akshare

    akfamily/akshare

    AKShare is an elegant and simple financial data interface library for Python, built for human beings! 开源财经数据接口库

    Language:Python14.3k2302.5k2.6k
  • oxnr/awesome-bigdata

    A curated list of awesome big data frameworks, ressources and other awesomeness.

  • pwxcoo/chinese-xinhua

    :orange_book: 中华新华字典数据库。包括歇后语,成语,词语,汉字。

    Language:Python11.4k306622.6k
  • apple/pkl

    A configuration as code language with rich validation and tooling.

    Language:Java10.9k59363339
  • prql

    PRQL/prql

    PRQL is a modern language for transforming data — a simple, powerful, pipelined SQL replacement

    Language:Rust10.5k431.1k241
  • Bogus

    bchavez/Bogus

    :card_index: A simple fake data generator for C#, F#, and VB.NET. Based on and ported from the famed faker.js.

    Language:C#9.5k120354537
  • rawgraphs-app

    rawgraphs/rawgraphs-app

    A web interface to create custom vector-based visualizations on top of RAWGraphs core

    Language:JavaScript8.9k3133221.9k
  • mage-ai/mage-ai

    🧙 Build, run, and manage data pipelines for integrating and transforming data.

    Language:Python8.5k631k888
  • Scrapling

    D4Vinci/Scrapling

    🕷️ An undetectable, powerful, flexible, high-performance Python library to make Web Scraping Easy and Effortless as it should be!

    Language:Python8.1k4535463
  • machine-learning-roadmap

    mrdbourke/machine-learning-roadmap

    A roadmap connecting many of the most important concepts in machine learning, how to learn them and what tools to use to perform them.

  • olifolkerd/tabulator

    Interactive Tables and Data Grids for JavaScript

    Language:JavaScript7.4k1454.4k870
  • snowplow

    snowplow/snowplow

    The leader in Customer Data Infrastructure

    Language:Scala7k2614k1.2k
  • flyte

    flyteorg/flyte

    Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.

    Language:Go6.6k2533.5k757
  • cloudquery

    cloudquery/cloudquery

    Data pipelines for cloud config and security data. Build cloud asset inventory, CSPM, FinOps, and vulnerability management solutions. Extract from AWS, Azure, GCP, and 70+ cloud and SaaS sources.

    Language:Go6.2k612.2k542
  • dformoso/machine-learning-mindmap

    A mindmap summarising Machine Learning concepts, from Data Analysis to Deep Learning.

  • Parsr

    axa-group/Parsr

    Transforms PDF, Documents and Images into Enriched Structured Data

    Language:JavaScript6k81163317
  • Countly/countly-server

    Countly is a product analytics platform that helps teams track, analyze and act-on their user actions and behaviour on mobile, web and desktop applications.

    Language:JavaScript5.8k214452981
  • cue-lang/cue

    The home of the CUE language! Validate and define text-based and dynamic configuration

    Language:Go5.8k433.1k339
  • airbnb/knowledge-repo

    A next-generation curated knowledge sharing platform for data scientists and other technical professions.

    Language:Python5.5k170291685
  • browser-compat-data

    mdn/browser-compat-data

    Browser compatibility data for Web technologies as displayed on MDN

    Language:JSON5.5k2686k2.4k
  • modelscope/data-juicer

    Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷

    Language:Python5.3k20293275
  • brianvoe/gofakeit

    Random fake data generator written in go

    Language:Go5.2k25181290
  • superduper

    superduper-io/superduper

    Superduper: End-to-end framework for building custom AI applications and agents.

    Language:Python5.2k431.4k533
  • ckan/ckan

    CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.

    Language:Python4.9k2023.6k2.1k