pyspark

There are 3404 repositories under pyspark topic.

  • ai-deployment

    关注AI模型上线、模型部署

    Language:Jupyter Notebook265
  • MorphL-Community-Edition

    MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization

    Language:Python260
  • hnswlib

    Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs

    Language:Java243
  • gimel

    Big Data Processing Framework - Unified Data API or SQL on Any Storage

    Language:Scala242
  • LearningApacheSpark

    LearningApacheSpark

    Language:Python235
  • spark-iforest

    Isolation Forest on Spark

    Language:Scala226
  • azure-cosmosdb-spark

    Apache Spark Connector for Azure Cosmos DB

    Language:Scala197
  • data-algorithms-with-spark

    O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian

    Language:Python193
  • automl-toolkit

    Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.

    Language:HTML190
  • incubator-graphar

    An open source, standard data file format for graph data storage and retrieval.

    Language:C++188
  • handyspark

    HandySpark - bringing pandas-like capabilities to Spark dataframes

    Language:Jupyter Notebook182
  • spark-extension

    A library that provides useful extensions to Apache Spark and PySpark.

    Language:Scala174
  • WallStreetBets_BigDataAnalysis

    Research project aimed to classify the best stock research posts from r/WallStreetBets for you. 😏

    Language:Jupyter Notebook168
  • DataAnalysisWithPythonAndPySpark

    Code repository for the "PySpark in Action" book

    Language:Python164
  • pyspark-learning

    Updated repository

    Language:Jupyter Notebook157
  • OSCI

    OSCI

    Open Source Contributor Index

    Language:Python154
  • big-data-mapreduce-course

    Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University

    Language:HTML146
  • data_engineering_best_practices

    Sample project to demonstrate data engineering best practices

    Language:Python141
  • song-playlist-recommendation

    This project was a joint effort by Lucas De Oliveira, Chandrish Ambati, and Anish Mukherjee to create a song and playlist embeddings for recommendations in a distributed fashion using a 1M playlist dataset by Spotify.

    Language:HTML137
  • Repo-2019

    BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics

    Language:Jupyter Notebook136
  • aut

    The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.

    Language:Scala134
  • RePlay

    RePlay

    A Comprehensive Framework for Building End-to-End Recommendation Systems with State-of-the-Art Models

    Language:Python130
  • phrase-at-scale

    Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English

    Language:Python125
  • Movalytics-Data-Warehouse

    Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow

    Language:Python120
  • cuallee

    Possibly the fastest DataFrame-agnostic quality check library in town.

    Language:Python118
  • pyspark-stubs

    Apache (Py)Spark type annotations (stub files).

    Language:Python114
  • dataproc-templates

    Dataproc templates and pipelines for solving simple in-cloud data tasks

    Language:Python113
  • Spark-Streaming-In-Python

    Apache Spark 3 - Structured Streaming Course Material

    Language:Python113
  • BitCoin-Value-Predictor

    [NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin

    Language:Jupyter Notebook112
  • pyspark-tutorial

    PySpark Code for Hands-on Learners

    Language:Jupyter Notebook111
  • Azure-Databricks-NYC-Taxi-Workshop

    An Azure Databricks workshop leveraging the New York Taxi and Limousine Commission Trip Records dataset

    Language:Scala102
  • Big-Data-Engineering-Coursera-Yandex

    Big Data for Data Engineers Coursera Specialization from Yandex

    Language:Jupyter Notebook100
  • Relation_Extraction

    Relation Extraction using Deep learning(CNN)

    Language:Python100
  • pyspark-tutorial

    Jupyter notebooks for pyspark tutorials given at University

    Language:Jupyter Notebook98
  • spark-select

    A library for Spark DataFrame using MinIO Select API

    Language:Scala96
  • spark_python_ml_examples

    Spark 2.0 Python Machine Learning examples

    Language:Python95