pyspark

There are 3404 repositories under pyspark topic.

ai-deployment
关注AI模型上线、模型部署
Language:Jupyter Notebook265
MorphL-Community-Edition
MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization
Language:Python260
hnswlib
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
Language:Java243
gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
Language:Scala242
LearningApacheSpark
LearningApacheSpark
Language:Python235
spark-iforest
Isolation Forest on Spark
Language:Scala226
azure-cosmosdb-spark
Apache Spark Connector for Azure Cosmos DB
Language:Scala197
data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
Language:Python193
automl-toolkit
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
Language:HTML190
incubator-graphar
An open source, standard data file format for graph data storage and retrieval.
Language:C++188
handyspark
HandySpark - bringing pandas-like capabilities to Spark dataframes
Language:Jupyter Notebook182
spark-extension
A library that provides useful extensions to Apache Spark and PySpark.
Language:Scala174
WallStreetBets_BigDataAnalysis
Research project aimed to classify the best stock research posts from r/WallStreetBets for you. 😏
Language:Jupyter Notebook168
DataAnalysisWithPythonAndPySpark
Code repository for the "PySpark in Action" book
Language:Python164
pyspark-learning
Updated repository
Language:Jupyter Notebook157
OSCI
Open Source Contributor Index
Language:Python154
big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Language:HTML146
data_engineering_best_practices
Sample project to demonstrate data engineering best practices
Language:Python141
song-playlist-recommendation
This project was a joint effort by Lucas De Oliveira, Chandrish Ambati, and Anish Mukherjee to create a song and playlist embeddings for recommendations in a distributed fashion using a 1M playlist dataset by Spotify.
Language:HTML137
Repo-2019
BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
Language:Jupyter Notebook136
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Language:Scala134
RePlay
A Comprehensive Framework for Building End-to-End Recommendation Systems with State-of-the-Art Models
Language:Python130
phrase-at-scale
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Language:Python125
Movalytics-Data-Warehouse
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
Language:Python120
cuallee
Possibly the fastest DataFrame-agnostic quality check library in town.
Language:Python118
pyspark-stubs
Apache (Py)Spark type annotations (stub files).
Language:Python114
dataproc-templates
Dataproc templates and pipelines for solving simple in-cloud data tasks
Language:Python113
Spark-Streaming-In-Python
Apache Spark 3 - Structured Streaming Course Material
Language:Python113
BitCoin-Value-Predictor
[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
Language:Jupyter Notebook112
pyspark-tutorial
PySpark Code for Hands-on Learners
Language:Jupyter Notebook111
Azure-Databricks-NYC-Taxi-Workshop
An Azure Databricks workshop leveraging the New York Taxi and Limousine Commission Trip Records dataset
Language:Scala102
Big-Data-Engineering-Coursera-Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Language:Jupyter Notebook100
Relation_Extraction
Relation Extraction using Deep learning(CNN)
Language:Python100
pyspark-tutorial
Jupyter notebooks for pyspark tutorials given at University
Language:Jupyter Notebook98
spark-select
A library for Spark DataFrame using MinIO Select API
Language:Scala96
spark_python_ml_examples
Spark 2.0 Python Machine Learning examples
Language:Python95