pyspark
There are 3404 repositories under pyspark topic.
ai-deployment
关注AI模型上线、模型部署
MorphL-Community-Edition
MorphL Community Edition uses big data and machine learning to predict user behaviors in digital products and services with the end goal of increasing KPIs (click-through rates, conversion rates, etc.) through personalization
hnswlib
Java library for approximate nearest neighbors search using Hierarchical Navigable Small World graphs
gimel
Big Data Processing Framework - Unified Data API or SQL on Any Storage
LearningApacheSpark
LearningApacheSpark
spark-iforest
Isolation Forest on Spark
azure-cosmosdb-spark
Apache Spark Connector for Azure Cosmos DB
data-algorithms-with-spark
O'Reilly Book: [Data Algorithms with Spark] by Mahmoud Parsian
automl-toolkit
Toolkit for Apache Spark ML for Feature clean-up, feature Importance calculation suite, Information Gain selection, Distributed SMOTE, Model selection and training, Hyper parameter optimization and selection, Model interprability.
incubator-graphar
An open source, standard data file format for graph data storage and retrieval.
handyspark
HandySpark - bringing pandas-like capabilities to Spark dataframes
spark-extension
A library that provides useful extensions to Apache Spark and PySpark.
WallStreetBets_BigDataAnalysis
Research project aimed to classify the best stock research posts from r/WallStreetBets for you. 😏
DataAnalysisWithPythonAndPySpark
Code repository for the "PySpark in Action" book
pyspark-learning
Updated repository
OSCI
Open Source Contributor Index
big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
data_engineering_best_practices
Sample project to demonstrate data engineering best practices
song-playlist-recommendation
This project was a joint effort by Lucas De Oliveira, Chandrish Ambati, and Anish Mukherjee to create a song and playlist embeddings for recommendations in a distributed fashion using a 1M playlist dataset by Spotify.
Repo-2019
BERT, AWS RDS, AWS Forecast, EMR Spark Cluster, Hive, Serverless, Google Assistant + Raspberry Pi, Infrared, Google Cloud Platform Natural Language, Anomaly detection, Tensorflow, Mathematics
aut
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
RePlay
A Comprehensive Framework for Building End-to-End Recommendation Systems with State-of-the-Art Models
phrase-at-scale
Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Movalytics-Data-Warehouse
Data pipeline performing ETL to AWS Redshift using Spark, orchestrated with Apache Airflow
cuallee
Possibly the fastest DataFrame-agnostic quality check library in town.
pyspark-stubs
Apache (Py)Spark type annotations (stub files).
dataproc-templates
Dataproc templates and pipelines for solving simple in-cloud data tasks
Spark-Streaming-In-Python
Apache Spark 3 - Structured Streaming Course Material
BitCoin-Value-Predictor
[NOT MAINTAINED] Predicting Bit coin price using Time series analysis and sentiment analysis of tweets on bitcoin
pyspark-tutorial
PySpark Code for Hands-on Learners
Azure-Databricks-NYC-Taxi-Workshop
An Azure Databricks workshop leveraging the New York Taxi and Limousine Commission Trip Records dataset
Big-Data-Engineering-Coursera-Yandex
Big Data for Data Engineers Coursera Specialization from Yandex
Relation_Extraction
Relation Extraction using Deep learning(CNN)
pyspark-tutorial
Jupyter notebooks for pyspark tutorials given at University
spark-select
A library for Spark DataFrame using MinIO Select API
spark_python_ml_examples
Spark 2.0 Python Machine Learning examples