pyspark-notebook
There are 205 repositories under pyspark-notebook topic.
josephmachado/efficient_data_processing_spark
Code for "Efficient Data Processing in Spark" Course
hyunjoonbok/PySpark
PySpark functions and utilities with examples. Assists ETL process of data modeling
jplane/pyspark-devcontainer
A simple VS Code devcontainer setup for local PySpark development
josephmachado/docker_for_data_engineers
Code for blog at: https://www.startdataengineering.com/post/docker-for-de/
brennerh1/databricks-demos
Repository of notebooks and related collateral used in the Databricks Demo Hub, showing how to use Databricks, Delta Lake, MLflow, and more.
archivesunleashed/notebooks
Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archives Unleashed Toolkit.
aakinlalu/Crime-Classification-using-PySpark
classify crime into different categories using PySpark
arjones/bigdata-workshop-es
Workshop Big Data en Español
jacobceles/intro-to-colab-pyspark-emr
A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics like EMR sizing, Google Colaboratory, fine-tuning PySpark jobs, and much more.
microsoft/Fabric-RTA-FlightStream
Microsoft Fabric Real-time Analytics flight streaming
johntelforduk/betfair-data-analysis
Explore, analyse and visualise Betfair Historical Data Feed using PySpark.
yennanliu/analysis
Repo for practical data science problems approaches, including notebook demo and working scripts | #DS | #analysis
hyeonsangjeon/dataplatform
Hadoop3.2 single/cluster mode with web terminal gotty, spark, jupyter pyspark, hive, eco etc.
prabeesh/pyspark-notebook
Pyspark Notebook With Docker
shsarv/Cardio-Monitor
Cardio Monitor is a web app that helps you to find out whether you are at risk of developing heart disease. the model used for prediction has an accuracy of 92%. This is the course project of subject Big Data Analytics (BCSE0158).
imsanjoykb/PySpark-Bootcamp
My Practice and project on PySpark
jitsejan/pyspark-101
A PySpark course to get started with the basics for a Data Engineer
miquido/DataScience
Useful scripts and notebooks for Data Science. The project was made by Miquido. https://www.miquido.com/
AnandaRauf/CekatanBiz
CekatanBiz is Software Tools Data Analyst,Business Analyst,and Business Intelligence. Developed using Python.
awkepler/PySpark_Spark_Adventure
Sample code for pyspark
lmriccardo/fraudolent-transaction-classification
Project for the Big Data Computing course at the University of "La Sapienza" in Master in Computer Science A.A. 2021/2022
Big-Data-FC/project
Predict how many points an European football team will end the season with, according to the characteristics of its players. Project for the Big Data Computing course at Sapienza University of Rome (2021-22)
Prajwal10031999/Song-Genre-Classification-in-PySparks-MLlib
A PySpark MLlib classification model to classify songs based on a number of characteristics into a set of 23 electronic genres.
benjbaron/GeoNames
GeoNames cities search service powered by Algolia
gupta-aayushkr/F1-Racing
The project aims to process Formula 1 racing data, create an automated data pipeline, and make the data available for presentation and analysis purposes.
HaJunYoo/Pyspark-tutorial
PySpark을 Colab, docker 환경에서 실습한 spark 코드 정리 레포지토리입니다
jpacerqueira-zz/Akamai-log-Analysis-SparkML-H2o
Transformation of Akamai Logs with Spark ETL and discover of Values and similarities in logs used SparkML and H2O ML
digitalhemanth/Data-Science
Data Science with Machine learning Algorithms using Python PySpark pandas Numpay TensorFlow Keras seaborn matplotlib
easonlai/Samples_for_Azure_Databricks_Orientation
Samples for Azure Databricks Orientation
vigneshSs-07/Pyspark-ACompleteGuide
This repo explains pyspark modules in python. Used to deal with big data more practical handson.
vikrant65-byte/IPL-dataset-Analysis
Now that this year's IPL is over, let's not curb our cricket love and start analyzing the whole of IPL with this latest and complete Indian Premier League dataset. It contains the match descriptions, results, winners, player of the matches, ball by ball dataset and much more. So, stop thinking and start analyzing . Content This dataset consists of three separate CSV files : matches and deliveries. These files contain the information of each match summary and ball by ball details, respectively.
HenryBao91/PySpark-Learning-Tutorial
Hadoop+PySpark大数据挖掘、处理与分析
TrentBrunson/Big_Data
Apache Hadoop: HDFS, MapReduce, YARN, NLP, AWS, Spark, Google Colab, PySpark