pyspark-notebook
There are 225 repositories under pyspark-notebook topic.
josephmachado/efficient_data_processing_spark
Code for "Efficient Data Processing in Spark" Course
hyunjoonbok/PySpark
PySpark functions and utilities with examples. Assists ETL process of data modeling
jplane/pyspark-devcontainer
A simple VS Code devcontainer setup for local PySpark development
josephmachado/docker_for_data_engineers
Code for blog at: https://www.startdataengineering.com/post/docker-for-de/
archivesunleashed/notebooks
Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archives Unleashed Toolkit.
brennerh1/databricks-demos
Repository of notebooks and related collateral used in the Databricks Demo Hub, showing how to use Databricks, Delta Lake, MLflow, and more.
aakinlalu/Crime-Classification-using-PySpark
classify crime into different categories using PySpark
arjones/bigdata-workshop-es
Workshop Big Data en Español
microsoft/Fabric-RTA-FlightStream
Microsoft Fabric Real-time Analytics flight streaming
jacobceles/intro-to-colab-pyspark-emr
A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics like EMR sizing, Google Colaboratory, fine-tuning PySpark jobs, and much more.
johntelforduk/betfair-data-analysis
Explore, analyse and visualise Betfair Historical Data Feed using PySpark.
shsarv/Cardio-Monitor
Cardio Monitor is a web app that helps you to find out whether you are at risk of developing heart disease. the model used for prediction has an accuracy of 92%. This is the course project of subject Big Data Analytics (BCSE0158).
yennanliu/analysis
Repo for practical data science problems approaches, including notebook demo and working scripts | #DS | #analysis
hyeonsangjeon/dataplatform
Hadoop3.2 single/cluster mode with web terminal gotty, spark, jupyter pyspark, hive, eco etc.
prabeesh/pyspark-notebook
Pyspark Notebook With Docker
jitsejan/pyspark-101
A PySpark course to get started with the basics for a Data Engineer
miquido/DataScience
Useful scripts and notebooks for Data Science. The project was made by Miquido. https://www.miquido.com/
AnandaRauf/CekatanBiz
CekatanBiz is Software Tools Data Analyst,Business Analyst,and Business Intelligence. Developed using Python.
awkepler/PySpark_Spark_Adventure
Sample code for pyspark
imsanjoykb/PySpark-Bootcamp
My Practice and project on PySpark
gupta-aayushkr/F1-Racing
The project aims to process Formula 1 racing data, create an automated data pipeline, and make the data available for presentation and analysis purposes.
HenryBao91/PySpark-Learning-Tutorial
Hadoop+PySpark大数据挖掘、处理与分析
lmriccardo/fraudolent-transaction-classification
Project for the Big Data Computing course at the University of "La Sapienza" in Master in Computer Science A.A. 2021/2022
alisonpezzott/calendario_fabric_lakehouse
Tabela calendário para lakehouse Fabric a partir do notebook spark
benjbaron/GeoNames
GeoNames cities search service powered by Algolia
Big-Data-FC/project
Predict how many points an European football team will end the season with, according to the characteristics of its players. Project for the Big Data Computing course at Sapienza University of Rome (2021-22)
HaJunYoo/pyspark-tutorial
PySpark을 Colab, docker 환경에서 실습한 spark 코드 정리 레포지토리입니다
Prajwal10031999/Song-Genre-Classification-in-PySparks-MLlib
A PySpark MLlib classification model to classify songs based on a number of characteristics into a set of 23 electronic genres.
easonlai/Samples_for_Azure_Databricks_Orientation
Samples for Azure Databricks Orientation
jpacerqueira-zz/Akamai-log-Analysis-SparkML-H2o
Transformation of Akamai Logs with Spark ETL and discover of Values and similarities in logs used SparkML and H2O ML
vigneshSs-07/Pyspark-ACompleteGuide
This repo explains pyspark modules in python. Used to deal with big data more practical handson.
digitalhemanth/Data-Science
Data Science with Machine learning Algorithms using Python PySpark pandas Numpay TensorFlow Keras seaborn matplotlib
shreyashji/Spark-PySpark-DataBricks
Adding my python,spark, pyspark, scala notebooks logics which i solve/see on daily basis,it contains optimization techniques for big data processing and real time scenarios