pyspark-notebook

There are 228 repositories under pyspark-notebook topic.

josephmachado/efficient_data_processing_spark
Code for "Efficient Data Processing in Spark" Course
Language:Python346 1 371
hyunjoonbok/PySpark
PySpark functions and utilities with examples. Assists ETL process of data modeling
Language:Jupyter Notebook104 1 176
jplane/pyspark-devcontainer
A simple VS Code devcontainer setup for local PySpark development
Language:Jupyter Notebook55 2 227
rlilojr/Detecting-Malicious-URL-Machine-Learning
Language:Jupyter Notebook55 0 121
josephmachado/docker_for_data_engineers
Code for blog at: https://www.startdataengineering.com/post/docker-for-de/
Language:C40 1 014
archivesunleashed/notebooks
Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archives Unleashed Toolkit.
Language:Jupyter Notebook26 5 54
brennerh1/databricks-demos
Repository of notebooks and related collateral used in the Databricks Demo Hub, showing how to use Databricks, Delta Lake, MLflow, and more.
Language:Python26 5 053
arjones/bigdata-workshop-es
Workshop Big Data en Español
Language:HTML21 12 059
microsoft/Fabric-RTA-FlightStream
Microsoft Fabric Real-time Analytics flight streaming
Language:Jupyter Notebook21 7 03
aakinlalu/Crime-Classification-using-PySpark
classify crime into different categories using PySpark
Language:Jupyter Notebook20 2 217
jacobceles/intro-to-colab-pyspark-emr
A tutorial that helps Big Data Engineers ramp up faster by getting familiar with PySpark dataframes and functions. It also covers topics like EMR sizing, Google Colaboratory, fine-tuning PySpark jobs, and much more.
Language:Jupyter Notebook20 1 010
johntelforduk/betfair-data-analysis
Explore, analyse and visualise Betfair Historical Data Feed using PySpark.
Language:Jupyter Notebook20 0 04
mohanakrishnavh/pyspark-tutorial
Language:Jupyter Notebook18 1 023
shsarv/Cardio-Monitor
Cardio Monitor is a web app that helps you to find out whether you are at risk of developing heart disease. the model used for prediction has an accuracy of 92%. This is the course project of subject Big Data Analytics (BCSE0158).
Language:Jupyter Notebook16 3 010
yennanliu/analysis
Repo for practical data science problems approaches, including notebook demo and working scripts | #DS | #analysis
Language:Jupyter Notebook12 1 010
hyeonsangjeon/dataplatform
Hadoop3.2 single/cluster mode with web terminal gotty, spark, jupyter pyspark, hive, eco etc.
Language:Shell11 0 01
prabeesh/pyspark-notebook
Pyspark Notebook With Docker
Language:Python11 0 010
jitsejan/pyspark-101
A PySpark course to get started with the basics for a Data Engineer
Language:Jupyter Notebook9 1 09
miquido/DataScience
Useful scripts and notebooks for Data Science. The project was made by Miquido. https://www.miquido.com/
Language:Jupyter Notebook9 2 03
AnandaRauf/CekatanBiz
CekatanBiz is Software Tools Data Analyst,Business Analyst,and Business Intelligence. Developed using Python.
Language:Jupyter Notebook8 3 01
awkepler/PySpark_Spark_Adventure
Sample code for pyspark
Language:Jupyter Notebook8 0 011
imsanjoykb/PySpark-Bootcamp
My Practice and project on PySpark
Language:Jupyter Notebook8 2 03
gupta-aayushkr/F1-Racing
The project aims to process Formula 1 racing data, create an automated data pipeline, and make the data available for presentation and analysis purposes.
Language:Python7 1 01
HenryBao91/PySpark-Learning-Tutorial
Hadoop+PySpark大数据挖掘、处理与分析
Language:Jupyter Notebook7 1 01
lmriccardo/fraudolent-transaction-classification
Project for the Big Data Computing course at the University of "La Sapienza" in Master in Computer Science A.A. 2021/2022
Language:Jupyter Notebook7 0 01
alisonpezzott/calendario_fabric_lakehouse
Tabela calendário para lakehouse Fabric a partir do notebook spark
Language:Python62
benjbaron/GeoNames
GeoNames cities search service powered by Algolia
Language:Jupyter Notebook6 0 06
Big-Data-FC/project
Predict how many points an European football team will end the season with, according to the characteristics of its players. Project for the Big Data Computing course at Sapienza University of Rome (2021-22)
Language:Jupyter Notebook6 1 92
HaJunYoo/pyspark-tutorial
PySpark을 Colab, docker 환경에서 실습한 spark 코드 정리 레포지토리입니다
Language:Jupyter Notebook6 1 03
Prajwal10031999/Song-Genre-Classification-in-PySparks-MLlib
A PySpark MLlib classification model to classify songs based on a number of characteristics into a set of 23 electronic genres.
Language:Jupyter Notebook6 1 12
easonlai/Samples_for_Azure_Databricks_Orientation
Samples for Azure Databricks Orientation
Language:HTML5 1 02
jpacerqueira-zz/Akamai-log-Analysis-SparkML-H2o
Transformation of Akamai Logs with Spark ETL and discover of Values and similarities in logs used SparkML and H2O ML
Language:HTML5 1 01
vigneshSs-07/Pyspark-ACompleteGuide
This repo explains pyspark modules in python. Used to deal with big data more practical handson.
Language:Jupyter Notebook5 1 04
conorheffron/ironoc-spark
Sample PySpark Notebook
Language:Jupyter Notebook40
Psingh12354/Pyspark
Language:Jupyter Notebook4 1 02
shreyashji/Spark-PySpark-DataBricks
Adding my python,spark, pyspark, scala notebooks logics which i solve/see on daily basis,it contains optimization techniques for big data processing and real time scenarios
Language:Jupyter Notebook4 1 05

pyspark-notebook

josephmachado/efficient_data_processing_spark

hyunjoonbok/PySpark

jplane/pyspark-devcontainer

rlilojr/Detecting-Malicious-URL-Machine-Learning

josephmachado/docker_for_data_engineers

archivesunleashed/notebooks

brennerh1/databricks-demos

arjones/bigdata-workshop-es

microsoft/Fabric-RTA-FlightStream

aakinlalu/Crime-Classification-using-PySpark

jacobceles/intro-to-colab-pyspark-emr

johntelforduk/betfair-data-analysis

mohanakrishnavh/pyspark-tutorial

shsarv/Cardio-Monitor

yennanliu/analysis

hyeonsangjeon/dataplatform

prabeesh/pyspark-notebook

jitsejan/pyspark-101

miquido/DataScience

AnandaRauf/CekatanBiz

awkepler/PySpark_Spark_Adventure

imsanjoykb/PySpark-Bootcamp

gupta-aayushkr/F1-Racing

HenryBao91/PySpark-Learning-Tutorial

lmriccardo/fraudolent-transaction-classification

alisonpezzott/calendario_fabric_lakehouse

benjbaron/GeoNames

Big-Data-FC/project

HaJunYoo/pyspark-tutorial

Prajwal10031999/Song-Genre-Classification-in-PySparks-MLlib

easonlai/Samples_for_Azure_Databricks_Orientation

jpacerqueira-zz/Akamai-log-Analysis-SparkML-H2o

vigneshSs-07/Pyspark-ACompleteGuide

conorheffron/ironoc-spark

Psingh12354/Pyspark

shreyashji/Spark-PySpark-DataBricks