spark-dataframes
There are 44 repositories under spark-dataframes topic.
mahmoudparsian/pyspark-tutorial
PySpark-Tutorial provides basic algorithms using PySpark
26hzhang/StockPrediction
Plain Stock Close-Price Prediction via Graves LSTM RNNs
mahmoudparsian/big-data-mapreduce-course
Big Data Modeling, MapReduce, Spark, PySpark @ Santa Clara University
Thomas-George-T/Movies-Analytics-in-Spark-and-Scala
Data cleaning, pre-processing, and Analytics on a million movies using Spark and Scala.
spider-123-eng/Spark
Apache Spark is a fast, in-memory data processing engine with elegant and expressive development API's to allow data workers to efficiently execute streaming, machine learning or SQL workloads that require fast iterative access to datasets.This project will have sample programs for Spark in Scala language .
jubins/Spark-And-MLlib-Projects
This repository contains Spark, MLlib, PySpark and Dataframes projects
yennanliu/spark-etl-pipeline
Various data stream/batch process demo with Apache Scala Spark 🚀
jkoth/Data-Lake-with-Spark-and-AWS-S3
Create Data Lake on AWS S3 to store dimensional tables after processing data using Spark on AWS EMR cluster
neerajkesav/SparkJavaExamples
Apache Spark Basics - Java Examples
NashTech-Labs/Sparkathon
A library having Java and Scala examples for Spark 2.x
afzals2000/spark-bigquery-parallel
Spark BigQuery Parallel
MaxineXiong/Item-based-collaborative-filtering
This project utilizes PySpark DataFrames and PySpark RDD to implement item-based collaborative filtering. By calculating cosine similarity scores or identifying movies with the highest number of shared viewers, the system recommends 10 similar movies for a given target movie that aligns users’ preferences.
maziyarpanahi/spark-quickie
Getting started with Apache Spark
mayankrawat/CSVJoin
Use this project to join data from multiple csv files. Currently in this project we support one to one and one to many join. Along with this you can find how to use kafka producer efficiently with spark.
ninjeanne/datastorm
Data Science and Engineering project - Programming for Big Data @ Simon Fraser University (SFU)
thenickben/SplitCSV-Spark
Big Data - Split a large CSV file into N smaller ones and save them into the local disk
Vivek-Murali/CarCrashAnalysis
BCG GAMMA CASE STUDY
AliElsaeid/Predicting-Kickstarter-Campaign-Success-Using-Machine-Learning
Predict the success of Kickstarter campaigns using machine learning. Analyze project data including financial goals, pledge amounts, categories, and outcomes. Perform data cleaning, queries, visualizations, and build models to forecast campaign success, helping entrepreneurs optimize their funding strategies
anshul1004/MutualFriends
Implementation of Hadoop and Spark
chinmayms/propinvestment
Predict Current Property Investment opportunities using Data Analysis (Big Data Spark ML)
LucasDLee/CMPT-353-Final-Project
This is our final project for SFU's CMPT 353 taught by Greg Baker during Summer 2023
mohammad-safari/spark-hadoop-exercise
spark hadoop exercise of cloud computing course - aut 1402-1403 fall
RahulGupta16/Pyspark-Theory-and-Code-Basics
Pyspark serves as a Python interface to Apache Spark, enabling the execution of Python and SQL-like instructions for the manipulation and analysis of data within a distributed processing framework.
rajeshsantha/MonitoredStructuredStreaming
Repository for Spark structured streaming use case implementations.
SevakAvet/spark-session-enricher
Calculate user sessions & stats on top of them for imaginary ecom site using Spark sql & aggregations
zaha2020/Big_Data
This repository contains the implementation of a wide variety of BigData Projects in different applications of NoSQL databases, Spark, Data Pipelines, and map-reduce. These projects include university projects and projects implemented due to interest in BigData.
Bcromas/pyspark_projects
A collection of small projects exploring PySpark features and functionality including packages and modules, algorithms, and general data science techniques.
lalithvenkat/Analysis-of-M50-Highway-data-using-Spark
This Repo contains analysis of large data using Spark
on2e/ntua-atdb
Advanced Topics in Databases course project - NTUA ECE - 2022-23
prajakta-3-patil/e-commerce-analysis
This project is about exploring and analysing E-commerce data. This primarily includes leveraging Apache Spark Dataframe API, joins, functions and aggregations to generate summarized results.
psanghal/bosch_manufacturing_line
UMSI-Bosch Manufacturing Line Failure Analysis
WazirRohiman/Apache_Spark_Basics
This series explores the basics of Apache Spark with the application of some practical elements of Spark, PySpark & SparkSQL
aravind2060/spark-sql-on-flight-data
work with a flight dataset and use Spark SQL to analyze flight delays, airport traffic, and other key metrics