mapreduce-java
There are 160 repositories under mapreduce-java topic.
COMPSCI401-Projects
Personal repo for COMPSCI 401 project 1-3, 22SP@DKU
big-data-analysis
Big Data Analysis using Hadoop and Spark
Analyzing_Brooklyn
For this project we studied 3 data sets revolving around neighborhoods in New York City. We hope to learn what neighborhoods in Brooklyn are good to live in
PageRank-Spark
Implementation of the MapReduce PageRank algorithm using the Spark framework both in Python and in Java (developed for Cloud Computing course)
spark
Playing with Spark using Java
HADOOP-BIGDATA
These are the various programs which i used for my hadoop projects.
PhraseExtract
Hadoop MapReduce Assignment: Distributed Phrase Extraction(Unregistered Word Discovery)
MapReduceVsSpark
This project compares the total runtime of MapReduce with Spark of two operations - single record lookup and filter
WordCount-MapReduce
Fully distributed Hadoop on AWS EC2 Cluster, executes WordCount MapReduce operations and analyzes performance as a function of Cluster Size.
MapReduce
A small hadoop map reduce implemented for Big Data Project
BigData-HW-MapReduce
Solutions of some MapReduce Problem
Parallel-Concurrent-and-Distributed-Programming-in-Java
Parallel, Concurrent, and Distributed Programming in Java | Coursera
MutualFriends
Implementation of Hadoop and Spark
MapReduce
Programs for MapReduce written in java with least complexity!
Hadoop-Mapreduce
Data analysis on Big Data. Used various databases from 1M to 100M including Movie Lens dataset to perform analysis. Covers basics and advance map reduce using Hadoop.
mapreduce-ep3-client
Um sistema que permite a um programa cliente requisitar, a uma arquitetura Map-Reduce, a criação de um índice invertido de links (semelhante a uma das atividades do PageRank do Google)
secondary-sort-employees
Secondary Sort MapReduce
Nap
Nap: Network-Aware Data Partitions for Efficient Distributed Processing
Information-Retrieval
Information retrieval (IR) is concerned with finding material (e.g., documents) of an unstructured nature (usually text) in response to an information need (e.g., a query) from large collections. One approach to identify relevant documents is to compute scores based on the matches between terms in the query and terms in the documents. For example, a document with words such as ball, team, score, championship is likely to be about sports. It is helpful to define a weight for each term in a document that can be meaningful for computing such a score. We describe below popular information retrieval metrics such as term frequency, inverse document frequency, and their product, term frequency-inverse document frequency (TF-IDF), that are used to define weights for terms. Term Frequency: Term frequency is the number of times a particular word t occurs in a document d. TF(t, d) = No. of times t appears in document d Since the importance of a word in a document does not necessarily scale linearly with the frequency of its appearance, a common modification is to instead use the logarithm of the raw term frequency. WF(t,d) = 1 + log10 (TF(t,d)) if TF(t,d) > 0, and 0 otherwise We will use this logarithmically scaled term frequency in what follows. Inverse Document Frequency: The inverse document frequency (IDF) is a measure of how common or rare a term is across all documents in the collection. It is the logarithmically scaled fraction of the documents that contain the word, and is obtained by taking the logarithm of the ratio of the total number of documents to the number of documents containing the term. IDF(t) = log10 (Total # of documents / # of documents containing term t) Under this IDF formula, terms appearing in all documents are assumed to be stopwords and subsequently assigned IDF=0. We will use the smoothed version of this formula as follows: IDF(t) = log10 (1 + Total # of documents / # of documents containing term t) Practically, smoothed IDF helps alleviating the out of vocabulary problem (OOV), where it is better to return to the user results rather than nothing even if his query matches every single document in the collection. TF-IDF: Term frequency–inverse document frequency (TF-IDF) is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus of documents. It is often used as a weighting factor in information retrieval and text mining. TF-IDF(t, d) = WF(t,d) * IDF(t)
Page-Rank-Implementation
The goal of this programming assignment is to compute the PageRanks of an input set of hyperlinked Wikipedia documents using Hadoop MapReduce. The PageRank score of a web page serves as an indicator of the importance of the page. Many web search engines (e.g., Google) use PageRank scores in some form to rank user-submitted queries. The goals of this assignment are to: 1. Understand the PageRank algorithm and how it works in MapReduce. 2. Implement PageRank and execute it on a large corpus of data. 3. Examine the output from running PageRank on Simple English Wikipedia to measure the relative importance of pages in the corpus. To run your program on the full Simple English Wikipedia archive, you will need to run it on the dsba-hadoop cluster to which you have access.
StockMeUp
This is a class project for 'CIS 610 : Data Science' where I try and validate Stock Market recommendations.
Recommend_Friends_through_MapReduce
Its a Map Reduce Program which tells you about People you may know on the basis of mutual friends
Big-Data-Systems
This repo contains all the assignments, project work on Engineering Big Data Systems coursework
MapReduce-Spark-Comparison
Assignment for Big Data Processing: Comparison between Spark and MapReduce programs for analysing large data sets.
Olympic-Tweets
Assignment for Big Data Processing: A collection of programs for analysing tweets related to the 2012 Olympics.
mosquitos-hpc
Proyecto Java para Hadoop MapReduce que permite ejecutar algoritmos de detección de tendencias sobre series temporales, aplicados a datos de ventas de productos relacionados con control de plagas (repelentes e insecticidas).
MIT6.824-MapReduce
MapReduce Implementation - Distributed System
Apache-Hadoop-Map-Reduce--Basic-Sentiment-Analysis-on-Yelp-Dataset
In this project we will use Hadoop MapReduce to implement a very basic “Sentiment Analysis” using the review text in the Yelp Academic Dataset as training data.
bd_lab
big data lab nmit 6th sem 18isl
MapReduceBasicApplications
Basic MapReduce applications in Java.
COVID-19-MapReduce-project
Design and implementation of different MapReduce jobs used to analyze a dataset on Covid-19 disease created by Our World In Data
BigData-Training
Big data training material
MapReduce-multi-table-merge
MapReduce multi-table merge MapReduce多表合并
SLR207
My Télécom Paris SLR207 repo - Implementing MapReduce Algorithm on a distributed networks communicating via socket.