mapreduce-java

There are 160 repositories under mapreduce-java topic.

  • Hadoop_Task

    Language:Java1
  • COMPSCI401-Projects

    Personal repo for COMPSCI 401 project 1-3, 22SP@DKU

    Language:Jupyter Notebook1
  • big-data-analysis

    Big Data Analysis using Hadoop and Spark

    Language:Java1
  • Analyzing_Brooklyn

    For this project we studied 3 data sets revolving around neighborhoods in New York City. We hope to learn what neighborhoods in Brooklyn are good to live in

    Language:HiveQL1
  • PageRank-Spark

    Implementation of the MapReduce PageRank algorithm using the Spark framework both in Python and in Java (developed for Cloud Computing course)

    Language:Java1
  • spark

    Playing with Spark using Java

    Language:Java1
  • HADOOP-BIGDATA

    These are the various programs which i used for my hadoop projects.

    Language:Jupyter Notebook1
  • PhraseExtract

    Hadoop MapReduce Assignment: Distributed Phrase Extraction(Unregistered Word Discovery)

    Language:Java1
  • MapReduceVsSpark

    This project compares the total runtime of MapReduce with Spark of two operations - single record lookup and filter

    Language:Java1
  • WordCount-MapReduce

    Fully distributed Hadoop on AWS EC2 Cluster, executes WordCount MapReduce operations and analyzes performance as a function of Cluster Size.

    Language:Java1
  • MapReduce

    A small hadoop map reduce implemented for Big Data Project

    Language:Java1
  • BigData-HW-MapReduce

    Solutions of some MapReduce Problem

    Language:Java1
  • Parallel-Concurrent-and-Distributed-Programming-in-Java

    Parallel, Concurrent, and Distributed Programming in Java | Coursera

    Language:Java1
  • MutualFriends

    Implementation of Hadoop and Spark

    Language:Java1
  • MapReduce

    Programs for MapReduce written in java with least complexity!

    Language:Java1
  • Hadoop-Mapreduce

    Data analysis on Big Data. Used various databases from 1M to 100M including Movie Lens dataset to perform analysis. Covers basics and advance map reduce using Hadoop.

  • mapreduce-ep3-client

    Um sistema que permite a um programa cliente requisitar, a uma arquitetura Map-Reduce, a criação de um índice invertido de links (semelhante a uma das atividades do PageRank do Google)

    Language:Java1
  • secondary-sort-employees

    Secondary Sort MapReduce

    Language:Java1
  • Nap

    Nap: Network-Aware Data Partitions for Efficient Distributed Processing

    Language:Mathematica1
  • Information-Retrieval

    Information retrieval (IR) is concerned with finding material (e.g., documents) of an unstructured nature (usually text) in response to an information need (e.g., a query) from large collections. One approach to identify relevant documents is to compute scores based on the matches between terms in the query and terms in the documents. For example, a document with words such as ball, team, score, championship is likely to be about sports. It is helpful to define a weight for each term in a document that can be meaningful for computing such a score. We describe below popular information retrieval metrics such as term frequency, inverse document frequency, and their product, term frequency-inverse document frequency (TF-IDF), that are used to define weights for terms. ​ Term​ ​Frequency: ​ Term frequency is the number of times a particular word t occurs in a document d. TF(t,​ ​d)​ ​=​ ​No.​ ​of​ ​times​ ​t​ ​appears​ ​in​ ​document​ ​d Since the importance of a word in a document does not necessarily scale linearly with the frequency of its appearance, a common modification is to instead use the logarithm of the raw term frequency. WF(t,d)​ ​=​ ​1​ ​+​ ​log​10​ (TF(t,d))​ ​ ​if​ ​TF(t,d)​ ​>​ ​0,​ ​and​ ​0​ ​otherwise ​ ​ ​ ​ ​ We will use this logarithmically scaled term frequency in what follows. Inverse​ ​Document​ ​Frequency: The inverse document frequency (IDF) is a measure of how common or rare a term is across all documents in the collection. It is the logarithmically scaled fraction of the documents that contain the word, and is obtained by taking the logarithm of the ratio of the total number of documents to the number of documents containing the term. IDF(t)​ ​=​ ​log​10​ ​ ​(Total​ ​#​ ​of​ ​documents​ ​/​ ​#​ ​of​ ​documents​ ​containing​ ​term​ ​t) ​ ​ ​ ​ ​ ​ Under this IDF formula, terms appearing in all documents are assumed to be stopwords and subsequently assigned IDF=0. We will use the smoothed version of this formula as follows: ​ IDF(t)​ ​=​ ​log​10​ ​ ​(1​ ​+​ ​Total​ ​#​ ​of​ ​documents​ ​/​ ​#​ ​of​ ​documents​ ​containing​ ​term​ ​t) ​ ​ ​ ​ ​ Practically, smoothed IDF helps alleviating the out of vocabulary problem (OOV), where it is better to return to the user results rather than nothing even if his query matches every single document in the collection. TF-IDF: Term frequency–inverse document frequency (TF-IDF) is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus of documents. It is often used as a weighting factor in information retrieval and text mining. TF-IDF(t,​ ​d)​ ​=​ ​WF(t,d)​ ​*​ ​IDF(t) ​ ​ ​ ​

    Language:Java1
  • Page-Rank-Implementation

    The goal of this programming assignment is to compute the PageRanks of an input set of hyperlinked Wikipedia documents using Hadoop MapReduce. The PageRank score of a web page serves as an indicator of the importance of the page. Many web search engines (e.g., Google) use PageRank scores in some form to rank user-submitted queries. The goals of this assignment are to: 1. Understand the PageRank algorithm and how it works in MapReduce. 2. Implement PageRank and execute it on a large corpus of data. 3. Examine the output from running PageRank on Simple English Wikipedia to measure the relative importance of pages in the corpus. To run your program on the full Simple English Wikipedia archive, you will need to run it on the dsba-hadoop cluster to which you have access.

    Language:Java1
  • StockMeUp

    This is a class project for 'CIS 610 : Data Science' where I try and validate Stock Market recommendations.

    Language:Java1
  • Recommend_Friends_through_MapReduce

    Its a Map Reduce Program which tells you about People you may know on the basis of mutual friends

    Language:Java1
  • Big-Data-Systems

    This repo contains all the assignments, project work on Engineering Big Data Systems coursework

    Language:C#1
  • MapReduce-Spark-Comparison

    Assignment for Big Data Processing: Comparison between Spark and MapReduce programs for analysing large data sets.

    Language:Java1
  • Olympic-Tweets

    Assignment for Big Data Processing: A collection of programs for analysing tweets related to the 2012 Olympics.

    Language:Java1
  • mosquitos-hpc

    Proyecto Java para Hadoop MapReduce que permite ejecutar algoritmos de detección de tendencias sobre series temporales, aplicados a datos de ventas de productos relacionados con control de plagas (repelentes e insecticidas).

    Language:Java
  • MIT6.824-MapReduce

    MapReduce Implementation - Distributed System

    Language:Go
  • Apache-Hadoop-Map-Reduce--Basic-Sentiment-Analysis-on-Yelp-Dataset

    In this project we will use Hadoop MapReduce to implement a very basic “Sentiment Analysis” using the review text in the Yelp Academic Dataset as training data.

    Language:Java
  • bd_lab

    big data lab nmit 6th sem 18isl

    Language:Java
  • MapReduceBasicApplications

    Basic MapReduce applications in Java.

    Language:Java
  • COVID-19-MapReduce-project

    Design and implementation of different MapReduce jobs used to analyze a dataset on Covid-19 disease created by Our World In Data

    Language:Java
  • BigData-Training

    Big data training material

    Language:Python
  • MapReduce-multi-table-merge

    MapReduce multi-table merge MapReduce多表合并

    Language:Java
  • SLR207

    My Télécom Paris SLR207 repo - Implementing MapReduce Algorithm on a distributed networks communicating via socket.

    Language:Java