apache-pig

Apache Pig is a platform used for analyzing large data sets, using a high-level lenguage called Pig Latin, which has a large capacity of parallelization. At the moment, Pig's infrastructure layer consists of a compiler that produces sequences of Map-Reduce programs, for which large-scale parallel implementations already exist (the Hadoop subproject, for example).

It is important to explain that Apache Pig can be runned locally, but it is better to run it on a Hadoop Cluster, since its goal is the processing of very large data sets, which implicates on high processing times.

This repository at the moment does not contain explanations on how to set up and install the Apache Pig platform or the Hadoop Cluster. That will be made on future updates

jm-valle/apache-pig

apache-pig