Big_data_Informative

Major V's in Big Data

  • Characteristics of Big Data

Big data is commonly characterised using a number of V's.

The first three are volume, velocity, and variety. 1] Volume refers to the vast amounts of data that is generated every second,minutes, hour, and day in our world.volume is the dimension of big data related to its size and its exponential growth. The challenges with working with volumes of big data include cost, scalability and performance related to their storage, access, and processing.(volume==size)

2] Variety refers to the ever increasing different forms that data can come in such as text, images, voice, and geospatial data. (Variety==complexity) → Structural variety refers to the difference in the representation of the data. Example: A satellite image of wildfires from NASA is very different from tweets sent out by people who are seeing the fire spread. → Media variety refers to the medium in which the data gets delivered. The audio of a speech versus the transcript of the speech may represent the same information in two different media. →Semantic variety refers to use different units for quantities we measure. For example, age can be a number or we represent it by terms like infant, juvenile, or adult.

3] Velocity refers to the speed at which data is being generated and the pace at which data moves from one point to the next.(velocity==speed) More Vs have been introduced to the big data community as we discover new challenges and ways to define big data.That is Veracity and Valence.

4] Veracity refers to the biases, noise, and abnormality in data. It refers to the often unmeasurable uncertainties , truthfulness and trustworthiness of data.(veracity==quality)

5] Valence refers to the connectedness of big data in the form of graphs, just like atoms. The more connected data is, the higher it's valences.Valence word is come from chemistry. Valence electrons are in the outer most shell, have the highest energy level and are responsible for bonding with other atoms. That higher valence results in greater boding, that is greater connectedness.For a data collection valence measures the ratio of actually connected data items to the possible number of connections that could occur within the collection.(valence==connectedness)

And last is 6] Value.Without a clear strategy and an objective with the value they are getting from big data. It is easy to imagine that organizations will be sidetracked by all these challenges of big data, and not be able to turn them into opportunities. So,The idea behind processing all this big data in the first place is to bring value to the problem at hand.

Four major parts that need to be in any strategy. Namely, aim, policy, plan, and action.

Five P's of Data Science