Big data defined

Big data cannot be defined in terms of absolute volume, because it increases so quickly. It is easier to define using the 5 v’s (see below)

Moore’s law (1965)

  • The fabulous increase in processing and storage capabilities since the 60s have obeyed Moore’s law.
  • This law states that the number of transistors which can be added on a chip doubles every 24 months. This has implications for processing speed and data storage: a processors performance doubles every 18 months (this was introduced by David House).

Moore’s law is named after Gordon Moore, the founder of Intel

Note that there is a limit to this self-fulfilling prophecy. By doubling the number of processors on a chip, the distance between transistors decreases. It was 14nanometers (nm) at the times of publication (2017). When it reaches 2-3nm there are quantum uncertainties which might kick in, as predicted by Paolo Gargini.

The 5 v’s

Volume

  • There is no apparent limit. The term “big” a few years back is no longer valid.
  • This kind of volume cannot be collected and stored with traditional computing.

Variety

  • Data is full of diversity and combining it makes data combining possibilities endless.
  • Health data can consist of images, clinical information, metadata, genetic information (genomics)…
  • Data can be structured (electronic medical records), semi-structured (images with tags for example) or unstructured (social media).
  • Data of use in health can also include environmental data such as weather data, geolocalisation data (altitude, latitude).

Velocity

  • Data streams continuously from phones, webs and sensors (internet of things IOT).
  • A lot of it is being produced: as technology advances, everything becomes faster.
  • It spreads quickly around the world: one popular social media post which is reshared spreads around the web like a virus.
  • It doesn’t consider healthcare currently, but some of the data has to be processed instantly: driverless cars for example.

Veracity

  • It’s very important in healthcare that in order to explain alroithmic solutions, that the data is reliable in its ource and purpose…it’s a matter of trust.

A lot of data somes from social media and it is a recognized discipline which can be used for eample for real-time epidemiology monitoring. Take a picture of a viral rash being geolocalized to some adjacent are of a cluster of affected individuals.

So a lot of data again is unstructured and no specific questzion is in mind. And the more data we have, the more statistical power we have, and the associations are not always relevant: look at the strange association between “high school basketball” google searches and the flu.

Visualisation

  • Data is a simple thing apart that there is a lot of it. However when artificial intelligence brings solutions and interpretability to the data, visual tools are essential to make the mind be understanding to them.

However, I personally believe as a neuroscience fan (after all until the end of medical school, I wanted to become a neurosurgeon) that there is a supratentorial brain concerned with pure intelligence without emotions but there is the limbic brain where signals between external stimuli (or data) simultaneously reach both parts of the brain.
Even with simple visualisation the shear amount of information sources become somewhat overwhelming for a human being, and I am not sure that collective intelligence is an improvement, but rather a herd mentality of a growing crowd made possible through technically.
One concrete example would be the pandemic. There have been various sources of data and the monitoring has been fantastic, these tools have helped elaborate vaccines. However some of the decisions seem irrational especially when the dangers related to COVID have been made excessively visible and I am not sure visualisation is enough.
There is simply too much data thrown at one’s face and I am not sure all the decisions made by those in power were rational or an excess of data thrown into everyone’s face, including the media.

A 6th one: Value

  • quality of results derived from big data analysis
  • quality for who look at pandemic
  • for politicians
  • for economists
  • for society
  • for the economy

Reference: Big Data: A very short introduction by Dawn E. Holmes. Oxford University Press, 2017

Categories

Scroll to Top