Types of data

3 types of data exist:

  • structured
  • unstructured
  • semi-structured

Structured data:

  • Until recently written by hand and kept in notebooks, this data is now stored electronically on spreadsheets and databases, and consists of spreadsheet-style tables with rows and columns, each row being a record and each column a well-defined field (name, date of birth…).
  • We are contributing to these structured data when, for example, we provide the information necessary to order goods online.
  • Carefully structured and tabulated data is relatively easy to manage and is amenable to statistical analysis, indeed until recently statistical analysis methods could apply only to structured data.

Unstructured data:

  • By contrast, unstructured data is not so easily categorized. It includes:
  • videos
  • photos
  • tweets
  • word-processing documents
  • Once the use of the world-wide web became widespread, it transpired that many such potential sources of information remained inaccessible because hey lacked the structure needed for existing analytic techniques to be applied.

Semi-structured data:

  • However by identifying key features data that appears initially to be unstructured may not be completely without structure.
  • Emails for example, contain for example structured metadata in the heading as well as the actual unstructured messages in the text so it may be classified as semi-structured data.
  • Metadata tags, which are essentially descriptive references, can be used to add some structure to unstructured data.
  • Adding a word to an image on a website makes it identifiable and subsequently easier to search for. Semi-structured data is also found on social networking websites, which use hashtags so that messages (which are unstructured data) on a particular topic can be identified.
  • Dealing with unstructured data is challenging: since it cannot be stored in traditional databases or spreadsheets, special tools have to be developed to extract useful information.

Reference: Big Data: A very short introduction by Dawn E. Holmes. Oxford University Press, 2017

Categories

Scroll to Top