Infographics – Week 5

Infographic

Monday 25th February 2018
Digital Journalism – Big Data Journalism
Introduction to Data Science
Big Data

Big data is information assets with the four Vs:
• Volume: how much?
• Velocity: growth rate?
• Variety: types of data?
• Veracity: reliability/consistency?


While all four Vs are growing, Variety is becoming the single biggest driver of big-data investments.

Data Security and Governance
Big data environments currently need a complex security architectural model. Security mechanisms: (encryption/obfuscation/loggers/monitors) must protect Data at Rest and Data in transit.

  • Get data
  • Clean, Prepare & Manipulate Data
  • Train Model
  • Test Data
  • Improve

Phase 2 typically represents 80% of the whole analytic process.
Know your data sources
• IoT: Device, Network and Sensor Data
• Provenance of data can be an issue… get consent!
• Use “reliable” open source data repositories
Kaggle, Data.gov.uk etc

Qualitative and Quantitative

Data Integration
• Combining all that data and reconciling it so that it can be used to create reports can be incredibly difficult.
• Vendors offer a variety of ETL and data integration tools designed to make the process easier.
• Many enterprises have not solved the data integration problem yet.

Data Cleansing/Wrangling
Extract, Transform, Load (ETL) is data pre-processing, an essential step in organizing, cleaning & unifying data for a data warehouse.
Generating Useful Insights: Skills
R
• Easy to learn
• Statistics based functions
Python
• Relatively easy to learn
• Requires knowledge of programming fundamentals

Finding Data Sets
Search for central government sources
Office for National statistics – elections data

You can find data for anything that you wish to search for – American Government, European Data etc