1. What is MapReduce?

MapReduce is a programming paradigm for big data processing, where data is partitioned into distributed chunks and processed by a series of transformations.

The MapReduce programming paradigm processes data in 2 operations: map() and then reduce(). map() is a user-defined function that maps each data record in the data collection. reduce() groups the output of map() with another user-defined function.

2. MapReduce Pipeline

MapReduce works on (key, value) pairs by performing the below steps:

INPUT: list of key-value pairs (k1, v1)

MAP: (k1, v1) >> [list of (k2, v2)]

SHUFFLE: combine (k2, v2) >> (k2, [list of…


The article contains three visualizations of the dataset share-of-individuals-using-the-internet from ourworldindata.org. The dataset describes the percentage of the population in a country or a region that uses the Internet from 1990 to 2017. The Entity column contains both country-level data and region level data. This dataset is displayed in the table below.

  1. In the first visualization, I took out the 2016 data for 10 aggregated regions including Arab World, Caribbean small states, Central Europe and the Baltics, East Asia & Pacific, Europe & Central Asia, Latin America & Caribbean, Middle East & North Africa, Pacific island small states, South…

This article contains 3 visualizations for the cellphone_per_100_people dataset from gapminder.org. Below is the data table, you can search for a specific value by typing it in the box under the variable name.

Here I randomly selected one country — Brazil to observe its cellphone ownership per 100 people over time. Notice that in 2014, Brazil reached its peak of cellphone prevalence at an average of 1.38 phone per person. In 2019, the average ownership dropped down to around 1 phone per person.

2. For this second visualization, I selected the four countries that have the highest average cellphone…


1. Using Data Visualization to Find Insights in Data

Data is invisible.

Data tables are also a type of data visualization, but tables alone don’t allow us to immediately identify patterns within the data

2. How to visualize?

- types of visualization

  1. Tables are very powerful when you are dealing with a relatively small number of data points and data with a single variable. (Edward Tufte suggested including small chart pieces within table columns.)
  2. Bar charts are perfect for categorical comparison.
  3. Line charts are especially suited for showing temporal evolutions. Like time-series data (where one variable is time/date/numbers of the same intervals).
  4. Scatter plot. (Date/Time data of different intervals is not time-series data and should be visualized…

Vicky Bee Wu

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store