Data Visualization

What is Data Visualization? Why is it important?

Data visualization is the process of mapping data to visuals.

Data visualization aids humans to make decisions or explorations.

“Visualization gives you answers to questions you didn’t know you have” – Ben Schneiderman

More formally it is the use of computer-supported, interactive visual objects of data aimed to increase cognition.

$7 \pm 2$ is the number of items an average human holds in working memory. – George Miller, 1956.

Highlight the important parts of complex big data. Data -> Insights.

We are in Big Data, but data is only going to get bigger. That means distilling what is important and giving people a platform to explore will be key.

Human-Computer Interaction (HCI) + Data Mining = Data Visualization.

Automatic, summaries, and interactive and visualization.

You can highlight the most abnormal or most interesting things, maybe the first 5. Look at these 5 and trust me…but we as humans want to know why and see proof. Hence interactivity can come into play. Usability really matters!

“Computers are incredibly fast, accurate and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Togehter they are powerful beyond imagination”

Human Perception

Data viz should leverage human perception, after all, you’re trying to turn data into a sense.

Out of all our senses, sight can process WAY more data/second than the rest of the senses combined. The human eye can process a LOT of information very quickly.

The question you have to keep in mind is how do I make it as easy as possible for my viewer to see what is important to them.

A good tip is to not draw attention to anything that doesn’t add to the story.

People love pictures and movies. Use them early and often! When you have good stuff, show them right away. Have an image that shows the main idea.

Images should be self-contained so they don’t have to jump back and forth. Eyes are drawn to images and they are lazy.

Stage 1: Pre-Attentive Processing

Very rapid, parallel, and automatic, but only lasts a short time. The eye moves every 200 ms and each time we process more data.

Things we can use to get our point across:

  • Color
  • Opacity
  • Shape
  • Length
  • Size

But color and shape together is not a pre-attentive process. Alone they are, but together it gets difficult.

Color is special Color calls attention to information, increases the beauty, make it more memorable. But more color does not mean better. Use it sparingly and wisely!

Stage 2: Serial Processing

Relatively slow, incorporates memory, more manual. Look at things one by one, try to memorize what we see. We choose what we want to process.

Gestalt Psychology

Gestalt Psychology was done to try to understand pattern perception. It tried to answer how do we see things in collection:

1) Proximity

  • How man groups are there? Usually you’d base this off of spacing. Items that are close together are grouped together.

2) Similarity

  • Find common things and see them as groups

3) Closure

  • Even if you have a broken segmented shape, we can fill in the lines and understand their whole.

4) Symmetry

  • If there are multiples of mirror images, we group them as one.

5) Common Fate

  • Continuity of animation or transportation. What way does stuff flow?

6) Continuity

  • Human can understand depth and breaks in depth.

7) Good Gestalt

  • Humans can understand negative space and interpret the missing shapes to make hidden shapes.

8) Past Experience

  • Humans use past experience to understand graphs

What graphs confuse people?

See paper Crowdsourcing Grpahical Perception: Musing Mechanical Turk to Asses Visualiztion Design

Bar charts, scatter plots, and line charts are very effective for quantitative data. Circular and rectangular areas don’t do very well. Positions and lengths usually lead to good graphs. Areas and volumes, not so much. Color is interesting because we detect them quickly, but we can’t quantify the differences very well.

Pie Charts

People hate pie charts. Because it’s hard to tell what the difference between the angles are. People hate 3D pie charts even more. It just makes it way harder to read and adds nothing. Pies remain very popular however.

Tufte’s Principles

1) Do not lie

2) Maximize Data-Ink Ratio

3) Minimize Chart Junk

Data Visualization Software

Click and Drag Dashboards/Graphs

Tableau

PowerBI

Google Data Studio

Flourish

Excel

Qlik

Spotfire

Looker

Klipfolio

Python

Matplotlib

Plotly

Dash

Seaborn

R

ggplot

Plotly

JS

D3

Other Tips

Use a log axis when the data is very separated and wide.

Axis labels should also not have rotated text, if you need to rotate, try truncating or rotating one of the axis.

Great places to learn

The NY Times