Data Visualization
What is Data Visualization? Why is it important?
Data visualization is the process of mapping data to visuals.
Data visualization aids humans to make decisions or explorations.
“Visualization gives you answers to questions you didn’t know you have” – Ben Schneiderman
More formally it is the use of computer-supported, interactive visual objects of data aimed to increase cognition.
$7 \pm 2$ is the number of items an average human holds in working memory. – George Miller, 1956.
Highlight the important parts of complex big data. Data -> Insights.
We are in Big Data, but data is only going to get bigger. That means distilling what is important and giving people a platform to explore will be key.
Human-Computer Interaction (HCI) + Data Mining = Data Visualization.
Automatic, summaries, and interactive and visualization.
You can highlight the most abnormal or most interesting things, maybe the first 5. Look at these 5 and trust me…but we as humans want to know why and see proof. Hence interactivity can come into play. Usability really matters!
“Computers are incredibly fast, accurate and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Togehter they are powerful beyond imagination”
Human Perception
Data viz should leverage human perception, after all, you’re trying to turn data into a sense.
Out of all our senses, sight can process WAY more data/second than the rest of the senses combined. The human eye can process a LOT of information very quickly.
The question you have to keep in mind is how do I make it as easy as possible for my viewer to see what is important to them.
A good tip is to not draw attention to anything that doesn’t add to the story.
People love pictures and movies. Use them early and often! When you have good stuff, show them right away. Have an image that shows the main idea.
Images should be self-contained so they don’t have to jump back and forth. Eyes are drawn to images and they are lazy.
Stage 1: Pre-Attentive Processing
Very rapid, parallel, and automatic, but only lasts a short time. The eye moves every 200 ms and each time we process more data.
Things we can use to get our point across:
- Color
- Opacity
- Shape
- Length
- Size
But color and shape together is not a pre-attentive process. Alone they are, but together it gets difficult.
Color is special Color calls attention to information, increases the beauty, make it more memorable. But more color does not mean better. Use it sparingly and wisely!
Stage 2: Serial Processing
Relatively slow, incorporates memory, more manual. Look at things one by one, try to memorize what we see. We choose what we want to process.
Gestalt Psychology
Gestalt Psychology was done to try to understand pattern perception. It tried to answer how do we see things in collection:
1) Proximity
- How man groups are there? Usually you’d base this off of spacing. Items that are close together are grouped together.
2) Similarity
- Find common things and see them as groups
3) Closure
- Even if you have a broken segmented shape, we can fill in the lines and understand their whole.
4) Symmetry
- If there are multiples of mirror images, we group them as one.
5) Common Fate
- Continuity of animation or transportation. What way does stuff flow?
6) Continuity
- Human can understand depth and breaks in depth.
7) Good Gestalt
- Humans can understand negative space and interpret the missing shapes to make hidden shapes.
8) Past Experience
- Humans use past experience to understand graphs
What graphs confuse people?
See paper Crowdsourcing Grpahical Perception: Musing Mechanical Turk to Asses Visualiztion Design
Bar charts, scatter plots, and line charts are very effective for quantitative data. Circular and rectangular areas don’t do very well. Positions and lengths usually lead to good graphs. Areas and volumes, not so much. Color is interesting because we detect them quickly, but we can’t quantify the differences very well.
Pie Charts
People hate pie charts. Because it’s hard to tell what the difference between the angles are. People hate 3D pie charts even more. It just makes it way harder to read and adds nothing. Pies remain very popular however.
Tufte’s Principles
1) Do not lie
2) Maximize Data-Ink Ratio
3) Minimize Chart Junk
Data Visualization Software
Click and Drag Dashboards/Graphs
Tableau
PowerBI
Google Data Studio
Flourish
Excel
Qlik
Spotfire
Looker
Klipfolio
Python
Matplotlib
Plotly
Dash
Seaborn
R
ggplot
Plotly
JS
D3
Other Tips
Use a log axis when the data is very separated and wide.
Axis labels should also not have rotated text, if you need to rotate, try truncating or rotating one of the axis.