Data Visualizations: What to Ask Before Believing

The Brain and I
7 min readAug 22, 2023

--

When was the last time you gave a second thought to the fancy charts and graphs in those polished reports?

When confronted with nice professional looking visualizations, it’s easy to take them at face value. It’s easy to just start reading the numbers, evaluating the percentages, and judge the results.

Is it really a good idea to do that? What would be a better approach?

Photo by Firmbee.com on Unsplash

Belief at first sight are not a good idea

When we first see a data visualization, we see more than just numbers. What we see is intertwined with our individual and cultural experiences, which are influenced by the history of how visual media have been perceived — as a way to portray an objective reality.

In this interesting article (2019), Kosminsky et al. discuss how the trend of believing photographs, as a natural and accurate representation of reality, extends to data visualizations.

This eagerness we humans display to jump on results faster than if we used a trampoline, should be the result of System I’s work. In his bestseller book, “Thinking, Fast and Slow”, Kahneman presents two different ways of decision making: System I and System II. The first one is a faster intuition-based method and the second one is analytical-based. Making analytical-based decision requires more time and energy. It is definitely not our brain first choice when having to make a choice. When we first lay eyes on a visualization, our brain is probably in system I mode. This decision-making method is based on trusting instinct and personal experience. If what you “see/know in real life” is represented on the chart, you are very likely to believe any chart. The problem here is that our personal experience is not rapresentative of the whole issue and might lead to biases. A bias is a distorsion of a result that leads to unfair prejudice and misinterpretation. So you will believe some charts to be true not because analytical reasons, but because it resembles your experience so far. If you are certain you have no biases or prejudices, you are in pretty good company. Most of us judge ourselves to be less biased than everyone else — or than how much biased we really are (check the article here).

So, it’s pretty clear that hitting the brakes before buying into any random visualization is a good idea. But how can we become savvy detectives of data and charts?

Say hello to critical thinking.

Critical Thinking Wins

Critical thinking is a way of thinking that looks at facts to judge the reliability of what’s presented to us. If you are not familiar with the term or with the many ways of thinking, check this article for an overview of most ways of thinking. By letting critical thinking take the wheel, we can give ourselves a chance to pause, reflect, and make better-informed decisions. It’s like letting the wise owl in us kick in and steer the ship. In this case, the wise owl is also known as System II.

When applying critical thinking, we should keep thinking like a detective or a journalist. Why am I presented this? What am I really looking at? How can I make sure this is an accurate representation of the global situation? Are there any relevant data left out? When was the data taken? These are just a few example of questions we should ask ourselves before even starting reading the visualization we are looking at. It might seem like a lot of questions to remember, so let’s start out with just four things to check.

Ask before Believing

The four S’es: Sample, Source, Size, Situation.

Upon stambling on some visualization, check the following four topics.

Sample. Sample is the statistical term to identify a smaller or more manageable representation of a larger group. This larger group it’s called population, and it encompasse all the possibilities available.

Let’s use an example to explain.

Imagine I want to open a pet food shop. I want to know what kind of products I should sell, so I decide to ask pet owners what’s their pet’s favourite food.

All the people owning pets in the world make up for the population. Maybe I want to be a local shop, so all the pet owners in my city are the population. The population is the group of all the people or things that are in the category I decide to study. Since I cannot really go around and ask everyone in my city — or in the world — I decide to ask a few people. These few people make up the sample. So my visualization is really based on a percentage of people that are also pet owners.

I do some survey and make a barchart with the results. This is the result:

Report on the survey about pet’s favourite food

Now, I figured out that all pets prefer fish, so I decide to open a fish shop. Some of my friends are feeling a little uneasy with this decision of mine, and start asking about the people I surveyed.

“How did you find these pet’s owners?”. It comes up that I asked all the pet owners in my building. And they all have cats. So in the end I only asked people who own cats. In this example, instead of asking people who own all kinds of pets, I focused of a specific pet, the cat. But I used the results of the survey to enforce my knowledge on all animals.

It is fine to use a sample for the visualization — as long as you check some other things we’ll talk about later on. The problem is that in this case the sample does not represents the population anymore. If the data visualization relates to a general situation, the sample should have the same variety as the population, on a smaller scale.

Wrong and correct sampling (simplified example)

Source. Data can be found everywhere. Some sources are more reliable than others. When you are presenting some visualization to your boss, the CEO, or some investors, you might want to be correct. The most reliable the source of this data, the better.

Let’s continue with our example to explain.

After my friends talked me out of asking more of my neighbours, I decided to go on internet and find a better sample. I find two different files full of data. The first one is on a government website, one that makes open data available to everyone. I keep looking and I find another file. This last one is on an obscure social media account which was opened one year ago and has only this file on it. The Twitter handle is “CatL0ver34XOXO”. These files have data that gives different visualizations. Which one should I use?

Not all decisions of data sources are this cut and dry, this being an example and all. Knowing where the data is coming from and how it was collected and cleaned helps when judging reliability.

Size. Size of the data is another important information to know. In statistics there are pretty strict rules on how much data you should have at least. From then on, the more data the more reliable the study.

In this article, we are covering basics of statistics without going very in depth into the theory, so I’ll give a rule-of-thumb number. When it comes to a population, however many cases are in this population, it’s fine. The rule-of-thumb number really applies to the amount of cases in a sample. There are situations in which less than this number is ok to have. For any visualization, you should have at least 50 cases to be confortable.

Let’s continue with our example to explain.

In my survey of favourite pet’s food, I should find 50 cases per pet I intend to provide food for. So I should talk with 50 dog’s owners, 50 cat’s owners, and so on. If for some reasons that’s not possible, at the very least there should be 5 cases per type of pet. It matters because it might happen that I talk to an owner of some odd cat that is allergic to fish. In that case I want more than one case to have a better idea of a general trend, not some specific case.

Situation. Can the population be used to demonstrate what the visualization conveys in the first place?

Let’s continue with our example to explain.

Now I’m telling my friends and possible investors about my survey. I notice that none of the pet owners mention vegetables being their pet’s favourite food. So I start saying that every pet hates vegetables.

In this case my friends should make me realize that the survey was about the favourite food. Since I didn’t ask the question “What food does your pet hate?” I cannot jump to other conclusions but what’s the favourite pet’s food. Maybe vegetables is every pet’s second favourite choice. Or not. I wouldn’t know because I didn’t ask.

As in the example, sometimes results are used to promote additional points, which where not the focus of the data gathering. Be very careful of what data was actually measured, asked, or gathered. Does it match the point the visualization is trying to make?

To sum up, we should not believe every nice visualization we find. Especially if it comes without any information on the data source and how it was collected. In such cases turn on your critical thinking, kick your system II awake, and be as inquisitive as Sherlock Holmes!!

--

--

The Brain and I
The Brain and I

Responses (1)