Debunking the Word Cloud

Posted by
Vladimir Bagdanov on 3/6/18 8:49 AM

Even the best tool applied to the wrong problem could be much worse than a mediocre one used under the right circumstances. This is exactly the case in point with the word cloud – a ubiquitous data visualization tool that has been around since the 1990s only gaining in popularity and acceptance ever since.

Word clouds come in many shapes and styles:

Various Word Clouds

 

What’s common for all of them is intuitive representation of textual data. The human brain isn’t very good with parsing long lists of words or numbers and putting it all together. By nature, most humans are visual learners.

For example, wouldn’t you rather look at a map to get your bearings than a long list of distances between different locations? Maps offer a quick and accurate perception of the “lay of the land”, or the big picture. In the world of data visualization, maps rule with a lot, if not majority, of graphing technics falling into this category: from simple time graphs to 2- and 3-dimensional clustering.

A word cloud looks like a map, “quakes” like a map, which makes us naturally believe it is one. But is it?

A true map preserves the distances between locations as they are in real life with all complexity of their inter-relations squeezed into a compact visual.

 

Let’s do a little experiment.

Given a list of weighted words, we will build a word cloud. Then marginally tweaking weights for just 3 of them (not the top ones) and keeping all of them in the same order we’ll build another one. Results for both are shown below:

Word Cloud Experiment

It’s easy to see that most words (except for the top ones) have changed their location and are now surrounded by completely new neighbors. If a geographic map behaved this way after every adjustment to a few distances, modern navigation would not be possible.

What happened here should not be a surprise, if we consider what went into creating these two word clouds. The data was a weighted list of words with no information on relationships between these words – just their ranked enumeration. One cannot build a map from a list of cities with just their respected populations. A computer has the same basic limitations and at best comes up with random locations.

In the end we have a diagram that looks like a map and tricks us into interpreting it as such, including proximities between words, all while these proximities are subject to wild changes being random to begin with. This is especially apparent for any two word clouds done for the same subject (e.g. brand-related conversation before and after a campaign).  

No doubt the word cloud is a great tool to spice up boring lists. They can represent prominence (via font size) and categorization (via color). If properly designed, they could be great eye candy. In the end though, they are just another way to look at a weighted list of topics; not even more compact, just prettier. Additionally, one has to always remind himself not to over interpret the word cloud in regards to locations and relative distances (as much as they temp us into doing so thanks to their map-like looks).

But what if a word cloud’s visual appeal could be combined with the accurate representation of relational data found in geographic maps?

Relations (distances) is what makes maps so powerful and, as we are better equipped to deal with the spatial visuals (vs. text and numbers), so much more effective. Relational data is more complex and voluminous than a simple enumeration: think of a square table vs. one column in Excel. It is also much more difficult to come up with: collect, process, standardize. Yet, absent this data, no map can be built.

 

A better solution is coming

Traditional Social Listening platforms measure volumes of different Brands, Products, etc. via user posting activity. This results in the lists of things, e.g. frequencies of words used in conjunction with a Brand mention. That data has been the basis for building word clouds in social analytics so far. Lacking any relational component, it cannot produce a true map of the sematic (topical) representation. The artistic and graphical technics improve over time, but the basic limitations remain the same.

With the second wave of Social Insights analytics (e.g. in the Social Standards offerings) we are starting to see relational data taking the leading role in all interpretive aspects of market tracking and representation. This, in turn, creates the possibility for a new generation of relational maps that not only look pretty, but also accurately represent the data.

The Social Standards platform today offers a complete set of relational data points for all objects tracked within the system resulting in hundreds of millions of fully normalized proximity values (distances between objects) for each month.

With such a large scope of data, one must apply sophisticated tools for extracting meaningful insights. Our platform does just that with real-time sorting, smart targeted filtering, clustering, exception-based highlighting, time-series analysis, and more to result in the proactive detection of key actionable insights relevant for a client’s needs. While these methods work well for detecting specific points of interest, they do not provide a full picture of how different objects relate to each other.

The best-known way of visually capturing the relational complexity within a set of data is a Map. This new exciting functionality is one of the many graphing techniques that have been under development at Social Standards. We’ve recently released this feature into production.

Now users can see how any set of objects relate to each other in terms of their demographic profiles, likes and interests of their audience, activity dynamics, correlation, co-occurrence, etc. Below is an example of one such visualization for BevAl Product Types that reduces all the numeric complexity to the intuitive Relational Map with similar product types being close together and not similar ones far apart.

BevAl Relational MapThis Social Standards Relational Map is grouped by product types in the Beverage Alcohol industry.

Other visualization features that are still under development should be gradually appearing in our products over the course of this year. Stay tuned for more updates.

 

Topics: social data, word cloud, data visualization, relational map

Subscribe to our blog!