Tableau is a very effective tool for basic geographical mapping, which can be powerful for insight work, exploratory analysis and even regular reporting. Yet while the process is dead simple with global or USA datasets, it’s slightly more involved for UK data. Only a few intermediate steps are required, but I’ve seen people thrown off and assume that Tableau doesn’t work simply because their data wasn’t in quite the right format.
I will explain the basics along with worked examples from publicly available datasets. I assume a rudimentary knowledge of Tableau, but no mapping experience is required. I’m splitting the UK mapping topic across three posts:
- Part 1 (this post) covers the basics, focusing on towns and cities, and also covers colour palette choice
- Part 2 will cover mapping UK postcodes – both postal sector and full postcode
- Part 3 will cover advanced area mapping such as county-level
Two principles for all UK mapping
- Tableau doesn’t always immediately understand when a field represents a geographical location. You may need to tell Tableau that the data is geographical, and that it relates to the UK.
- Tableau doesn’t work directly with full postcodes, e.g. SW1A 1AA; but it inherently understands the postal sector, e.g. SW1A. You can pre-transform the data, or create a calculated field to strip out only the postal sector (This will be covered in the next post)
Cities and towns
Tableau has inbuilt ‘understanding’ of around 750 towns and cities in the UK. It refers to these all as cities. While every self-respecting British citizen knows of course that city status is complex and historical and often requires a cathedral, the American-born Tableau doesn’t care about this distinction. As such, the city of St David’s / Tyddewi (population 1,841 at last census) is not recognised by Tableau.
Cities are represented as a single location, so can be mapped as a point but not as an area. England, Scotland, Wales and Northern Ireland are all covered.
Counties, boroughs and wards
Much publicly available UK data is at county, borough or ward level. If working with such datasets, avoid the temptation to call them ‘cities’ and have Tableau map it this way. While many of the points will be matched because they also coincide with a town, a substantial proportion will be unmatched. And no, you won’t end up with a random representative sample of towns – rather entire regions will be sparse or empty while others will be well represented – you’ll get a very skewed view of reality. In part 3 of this mapping series I will cover the more advanced techniques to map such data correctly.
Worked Example – House Prices
I have combined a couple of ONS datasets relating to median house prices in Q2 2015 by city, and the 5-year relative change in median house price values. I’ve uploaded these as a csv to Gist here to save you time. If you want to source the data directly the files are available from the ONS here – specifically the csv files against figures 1 and 2 – not the tables but the charts.
The final workbook is available on Tableau Public here for reference. You can view the workbook or download it from the link at the bottom of the window.
Step 1 – Download the data from Gist here. If you download the zip, the filename is long, so you may want to rename before extracting the file.
Step 2 – Load the data into Tableau and go straight to worksheet 1
The ‘location’ field has not been recognised by Tableau as having a geographical role; it is being treated as normal string datatype. If you highlight the field and click ‘show me’ you will note that both map types are greyed out. If the field had originally been named ‘City’ or ‘Town’ or similar, Tableau would have identified it as geographic automatically. (In fact, in the original source it was named ‘Town/City’, but I renamed to location to demonstrate this point)
Step 3 – Give the ‘location’ field a geographical context. Right click on the field, hover over ‘geographical role’ and click on ‘City’. The icon next to the field will change to a globe, representing the geographical datatype.
Next, we need to ensure that Tableau has understood the data related to the UK. If your system locale is set to UK and you’re working with a clean dataset like the example, then Tableau will usually understand this. However, with large real-life datasets and the associated data quality issues, this can often be a problem.
Step 4 – Click the Map menu up top and select ‘Edit Locations’. Click on the first dropdown (country/region), ensure that ‘fixed’ is selected, and that ‘United Kingdom’ is selected from the dropdown.
You will note that several locations are flagged up red in the location matcher. The location matcher can be particularly useful where you either have data-quality issues such as typos, or where your city names are slightly different, for example your data may say ‘Burton’ whereas Tableau wants ‘Burton upon Trent’. The red flags indicate either:
- Ambiguity – where a name matches multiple known cities; or unrecognised areas. You can specify latitude and longitude manually.
- Unknown areas – you can either choose the matched location from the dropdown, or specify latitude and longitude manually.
I am going to ignore the three unmatched locations for this example.
Step 5 – To appreciate this point more broadly, repeat Step 4, but this time instead of selecting ‘fixed’ in the county/region dropdown, select ‘none’.
You will note that many more cities become flagged red as ‘ambiguous’. This is because cities such as Birmingham, York etc are also major cities in the USA (or elsewhere in the world). Without the context of UK to work against, Tableau is unsure where exactly you mean.
Step 6 – Switch back to ‘Fixed’ and United Kingdom.
Your data is now setup correctly. You will have noticed that a latitude and longitude field have been automatically created. I will detail the use of these further in the next post, but for now you can ignore them.
Time to create the maps…
Step 7 – Click the ‘show me’ button in the top right. You will note that only symbol maps (points) are available, and area maps are greyed out. As mentioned earlier, Tableau only understands cities as points; area mapping for postcodes will be covered later. Click on the Symbol maps icon to generate a map.
Getting there… Tableau has produced the points, but has chosen the size to represent the price change, and the colour to represent the price. This works theoretically, but is not an intuitive view. One particular problem is that size cannot represent negative change effectively. We can’t have negative size, so the zero-point is unclear.
In cases like this we are better off using size to represent the absolute quantity (price), and colour to represent the change.
Step 8 – Drag the price measure (‘Median price 2015 Q2’) to the size shelf, then drag the % change measure to the colours shelf.
The cities with negative growth are now immediately obvious as the orange points. We can see that Cardiff and parts of the north west saw negative growth; the midlands and much of the north was moderately positive, and parts of the south-east and London saw the strongest growth.
A brief detour here. We can see an interesting pattern – in general the more expensive cities saw far greater price rises. Now to an extent this is a mathematical inevitability, since we’re looking at the final price which incorporates the change. But the trend appears too strong to be merely that. To confirm this, I created a measure of the original price (Q2 2010) and plotted this against the change – the correlation is extremely strong. This could be the result of either a correction after the 2008 crash, a new housing bubble forming in wealthy areas, or a true sign of growing wealth inequality.
Back to the map… The grey bubbles against a grey-ish background are still a bit of an issue. There is no consistent ideal solution to this, but it can be improved.
Step 9 – Click on the Map menu, then ‘Map Layers…’ and try playing around with style and washout until you minimise this. I went for ‘Normal’ background style with 30% washout.
Roads and other markers
I also find that adding roads as a map layer often helps people identify locations more easily, and this is even more true when we are zoomed in to a single city – the network of roads typically converging on the centre or circling around it acts as a reference base. Place names can be helpful, but can also be obscured by points in a busy dataset. Try to avoid crowding the background layer with too much information or too great a depth of colour, as the actual data points will lose clarity.
Colour palette selection
Colour palette selection is vital for effective mapping. There are several considerations:
- Do you need to show a neutral zero-point, or merely a continuous range?
- What is the background layer of your map like? Will it blend in too much?
- What is the distribution of the data? Are there major outliers or skew?
To demonstrate, one final example. I will focus on price only using a more expressive colour scale.
Step 10: Duplicate the map sheet from above, to work on a copy
Step 11: Move median price from the size shelf to the colours shelf (we want to get rid of change entirely for this example)
Step 12: Click on the colour shelf and press ‘edit colours’. Select the ‘temperature diverging’ as the palette – this is right near the bottom of the list. Press OK
This is a fairly effective view of the price distribution, certainly better than a single-colour palette. However, you will notice that most of the points are Green. This is because the distribution is quite heavily right skewed, i.e. many of the locations are clustered at the lower price ranges, while a minority have far higher prices (that’s not a technical definition, but you get the idea). As can be seen below, roughly 87% of points lie below the mid-point of the distribution’s range.
The problem for the map, is that important differences at the lower end are obscured. We can barely tell a colour difference between Burnley (£78k) and Leicester (£140k). The downside of colour scales is that they are very sensitive to skew, and very sensitive to outliers. The point on outliers becomes even more important when you are dealing with large datasets.
There is no perfect solution here, it depends what you are wanting to show. Potential options include:
- Set a maximum point on the colour range, this can be done from the colour pallet > advanced options. This is vital when you have major outliers typically in a large dataset, or 99% of points will lie right near the bottom and be the same colour.
- Use a logarithmic colour scale. This can be achieved by creating a calculated field which is the log of the price, and setting this as the colour. Meanwhile while the labels and/or popups can still be the actual price. Don’t worry about choice of log base, as most people can’t interpret logs in their head anyway.
- An improvement on logs that many people find easier to interpret, is to use a small number of discrete buckets spanning several colours. For example, in the above you could have light green representing £50-£100k, yellow for £100-£200, orange for £200-£400 and finally red for £400+. This is best achieved with a calculated field hardcoding each range, rather than a fancy formula, or you’ll end up with odd looking cutoff points such as £272,835.23.
Play around with the colour scales and various transforms to see what suits. The solution is highly specific to your dataset, and the message you’re trying to convey in the chart.
This post has covered the basics of UK mapping and some principles around effective mapping and colour choice. For many commercial datasets, mapping can be done around postcode and I’ll cover this in the next post. Finally in part 3 I will cover mapping of counties, boroughs and wards – a lot of public sector data and publicly available datasets are at these levels.
If you found this useful and are interested in the next topics please subscribe below to be notified.