In this blog post I will show you how to create three types of heatmaps in R by using three different functions. The heatmap is a type of chart which is very intuitive. If you use colors in the right way, there is almost no explanation needed to understand your conclusion. In this article I will first focus on the easiest function which does not need a lot of commentation. The second function is slightly more advanced, whereas the third variation is the one I typically use in businesses.
For this heatmap I use a data set from the Dutch national weather service, KNMI. They collect all sorts of information on different locations in the Netherlands. You can find information about rainfall, hours of sun per day, humidity and a lot more variables. I’m going to download the data set with temperatures per day since 2000. I want to find out if the average temperature per month has changed since 2000.
The first few lines of code loads a couple of libraries in R. Then the data is loaded which contains three columns: the selected weather station, the date and temperature.
library(reshape) # used for cast function
library(plyr) # used for ddply function
library(RColorBrewer) # used to customize heatmap colors
# Read data
data <- read.table('.../KNMI_temperature.txt',
sep = ',',
col.names = c("Station","Date","Temperature"))
# Station Date Temperature
#1 260 20000101 81
#2 260 20000102 87
#3 260 20000103 96
#4 260 20000104 94
#5 260 20000105 74
#6 260 20000106 91
I want the heatmap to show the years and months on the axes, the color is based on the average temperature. We need to split the date field into a year, month and day column. With these columns, we can calculate the average temperature per month. The temperature is in 0.1 Celsius degrees, I divide this by 10 to make it a temperature we normally use.
# Create columns for year, month and day and calculate the temperature in Celsius
data$Year <- substr(data$Date, 0, 4)
data$Month <- substr(data$Date, 5, 6)
data$Day <- substr(data$Date, 7, 8)
data$Temperature <- data$Temperature/10
After we have added the necessary columns, we can calculate the average temperature per month and cast it to a matrix format. I’m doing some additional steps: set the year numbers as row names, transpose the matrix and reorder the month numbers. These steps make sure that the y-axis of the heatmap shows the month numbers starting from January to December and the years on the x-axis.
data_heatmap <- ddply(data, .(Year, Month), summarize, Temp=mean(Temperature))
# Year Month Temp
#1 2000 01 6.561290
#2 2000 02 8.941379
#3 2000 03 10.112903
#4 2000 04 14.753333
#5 2000 05 20.022581
#6 2000 06 21.073333
data_heatmap_matrix <- cast(data_heatmap, Year ~ Month)
rownames(data_heatmap_matrix) <- data_heatmap_matrix[,1]
data_heatmap_matrix[,1] <- NULL
data_heatmap_matrix <- as.matrix(t(data_heatmap_matrix))
data_heatmap_matrix <- data_heatmap_matrix[nrow(data_heatmap_matrix):1,]
# 2000 2001 2002 2003 2004
#01 6.561290 5.332258 6.706452 5.161290 6.025806
#02 8.941379 7.792857 10.164286 6.335714 7.924138
#03 10.112903 8.032258 11.787097 12.761290 10.261290
#04 14.753333 12.580000 14.303333 15.530000 15.733333
#05 20.022581 19.358065 18.248387 18.222581 17.580645
The basic heatmap function in R does everything you need without any fancy stuff. Perfect if you want to explore the data or if you are already happy with the visual itself. You can easily change the colors. In this article I use the RColorBrewer library for each heatmap. I’m not going into details about this library since there is so much to tell. I will explain the possibilities of the RColorBrewer library in another blog post.
The following code produces a simple heatmap:
heatmap(data_heatmap_matrix1, Colv = NA, Rowv = NA,
scale = "none",
col = rev(brewer.pal(11, "RdBu")),
main = "Temperature in the Netherlands",
xlab = "Year",
ylab = "Month",
margins = c(4, 4))
Unfortunately, the heatmap() function has no possibility to include a legend. Ideally I want to have the y-axis on the left side of the chart. This is only possible if you adjust the code of the function itself.
If you want a slightly more sophisticated heatmap, you can use the heatmap.2() function. This function has more possibilities such as adding a legend and distribution options. For this heatmap I add a legend which shows what the blue and red colors mean. dendrogram = "none" removes the tree structure for the variables, Colv = FALSE and Rowv = FALSE allow you to present the categories in the order of the matrix. If this is excluded from the code, the algorithm will decide in which order to present the categories. By adding trace ="none" the distribution lines disappear from the chart. In this case it would be distracting, but in many cases it is worth adding distribution lines.
# Heatmap two
dendrogram = "none", Colv = FALSE, Rowv = FALSE,
scale = "none", col = rev(brewer.pal(11, "RdBu")),
key = TRUE, density.info = "none", key.title = NA, key.xlab = "Temperature",
trace = "none",
main = "Temperature in the Netherlands",
xlab = "Year",
ylab = "Month")
With the ggplot() function is so much possible, that the code below is only a start of the many visualizations you can create. Comparing this heatmap with the other two as described before, it looks fairly similar. In this case, adding the right colors is just as easy. Also, the legend is automatically included in the chart but can be removed by adding guides(fill = FALSE). The one thing that was not possible in previous functions but is possible with ggplot() is to move the y-axis labels to the left side of the chart.
The code speaks for itself, but it might be worth to go through line by line:
# Heatmap three
ggplot(data_heatmap, aes(Year, Month)) +
geom_tile(aes(fill = Temp)) +
scale_fill_gradientn(colors = rev(brewer.pal(11, "RdBu"))) +
scale_y_discrete(limits = rev(unique(data_heatmap$Month))) +
ggtitle("Temperature in the Netherlands") +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank(),
axis.text.x = element_text(angle = 90))
Depending on what your goals and requirements are, you can decide which function to use for your analysis. In case you need to explore the data quickly to spot any relations and don’t want to spend time on making plots fancy, the heatmap() functions is your best friend. The heatmap.2() function has slightly more options. If you want to be in full control of all the possibilities, then I would definitely use the ggplot().
If you have any question, feel free to ask in the comments!