Heatmap, How to, R

Three ways to create a heatmap in R

In this blog post I will show you how to create three types of heatmaps in R by using three different functions. The heatmap is a type of chart which is very intuitive. If you use colors in the right way, there is almost no explanation needed to understand your conclusion. In this article I will first focus on the easiest function which does not need a lot of commentation. The second function is slightly more advanced, whereas the third variation is the one I typically use in businesses.

 

Sections

  1. Data set
  2. Data preparation
  3. Heatmap function
  4. Heatmap.2 function
  5. Ggplot function
  6. Conclusion

 

Data set

For this heatmap I use a data set from the Dutch national weather service, KNMI. They collect all sorts of information on different locations in the Netherlands. You can find information about rainfall, hours of sun per day, humidity and a lot more variables. I’m going to download the data set with temperatures per day since 2000. I want to find out if the average temperature per month has changed since 2000.

 

Data preparation

The first few lines of code loads a couple of libraries in R. Then the data is loaded which contains three columns: the selected weather station, the date and temperature.

library(reshape) # used for cast function
library(plyr) # used for ddply function
library(RColorBrewer) # used to customize heatmap colors

# Read data
data <- read.table('.../KNMI_temperature.txt',
                   sep = ',',
                   col.names = c("Station","Date","Temperature"))
head(data)
#  Station     Date Temperature
#1     260 20000101          81
#2     260 20000102          87
#3     260 20000103          96
#4     260 20000104          94
#5     260 20000105          74
#6     260 20000106          91

I want the heatmap to show the years and months on the axes, the color is based on the average temperature. We need to split the date field into a year, month and day column. With these columns, we can calculate the average temperature per month. The temperature is in 0.1 Celsius degrees, I divide this by 10 to make it a temperature we normally use.

# Create columns for year, month and day and calculate the temperature in Celsius
data$Year <- substr(data$Date, 0, 4)
data$Month <- substr(data$Date, 5, 6)
data$Day <- substr(data$Date, 7, 8)
data$Temperature <- data$Temperature/10

After we have added the necessary columns, we can calculate the average temperature per month and cast it to a matrix format. I’m doing some additional steps: set the year numbers as row names, transpose the matrix and reorder the month numbers. These steps make sure that the y-axis of the heatmap shows the month numbers starting from January to December and the years on the x-axis.

data_heatmap <- ddply(data, .(Year, Month), summarize,  Temp=mean(Temperature))
head(data_heatmap)
#  Year Month      Temp
#1 2000    01  6.561290
#2 2000    02  8.941379
#3 2000    03 10.112903
#4 2000    04 14.753333
#5 2000    05 20.022581
#6 2000    06 21.073333

data_heatmap_matrix <- cast(data_heatmap, Year ~ Month)
rownames(data_heatmap_matrix) <- data_heatmap_matrix[,1]
data_heatmap_matrix[,1] <- NULL
data_heatmap_matrix <- as.matrix(t(data_heatmap_matrix))
data_heatmap_matrix <- data_heatmap_matrix[nrow(data_heatmap_matrix):1,]
data_heatmap_matrix[1:5, 1:5]
#        2000      2001      2002      2003      2004
#01  6.561290  5.332258  6.706452  5.161290  6.025806
#02  8.941379  7.792857 10.164286  6.335714  7.924138
#03 10.112903  8.032258 11.787097 12.761290 10.261290
#04 14.753333 12.580000 14.303333 15.530000 15.733333
#05 20.022581 19.358065 18.248387 18.222581 17.580645

 

Heatmap function

The basic heatmap function in R does everything you need without any fancy stuff. Perfect if you want to explore the data or if you are already happy with the visual itself. You can easily change the colors. In this article I use the RColorBrewer library for each heatmap. I’m not going into details about this library since there is so much to tell. I will explain the possibilities of the RColorBrewer library in another blog post.

The following code produces a simple heatmap:

heatmap(data_heatmap_matrix1, Colv = NA, Rowv = NA, 
        scale = "none", 
        col = rev(brewer.pal(11, "RdBu")),
        main = "Temperature in the Netherlands",
        xlab = "Year",
        ylab = "Month",
        margins = c(4, 4))

Heatmap created in R with the function heatmap()

Unfortunately, the heatmap()  function has no possibility to include a legend. Ideally I want to have the y-axis on the left side of the chart. This is only possible if you adjust the code of the function itself.

 

Heatmap.2 function

If you want a slightly more sophisticated heatmap, you can use the heatmap.2()  function. This function has more possibilities such as adding a legend and distribution options. For this heatmap I add a legend which shows what the blue and red colors mean. dendrogram = “none”  removes the tree structure for the variables, Colv = FALSE  and Rowv = FALSE  allow you to present the categories in the order of the matrix. If this is excluded from the code, the algorithm will decide in which order to present the categories. By adding trace =”none”  the distribution lines disappear from the chart. In this case it would be distracting, but in many cases it is worth adding distribution lines.

# Heatmap two
library(gplots)
heatmap.2(data_heatmap_matrix, 
          dendrogram = "none", Colv = FALSE, Rowv = FALSE,
          scale = "none", col = rev(brewer.pal(11, "RdBu")),
          key = TRUE, density.info = "none", key.title = NA, key.xlab = "Temperature",
          trace = "none",
          main = "Temperature in the Netherlands",
          xlab = "Year",
          ylab = "Month")

Heatmap created in R with the function heatmap.2()

 

Ggplot function

With the ggplot()  function is so much possible, that the code below is only a start of the many visualizations you can create. Comparing this heatmap with the other two as described before, it looks fairly similar. In this case, adding the right colors is just as easy. Also, the legend is automatically included in the chart but can be removed by adding guides(fill = FALSE). The one thing that was not possible in previous functions but is possible with ggplot()  is to move the y-axis labels to the left side of the chart.

The code speaks for itself, but it might be worth to go through line by line:

  • ggplot(data_heatmap, aes(Year, Month)) plots an empty canvas, telling it to use the data frame data_heatmap and having Year on the horizontal axis and Month on the vertical axis.
  • geom_tile(aes(fill = Temp)) plots, not surprisingly, tiles on the empty canvas. It uses the Temp variable to determine the intensity of the color for each tile. By default, it uses blue colors.
  • scale_fill_gradientn(colors = rev(brewer.pal(11, “RdBu”))) changes the colors. I want the heatmap to use blue colors for cold months and red colors for warm months. The color palette RdBu is the other way around, therefore we need to use rev() to reverse the order.
  • scale_y_discrete(limits = rev(unique(data_heatmap$Month))) reorders the vertical categories. In the original chart, the month December was displayed at the top left corner. For readability purposes, I want the chart to start with January.
  • ggtitle(“Temperature in the Netherlands”) simply adds a title to the canvas.
  • The final bit within theme() removes the gray background area, borders and rotates the labels on the x-axis.
# Heatmap three
library(ggplot2)
ggplot(data_heatmap, aes(Year, Month)) +
  geom_tile(aes(fill = Temp)) +
  scale_fill_gradientn(colors = rev(brewer.pal(11, "RdBu"))) +
  scale_y_discrete(limits = rev(unique(data_heatmap$Month))) +
  ggtitle("Temperature in the Netherlands") +
  theme(panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        panel.border = element_blank(),
        panel.background = element_blank(),
        axis.text.x = element_text(angle = 90))

Heatmap created in R with the function ggplot()

Conclusion

Depending on what your goals and requirements are, you can decide which function to use for your analysis. In case you need to explore the data quickly to spot any relations and don’t want to spend time on making plots fancy, the heatmap() functions is your best friend. The heatmap.2() function has slightly more options. If you want to be in full control of all the possibilities, then I would definitely use the ggplot().

If you have any question, feel free to ask in the comments!

1 comment

Leave a comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.