# Three ways to create a heatmap in R

In this blog post I will show you how to create three types of heatmaps in R by using three different functions. The heatmap is a type of chart which is very intuitive. If you use colors in the right way, there is almost no explanation needed to understand your conclusion. In this article I will first focus on the easiest function which does not need a lot of commentation. The second function is slightly more advanced, whereas the third variation is the one I typically use in businesses.

## Sections

1. Data set
2. Data preparation
3. Heatmap function
4. Heatmap.2 function
5. Ggplot function
6. Conclusion

## Data set

For this heatmap I use a data set from the Dutch national weather service, KNMI. They collect all sorts of information on different locations in the Netherlands. You can find information about rainfall, hours of sun per day, humidity and a lot more variables. I’m going to download the data set with temperatures per day since 2000. I want to find out if the average temperature per month has changed since 2000.

## Data preparation

The first few lines of code loads a couple of libraries in R. Then the data is loaded which contains three columns: the selected weather station, the date and temperature.

I want the heatmap to show the years and months on the axes, the color is based on the average temperature. We need to split the date field into a year, month and day column. With these columns, we can calculate the average temperature per month. The temperature is in 0.1 Celsius degrees, I divide this by 10 to make it a temperature we normally use.

After we have added the necessary columns, we can calculate the average temperature per month and cast it to a matrix format. I’m doing some additional steps: set the year numbers as row names, transpose the matrix and reorder the month numbers. These steps make sure that the y-axis of the heatmap shows the month numbers starting from January to December and the years on the x-axis.

## Heatmap function

The basic heatmap function in R does everything you need without any fancy stuff. Perfect if you want to explore the data or if you are already happy with the visual itself. You can easily change the colors. In this article I use the RColorBrewer library for each heatmap. I’m not going into details about this library since there is so much to tell. I will explain the possibilities of the RColorBrewer library in another blog post.

The following code produces a simple heatmap:

Unfortunately, the  heatmap()  function has no possibility to include a legend. Ideally I want to have the y-axis on the left side of the chart. This is only possible if you adjust the code of the function itself.

## Heatmap.2 function

If you want a slightly more sophisticated heatmap, you can use the  heatmap.2()  function. This function has more possibilities such as adding a legend and distribution options. For this heatmap I add a legend which shows what the blue and red colors mean.  dendrogram = "none"  removes the tree structure for the variables,  Colv = FALSE  and  Rowv = FALSE  allow you to present the categories in the order of the matrix. If this is excluded from the code, the algorithm will decide in which order to present the categories. By adding  trace ="none"  the distribution lines disappear from the chart. In this case it would be distracting, but in many cases it is worth adding distribution lines.

## Ggplot function

With the  ggplot()  function is so much possible, that the code below is only a start of the many visualizations you can create. Comparing this heatmap with the other two as described before, it looks fairly similar. In this case, adding the right colors is just as easy. Also, the legend is automatically included in the chart but can be removed by adding  guides(fill = FALSE). The one thing that was not possible in previous functions but is possible with  ggplot()  is to move the y-axis labels to the left side of the chart.

The code speaks for itself, but it might be worth to go through line by line:

• ggplot(data_heatmap, aes(Year, Month)) plots an empty canvas, telling it to use the data frame data_heatmap and having Year on the horizontal axis and Month on the vertical axis.
• geom_tile(aes(fill = Temp)) plots, not surprisingly, tiles on the empty canvas. It uses the Temp variable to determine the intensity of the color for each tile. By default, it uses blue colors.
• scale_fill_gradientn(colors = rev(brewer.pal(11, "RdBu"))) changes the colors. I want the heatmap to use blue colors for cold months and red colors for warm months. The color palette RdBu is the other way around, therefore we need to use  rev() to reverse the order.
• scale_y_discrete(limits = rev(unique(data_heatmap\$Month))) reorders the vertical categories. In the original chart, the month December was displayed at the top left corner. For readability purposes, I want the chart to start with January.
• ggtitle("Temperature in the Netherlands") simply adds a title to the canvas.
• The final bit within theme() removes the gray background area, borders and rotates the labels on the x-axis.

## Conclusion

Depending on what your goals and requirements are, you can decide which function to use for your analysis. In case you need to explore the data quickly to spot any relations and don’t want to spend time on making plots fancy, the heatmap() functions is your best friend. The  heatmap.2() function has slightly more options. If you want to be in full control of all the possibilities, then I would definitely use the  ggplot().

If you have any question, feel free to ask in the comments!

### Who I am

Hi! My name is Claudia, a freelance data analyst/scientist. This is my space on the internet where I share knowledge and experience with everyone who wants to become a better analyst. Read more about my work as a freelancer here.