One of the first things you probably do with a dataset, is checking the number of records, counting the number of variables and understanding what the variables mean. Soon after, you will probably check if there are any correlations between the variables. This gives you a good understanding of the data and perhaps unexpected correlations appear.
A correlation is usually presented in a matrix. It does give the information you need, but chances are it takes some time to go through all correlation values or you miss the one correlation necessary for further analysis. Presenting correlations in a matrix is something I keep as background information and sometimes I show it to clients and business people.
The best way to show correlations is to visualize it in a correlation plot. Below I’ve listed a couple of ways how you can quickly visualize a correlation matrix in R. There are several packages available for visualizations. I will use the packages corrplot , GGally , ggcorrplot and ggplot2 .
The corrplot package is the easiest way to get a good looking visual of the correlations. It only takes seconds to have your visual ready which you can adapt with some handy functions. Some functions available:
- method = “circle”, “square”, “ellipse”, “number”, “pie”, “shade” or “color”.
- type = “full”, “upper” or “lower”.
- order = “original”, “AOE”, “FPC”, “hclust” or “alphabet”.
library(corrplot) library(RColorBrewer) correlation_matrix <- cor(mtcars) corrplot(correlation_matrix, method = "square", type = "upper", tl.col = "black", order = "hclust", col = brewer.pal(n = 5, name = "RdYlBu"))
The function ggcor from the GGally package is another way to plot a correlation matrix. This function has less options but does everything you need. This function also has the option for different shapes by using geom = “tile”, “circle”, “text” or “blank”. By default, the lower triangle is plotted.
library(GGally) ggcorr(correlation_matrix, nbreaks = 5, palette = "RdYlBu", geom = "tile")
Third function in this article is the one from the ggcorrplot package. This function is very similar to the one from GGally package, but this time you can also apply ggplot functions which makes it much more advanced. Like most functions, there is the possibility to only plot the upper or lower triangle, order the variables and apply a different color.
library(ggcorrplot) ggcorrplot(correlation_matrix, type = "upper", hc.order = TRUE, colors = brewer.pal(n = 3, name = "RdYlBu"))
The most advanced version is using the ggplot2, which allows you to modify the correlation plot as much as you want. The basic code to start with is shown below. The plot doesn’t look to fancy, but with some additional code you can achieve the same result as previous examples. If you want to show the upper or lower triangle, you need to do this in the data preparation, the same holds for ordering the variables.
library(ggplot2) library(reshape2) ggplot(melt(correlation_matrix), aes(Var1, Var2, fill = value)) + geom_tile() + scale_fill_gradient2(low="blue", mid="white", high="red") + coord_equal()
Hope you find this post helpful to choose the right function for your case. Depending on how much you want to specify or how quick you want results, there is always a function which will fit your needs.