Also known as a “rain plot” or “train in the rain” (more on that later), raincloud plots use multiple visual elements to illustrate distributional information. Most typically, these figures show distribution densities alongside dot plots, though they are often accompanied by some graphical representation of summary information, such as an indicator of mean, median, or inter-quartile range (IQR).
Example
Have you ever wondered whether horror movie ratings affect review scores? Consider the distributions of review ratings for films that received MPAA ratings of “TV-MA,” “PG-13,” or “R”:
[Note: intervals show the median and quantiles for the middle 95% of the respective distributions.]
Data
The data used in the above plot come from the Horror movie metadata data set hosted through Tidy Tuesday. These data represent 3,328 horror movie titles released between 2012 and 2017 and listed on IMDB. Variables include release title, genre, date, country of origin, MPAA rating, review rating (0-10), run time (minutes), plot description, cast list, language, filming location, and budget (USD).
For ease of comparison, data used in the plot were filtered to only include observations MPAA ratings of “TV-MA,” “PG-13,” and “R” (\(n =\) 602), which were the three most frequent categories of films that had received MPAA ratings (that is, excluding films designated “Not Rated” or “Unrated”).
Representation Description
These data and the accompanying raincloud plot are a fun example intended for a broad audience with only a basic understanding of graphs. The plot is intended to be relatively intuitive to read, but we can go into much greater depth exploring the intricacies of this visualization method.
Raincloud plots merge density (the “cloud”; also sometimes referred to as a “half-violin”) and dot plot (the “rain”; sometimes represented as a jittered scatter plot) illustrations together into a hybrid representation of distributional information for continuous (or integer) numeric data. The overarching idea behind these plots is that a viewer can get a sense of the shape of a distribution without losing information about sample size. On their own, density plots can be misleading when there are very few observations. Likewise, it can be hard to visualize a “smooth” distribution from a dot plot. Raincloud plots combine these figure types together to ameliorate their respective weaknesses. To further supplement the density and dot elements in a raincloud plot, visual indicators of numeric center and variance can be incorporated as well, either with an interval marker or a boxplot (this convention is sometimes referred to as the “train in the rain,” since a boxplot can be thought of as a “boxcar”). This addition helps to eliminate the guesswork involved in trying to “eyeball” the center and proportions of a given distribution.
Going back to the example plot above, we can use these rainclouds to answer questions both about the shape and center of the review distributions for different movie ratings, while also noting differences in sample sizes. We see that reviews are mostly normally distributed, regardless of MPAA rating, and that the variability in ratings is similar for the three groups. However, we can also see that “PG-13” and “R”-rated movies have somewhat higher median ratings compared to “TV-MA” films, and that “R” movies make up the majority of observations, by far.
How to Read it and What to Look For
Raincloud plots are a great way to quickly get a lot of information about distributions. As such, they’re a great tool for exploratory data analysis, and your approach might depend on your emergent questions or motivations:
Are you investigating a single distribution?
Start by looking at the density plot. Is the distribution uni-modal? Bi-modal? Does it look to be normally distributed/symmetric, or is it skewed? In any case, how wide or spread out is the distribution? After answering these questions, look at the dot or scatter plot. Are there enough points present to make you feel confident in the shape of the density plot, or are there so few observations that the shape might change with the addition of only a few new points? Lastly, is there an interval or boxplot present? Do the center/spread of confirm your observations from looking at the density plot, or do they seem mismatched (e.g., if the distribution is bimodal, then a mean value is probably going to be between the peaks)?
Are you looking at multiple distributions?
Alternatively, you might want to compare distributions across two or more categories (such as in our example). You can again start by looking at the density plots to compare shape, spread, or alignment (i.e., center), but it’s possible that you’re more concerned with comparing sample sizes. If that’s the case, you can start by looking at the dot plots. Are there roughly a similar number of observations between the categories? Is one group notably smaller/larger than the others?
Presentation Tips
The biggest pitfall of using raincloud plots, in my opinion, is their propensity to look overly busy or cluttered. There are so many components competing for the viewer’s attention that, if we’re not careful, the average person will get lost trying to sort out what is or is not important. With this cautionary note in mind, here are some suggestions to help guarantee your raincloud plots are legible:
Use clear and descriptive labels
This advice is applicable to a wide assortment of graphs and figures (all of them?), but make sure that plot elements are clearly labelled. Are there multiple categories for multiple distributions? If so, is it obvious which is which? Is there an interval marker, rather than a boxplot? If so, can you tell whether the center and spread illustrate a median and IQR, or do they show the mean and standard deviation?
Create a hierarchy of elements
A great way to ensure that your raincloud plots don’t look too busy or cluttered is to establish a ranking of which plot elements are going to be most important in advance of you making the plot. Are you most interested in sample sizes? Then make your dot plots the most prominent component and de-emphasize the density elements (e.g., make them smaller or fainter/more transparent). Likewise, you might know in advance that you mainly care about the general shape of a distribution, so you can make the dots smaller and increase the size of the half-violin element(s).
Use adequate spacing and appropriate plot orientations
Another way to reduce clutter or confusion is to check that different elements aren’t squished too close together, especially if you have multiple categories and elements run the risk of colliding with each other (e.g., the dots of one row overlapping with an adjacent density). In any case, make sure that things have room to breath, and maybe consider using color to help differentiate the elements if necessary, since that’s a good way to help delineate the start or end of a category in a series of rows or columns. Lastly, whether your plots are oriented vertically or horizontally, think about how this orientation affects the viewer and, if applicable, their ability to compare distributions. For example, it might be hard to compare two densities if you have them oriented away from each other with two sets of dot plots facing each other in the middle.
Variations & Alternatives
As one might imagine, the very nature of a raincloud plot as a combination of discrete plot elements means that there are myriad ways to vary and adjust these figures. The angular orientation of these plots is subject to change, so it’s common to see raincloud plots at \(0^{\circ}\) or \(\pm 90^{\circ}\). Likewise, in the inclusion/omission of some interval element is optional, as is the type of interval and its placement relative to the rest of the plot (e.g., is the interval in the density, such as in our example above, or maybe between the density and dot plots? How about below the dots?). Stacked dot plots are commonplace, but so are jittered scatterplots (see the geom_rain() output at the end of this page for an example). At the end of day, raincloud plots are fairly adjustable, and the execution of any particular plot will depend on the motivations of the person making the plot.
As a more unusual example case, though, let’s consider a raincloud plot with paired observations. The plot below is a recreation of a figure I made for a previous class (BSTA 526: R for Data Science) that shows data from the stock R data set sleep. There are two raincloud plots to illustrate changes in sleep duration for the same individuals before and after taking two different soporific drugs. This example shows that, even beyond the standard triad of dot/density/interval elements, we can add other layers to raincloud plots to represent additional dimensions, such as pairs of observations. It also shows that there are times when it makes sense to do things like mirror the orientations of two rainclouds:
Code
data("sleep")sleep |>ggplot(aes(x = group,y = extra)) +geom_rain(rain.side ="f1x1",id.long.var ="ID",fill ="grey85" ) +theme_minimal() +labs(title ="Paired Changes in Sleep Duration Following Use of Two Soporific Drugs",x ="Drug Type",y ="Change in sleep duration (hrs)" ) +theme(axis.title =element_text(size =12),axis.text =element_text(size =12) )
Method: How to Make One
A basic raincloud plot is very simple to create using the functions stat_slab() and stat_dots() from the ggdist package. Let’s practice recreating a basic version of the example plot at the top of the page:
# Load the appropriate libraries in Rlibrary(tidyverse)library(ggdist)# Create a ggplotggplot(# Select your data horror_movies_filtered,# Assign your global aestheticsaes(x = review_rating,y = movie_rating )) +# Add the "slab" (i.e., density plot)stat_slab(# You can adjust the side and scale of the "slab"side ="right",scale =0.5, ) +# Add the dots## Make sure the "side" is opposite the density plot!stat_dots(side="left" )
Alternatively, one can make these plots using the function geom_rain() from the package ggrain. In my personal opinion, though, this route is slightly more difficult and yields a somewhat inferior result, mainly due to the overlapping dot elements:
# Load the appropriate libraries in Rlibrary(ggrain)# Create a ggplotggplot(# Select your data horror_movies_filtered,# Assign your global aestheticsaes(# See the comment below; this package requires us# to put the continuous variable along the y-axisx = movie_rating,y = review_rating )) +# Add the `geom_rain()` layergeom_rain() +# As a quirk of this package, we need to use `coord_flip()`# if we want to have categories along the y-axiscoord_flip()