Cumulative MoMA Painting Acquisitions by Artist Gender

From Lab 2

Published

April 14, 2026

How many paintings did the Museum of Modern Art (MoMA) in New York City cumulatively acquire between 1930-2017? Were there differences in overall acquisitions between male and female artists? Were there any gender-related differences in acquisition rates?

Code
# Make our new plot
artworks_long |> 
  ggplot(aes(
    x = year_acquired,
    y = total_by_gender,
    fill = artist_gender
  )) +
  
  # Use `geom_area()` and adjust position so the groups don't
  # "stack" on top of each other
  geom_area(
    color = "black",
    position = "identity"
  ) +
  
  # Adjust the theme and legend properties
  theme_bw() +
  theme(
    legend.position = "inside",
    legend.position.inside = c(0.2,0.75),
    legend.background = element_rect(color = "black"),
    axis.text = element_text(size = 10),
    axis.title = element_text(size = 12)
  ) +
  
  # Fix our labels
  labs(
    title = "Cumulative Number of Paintings Acquired Over Time by Artist Gender",
    subtitle = "(MoMA, 1930-2017)",
    x = "Year",
    y = "Number of Acquired Works",
    fill = "Artist Gender"
  )

Data

These data come from a record set of inventoried MoMA artworks hosted on GitHub. The data have been cleaned for use by our BMI 525 cohort and filtered to include only paintings. Rows in the data set represent individual artworks, with associated variables for information like title, artist, year_acquired, height_cm, width_cm, etc.

Audience

This particular plot is intended for a broad audience of both scientists and non-scientists alike (e.g., the general readership of a magazine or blog about museums).

Graph Type

This plot is an an example of an area plot, which is a type of line plot where the area under a given line is filled in with a particular color. Of note, area plots with multiple categories (e.g., two genders) can show either absolute or “stacked” values. So-called “stacked” area plots get their name because they “stack” the areas of multiple groups on top of one another, such that the upper-most line will show the total of all categories together and the color blocks will show the proportions of categories. The plot above, however, only shows absolute values for each group, so the two color blocks can be considered independently of one another.

Representation Description

The plot shows the cumulative number of paintings (y-axis) acquired by the MoMA from male and female artists (indicated by color/fill), from 1930 to 2017 (left to right on the x-axis). The shaded regions for male and female artists are overlaid and indicate independent group totals. We see that the total number of works by both male and female artists have increased over time, though it appears the rate of acquisition is steeper for male artists than for female artists, even up to 2017, suggesting that the museum has historically acquired paintings by male artists at a faster rate than paintings by female artists.

Tips for Interpretation

Start by looking at the x- and y-axes to note the time-span and the range of acquired paintings, respectively. Look at the color blocks one at a time: how does the overall area of the female artist block compare to the area of the male artist block? Now, compare the slopes of the the two shapes (i.e., the steepness of the top edge as we move left to right along the x-axis). Does one color block have a steeper slope than the other? Which one? Are there any fluctuations in these slopes over the years?

Presentation Considerations

The default color set has been used to fill in the areas for the male and female gender groups, since the basic two-category scenario in the present case is easily handled by the default palette. The areas were not “stacked,” since we want to know about total acquisitions within each gender group, not the proportion of the two groups, per se.

Method

Starting from the cleaned and filtered (paintings only) data, we need to first tabulate the cumulative sum of paintings by artist gender after arrange the works by year of acquisition. Afterwards, the frame needs to be pivoted longer.

Code
# Create new variables `total_(fe)male_artists` and pivot longer
artworks_long <- artworks_clnd2 |> 
  
  # Filter out `NA` values
  filter(
    !is.na(artist_gender),
    !is.na(year_acquired)
  ) |> 
  
  # Sort by year of acquisition
  arrange(year_acquired) |> 
  
  # Calculate cumulative sum of works by gender
  mutate(
    total_female_artists = cumsum(n_female_artists),
    total_male_artists = cumsum(n_male_artists)
  ) |> 
  
  # Reduce to relevant variables
  select(
    year_acquired,
    total_female_artists,
    total_male_artists
  ) |> 
  
  # Pivot longer and rename columns
  pivot_longer(
    cols = starts_with("total_"),
    names_to = "artist_gender",
    names_pattern = "total_(.*)_artists",
    values_to = "total_by_gender"
  ) |> 
  
  # Adjust the order of the factors to help with plotting in the next step
  mutate(
    artist_gender = factor(
      artist_gender,
      levels = c("male", "female")
    )
  )

After wrangling the data, the figure can be constructed with geom_area() and set position = "identity" to prevent stacking.

Code
# Initiate the plot
artworks_long |> 
  ggplot(aes(
    x = year_acquired,
    y = total_by_gender,
    fill = artist_gender
  )) +
  
  # Use `geom_area()` and adjust position so the groups don't
  # "stack" on top of each other
  geom_area(
    position = "identity"
  )