My Final Visualization for BMI 525

How has my exercise life changed over the last 1.4 years?

Published

May 31, 2026


Final Figure: My personal total weekly workout activity levels from Jan. 2025 through May 2026, sorted by exercise. Weight-based exercises (red) and running statistics (green) are represented as standard scores (\(z\)-scores) calculated from weekly totals, where weight-based exercise activity was defined as total weight moved. Darker and lighter hues illustrate lower and higher levels of activity, respectively. Body weight over time is shown at the bottom of the figure with annotations corresponding to noteworthy live events.


1 Data Source

In the fall of 2024 (Oct. 25th, to be exact), a group of my friends got together for dinner and drinks at John’s Marketplace on SE Powell Blvd in Portland, OR. We lamented not seeing each other more often, the chaotic nature of adult life, and feeling generally out-of-shape. Partly in jest, one of us suggested starting a regular workout group at a gym in NE Portland. Surprisingly, the group was receptive to this idea. Even more surprisingly, we actually started going! And so, on Nov. 7th, our collective gym journey began, with at least some subset of the crew showing up every Sunday and Thursday at 8:00pm to socialize and bulk up.

Knowing a good opportunity for data collection when I see one, I made the decision fairly early on to start tracking my own progress at the gym, both during our Sunday/Thursday meet-ups, and at any other exercise outing. I had already been recording my running stats for many years at this point, but this new scenario presented a chance to collect a much richer data set. Thus, on Jan. 1st, 2025, I started tracking my personal exercise activities, including quantitative data related to:

  • Repetitions and weight amounts of weight-based exercises;
  • Running statistics, including duration and distance;
  • My weekly body weight (lbs.)
Note

I also tracked qualitative information related to my diet, but the complexity of those data was beyond the scope of the present investigation. I look forward to revisiting that information at some point in the future.

Now, approximately 17 months later, I am in possession of a uniquely personalized data set of synchronized changes in many exercise variables. With such a considerable span of data, I want to ask:

  • How have my overall activity levels and fitness changed since I started recording my workouts?
  • How has my body weight fluctuated over time?
  • How have these changes coincided with various life events?
  • Are there any notable differences in activity between specific exercises?

Partly in satisfaction of our BMI 525 class requirements, but also partly as a challenge to myself, I have attempted a visualization that addresses most of these questions in a single consolidated figure. For the purposes of this illustration, “total activity” for weight lifting exercises has been defined as \(z\)-scored total weight moved per week. Likewise, body weight has been used as a proxy for one aspect “fitness,” though I want to emphasize that body weight and fitness are not one and the same (see me note below under What are we trying to show with this visualization?).


2 Audience

This figure isn’t intended for a wider scientific audience, per se, but it is intended for our BMI 525 class. As a group, we constitute a scientifically literate and multidisciplinary range of perspectives. As such, we might expect that concepts like “z-score normalization” (i.e., standard scores) are familiar, etc. We are also already collectively aware of many aspects of data visualization and, therefore, prepared to decipher things like basic heat maps and line plots.

The secondary audience for this figure will be my gym friends, who have heard me ramble about my spreadsheet for quite some time at this point, and who will want proof that it actual exists.


3 What kind(s) of graph(s) am I trying to create?

I have attempted a composite figure made up of two heatmaps (geom_tile()) and one line plot (geom_line()). These components are stacked vertically over a shared x-axis illustrating time (with 1 week as the minimum resolution). The plots start on Jan. 4th, 2025 (inclusive of the prior week) and end on May 30th, 2026.

For the heatmaps, hues (reds versus greens) are used to denote overall category of activity (weight lifting vs. running). In both cases, though, darker colors denote less total activity on a given week, while brighter colors represent higher amounts of activity. Values were all converted to standard scores (\(z\)-scores) to allow for the comparison of very different distributions of weights (i.e., \(0\) represents the average total activity, \(-1\) is \(1\) standard deviation below the mean, etc.). These maps are useful for several reasons, including that they do a good job of illustrating overall trends across multiple variables, and because they can also illustrate missing data by showing gaps. However, they do not do a good job of showing exact quantitative amounts (recall from class that it’s hard for people to discern exact changes in color values).

Finally, with the line plot of body weight, it is much easier to see exact amounts and differences, week to week. However, it can be difficult to visualize many variables together in this sort of plot. Some examples of this limitation can be seen in the Methods section below.


4 What are we trying to show with this visualization?

In a broad sense, we are looking for concerted changes in total exercise activity that are coincident with annotated events, such as the start of grad school. Importantly, this goal necessitates that we use a metric that captures both intensity and frequency (hence, total weight moved per week). As well, we can look to see if there is a correlation between exercise activity and body weight. Remember though, there is not enough information here for causal inference (e.g., we are missing important dietary data).

Walking through the figure, left-to-right, I want to tell a story:

  • In 2025, I established a regular exercise routine with multiple activities.
  • With the exception of transient interruptions (e.g., traveling for work), my exercise activity was consistent and even increased a bit over the first three quarters of 2025.
  • After starting grad school in the fall of 2025, my exercise regimen started to become less consistent and my overall activity decreased.
  • Body weight seems to level out after the start of BMI 525, and this is likely related to the fact that I started visualizing my data and becoming more mindful of my activities.
  • Changes in my body weight generally followed exercise activity, but we don’t have enough information to say anything about causation.

It is also worth noting that body weight is not a completely appropriate proxy for “fitness” in this visualization. Importantly, this graphic does not show any measurements of strength, such as mean weight per repetition. Additional visualizations can be found below in the Methods section below (spoiler: I did get stronger!).


5 How do I read this figure?

There are a lot of elements in this figure, so it’s important to break things down into some helpful steps:

  1. First and foremost, notice the x-axis and the time scale of the figure. Note the annotated events in the line plot, which are important events that will match various changes in the rest of the plot.
  2. The figure is, in some sense, meant to be read from bottom-to-top. Look at how body weight changes over time – this trend is easy to parse, and we can notice the line go down then back up.
  3. Moving up the figure, look at the green rows together as a single unit (remember gestalt?). Where are they both darker at the same time (less activity)? Where are they both light (more activity)? Are there times when they are mismatched? What does that mean?
  4. Look at the red rows together as a single unit. Do you notice trends in colors over time? Are there any abrupt changes? What sorts of gaps do you notice, and are they consistent?
  5. Finally, follow up by looking at the individual rows. Feel free to inspect individual activities and break up the color blocks to see how certain exercises changed over time.

6 Visual Presentation Considerations

This figure was perhaps the most complicated graphic I have made so far in my professional life. As a composite image that brings together multiple plots into a single visualization, I had to carefully consider several factors:

  1. Using a shared x-axis was hugely helpful in decreasing the complexity of this figure, since it both decreased visual clutter and also tied the elements together, conceptually. In early drafts, there were some alignment issues that I was adamant to fix, since misalignment had a disproportionately negative impact on the overall figure.
  2. Knowing in advance that I was going to use color hue in heat maps to represent quantitative changes, I planned in advance to utilize the scale_fill_viridis_c() palettes, since they are designed to be safe for different varieties of color blindness, and because they are specifically engineered to show smooth and maximally discernible changes in hue.
  3. Annotations were added to the line plot section because it was the least “busy” subplot, and therefore the easiest to show events on the time axis. Dashed vertical lines made events obvious, but were distinctly separate from the solid data line.
  4. In general, I kept text size fairly small to reduce clutter, but also so attention was drawn more immediately to the heat maps and line plot.
  5. I used the font Century Gothic, which I have used before in other class projects. I personally am fond of this sans-serif font and think it looks both clean and inviting.
  6. I opted to list “z-score normalization” as a subtitle rather than in the color legends for two reasons: it saves horizontal space, and our BMI 525 class should have enough background knowledge to understand the connection between this specification and the ranges in the legends.

7 Methods

So, how did I actually create this figure? Let’s go through a detailed breakdown of the different parts of my process, including some of the interim figures I generated while exploring the data and troubleshooting the final product. It’s important to mention, I didn’t even know exactly what measures to use when I started looking through the data, so the outline below documents a fair amount of active decision-making.

7.1 Data Collection

Even though this was a fairly informal endeavor, I made efforts to adhere to a consistent data collection protocol. I recorded exercise activities in my phone’s notes app during workouts and would manually transfer them to an Excel spreadsheet when I returned home. The raw data looked like so:

7.1.1 General patterns of exercise

Engagement in recorded exercise occurred once per day (or not at all) and consisted of:

  1. a run/jog,
  2. weight lifting, or
  3. both a run and weight lifting.

Exercise usually occurred in the evenings after dinner, but time wasn’t fixed or tracked (except for weigh-in days; see below).

7.1.2 Running

Runs occurred either on a treadmill or outside. Run distance and duration were the primary measurements, while other factors like elevation gain or ambient temperature weren’t tracked. Distance was always rounded to the nearest hundredth of a mile, and duration only ever measured in full minute increments (i.e., I never stopped the timer except on the minute mark). I also made a habit of “rounding down” my distances/times in cases where there was some disruption (e.g., I would run an extra 30 seconds past time if I had spent 15-20 seconds accelerating from standstill at the start of a run).

7.1.3 Weight lifting

Duration of weight lifting sessions was not tracked, but most “full” outings lasted approximately 60-70 minutes. Similarly, order of weight exercises was not tracked. In a given session, bench press was generally first, followed by free-weights, then some combination of the remaining exercises, and finally stretching or floor exercises. A total of 13 exercise types were tracked at the gym; most of these exercises typically included three sets per day, with a fixed resistance weight within-set. Weights were frequently followed by a brief set of abdominal exercises and stretching on a floor mat. The exact types and amounts of those activities were not recorded, but participation in these activities was tracked with an indicator.

7.1.4 Weigh-in procedure

To eliminate certain confounds for the body weight measure, I was careful to follow a fairly rigid procedure for weekly weigh-ins. Weight was estimated as the average of two scale readings (one before showering; one immediately after) taken on an empty stomach after completing my weekly Friday run, which almost always occurred in the late afternoon/early evening.

7.1.5 What wasn’t recorded?

I did not track walking. I could have gone on a 10-mile uphill walk or hike, but it wouldn’t have been recorded if I wasn’t at least jogging. Relatedly, I didn’t track elevation gain/loss, though I did have indicators for whether runs were outside or on a treadmill, even if they weren’t incorporated into the analysis.

I did keep a food journal for the entirety of this process as well, though I have not yet figured out how to use that data. The purpose of that component was to encourage mindful eating, rather than track calories, etc., so it would take substantial effort to re-code those logs quantitatively, and my initial attempts to find “easy fixes” have not yielded promising results (e.g., basic methods like character counts per day were very, very noisy).

7.2 Data Wrangling

As with all projects, we start by importing the data:

Code
# Load the exercise data
df_gym_wide <- read_excel(
  here(
    "data",
    "Exercise_2025_2026.xlsx"
  ),
  skip = 1,
  na = "."
)

glimpse(df_gym_wide)
Rows: 515
Columns: 89
$ date                   <dttm> 2025-01-01, 2025-01-02, 2025-01-03, 2025-01-04…
$ weight                 <dbl> NA, NA, 223.0, NA, NA, NA, NA, NA, NA, 218.3, N…
$ run                    <chr> "No", "Yes", "Yes", "No", "Yes", "Yes", "No", "…
$ run_type               <chr> NA, "Treadmill", "Treadmill", NA, "Treadmill", …
$ run_distance           <dbl> NA, 2.15, 4.27, NA, 3.25, 2.16, NA, NA, 2.25, 3…
$ run_duration           <dbl> NA, 20, 40, NA, 30, 20, NA, NA, 21, 30, NA, 30,…
$ note                   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, "Forgot hea…
$ weight_lifting         <chr> "No", "Yes", "No", "No", "Yes", "Yes", "No", "N…
$ floor_mat              <chr> NA, "Yes", NA, NA, "Yes", "No", NA, NA, "Yes", …
$ bench_set1_weight      <dbl> NA, 115, NA, NA, 125, 115, NA, NA, 125, NA, NA,…
$ bench_set1_reps        <dbl> NA, 10, NA, NA, 10, 10, NA, NA, 10, NA, NA, 6, …
$ bench_set2_weight      <dbl> NA, 125, NA, NA, 115, 115, NA, NA, 135, NA, NA,…
$ bench_set2_reps        <dbl> NA, 9, NA, NA, 9, 9, NA, NA, 6, NA, NA, 8, 8, N…
$ bench_set3_weight      <dbl> NA, 125, NA, NA, 135, 95, NA, NA, 135, NA, NA, …
$ bench_set3_reps        <dbl> NA, 5, NA, NA, 3, 10, NA, NA, 6, NA, NA, 8, 6, …
$ bench_set4_weight      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 135, NA, NA, 13…
$ bench_set4_reps        <dbl> NA, NA, NA, NA, NA, NA, NA, NA, 4, NA, NA, 4, 4…
$ bicep_set1_weight      <dbl> NA, 25, NA, NA, 20, 20, NA, NA, 25, NA, NA, 25,…
$ bicep_set1_reps        <dbl> NA, 10, NA, NA, 10, 10, NA, NA, 10, NA, NA, 10,…
$ bicep_set2_weight      <dbl> NA, 25, NA, NA, 25, 20, NA, NA, 25, NA, NA, 25,…
$ bicep_set2_reps        <dbl> NA, 10, NA, NA, 10, 10, NA, NA, 10, NA, NA, 10,…
$ bicep_set3_weight      <dbl> NA, 25, NA, NA, 25, 20, NA, NA, 25, NA, NA, 25,…
$ bicep_set3_reps        <dbl> NA, 10, NA, NA, 10, 10, NA, NA, 10, NA, NA, 10,…
$ shldr_set1_weight      <dbl> NA, 12.5, NA, NA, 12.5, 12.5, NA, NA, 12.5, NA,…
$ shldr_set1_reps        <dbl> NA, 10, NA, NA, 10, 10, NA, NA, 10, NA, NA, 10,…
$ shldr_set2_weight      <dbl> NA, 12.5, NA, NA, 15.0, 12.5, NA, NA, 12.5, NA,…
$ shldr_set2_reps        <dbl> NA, 10, NA, NA, 10, 10, NA, NA, 10, NA, NA, 10,…
$ shldr_set3_weight      <dbl> NA, 12.5, NA, NA, 10.0, 10.0, NA, NA, 12.5, NA,…
$ shldr_set3_reps        <dbl> NA, 10, NA, NA, 10, 10, NA, NA, 10, NA, NA, 10,…
$ tricep1_set1_weight    <dbl> NA, 80, NA, NA, 80, 70, NA, NA, 80, NA, NA, 80,…
$ tricep1_set1_reps      <dbl> NA, 10, NA, NA, 10, 10, NA, NA, 10, NA, NA, 10,…
$ tricep1_set2_weight    <dbl> NA, 80, NA, NA, 80, 80, NA, NA, 90, NA, NA, 80,…
$ tricep1_set2_reps      <dbl> NA, 10, NA, NA, 10, 10, NA, NA, 10, NA, NA, 10,…
$ tricep1_set3_weight    <dbl> NA, 80, NA, NA, 80, 70, NA, NA, 80, NA, NA, 80,…
$ tricep1_set3_reps      <dbl> NA, 10, NA, NA, 10, 10, NA, NA, 10, NA, NA, 10,…
$ tricep2_set1_weight    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ tricep2_set1_reps      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ tricep2_set2_weight    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ tricep2_set2_reps      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ tricep2_set3_weight    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ tricep2_set3_reps      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ back_set1_weight       <dbl> NA, 120, NA, NA, 140, 120, NA, NA, 140, NA, NA,…
$ back_set1_reps         <dbl> NA, 10, NA, NA, 10, 10, NA, NA, 10, NA, NA, 10,…
$ back_set2_weight       <dbl> NA, 140, NA, NA, 140, 140, NA, NA, 140, NA, NA,…
$ back_set2_reps         <dbl> NA, 10, NA, NA, 10, 10, NA, NA, 10, NA, NA, 10,…
$ back_set3_weight       <dbl> NA, 140, NA, NA, 140, 140, NA, NA, 160, NA, NA,…
$ back_set3_reps         <dbl> NA, 10, NA, NA, 10, 10, NA, NA, 10, NA, NA, 10,…
$ quad_set1_weight       <dbl> NA, NA, NA, NA, NA, 100, NA, NA, 115, NA, NA, 1…
$ quad_set1_reps         <dbl> NA, NA, NA, NA, NA, 10, NA, NA, 10, NA, NA, 10,…
$ quad_set2_weight       <dbl> NA, NA, NA, NA, NA, 100, NA, NA, 115, NA, NA, 1…
$ quad_set2_reps         <dbl> NA, NA, NA, NA, NA, 10, NA, NA, 10, NA, NA, 10,…
$ quad_set3_weight       <dbl> NA, NA, NA, NA, NA, 100, NA, NA, 115, NA, NA, 1…
$ quad_set3_reps         <dbl> NA, NA, NA, NA, NA, 10, NA, NA, 10, NA, NA, 10,…
$ thighsqz_set1_weight   <dbl> NA, 160, NA, NA, 175, 175, NA, NA, 175, NA, NA,…
$ thighsqz_set1_reps     <dbl> NA, 10, NA, NA, 10, 10, NA, NA, 10, NA, NA, 10,…
$ thighsqz_set2_weight   <dbl> NA, 160, NA, NA, 175, 175, NA, NA, 175, NA, NA,…
$ thighsqz_set2_reps     <dbl> NA, 10, NA, NA, 10, 10, NA, NA, 10, NA, NA, 10,…
$ thighsqz_set3_weight   <dbl> NA, 160, NA, NA, 160, 175, NA, NA, 175, NA, NA,…
$ thighsqz_set3_reps     <dbl> NA, 10, NA, NA, 10, 10, NA, NA, 10, NA, NA, 10,…
$ thighpress_set1_weight <dbl> NA, 160, NA, NA, 175, 175, NA, NA, 175, NA, NA,…
$ thighpress_set1_reps   <dbl> NA, 10, NA, NA, 10, 10, NA, NA, 10, NA, NA, 10,…
$ thighpress_set2_weight <dbl> NA, 160, NA, NA, 175, 175, NA, NA, 175, NA, NA,…
$ thighpress_set2_reps   <dbl> NA, 10, NA, NA, 10, 10, NA, NA, 10, NA, NA, 10,…
$ thighpress_set3_weight <dbl> NA, 160, NA, NA, 175, 175, NA, NA, 175, NA, NA,…
$ thighpress_set3_reps   <dbl> NA, 10, NA, NA, 10, 10, NA, NA, 10, NA, NA, 10,…
$ invrtpress_set1_weight <dbl> NA, NA, NA, NA, 175, NA, NA, NA, 170, NA, NA, N…
$ invrtpress_set1_reps   <dbl> NA, NA, NA, NA, 10, NA, NA, NA, 10, NA, NA, NA,…
$ invrtpress_set2_weight <dbl> NA, NA, NA, NA, 175, NA, NA, NA, 170, NA, NA, N…
$ invrtpress_set2_reps   <dbl> NA, NA, NA, NA, 10, NA, NA, NA, 10, NA, NA, NA,…
$ invrtpress_set3_weight <dbl> NA, NA, NA, NA, 200, NA, NA, NA, 170, NA, NA, N…
$ invrtpress_set3_reps   <dbl> NA, NA, NA, NA, 10, NA, NA, NA, 10, NA, NA, NA,…
$ legpress_set1_weight   <dbl> NA, NA, NA, NA, NA, 130, NA, NA, NA, NA, NA, NA…
$ legpress_set1_reps     <dbl> NA, NA, NA, NA, NA, 10, NA, NA, NA, NA, NA, NA,…
$ legpress_set2_weight   <dbl> NA, NA, NA, NA, NA, 110, NA, NA, NA, NA, NA, NA…
$ legpress_set2_reps     <dbl> NA, NA, NA, NA, NA, 10, NA, NA, NA, NA, NA, NA,…
$ legpress_set3_weight   <dbl> NA, NA, NA, NA, NA, 130, NA, NA, NA, NA, NA, NA…
$ legpress_set3_reps     <dbl> NA, NA, NA, NA, NA, 10, NA, NA, NA, NA, NA, NA,…
$ squat_set1_weight      <dbl> NA, 95, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ squat_set1_reps        <dbl> NA, 10, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ squat_set2_weight      <dbl> NA, 45, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ squat_set2_reps        <dbl> NA, 10, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ squat_set3_weight      <dbl> NA, 45, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ squat_set3_reps        <dbl> NA, 10, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ pushup_set1_weight     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ pushup_set1_reps       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ pushup_set2_weight     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ pushup_set2_reps       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ pushup_set3_weight     <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ pushup_set3_reps       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

We need to pivot everything to long format, but it’s going to be tricky pivoting two outcomes at the same time (weight amount and number of reps). One workaround is to separate our data into subsets, pivot, then join them back together:

Code
# Separate out the variables with only one observation per day
# and calculate some additional running metrics
df_gym <- df_gym_wide |> 
  select(
    !contains("_set"),
    -note
  ) |> 
  mutate(
    run_avg_speed = (run_distance/run_duration)*60,
    run_avg_pace = run_duration/run_distance
  )

# Pivot weight amount
df_gym_weight_long <- df_gym_wide |>
  select(
    date,
    contains("_weight")
  ) |> 
  pivot_longer(
    cols = contains("_weight"),
    names_to = c("exercise_type","set"),
    names_pattern = "(.*)_set(.)_weight",
    values_to = "weight_amt"
  )

# Pivot reps
df_gym_reps_long <- df_gym_wide |>
  select(
    date,
    contains("_reps")
  ) |> 
  pivot_longer(
    cols = contains("_reps"),
    names_to = c("exercise_type","set"),
    names_pattern = "(.*)_set(.)_reps",
    values_to = "reps"
  )

# Join weight amounts with reps
df_gym_wght_rep_long <- df_gym_weight_long |> 
  full_join(
    df_gym_reps_long,
    by = join_by(
      date,
      exercise_type,
      set
    )
  )

# Join remaining variables
df_gym_long <- df_gym |> 
  full_join(
    df_gym_wght_rep_long,
    by = join_by(date)
  )

# Check our work
glimpse(df_gym_long)
Rows: 20,600
Columns: 14
$ date           <dttm> 2025-01-01, 2025-01-01, 2025-01-01, 2025-01-01, 2025-0…
$ weight         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ run            <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", "…
$ run_type       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ run_distance   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ run_duration   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ weight_lifting <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", "…
$ floor_mat      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ run_avg_speed  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ run_avg_pace   <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ exercise_type  <chr> "bench", "bench", "bench", "bench", "bicep", "bicep", "…
$ set            <chr> "1", "2", "3", "4", "1", "2", "3", "1", "2", "3", "1", …
$ weight_amt     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ reps           <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

So far, everything looks good! \(\checkmark\)

Let’s make long-format frames of the running data, both by day and by week:

Code
# Pivot the running data from wide to long and get normalized values
## Norms: both z-scores and min-max normalization
df_run_long <- df_gym |> 
  select(
    date,
    contains("run")
  ) |> 
  select(
    -run,
    -run_type
  ) |> 
  pivot_longer(
    cols = contains("run"),
    values_to = "value",
    names_to = "stat",
    names_pattern = "run_(.*)",
  ) |> 
  group_by(stat) |> 
  mutate(
    value_minmax_norm =
      (value - min(value, na.rm = T))/(max(value, na.rm = T) - min(value, na.rm = T)),
    value_z_norm =
      (value - mean(value, na.rm = T))/sd(value, na.rm = T)
  ) |> 
  ungroup()

# Make a version that's grouped by week as well
df_run_long_week <- df_gym |> 
  select(
    date,
    contains("run")
  ) |> 
  select(
    -run,
    -run_type
  ) |> 
  mutate(
    date = ceiling_date(
      date,
      "week",
      week_start = 6
    )
  ) |>
  group_by(date) |> 
  summarize(
    run_distance_week = sum(run_distance, na.rm = T),
    run_duration_week = sum(run_duration, na.rm = T)
  ) |> 
  ungroup() |> 
  mutate(
    run_avg_speed_week = (run_distance_week/run_duration_week)*60,
    run_avg_pace_week = run_duration_week/run_distance_week
  ) |> 
  mutate(
    across(
      .cols = contains("run"),
      .fns = ~case_when(
        .x == 0 ~ NA,
        is.nan(.x) ~ NA,
        .default = .
      )
    )
  ) |> 
  pivot_longer(
    cols = contains("run"),
    values_to = "value",
    names_to = "stat",
    names_pattern = "run_(.*)",
  ) |> 
  group_by(stat) |> 
  mutate(
    value_minmax_norm =
      (value - min(value, na.rm = T))/(max(value, na.rm = T) - min(value, na.rm = T)),
    value_z_norm =
      (value - mean(value, na.rm = T))/sd(value, na.rm = T)
  ) |> 
  ungroup()

# Check our work
glimpse(df_run_long)
Rows: 2,060
Columns: 5
$ date              <dttm> 2025-01-01, 2025-01-01, 2025-01-01, 2025-01-01, 202…
$ stat              <chr> "distance", "duration", "avg_speed", "avg_pace", "di…
$ value             <dbl> NA, NA, NA, NA, 2.150000, 20.000000, 6.450000, 9.302…
$ value_minmax_norm <dbl> NA, NA, NA, NA, 0.1754098, 0.2000000, 0.5167522, 0.4…
$ value_z_norm      <dbl> NA, NA, NA, NA, -1.6075825, -1.5093074, -1.2307526, …
Code
glimpse(df_run_long_week)
Rows: 296
Columns: 5
$ date              <dttm> 2025-01-04, 2025-01-04, 2025-01-04, 2025-01-04, 202…
$ stat              <chr> "distance_week", "duration_week", "avg_speed_week", …
$ value             <dbl> 6.420000, 60.000000, 6.420000, 9.345794, 10.730000, …
$ value_minmax_norm <dbl> 0.22076979, 0.23076923, 0.14807356, 0.83684960, 0.53…
$ value_z_norm      <dbl> -1.32792261, -1.24564447, -1.80463667, 1.82478640, -…

Now, we can spend a little time calculating any weight lifting metrics that might be interesting (e.g., totals, averages). Let’s add totals to the primary data frame, then we can use summarize() to get daily averages:

Code
# Calculate totals
df_gym_long <- df_gym_long |>
  mutate(
    total_lifted = weight_amt*reps
  )

# Summary statistics by date and exercise type
gym_summary <- df_gym_long |> 
  group_by(
    date, exercise_type
  ) |> 
  summarize(
    total_weight_moved = sum(total_lifted, na.rm = T),
    total_reps = sum(reps, na.rm = T)
  ) |> 
  mutate(
    mean_rep_weight = total_weight_moved/total_reps,
    across(
      .cols = c(
        "total_weight_moved",
        "total_reps",
        "mean_rep_weight"
      ),
      .fns = ~case_when(
        .x == 0 ~ NA,
        is.nan(.x) ~ NA,
        .default = .
      )
    )
  ) |> 
  ungroup()

# Get total amount of weight moved by day and also
# calculate normalized scores (z-scores and min-max norms)
gym_summary <- gym_summary |> 
  group_by(date) |> 
  mutate(
    total_weight_moved_day = sum(total_weight_moved, na.rm = T)
  ) |> 
  ungroup() |> 
  mutate(
    total_weight_moved_day = case_when(
      total_weight_moved_day == 0 ~ NA,
      .default = total_weight_moved_day
    )
  ) |> 
  group_by(exercise_type) |> 
  mutate(
    cumulative_lifted = cumsum(
      coalesce(total_weight_moved_day, 0)
    ),
    total_weight_moved_minmax_norm =
      (total_weight_moved - min(total_weight_moved, na.rm = T))/(max(total_weight_moved, na.rm = T) - min(total_weight_moved, na.rm = T)),
    total_weight_moved_z_norm =
      (total_weight_moved - mean(total_weight_moved, na.rm = T))/sd(total_weight_moved, na.rm = T),
    mean_rep_weight_minmax_norm =
      (mean_rep_weight - min(mean_rep_weight, na.rm = T))/(max(mean_rep_weight, na.rm = T) - min(mean_rep_weight, na.rm = T)),
    mean_rep_weight_z_norm =
      (mean_rep_weight - mean(mean_rep_weight, na.rm = T))/sd(mean_rep_weight, na.rm = T)
  )

# Check our work
glimpse(gym_summary)
Rows: 6,695
Columns: 11
Groups: exercise_type [13]
$ date                           <dttm> 2025-01-01, 2025-01-01, 2025-01-01, 20…
$ exercise_type                  <chr> "back", "bench", "bicep", "invrtpress",…
$ total_weight_moved             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ total_reps                     <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ mean_rep_weight                <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ total_weight_moved_day         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ cumulative_lifted              <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ total_weight_moved_minmax_norm <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ total_weight_moved_z_norm      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ mean_rep_weight_minmax_norm    <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ mean_rep_weight_z_norm         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…

Just like we did with the running data above, let’s also create a version of the data that’s grouped by week:

Code
# Summary statistics after "rounding" the date values
gym_summary_week <- df_gym_wght_rep_long |> 
  mutate(
    date = ceiling_date(
      date,
      "week",
      week_start = 6
    )
  ) |> 
  mutate(
    total_lifted = weight_amt*reps
  ) |> 
  group_by(
    date, exercise_type
  ) |>
  summarize(
    total_weight_moved = sum(total_lifted, na.rm = T),
    total_reps = sum(reps, na.rm = T)
  ) |> 
  mutate(
    mean_rep_weight = total_weight_moved/total_reps,
    across(
      .cols = c(
        "total_weight_moved",
        "total_reps",
        "mean_rep_weight"
      ),
      .fns = ~case_when(
        .x == 0 ~ NA,
        is.nan(.x) ~ NA,
        .default = .
      )
    )
  ) |> 
  ungroup()

gym_summary_week <- gym_summary_week |> 
  group_by(date) |> 
  mutate(
    total_weight_moved_week = sum(total_weight_moved, na.rm = T)
  ) |> 
  ungroup() |> 
  mutate(
    total_weight_moved_week = case_when(
      total_weight_moved_week == 0 ~ NA,
      .default = total_weight_moved_week
    )
  ) |> 
  group_by(exercise_type) |> 
  mutate(
    cumulative_lifted = cumsum(
      coalesce(total_weight_moved_week, 0)
    ),
    total_weight_moved_week_minmax_norm =
      (total_weight_moved - min(total_weight_moved, na.rm = T))/(max(total_weight_moved, na.rm = T) - min(total_weight_moved, na.rm = T)),
    total_weight_moved_week_z_norm =
      (total_weight_moved - mean(total_weight_moved, na.rm = T))/sd(total_weight_moved, na.rm = T),
    mean_rep_weight_minmax_norm =
      (mean_rep_weight - min(mean_rep_weight, na.rm = T))/(max(mean_rep_weight, na.rm = T) - min(mean_rep_weight, na.rm = T)),
    mean_rep_weight_z_norm =
      (mean_rep_weight - mean(mean_rep_weight, na.rm = T))/sd(mean_rep_weight, na.rm = T)
  )

# Check our work
glimpse(gym_summary_week)
Rows: 962
Columns: 11
Groups: exercise_type [13]
$ date                                <dttm> 2025-01-04, 2025-01-04, 2025-01-0…
$ exercise_type                       <chr> "back", "bench", "bicep", "invrtpr…
$ total_weight_moved                  <dbl> 4000, 2900, 750, NA, NA, NA, NA, 3…
$ total_reps                          <dbl> 30, 24, 30, NA, NA, NA, NA, 30, 30…
$ mean_rep_weight                     <dbl> 133.33333, 120.83333, 25.00000, NA…
$ total_weight_moved_week             <dbl> 21875, 21875, 21875, 21875, 21875,…
$ cumulative_lifted                   <dbl> 21875, 21875, 21875, 21875, 21875,…
$ total_weight_moved_week_minmax_norm <dbl> 0.00000000, 0.00000000, 0.03703704…
$ total_weight_moved_week_z_norm      <dbl> -1.62539575, -1.98731104, -1.67457…
$ mean_rep_weight_minmax_norm         <dbl> 0.00000000, 0.17041801, 0.29411765…
$ mean_rep_weight_z_norm              <dbl> -2.5529101, -1.5300586, -0.8373731…

Phew! It looks like we finally have all the data prepped and ready to use. \(\checkmark\)

Code
# Optional: Save an image of the workspace as a shortcut for future use
save.image(
  file = here(
    "data",
    "gym_wrkspc.RData"
  )
)

7.3 Visualizations

Now that we’ve cleaned up the data, we can start to explore trends in exercise activity over time and audition ways of collapsing multiple dimensions (e.g., exercise frequency and intensity) into common plots. Importantly, we will also need to consider normalized scores to illustrate different exercises side-by-side, since the absolute weight amounts will vary widely between things like bench press and bicep curls. We have two normalization methods implemented at this point (min-max and standard \(z\)-scores), so we’ll see if one works better than the other for our purposes.

7.3.1 Mean Weight Moved by Day and Week

Starting with some exploratory plots of mean weight moved per repetition, grouped by exercise type:

Code
# Mean weight per rep, by exercise and day
gym_summary |> 
  filter(
    exercise_type != "pushup"
  ) |> 
  ggplot(aes(
    x = date,
    y = mean_rep_weight
  )) +
  geom_point() +
  geom_smooth(
    se = F
  ) +
  facet_wrap(
    ~exercise_type,
    nrow = 12,
    scales = "free"
  ) +
  labs(
    title = "Mean Rep Weight by Day and Exercise"
  )

Code
# Same plot of mean weight per rep, but use min-max normalization
# to show all exercises in a common plot
gym_summary |> 
  filter(
    exercise_type != "pushup"
  ) |> 
  ggplot(aes(
    x = date,
    y = mean_rep_weight_minmax_norm,
    color = exercise_type,
    lty = exercise_type
  )) +
  # geom_point() +
  geom_smooth()

Code
# Experiment with showing these data in a heat map instead
gym_summary |> 
  filter(
    exercise_type != "pushup"
  ) |>
  # na.omit() |> 
  ggplot(aes(
    x = date,
    y = exercise_type
  )) +
  geom_tile(
    aes(
      fill = mean_rep_weight_minmax_norm
    )
  )

Code
# What about standard scores instead?
gym_summary |> 
  filter(
    exercise_type != "pushup"
  ) |> 
  ggplot(aes(
    x = date,
    y = mean_rep_weight_z_norm,
    color = exercise_type,
    lty = exercise_type
  )) +
  # geom_point() +
  geom_smooth()

Having gotten our legs under us, we can start to save some of the more promising plot options.

Code
# Z-Normed Data
(plot_tile_z <- gym_summary |> 
  filter(
    exercise_type != "pushup"
  ) |>
  na.omit() |>
  ggplot(aes(
    x = date,
    y = exercise_type
  )) +
  geom_tile(
    aes(
      fill = mean_rep_weight_z_norm
    )
  ) +
  scale_y_discrete(
    limits = rev
  ) +
  scale_fill_viridis_c(
    option = "F"
  ) +
   labs(
     x = "Date"
   ) +
   theme(
     axis.text.x = element_text(
       angle = 45,
       hjust = 1
     )
   ))

Code
# It's hard to see the lighter tiles, so let's darken the background and
# make a few other adjustements
(plot_tile_z2 <- plot_tile_z +
  theme(
    axis.title.x = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks = element_blank(),
    plot.margin = margin(b = 0),
    # legend.position = "none",
    panel.background = element_rect(
      fill = "grey80"
    )
  ) +
  scale_y_discrete(
    limits = rev,
    labels = c(
        "Tricep (2)",
        "Tricep (1)",
        "Hip Adductor",
        "Hip Abductor",
        "Squat",
        "Trapezius",
        "Quadriceps",
        "Leg Press",
        "Leg Press (Invrt)",
        "Biceps",
        "Bench Press",
        "Cable Row"
      )
  ) +
  labs(
    y = "Weight-Based Exercise",
    fill = ""
  )
)

These heat maps are starting to look promising. They show mean weight lifted per rep and also frequency of each exercise at the resolution of one day per tile (i.e., we can see how many gaps or tiles there are, so we know how frequently an activity occurred, day to day). Still though, this is looking far too busy, so maybe we need to collapse further and start looking at activities on a week-to-week basis (i.e., decrease temporal resolution).

Code
# By week
(plot_tile_z_wk <- gym_summary_week |> 
  filter(
    exercise_type != "pushup"
  ) |>
  na.omit() |>
  ggplot(aes(
    x = date,
    y = exercise_type
  )) +
  geom_tile(
    aes(
      fill = mean_rep_weight_z_norm
    )
  ) +
  scale_y_discrete(
    limits = rev
  ) +
  scale_fill_viridis_c(
    option = "F"
  ) +
    labs(
     x = "Date"
   ) +
   theme(
     axis.text.x = element_text(
       angle = 45,
       hjust = 1
     )
   ))

Code
(plot_tile_z_wk2 <- plot_tile_z_wk +
  theme(
    axis.title.x = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks = element_blank(),
    plot.margin = margin(b = 0),
    # legend.position = "none",
    panel.background = element_rect(
      fill = "grey80"
    )
  ) +
  scale_y_discrete(
    limits = rev,
    labels = c(
        "Tricep (2)",
        "Tricep (1)",
        "Hip Adductor",
        "Hip Abductor",
        "Squat",
        "Trapezius",
        "Quadriceps",
        "Leg Press",
        "Leg Press (Invrt)",
        "Biceps",
        "Bench Press",
        "Cable Row"
      )
  ) +
  labs(
    y = "Weight-Based Exercise",
    fill = ""
  )
)

Let’s go back to min-max normalization and see how things look.

Code
# Min/Max Normalization
(plot_tile_minmax <- gym_summary |> 
  filter(
    exercise_type != "pushup"
  ) |>
  na.omit() |>
  ggplot(aes(
    x = date,
    y = exercise_type
  )) +
  geom_tile(
    aes(
      fill = mean_rep_weight_minmax_norm
    )
  ) +
  scale_y_discrete(
    limits = rev
  ) +
  scale_fill_viridis_c(
    option = "F"
  ) +
   labs(
     x = "Date"
   ) +
   theme(
     axis.text.x = element_text(
       angle = 45,
       hjust = 1
     )
   ))

Code
(plot_tile_minmax2 <- plot_tile_minmax +
  theme(
    axis.title.x = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks = element_blank(),
    plot.margin = margin(b = 0),
    # legend.position = "none",
    panel.background = element_rect(
      fill = "grey80"
    )
  ) +
  scale_y_discrete(
    limits = rev,
    labels = c(
        "Tricep (2)",
        "Tricep (1)",
        "Hip Adductor",
        "Hip Abductor",
        "Squat",
        "Trapezius",
        "Quadriceps",
        "Leg Press",
        "Leg Press (Invrt)",
        "Biceps",
        "Bench Press",
        "Cable Row"
      )
  ) +
  labs(
    y = "Weight-Based Exercise",
    fill = ""
  ))

Code
# By week
(plot_tile_minmax_wk <- gym_summary_week |> 
  filter(
    exercise_type != "pushup"
  ) |>
  na.omit() |>
  ggplot(aes(
    x = date,
    y = exercise_type
  )) +
  geom_tile(
    aes(
      fill = mean_rep_weight_minmax_norm
    )
  ) +
  scale_y_discrete(
    limits = rev
  ) +
  scale_fill_viridis_c(
    option = "F"
  ) +
  labs(
     x = "Date"
   ) +
   theme(
     axis.text.x = element_text(
       angle = 45,
       hjust = 1
     )
   ))

Code
(plot_tile_minmax_wk2 <- plot_tile_minmax_wk +
  theme(
    axis.title.x = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks = element_blank(),
    plot.margin = margin(b = 0),
    # legend.position = "none",
    panel.background = element_rect(
      fill = "grey80"
    )
  ) +
  scale_y_discrete(
    limits = rev,
    labels = c(
        "Tricep (2)",
        "Tricep (1)",
        "Hip Adductor",
        "Hip Abductor",
        "Squat",
        "Trapezius",
        "Quadriceps",
        "Leg Press",
        "Leg Press (Invrt)",
        "Biceps",
        "Bench Press",
        "Cable Row"
      )
  ) +
  labs(
    y = "Weight-Based Exercise",
    fill = ""
  ))

We can see that min-max normalization yields more dramatic color changes in some of these heat maps, but we should keep in mind that this method is overly susceptible to outliers. As such, I personally like the more stable z-scored values.

Also, we can see that the weekly resolution seems to look more coherent. The day-by-day heat maps are too hard to read with all those gaps and narrow tiles.

7.3.2 Total Weight Moved by Day and Week

So far we’ve looked a lot at mean weight moved per repetition within discrete exercises. Now, let’s consider total activity, which we’ve defined as the total amount of weight moved (i.e., the sum of all weight across all reps). Like before, we’ll start by looking at the day-resolution data, then move to week-resolution versions of the plots.

Code
# Total weight moved, by exercise and day
gym_summary |> 
  filter(
    exercise_type != "pushup"
  ) |> 
  ggplot(aes(
    x = date,
    y = total_weight_moved
  )) +
  geom_point() +
  geom_smooth(
    se = F
  ) +
  facet_wrap(
    ~exercise_type,
    nrow = 12,
    scales = "free"
  ) +
  labs(
    title = "Total Weight Moved Per Day by Exercise"
  )

Code
# Z-Normed Data
(plot_tile_total_z <- gym_summary |> 
  filter(
    exercise_type != "pushup"
  ) |>
  na.omit() |>
  ggplot(aes(
    x = date,
    y = exercise_type
  )) +
  geom_tile(
    aes(
      fill = total_weight_moved_z_norm
    )
  ) +
  scale_y_discrete(
    limits = rev
  ) +
  scale_fill_viridis_c(
    option = "F"
  ) +
   labs(
     x = "Date"
   ) +
   theme(
     axis.text.x = element_text(
       angle = 45,
       hjust = 1
     )
   ))

Code
(plot_tile_total_z2 <- plot_tile_total_z +
  theme(
    axis.title.x = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks = element_blank(),
    plot.margin = margin(b = 0),
    # legend.position = "none",
    panel.background = element_rect(
      fill = "grey80"
    )
  ) +
  scale_y_discrete(
    limits = rev,
    labels = c(
        "Tricep (2)",
        "Tricep (1)",
        "Hip Adductor",
        "Hip Abductor",
        "Squat",
        "Trapezius",
        "Quadriceps",
        "Leg Press",
        "Leg Press (Invrt)",
        "Biceps",
        "Bench Press",
        "Cable Row"
      )
  ) +
  labs(
    y = "Weight-Based Exercise",
    fill = ""
  ))

Code
# By week
(plot_tile_total_z_wk <- gym_summary_week |> 
  filter(
    exercise_type != "pushup"
  ) |>
  na.omit() |>
  ggplot(aes(
    x = date,
    y = exercise_type
  )) +
  geom_tile(
    aes(
      fill = total_weight_moved_week_z_norm
    )
  ) +
  scale_y_discrete(
    limits = rev
  ) +
  scale_fill_viridis_c(
    option = "F"
  ) +
    labs(
     x = "Date"
   ) +
   theme(
     axis.text.x = element_text(
       angle = 45,
       hjust = 1
     )
   ))

Code
(plot_tile_total_z_wk2 <- plot_tile_total_z_wk +
  theme(
    axis.title.x = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks = element_blank(),
    plot.margin = margin(b = 0),
    # legend.position = "none",
    panel.background = element_rect(
      fill = "grey80"
    )
  ) +
  scale_y_discrete(
    limits = rev,
    labels = c(
        "Tricep (2)",
        "Tricep (1)",
        "Hip Adductor",
        "Hip Abductor",
        "Squat",
        "Trapezius",
        "Quadriceps",
        "Leg Press",
        "Leg Press (Invrt)",
        "Biceps",
        "Bench Press",
        "Cable Row"
      )
  ) +
  labs(
    y = "Weight-Based Exercise",
    fill = ""
  ))

Code
# Min/Max Normalization
(plot_tile_total_minmax <- gym_summary |> 
  filter(
    exercise_type != "pushup"
  ) |>
  na.omit() |>
  ggplot(aes(
    x = date,
    y = exercise_type
  )) +
  geom_tile(
    aes(
      fill = total_weight_moved_minmax_norm
    )
  ) +
  scale_y_discrete(
    limits = rev
  ) +
  scale_fill_viridis_c(
    option = "F"
  ) +
   labs(
     x = "Date"
   ) +
   theme(
     axis.text.x = element_text(
       angle = 45,
       hjust = 1
     )
   ))

Code
(plot_tile_total_minmax2 <- plot_tile_total_minmax +
  theme(
    axis.title.x = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks = element_blank(),
    plot.margin = margin(b = 0),
    # legend.position = "none",
    panel.background = element_rect(
      fill = "grey80"
    )
  ) +
  scale_y_discrete(
    limits = rev,
    labels = c(
        "Tricep (2)",
        "Tricep (1)",
        "Hip Adductor",
        "Hip Abductor",
        "Squat",
        "Trapezius",
        "Quadriceps",
        "Leg Press",
        "Leg Press (Invrt)",
        "Biceps",
        "Bench Press",
        "Cable Row"
      )
  ) +
  labs(
    y = "Weight-Based Exercise",
    fill = ""
  ))

Code
# Total weight moved, by exercise and week
gym_summary_week |> 
  filter(
    exercise_type != "pushup"
  ) |> 
  ggplot(aes(
    x = date,
    y = total_weight_moved
  )) +
  geom_point() +
  geom_smooth(
    se = F
  ) +
  facet_wrap(
    ~exercise_type,
    nrow = 12,
    scales = "free"
  ) +
  labs(
    title = "Total Weight Moved Per Week by Exercise"
  )

Code
# By week
(plot_tile_total_minmax_wk <- gym_summary_week |> 
  filter(
    exercise_type != "pushup"
  ) |>
  na.omit() |>
  ggplot(aes(
    x = date,
    y = exercise_type
  )) +
  geom_tile(
    aes(
      fill = total_weight_moved_week_minmax_norm
    )
  ) +
  scale_y_discrete(
    limits = rev
  ) +
  scale_fill_viridis_c(
    option = "F"
  ) +
  labs(
     x = "Date"
   ) +
   theme(
     axis.text.x = element_text(
       angle = 45,
       hjust = 1
     )
   ))

Code
(plot_tile_total_minmax_wk2 <- plot_tile_total_minmax_wk +
  theme(
    axis.title.x = element_blank(),
    axis.text.x = element_blank(),
    axis.ticks = element_blank(),
    plot.margin = margin(b = 0),
    # legend.position = "none",
    panel.background = element_rect(
      fill = "grey80"
    )
  ) +
  scale_y_discrete(
    limits = rev,
    labels = c(
        "Tricep (2)",
        "Tricep (1)",
        "Hip Adductor",
        "Hip Abductor",
        "Squat",
        "Trapezius",
        "Quadriceps",
        "Leg Press",
        "Leg Press (Invrt)",
        "Biceps",
        "Bench Press",
        "Cable Row"
      )
  ) +
  labs(
    y = "Weight-Based Exercise",
    fill = ""
  )
)
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

We notice a lot of the same aesthetic trends as we did with mean weight per rep (e.g., standard scores are more stable; week-resolution is easier to look at, generally). Thinking about the actual data though, we notice an inversion of the trends of mean weight per rep over time when we look at totals by week. That is, I seem to have gotten stronger per rep, which is reflected in both mean rep weight and totals per day, but when the time window is widened to 1 week, we start to capture frequency information and see a gradual decrease in the total amount of weight I’m moving every week. Oh no!

7.3.3 Cumulative Weight Moved

Out of curiosity, let’s take a detour and look at cumulative weight moved over the entire data set:

Code
# Cumulative weight moved
gym_summary |>
  filter(
    
    ## This is misleading, so take note: we're filtering to one exercise
    ## since each exercise has this total stat, so we don't want to use
    ## them all and end up with a bunch of redundant elements
    exercise_type == "bench"
    
  ) |> 
  ggplot(aes(
    x = date,
    y = cumulative_lifted
  )) +
  geom_area()

Wowza! It seems that I’ve moved over 6,000,000 lbs. across all exercises since I started keeping track! Notice though that the slope of this area plot starts to become less steep in the latter months of 2025, which reflects the same decreases in total weight moved per week as described above.

7.3.4 Change in Body Weight Over Time

Now let’s see how body weight has fluctuated over time. We can also annotate potentially relevant life events that correspond with shifts in the line plot:

Code
# Plot body weight
plot_weight <- df_gym |> 
  filter(!is.na(weight)) |> 
  ggplot(aes(
    x = date,
    y = weight
  )) +
  geom_line()

(plot_weight2 <- plot_weight +
    theme(
      plot.margin = margin(t = 0),
      panel.background = element_rect(
      fill = "grey80"
    )
    ) +
    labs(
      x = "Date",
      y = "Body Weight\n(lbs.)"
    ) +
   theme(
     axis.text.x = element_text(
       angle = 45,
       hjust = 1
     )
   ) +
  
  # Add annotations
    geom_vline(
    xintercept = as.Date("2025-06-05"),
    lty = "dashed"
  ) +
  annotate(
    geom = "text",
    x = as.Date("2025-06-05"),
    y = 207.5,
    label = "Travel",
    angle = 90,
    vjust = -0.5
  ) +
    geom_vline(
    xintercept = as.Date("2025-08-11"),
    lty = "dashed"
  ) +
  annotate(
    geom = "text",
    x = as.Date("2025-08-11"),
    y = 207.5,
    label = "Staycation",
    angle = 90,
    vjust = -0.5
  ) +
  geom_vline(
    xintercept = as.Date("2025-09-29"),
    lty = "dashed"
  ) +
  annotate(
    geom = "text",
    x = as.Date("2025-09-29"),
    y = 207.5,
    label = "Grad School",
    angle = 90,
    vjust = -0.5
  ) +
  geom_vline(
    xintercept = as.Date("2026-01-05"),
    lty = "dashed"
  ) +
  annotate(
    geom = "text",
    x = as.Date("2026-01-05"),
    y = 210,
    label = "Winter Term",
    angle = 90,
    vjust = -0.5
  ) +
  geom_vline(
    xintercept = as.Date("2026-03-30"),
    lty = "dashed"
  ) +
  annotate(
    geom = "text",
    x = as.Date("2026-03-30"),
    y = 212.5,
    label = "BMI 525",
    angle = 90,
    vjust = -0.5
  ))

7.3.5 Running Stats

Moving away from weight, let’s take a look at the numbers related to running/jogging. Again, we can audition a by-day version of the heat map before trying out the by-week version.

Code
# z-score normalization
(plot_run_z <- df_run_long |> 
  na.omit() |> 
  filter(
    stat %in% c("distance","avg_speed")
  ) |> 
  ggplot(aes(
    x = date,
    y = stat
  )) +
  geom_tile(
    aes(
      fill = value_z_norm
    )
  ) +
  scale_fill_viridis_c(
    option = "D"
  ))

Code
(plot_run_z2 <- plot_run_z +
    theme(
      axis.title.x = element_blank(),
      axis.text.x = element_blank(),
      axis.ticks = element_blank(),
      plot.margin = margin(t = 0, b = 0),
      panel.background = element_rect(
        fill = "grey80"
      ),
      # legend.position = "none"
    ) +
    labs(
      x = "Date",
      y = "Running",
      fill = ""
    ) +
    scale_y_discrete(
      labels = c(
        "Speed",
        "Distance"
      )
    )
   )

Code
# By week
(plot_run_z_wk <- df_run_long_week |> 
  na.omit() |>
  filter(
    stat %in% c("distance_week","avg_speed_week")
  ) |>
  ggplot(aes(
    x = date,
    y = stat
  )) +
  geom_tile(
    aes(
      fill = value_z_norm
    )
  ) +
  scale_fill_viridis_c(
    option = "D"
  ))

Code
(plot_run_z_wk2 <- plot_run_z_wk +
    theme(
      axis.title.x = element_blank(),
      axis.text.x = element_blank(),
      axis.ticks = element_blank(),
      plot.margin = margin(t = 0, b = 0),
      panel.background = element_rect(
        fill = "grey80"
      ),
      # legend.position = "none"
    ) +
    labs(
      x = "Date",
      y = "Running",
      fill = ""
    ) +
    scale_y_discrete(
      labels = c(
        "Speed",
        "Distance"
      )
    )
   )

Code
# Min/Max Normalization
(plot_run_minmax <- df_run_long |> 
  na.omit() |>
  filter(
    stat %in% c("distance","avg_speed")
  ) |>
  ggplot(aes(
    x = date,
    y = stat
  )) +
  geom_tile(
    aes(
      fill = value_minmax_norm
    )
  ) +
  scale_fill_viridis_c(
    option = "D"
  ))

Code
(plot_run_minmax2 <- plot_run_minmax +
    theme(
      axis.title.x = element_blank(),
      axis.text.x = element_blank(),
      axis.ticks = element_blank(),
      plot.margin = margin(t = 0, b = 0),
      panel.background = element_rect(
      fill = "grey80"
      ),
      # legend.position = "none"
    ) +
    labs(
      x = "Date",
      y = "Running",
      fill = ""
    ) +
    scale_y_discrete(
      labels = c(
        "Speed",
        "Distance"
      )
    )
   )

Code
# By week
(plot_run_minmax_wk <- df_run_long_week |> 
  na.omit() |>
  filter(
    stat %in% c("distance_week","avg_speed_week")
  ) |>
  ggplot(aes(
    x = date,
    y = stat
  )) +
  geom_tile(
    aes(
      fill = value_minmax_norm
    )
  ) +
  scale_fill_viridis_c(
    option = "D"
  ))

Code
(plot_run_minmax_wk2 <- plot_run_minmax_wk +
    theme(
      axis.title.x = element_blank(),
      axis.text.x = element_blank(),
      axis.ticks = element_blank(),
      plot.margin = margin(t = 0, b = 0),
      panel.background = element_rect(
      fill = "grey80"
      ),
      # legend.position = "none"
    ) +
    labs(
      x = "Date",
      y = "Running",
      fill = ""
    ) +
    scale_y_discrete(
      labels = c(
        "Speed",
        "Distance"
      )
    )
   )

And like before, it seems that standard scoring and week-resolution will work best for our purposes. \(\checkmark\)

7.3.6 Combining Plots

Okay, now that we’ve spent so much time creating and saving discrete plots, let’s try to combine the anaerobic and aerobic heat maps together with the body weight line plot. These all have the same time-scale, so they can share an x-axis if we use patchwork to plot them together.

7.3.6.1 By Day

We know that by-day doesn’t work so well, but we’ll give it one last try anyway:

Code
# # z-scored versions
# plot_tile_z2/plot_run_z2/plot_weight2 +
#   plot_layout(
#     axes = "collect",
#     heights = c(4,2,2)
#   ) &
#   scale_x_date(
#     date_labels = "%y %b",
#     date_breaks = "1 month",
#     limits = as.Date(c("2024-12-31", "2026-06-04"))
#   ) &
#   plot_annotation(
#     title = "Mean Rep Weight By Exercise",
#     subtitle = "(Z-scored; by day)"
#   )

# Min/Max normalization versions
plot_tile_minmax2/plot_run_minmax2/plot_weight2 +
  plot_layout(
    axes = "collect",
    heights = c(4,2,2)
  ) &
  scale_x_date(
    date_labels = "%y %b",
    date_breaks = "1 month",
    limits = as.Date(c("2024-12-31", "2026-06-04"))
  ) &
  plot_annotation(
    title = "Mean Rep Weight By Exercise",
    subtitle = "(min-max normalization; by day)"
  )

7.3.6.2 By Week

The week-resolution data seem to look better in heat maps, so we’ll plot four potential candidates for our final figure: mean weight per rep and total activity, each with min-max and standard score normalization options.

Code
# Z-scored versions
plot_tile_z_wk2/plot_run_z_wk2/plot_weight2 +
  plot_layout(
    axes = "collect",
    heights = c(4,2,2)
  ) &
  scale_x_date(
    date_labels = "%y %b",
    date_breaks = "1 month",
    limits = as.Date(c("2024-12-31", "2026-06-04"))
  ) &
  plot_annotation(
    title = "Mean Rep Weight By Exercise",
    subtitle = "(Z-scored; by week)"
  )

Code
# Select our FINAL figure from the line-up
(final_fig <- plot_tile_total_z_wk2/plot_run_z_wk2/plot_weight2 +
  plot_layout(
    axes = "collect",
    heights = c(4,1,3)
  ) &
  scale_x_date(
    date_labels = "%b, '%y",
    date_breaks = "1 month",
    limits = as.Date(c("2024-12-31", "2026-06-04"))
  ) &
  plot_annotation(
    title = "Total Weekly Activity By Exercise",
    subtitle = "(z-score normalization)"
  ) &
  theme(
    text = element_text(
      family = "Century Gothic",
      size = 12
    ),
    plot.title = element_text(
      size = 18,
      hjust = 0.54
    ),
    plot.subtitle = element_text(
      size = 14,
      hjust = 0.54
    )
  ))

Code
# Min/Max normalization versions
plot_tile_minmax_wk2/plot_run_minmax_wk2/plot_weight2 +
  plot_layout(
    axes = "collect",
    heights = c(4,2,2)
  ) &
  scale_x_date(
    date_labels = "%y %b",
    date_breaks = "1 month",
    limits = as.Date(c("2024-12-31", "2026-06-04"))
  ) &
  plot_annotation(
    title = "Mean Rep Weight By Exercise",
    subtitle = "(min-max normalization; by week)"
  )

Code
plot_tile_total_minmax_wk2/plot_run_minmax_wk2/plot_weight2 +
  plot_layout(
    axes = "collect",
    heights = c(4,1,3)
  ) &
  scale_x_date(
    date_labels = "%y %b",
    date_breaks = "1 month",
    limits = as.Date(c("2024-12-31", "2026-06-04"))
  ) &
  plot_annotation(
    title = "Total Weight Moved Per Week",
    subtitle = "(min-max normalization)"
  ) &
  theme(
    text = element_text(
      family = "Century Gothic",
      size = 12
    ),
    plot.title = element_text(
      size = 16
    )
  )

Aha! I believe that the \(z\)-scored total weekly activity plot looks the cleanest, since it shows a “smoother” color gradient, but still illustrates activity changes that align with life events (mainly starting grad school). Min-max normalization looks a little too extreme to my eye, and the mean weight per rep plots don’t address my original questions about activity levels as clearly, since that particular metric isn’t sensitive to exercise frequency.

7.3.6.3 Save the final figure

Code
# Save a .PNG copy of the final figure
ggsave(
  "final_fig.png",
  final_fig,
  width = 12,
  height = 7,
  path = here(
    "figures"
  )
)

# Save the final figure as an R object
save(
  final_fig,
  file = "./data/final_fig.Rdata"
)

8 Slideshow Presentation

For a more concise summary of my final figure and process, please see the slides from my in-class presentation to our BMI 525 cohort: