Soviet Space Dogs (Part 2)

“A data visualization is not a piece of art meant to be looked at only for its aesthetically pleasing features. Instead, its purpose is to convey information and make a point. To reliably achieve this goal when preparing visualizations, we have to place the data into context and provide accompanying titles, captions, and other annotations.”  Claus O. Wilke, Fundamentals of Data Visualization

In my last blog post I detailed the data cleaning and tidying steps I took to transform 2 csv files containing information on the flights of the Soviet space dog program. In this blog post I will take the data and turn into a visualisation. The data was supplied by Duncan Geere and you can check out his amazing visualisation of the data here.

Rather than detail all aspects of the visualisation, I’m going to focus on 2 packages, ggforce and ggtext, that provide some excellent annotation features. If you’d like to see a more in-depth case study of customised plots using ggplot2 you can check out some of my other blog posts here and here. Alternatively, please check out the evolution of a ggplot by Cédric Scherer which provides a fantastic step-by-step guide.

Space Dog Data

Let’s first refresh our memory of what the data is:

library(tidyverse)

glimpse(dogs_tidy)
## Observations: 81
## Variables: 9
## $ name_latin    <chr> "Dezik", "Dezik", "Tsygan", "Lisa", "Chizhik", "Ch…
## $ name_english  <chr> "Dezik", "Dezik", "Gypsy", "Fox", "Siskin", "Siski…
## $ name_cyrillic <chr> "Дезик", "Дезик", "Цыган", "Лиса", "Чижик", "Чижик…
## $ gender        <chr> "Male", "Male", "Male", "Female", "Male", "Male", …
## $ flights       <chr> "1951-07-22", "1951-07-29", "1951-07-22", "1951-07…
## $ date_flight   <date> 1951-07-22, 1951-07-29, 1951-07-22, 1951-07-29, 1…
## $ date_death    <date> 1951-07-29, 1951-07-29, NA, 1951-07-29, 1951-08-2…
## $ flight_fate   <chr> "Survived", "Died", "Survived", "Died", "Survived"…
## $ notes         <chr> NA, NA, "Adopted as a pet by Soviet physicist Anat…
glimpse(flights_tidy)
## Observations: 42
## Variables: 6
## $ date_flight  <date> 1951-07-22, 1951-07-29, 1951-08-15, 1951-08-19, 19…
## $ rocket       <chr> "R-1V", "R-1B", "R-1B", "R-1V", "R-1B", "R-1B", "R-…
## $ altitude_km  <chr> "100", "100", "100", "100", "100", "100", "100", "1…
## $ result       <chr> "recovered safely", "parachute failed, both dogs di…
## $ notes_flight <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, "no rocket or a…
## $ altitude     <dbl> 100, 100, 100, 100, 100, 100, 100, 100, 100, NA, 10…

So the 1st dataset contains a record of each unique dog and flight combination, and the 2nd dataset contains further information on each flight. The 2 datasets can be merged into one by joining on the date_flight variable:

all_dogs_flights <- dogs_tidy %>% 
  inner_join(flights_tidy, by = "date_flight") %>% 
  mutate(flight_year = year(date_flight)) %>% 
  arrange(date_flight, name_latin) %>% 
  group_by(flight_year) %>% 
  mutate(year_pos = row_number())

glimpse(all_dogs_flights)
## Observations: 81
## Variables: 16
## Groups: flight_year [10]
## $ name_latin    <chr> "Lisa-2", "Ryzhik-2", "Dezik", "Tsygan", "Dezik", …
## $ name_english  <chr> "Fox", "Ginger", "Dezik", "Gypsy", "Dezik", "Fox",…
## $ name_cyrillic <chr> "Лиса", "Рыжик", "Дезик", "Цыган", "Дезик", "Лиса"…
## $ gender        <chr> "Female", "Male", "Male", "Male", "Male", "Female"…
## $ flights       <chr> "1951-06-26", "1951-06-26", "1951-07-22", "1951-07…
## $ date_flight   <date> 1951-06-26, 1951-06-26, 1951-07-22, 1951-07-22, 1…
## $ date_death    <date> 1955-02-05, 1954-07-07, 1951-07-29, NA, 1951-07-2…
## $ flight_fate   <chr> "Survived", "Survived", "Survived", "Survived", "D…
## $ notes         <chr> NA, NA, NA, "Adopted as a pet by Soviet physicist …
## $ rocket        <chr> "R-1D", "R-1D", "R-1V", "R-1V", "R-1B", "R-1B", "R…
## $ altitude_km   <chr> "100", "100", "100", "100", "100", "100", "100", "…
## $ result        <chr> "recovered safely", "recovered safely", "recovered…
## $ notes_flight  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ altitude      <dbl> 100, 100, 100, 100, 100, 100, 100, 100, 100, 100, …
## $ flight_year   <dbl> 1951, 1951, 1951, 1951, 1951, 1951, 1951, 1951, 19…
## $ year_pos      <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 1, …

This is the final dataset that will be used in the visualisation.

Packages for annotation

As I’ve said, I’ll be using 2 packages to help me annotate the plot:

  • ggforce is a package by Thomas Lin Pedersen and is available on CRAN. As described by Thomas it is “a collection of features with the only commonality being their tie to the ggplot2 API”, therefore it offers a lot more than just annotation features, but I will just be focussing on the annotation functions here.

  • ggtext is a package by Clause Wilke which currently can be installed from GitHub. It provides rich text support, enabling the formatting of text via Markdown or basic HTML. If you’ve ever wanted to use different text colours within the same ggplot2 title then this is the package for you!

Once you have the 2 packages installed, they can be loaded into your R session:

library(ggforce)
library(ggtext)

Plot Un-annotated

Here is the code and resulting plot before any annotations are added. Each dog on a flight is represented by a dot, with a line connecting dogs on the same flight. The dots are coloured by the fate of the dog on that flight. There’s a fair bit to chew over in terms of the customising of theme elements, but I won’t be focussing on this here:

p <- ggplot(all_dogs_flights, aes(x = year_pos, y = flight_year)) +
  geom_line(aes(group = date_flight), colour = "white", size = 1.5) +
  geom_point(aes(fill = flight_fate), shape = 21, colour = "white", size = 4.5, stroke = 1.5) +
  scale_y_reverse(breaks = seq(1951, 1966, 1)) +
  scale_fill_manual(values = c("Survived" = "#E69F00", "Died" = "#CC79A7")) +
  labs(y = "", x = "", 
       fill = "Each dot represents a dog and its fate on a mission\nDogs on the same flight are connected by a line",
       title = "Soviet Space Dogs",
       subtitle = "Dogs sent on sub-orbital and orbital space flights by the\nSoviet Space Program in the 1950s and 1960s",
       caption = "Source: @DuncanGeere | Graphic: @committedtotape") +
  theme(plot.background = element_rect(fill = "#383854", colour = "#383854"),
        panel.background = element_rect(fill = "#383854", colour = "#383854"),
        panel.grid.minor = element_blank(),
        panel.grid.major = element_blank(),
        text = element_text(colour = "white", family = "Space Mono"),
        axis.text = element_text(colour = "white"),
        axis.text.x = element_blank(),
        plot.title = element_text(face = "bold", size = 20),
        plot.margin = margin(10,20,10,10),
        legend.background = element_rect(fill = "#383854", colour = "white"),
        legend.key = element_rect(fill = "#383854", colour = "#383854"),
        legend.direction = "horizontal",
        legend.position = c(0.5, 0.2))  

p

Lots of clear blue space to fill! Let’s delve into the stories behind some of the dogs.

ggforce

I’m going to add 3 annotations using the ggforce package. The annotation functions all begin geom_mark and there are 4 of them, which vary by how the points to be annotated are enclosed:

  • geom_mark_circle
  • geom_mark_ellipse
  • geom_mark_hull
  • geom_mark_rect

Typically, when annotating, you want to highlight a point, or a group of points in your plot. These geom_mark_* functions come with a dedicated filter aesthetic which provides an easy way to select the point(s) in question. Within the aes(), you can then supply a label and description. The label adds a title to the annotation, with a description used to add further info below it. There are then multiple arguments to enable you to fine-tune your annotation if you so wish. I’ll touch on a few of them here:

  • label.colour - the text colour. Below I am specifying a vector of 2 colours. When doing this, the 1st colour is used for the label and the 2nd for the description. I decided to colour the label (title) of each annotation the same as the colour of the point it’s referencing (i.e. if the dog died or survived), whilst keeping white the description that follows.
  • label.fill - the fill colour of the annotation box. Below I am setting this to NA to keep a clean look.
  • con.colour - the colour of the line connecting the annotation to the point(s).
  • colour - the colour of the outline of the shape surrounding the point(s). So for example, in a geom_mark_circle this specifies the colour of the circle surrounding the filtered data points. Below, I am again setting this to NA to hide it as aesthetically I didn’t feel it was needed.

The documentation for the package is excellent, so for full details of all the arguments I’ve used head there.

Notice that when just annotating for 1 data point (the dog Laika) I am using geom_circle_mark, whereas when annotating for flights with 2 dogs I am using geom_mark_ellipse.

p1 <- p +
  geom_mark_circle(aes(filter = name_latin == 'Laika', label = 'Laika - 3 November 1957', 
                       description = "The 1st living creature in orbit, never expected to survive"),
                   label.family = "Space Mono",
                   label.fontsize = 10,
                   label.colour = c("#CC79A7", "white"),
                   label.fill = NA,
                   label.buffer = unit(1, 'mm'),
                   con.colour = "white",
                   colour = NA,
                   con.type = "straight",
                   con.cap = 0) +
  geom_mark_ellipse(aes(filter = date_flight == as.Date("1951-07-29"), 
                       label = 'Dezik and Lisa - 29 July 1951', 
                       description = "The 1st deaths, due to parachute failure"),
                   label.family = "Space Mono",
                   label.fontsize = 10,
                   label.colour = c("#CC79A7", "white"),
                   label.fill = NA,
                   label.buffer = unit(1, 'mm'),
                   con.colour = "white",
                   colour = NA,
                   con.type = "straight",
                   con.cap = 0) +
  geom_mark_ellipse(aes(filter = date_flight == as.Date("1960-08-19"), 
                        label = 'Belka and Strelka - 19 August 1960', 
                        description = "Spent a day in space and safely returned to earth"),
                    label.family = "Space Mono",
                    label.fontsize = 10,
                    label.colour = c("#E69F00", "white"),
                    label.fill = NA,
                    label.buffer = unit(1, 'mm'),
                    con.colour = "white",
                    colour = NA,
                    con.type = "straight",
                    con.cap = 0)

p1

This is much more informative now, and has helped fill some of that empty space! As mentioned in the last blog post, poor Laika was never expected to survive her space mission, however, some 3 years later Belka and Strelka returned safely from their orbit.

Let’s make one last addition to the plot.

ggtext

Ever since learning ggplot2 last year, I’d always thought it would be great to have more control over the colour of the text within an element (such as a title or subtitle). It’s something I would frequently see in news graphics, often where words in the title are coloured the same as their manifestation (e.g. a point/bar/line) in the plot. This is now completely achievable with ggplot2, no post-processing steps required! As well as this (which can be achieved using the new theme element element_markdown) you can also add annotations, similar to geom_text, that can accomodate more bespoke formatting, using the new geom_richtext function. Within this function you can then create a label that can consist of markdown or HTML code to format the text.

In the below I am highlighting a dog, Otvazhnaya, who made 7 flights in total, the most of any dog. I’ve changed the colour of the circle outline with a geom_point call, and add an accompanying annotation using geom_richtext. The label contains some markdown to bold the dog’s name, along with some HTML to change the colour of the dog’s name so it matches the circle outline. The NA supplied to the fill and label.colour arguments removes the annotation’s background and outline.

p1 +
  geom_point(data = filter(all_dogs_flights, name_latin == 'Kusachka / Otvazhnaya'), 
             shape = 1, colour = "#64DCF4", size = 4.5, stroke = 2) +
  geom_richtext(aes(x = 4.5, y = 1959, 
                     label = "<span style='color:#64DCF4'>**Otvazhnaya ('Brave One')**</span> made the most flights of any space dog"
                     ),
                 fill = NA, label.color = NA,
                 label.padding = grid::unit(rep(0, 4), "pt"),
                 hjust = 0, family = "Space Mono", 
                 color = "white", size = 3.3)

It may seem a subtle change to have part of an annotation in a different colour, but I think this added functionality when it comes to text formatting is a big win, so thank you Claus!

End

There we have it. We’ve gone from a rather sparse and uninformative plot to something that provides more details on these space-travelling canines. I hope this encourages you to join The Annotation Game!