Spending on Education

Sometimes you just need a little nudge before delving into something new, and sometimes that little nudge is a RStudio competition. Last year I finally gave Shiny ago after seeing RStudio announce the first ever Shiny contest - and things worked out pretty well! So here I am again to try and repeat the trick, but this time - it’s tables!

In this post I’ll be using the gt package to render a table in HTML. There are several packages in R for creating tables, but I was drawn to gt (which is short for the grammar of tables) because of my love for ggplot2 (the grammar of graphics) and their shared philosophy of building output layer by layer, with the flexibility to customise the output.

For the full code to reproduce the below, go to this repo on Github: https://github.com/committedtotape/education-spending.

For a great introduction to the basics of gt, look no further than Thomas Mock’s blogpost, and check out his other posts on building tables in R while you’re at it! I’ll skip over some of the core building blocks that Thomas has covered, as I want to mainly focus on ways to customise a table to give it that little bit extra!

The Data

The data comes from the wonderful TidyTuesday project, and it contains details on the public spending on children from 1997 to 2016 in the US. The data was compiled from Urban Institute by Joshua Rosenberg:

library(tidyverse)
library(gt)

kids <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-09-15/kids.csv')

I am just going to focus on Elementary and Secondary schools spending (see the codebook for descriptions of the types of spending included in data), and I’ll compare across states and years using the inflation adjusted per child figure:

school_spend <- kids %>% 
  filter(variable == "PK12ed") %>% 
  select(state, year, inf_adj_perchild)

school_spend
## # A tibble: 1,020 x 3
##    state                 year inf_adj_perchild
##    <chr>                <dbl>            <dbl>
##  1 Alabama               1997             3.93
##  2 Alaska                1997             7.55
##  3 Arizona               1997             3.71
##  4 Arkansas              1997             3.89
##  5 California            1997             4.28
##  6 Colorado              1997             4.38
##  7 Connecticut           1997             6.70
##  8 Delaware              1997             5.63
##  9 District of Columbia  1997             6.11
## 10 Florida               1997             4.45
## # … with 1,010 more rows

As we are dealing with data on education spending I thought it would be fun to make my final output look like a table written up on a school chalkboard. I love customising ggplot2 graphs in this sort of way, so why not try something similar with my first foray into gt.

Pre-Processing

There are some things I need to do before building the table. I just can’t resist using ggplot2, so I will be embedding some plots into the table. Firstly, I want some sparklines to illustrate how spending has changed over time. Here I create a function that will do just that:

# function for plotting spark lines
plot_spark <- function(name, df) {
  plot_object <-
    ggplot(data = df,
           aes(x = year, y = inf_adj_perchild)) +
    geom_line(colour = "#F3FCF0", size = 12) +
    theme_void() +
    theme(plot.background = element_rect(fill = "#47745A", colour = "#47745A"),
          plot.margin = margin(0,0,0,0))
  return(plot_object)
}

I want a nice, clean sparkline plot so I have removed all non-data ink with theme_void. I am also setting the plot background colour to match the colour of my final table (as you’ll see later).

My table will also display the spending figure at the start and end of the period (1997 and 2016). As an added visual reference I am going to map these amounts to the height of bars, but give them a sketchy, hand-drawn look to align with my chalkboard aesthetic. This took much fiddling about, but a combination of runif, jitter and geom_curve gave me the desired effect. I was really just trying to replicate the look of ggrough, which I tried out but couldn’t get to render within my gt table:

# function for sketchy barplot
plot_bar <- function(name, df, yr) {
  
 barheight <- df %>% 
    filter(year == yr) %>% 
    pull(inf_adj_perchild)
  
  plot_object <-
    tibble( x = runif(8*barheight, -0.01, 0.01), xend = runif(8*barheight, 0.99, 1.01),
            y = runif(8*barheight, 0.1, barheight), yend = y + jitter(0.05),
            y1 = yend, yend1 = y - jitter(1), x1 = xend, xend1 = x) %>% 
    ggplot() +
    geom_curve(aes(x = x, xend = xend, y = y, yend = yend), 
               colour = "white", size = 2, curvature = 0.01, alpha = 0.7) +
    geom_curve(aes(x = x1, xend = xend1, y = y1, yend = yend1), 
               colour = "white", size = 2, curvature = -0.01, alpha = 0.7) +
    scale_y_continuous(limits = c(-0.2,20)) +
    scale_x_continuous(limits = c(-0.2,1.2)) +
    theme_void() +
    theme(plot.background = element_rect(fill = "#47745A",
                                         colour = "#47745A"),
          plot.margin = margin(0,0,0,0))
  return(plot_object)
}

I’ve again used theme_void to keep the plots clutter-free.

I want the plots to appear for each state in the dataset. To do this I group my data by state and then nest to get nested tables of data for each state within my dataframe. I can then map this nested data for each state through my plotting functions. For the bar charts I am creating a 1997 and a 2016 version:

# df of plots for each state
plots <-
  school_spend %>%
  dplyr::group_by(state) %>%
  tidyr::nest() %>%
  dplyr::mutate(
    spark = map2(state, data, plot_spark),
    bar97 = map2(state, data, plot_bar, yr = 1997),
    bar16 = map2(state, data, plot_bar, yr = 2016)) %>% 
  select(-data)

plots
## # A tibble: 51 x 4
## # Groups:   state [51]
##    state                spark  bar97  bar16 
##    <chr>                <list> <list> <list>
##  1 Alabama              <gg>   <gg>   <gg>  
##  2 Alaska               <gg>   <gg>   <gg>  
##  3 Arizona              <gg>   <gg>   <gg>  
##  4 Arkansas             <gg>   <gg>   <gg>  
##  5 California           <gg>   <gg>   <gg>  
##  6 Colorado             <gg>   <gg>   <gg>  
##  7 Connecticut          <gg>   <gg>   <gg>  
##  8 Delaware             <gg>   <gg>   <gg>  
##  9 District of Columbia <gg>   <gg>   <gg>  
## 10 Florida              <gg>   <gg>   <gg>  
## # … with 41 more rows

Now, to get the data in the format I desire, I will filter to just the 2 years I want for each state, and then use pivot_wider to turn these 2 rows for each state into 2 columns. I then also inner_join the plotting information from previous code chunk. Lastly, I create 3 empty variables to serve as placeholders for the plots used in the gt build:

table_prepped <- school_spend %>% 
  filter(year %in% c(1997, 2016)) %>% 
  pivot_wider(names_from = year, values_from = inf_adj_perchild) %>% 
  mutate(percent_change = (`2016` - `1997`) / `1997`) %>% 
  inner_join(plots, by = "state") %>% 
  mutate(ggplot1 = NA,
         ggplot2 = NA,
         ggplot3 = NA) %>% 
  select(state, `1997`, ggplot1, `2016`, ggplot2, percent_change, ggplot3, bar97, bar16, spark)

table_prepped
## # A tibble: 51 x 10
##    state  `1997` ggplot1 `2016` ggplot2 percent_change ggplot3 bar97 bar16 spark
##    <chr>   <dbl> <lgl>    <dbl> <lgl>            <dbl> <lgl>   <lis> <lis> <lis>
##  1 Alaba…   3.93 NA        5.67 NA               0.443 NA      <gg>  <gg>  <gg> 
##  2 Alaska   7.55 NA       10.6  NA               0.408 NA      <gg>  <gg>  <gg> 
##  3 Arizo…   3.71 NA        4.15 NA               0.118 NA      <gg>  <gg>  <gg> 
##  4 Arkan…   3.89 NA        5.93 NA               0.523 NA      <gg>  <gg>  <gg> 
##  5 Calif…   4.28 NA        7.17 NA               0.673 NA      <gg>  <gg>  <gg> 
##  6 Color…   4.38 NA        6.06 NA               0.384 NA      <gg>  <gg>  <gg> 
##  7 Conne…   6.70 NA       10.8  NA               0.614 NA      <gg>  <gg>  <gg> 
##  8 Delaw…   5.63 NA        8.44 NA               0.500 NA      <gg>  <gg>  <gg> 
##  9 Distr…   6.11 NA       17.7  NA               1.89  NA      <gg>  <gg>  <gg> 
## 10 Flori…   4.45 NA        5.50 NA               0.236 NA      <gg>  <gg>  <gg> 
## # … with 41 more rows

A massive thanks to CerebralMastication and the commenters on this github issue for helping me get to a working solution on this - it had caused me quite the headache! Pretty much all the code for getting the inline plots was stolen from there.

Finally, I’ll truncate to just the top 10 spenders (based on 2016 figure). This is pretty arbitrary, but I wanted to focus more on the aesthetics of the table, so sacrificed some of the data:

top10 <- table_prepped %>% 
  arrange(-`2016`) %>% 
  head(10)

top10
## # A tibble: 10 x 10
##    state  `1997` ggplot1 `2016` ggplot2 percent_change ggplot3 bar97 bar16 spark
##    <chr>   <dbl> <lgl>    <dbl> <lgl>            <dbl> <lgl>   <lis> <lis> <lis>
##  1 Distr…   6.11 NA       17.7  NA               1.89  NA      <gg>  <gg>  <gg> 
##  2 New Y…   7.05 NA       12.8  NA               0.811 NA      <gg>  <gg>  <gg> 
##  3 Vermo…   5.63 NA       12.0  NA               1.14  NA      <gg>  <gg>  <gg> 
##  4 New J…   7.22 NA       11.6  NA               0.605 NA      <gg>  <gg>  <gg> 
##  5 Conne…   6.70 NA       10.8  NA               0.614 NA      <gg>  <gg>  <gg> 
##  6 Alaska   7.55 NA       10.6  NA               0.408 NA      <gg>  <gg>  <gg> 
##  7 Wyomi…   5.81 NA       10.6  NA               0.823 NA      <gg>  <gg>  <gg> 
##  8 Rhode…   6.06 NA       10.2  NA               0.683 NA      <gg>  <gg>  <gg> 
##  9 New H…   5.06 NA        9.82 NA               0.941 NA      <gg>  <gg>  <gg> 
## 10 Massa…   5.52 NA        9.49 NA               0.720 NA      <gg>  <gg>  <gg>

Building the table

Getting started

I’ll start with the basis of my table. Passing my data to the gt() function will create a table, but I’m just taking it a little bit further here by adding some formatting, aligning columns and labelling columns. I won’t dwell on these steps as it is covered excellently by Thomas Mock in the aforementioned blog post:

gt_table_1 <- top10 %>% 
  gt() %>% 
  fmt_currency(
    columns = vars(`1997`,`2016`),
    decimals = 1,
    pattern = "{x}k"
  ) %>%
  fmt_percent(
    columns = vars(percent_change),
    decimals = 0
  ) %>%
  cols_align(
    align = "right",
    columns = vars(`1997`,`2016`)
  ) %>% 
  cols_label(
    state = "State",
    percent_change = "Change",
    ggplot1 = " ",
    ggplot2 = " ",
    ggplot3 = "Trend"
  )

Let’s display what we have. I will need to perform some transformation on the plotting data (which I’ll do later), so for now I will hide these columns with the cols_hide function:

gt_table_1 %>%
  cols_hide(vars(ggplot1, ggplot2, ggplot3, bar97, bar16, spark))
State 1997 2016 Change
District of Columbia $6.1k $17.7k 189%
New York $7.0k $12.8k 81%
Vermont $5.6k $12.0k 114%
New Jersey $7.2k $11.6k 60%
Connecticut $6.7k $10.8k 61%
Alaska $7.5k $10.6k 41%
Wyoming $5.8k $10.6k 82%
Rhode Island $6.1k $10.2k 68%
New Hampshire $5.1k $9.8k 94%
Massachusetts $5.5k $9.5k 72%

Customise!

Before I add in the plots, let’s have some fun customising the table. I want to give the table a green background reminiscent of a chalkboard, whilst also adding a brown frame (too much? Probably). Getting the border turned out trickier than I thought. I’ve used a combination of the tab_style function to change cell_borders plus the tab_options function. The state variable is the left-most column so I am adding a brown border (#A36734) on the left of the body (data rows) and header (column labels) of the table, plus the title and subtitle cells (which I’ll add later). I’ll do something similar for the right-hand side of the table (which will be the 3rd ggplot). Within another tab_style I also fill the body of the table with the green chalkboard colour (#47745A):

gt_table_2 <- gt_table_1 %>%
  # left hand board frame
  tab_style(
    style = cell_borders(sides = "left",
                        color = "#A36734",
                        weight = px(20)
    ),
    locations = list(cells_body(columns = vars(state)), cells_column_labels(columns = vars(state)),
                     cells_title("title"), cells_title("subtitle"))
  ) %>%
  # right hand board frame
  tab_style(
    style = cell_borders(
        sides = "right",
        color = "#A36734",
        weight = px(20)
      ),
    locations = list(cells_body(columns = vars(ggplot3)), 
                     cells_column_labels(columns = vars(ggplot3)),
                     cells_title("title"), cells_title("subtitle"))
  ) %>%
  tab_style(
    style = cell_fill(color = "#47745A"),
    locations = cells_body(
      columns = gt::everything())
  ) %>%
  # table options
  tab_options(table.background.color = "#A36734",
              heading.background.color = "#47745A",
              column_labels.background.color = "#47745A",
              table.font.color = "#F3FCF0",
              data_row.padding = px(10),
              table.border.top.width = px(20),
              table.border.top.color = "#A36734",
              table.border.bottom.color = "#A36734",
              source_notes.background.color = "#A36734"
  ) 

gt_table_2 %>%
  cols_hide(vars(ggplot1, ggplot2, ggplot3, bar97, bar16, spark))

I’ve made further colour changes with the tab_options function. In order to get the brown border around the whole table I ended up setting the entire background of the table (table.background.color), along with the top and bottom borders, to the brown colour, but then applied the green colour to majority of the other elements.

State 1997 2016 Change
District of Columbia $6.1k $17.7k 189%
New York $7.0k $12.8k 81%
Vermont $5.6k $12.0k 114%
New Jersey $7.2k $11.6k 60%
Connecticut $6.7k $10.8k 61%
Alaska $7.5k $10.6k 41%
Wyoming $5.8k $10.6k 82%
Rhode Island $6.1k $10.2k 68%
New Hampshire $5.1k $9.8k 94%
Massachusetts $5.5k $9.5k 72%

The border doesn’t look right yet, as I am still hiding columns, including the right-most column.

Next let’s work on the text. I’m going to change the font styles throughout, and I am also going to add a title, subtitle and source note. I adjust the font, colour and alignment of the text in specific locations using the cell_text function within tab_style.

I then add the detail of the source note (tab_source_note), title and subtitle (tab_header). The great thing about gt is that as you’re rendering HTML you can style your text with HTML. Below I am making some further adjustments to the colour and size of some of the elements by writing HTML code within the html function. This is great because it means you can also change style within elements (as I do to 2nd line of the subtitle):

gt_table_3 <- gt_table_2 %>% 
  # Adjust title font
  tab_style(
    style = cell_text(
        font = "Chalkduster",
        color = "#8DDBE0",
        weight = "bold",
        align = "center"),
    locations = cells_title(groups = "title")
  ) %>%
  # Adjust sub-title font
  tab_style(
    style = cell_text(
      font = "Chalkduster",
      align = "center"),
    locations = cells_title(groups = "subtitle")
  ) %>% 
  # Style header font
  tab_style(
    style = cell_text(font = "Chalkduster", 
                      weight = "bold", 
                      color = "#FFD23F"),
    locations = cells_column_labels(gt::everything())
  ) %>%
  tab_style(
    style = cell_text(font = "Chalkduster"),
    locations = cells_body(columns = gt::everything())
  ) %>%
   tab_source_note(html("<span style='color:#4E3423;font-size:12pt'><b>TABLE: committedtotape | DATA: Urban Institute</b></span>")) %>%
  tab_header(
    title = html("<br><span style='color:#F2AA99;font-size:20pt'>TOP OF THE CLASS</span><br>Public Spending on Elementary & Secondary Education"),
    subtitle = html("Dollars spent per child (inflation adjusted) in the US By State, 1997-2016<br><br><span style='color:#F2AA99;'>Top 10 Spenders in 2016</span>")
  ) %>% 
  #borders
  tab_style(
    style = cell_borders(
      sides = c("bottom", "top"),
      color = "white",
      weight = px(5)
    ),
    locations = cells_column_labels(columns = gt::everything())
  ) %>%
  tab_style(
    style = cell_borders(
      sides = "bottom",
      color = "white",
      weight = px(3)),
    locations = cells_body()
  )
  
gt_table_3 %>%
  cols_hide(vars(ggplot1, ggplot2, ggplot3, bar97, bar16, spark))

Finally in the above I adjust the internal borders of the table.


TOP OF THE CLASS
Public Spending on Elementary & Secondary Education
Dollars spent per child (inflation adjusted) in the US By State, 1997-2016

Top 10 Spenders in 2016
State 1997 2016 Change
District of Columbia $6.1k $17.7k 189%
New York $7.0k $12.8k 81%
Vermont $5.6k $12.0k 114%
New Jersey $7.2k $11.6k 60%
Connecticut $6.7k $10.8k 61%
Alaska $7.5k $10.6k 41%
Wyoming $5.8k $10.6k 82%
Rhode Island $6.1k $10.2k 68%
New Hampshire $5.1k $9.8k 94%
Massachusetts $5.5k $9.5k 72%
TABLE: committedtotape | DATA: Urban Institute

Adding plots to your table

Now let’s include the plots. The plotting data needs to be transformed using the text_transform function. Each plotting column goes through its own text_transform where it is mapped to the ggplot_image function of gt. You can adjust the height and aspect_ratio arguments of this function to get the desired dimensions.

The 3 sets of plots are put into the placeholder columns created earlier, and at the end we remove just the 3 orginal plotting columns from the table:

gt_table_4 <- gt_table_3 %>% 
  # rendering bar 1997
  text_transform(
    locations = cells_body(columns = vars(ggplot1)), # use empty cell as location
    fn = function(x) {
      # Insert each image into each empty cell in `ggplot`
      map(top10$bar97, ggplot_image, height = px(50), aspect_ratio = 1.5)
    }
  ) %>%
  # rendering bar 2016
  text_transform(
    locations = cells_body(columns = vars(ggplot2)), 
    fn = function(x) {
      map(top10$bar16, ggplot_image, height = px(50), aspect_ratio = 1.5)
    }
  ) %>%
  # rendering the spark lines
  text_transform(
    locations = cells_body(columns = vars(ggplot3)), 
    fn = function(x) {
      map(top10$spark, ggplot_image, height = px(30), aspect_ratio = 4)
    }
  ) %>%
  cols_hide(vars(bar97, bar16, spark))

gt_table_4

This completes the table - let’s take a look:


TOP OF THE CLASS
Public Spending on Elementary & Secondary Education
Dollars spent per child (inflation adjusted) in the US By State, 1997-2016

Top 10 Spenders in 2016
State 1997 2016 Change Trend
District of Columbia $6.1k $17.7k 189%
New York $7.0k $12.8k 81%
Vermont $5.6k $12.0k 114%
New Jersey $7.2k $11.6k 60%
Connecticut $6.7k $10.8k 61%
Alaska $7.5k $10.6k 41%
Wyoming $5.8k $10.6k 82%
Rhode Island $6.1k $10.2k 68%
New Hampshire $5.1k $9.8k 94%
Massachusetts $5.5k $9.5k 72%
TABLE: committedtotape | DATA: Urban Institute

End

And there we have it! A potentially unnecessary exploration into how you can turn a default gt table into something more at home in the school classroom. Probably something you’ll never need to do yourself, but hopefully this post has demonstrated some of the functionality within gt that you can use to customise your tables and make them stand-out from the crowd.

As I said up top, I have entered this table into RStudio’s Table Competition for 2020. You can find my entry, along with all the others, on RStudio Community.

There’s still a lot about gt that I want to learn and master, but this was a fun place to start - thanks for reading, I hope there was no falling asleep at the back!