Global hydropower scenarios revisited

/ Sean Turner

I published a paper in 2017 that analyzes monthly resolution projections of global hydropower production under climate change. Recently some folks have asked me for this data. Unfortunately the formatting is horrible, so I’m providing this tutorial to absolve past sins.

For this tutorial I’m going to download these data and aggregate to country-scale, monthly time series of projected generation (shout out to Febin Kachirayil for this request).

First, we’re gonna need the IDs and associated countries for each dam of the Global Reservoir and Dams Database. You can download this from here.

We wanna grab the just the data table contained in the shape file GRanD_dams_v1_1.shp

# load libraries
library(dplyr)  # for data wrangling
library(rgdal)  # for reading shape file data

# get GRanD data table
readOGR(paste0(your_GRanD_dir,
               "/GRanD_dams_v1_1.shp")) %>% 
  .@data %>% 
  # ^^ extract data table from shape object
  as_tibble() %>% 
  select(GRAND_ID, DAM_NAME, COUNTRY) ->
  grand_data
## OGR data source with driver: ESRI Shapefile 
## Source: "/Users/turn652/Documents/IM3/WM development/GRanD/GRanD download/GRanD_dams_v1_1.shp", layer: "GRanD_dams_v1_1"
## with 6862 features
## It has 56 fields
## Integer64 fields read as strings:  DIS_AVG_LS
grand_data
## # A tibble: 6,862 x 3
##    GRAND_ID DAM_NAME       COUNTRY      
##       <int> <chr>          <chr>        
##  1        1 Terror Lake    United States
##  2        2 Mayo           Canada       
##  3        3 Blue Lake      United States
##  4        4 Green Lake     United States
##  5        5 Long Lake Dam  United States
##  6        6 W.A.C. Bennett Canada       
##  7        7 Peace Canyon   Canada       
##  8        8 Swan Lake      United States
##  9        9 Anyox          Canada       
## 10       10 Upper Silvis   United States
## # … with 6,852 more rows

Now let’s connect these IDs to the simulation data. These are currently stored in a dedicated GitHub repo here.

data_dir <- "https://raw.githubusercontent.com/swd-turner/HydroSimGlobal/master/"

The first file we need is GRanD_ID.csv. This is gonna help us to index through lists of data to extract data for relevant dams.

library(vroom)  # for data reading
index <- vroom(paste0(data_dir, "GRanD_ID.csv"), delim = ",")

# join up the index to the grand data
index %>% left_join(grand_data, by = c("GRanD_ID" = "GRAND_ID")) ->
  index_grand

Now we wanna get the actual simulation data. These are in R list format and are pickled in .RData files (I warned you this wasn’t pretty). There are 1593 elements to these lists, each representing a dam. Each of these elements is itself a list, containing various simulation data for the relevant dam (generation, water storage, reservoir levels, etc.). Compounding matters further, the data are stored as ts (time series) objects—a format that is rarely used today.

We can use base functions url and load to get this data straight into our environment:

# url for GCM forced simulations with CNRM under A2 emissions scenario
data_url <- url(paste0(data_dir, "/GCM-forced%20simulations/CNRM/cnrm_A2"))
load(data_url)

Now a list object named cnrm_A2 should appear in your environment. If we look at the first element we see a list of six time series 2001 to 2100:

str(cnrm_A2[[1]])
## List of 6
##  $ Storage_Mm^3    : Time-Series [1:1200] from 2001 to 2101: 74300 73271 72995 71227 67589 ...
##  $ Release_Mm^3    : Time-Series [1:1200] from 2001 to 2101: 2542 1525 2796 4575 1271 ...
##  $ Evaporation_Mm^3: Time-Series [1:1200] from 2001 to 2101: 2.10e-04 3.97e-01 4.71 1.95e+01 1.28e+02 ...
##  $ WaterLevel_m    : Time-Series [1:1200] from 2001 to 2101: 184 183 183 182 180 ...
##  $ Spill_Mm^3      : Time-Series [1:1200] from 2001 to 2101: 0 0 0 0 0 0 0 0 0 0 ...
##  $ Power_MW        : Time-Series [1:1200] from 2001 to 2101: 1363 816 1490 2413 666 ...

From index_grand, created earlier, we see that this first element (index = 1) of cnrm_A2 corresponds to GRanD ID 6 (or the dam W.A.C. Bennett, Canada). So now we’re gonna cycle through all the countries of that table, pick out all the relevant indices, extract the simulated generation (Power_MW) for those indices, and finally, aggregate to country-level generation (MWh).

This will require purrr::map.

library(purrr)

index_grand %>% split(.$COUNTRY) %>%
  map_dfr(function(x){
    x %>% pull(COUNTRY) %>% first() -> country
    x %>% pull(Index) %>% 
      map_dfr(
        function(dam){
          cnrm_A2[[dam]] %>% .$Power_MW %>%
            tibble(gen = ., time = time(.)) %>% 
            mutate(gen = as.double(gen) * 732) %>%
            # ^^ 732 is approx hrs per month to convert MW to MWh
            mutate(year = as.integer(substr(time, 1, 4)),
                   month = factor(rep(month.abb, 100), levels = month.abb),
                   index = dam) %>% 
            select(index, year, month, gen_MWh = gen)
        }
      ) %>% mutate(country = !!country) %>% 
      group_by(country, year, month) %>% 
      summarise(gen_MWh = sum(gen_MWh)) %>% ungroup()
  })
## # A tibble: 128,400 x 4
##    country      year month gen_MWh
##    <chr>       <int> <fct>   <dbl>
##  1 Afghanistan  2001 Jan    67015.
##  2 Afghanistan  2001 Feb    81655.
##  3 Afghanistan  2001 Mar    81655.
##  4 Afghanistan  2001 Apr    81655.
##  5 Afghanistan  2001 May    81655.
##  6 Afghanistan  2001 Jun    60081.
##  7 Afghanistan  2001 Jul    52977.
##  8 Afghanistan  2001 Aug    29872.
##  9 Afghanistan  2001 Sep    17176.
## 10 Afghanistan  2001 Oct    15815.
## # … with 128,390 more rows

Lesson: provide data as netcdf or plain old csv in future.