Analysing Climate data with R

1 Sep 2022 11-minute read

This post contains the scripts provided by Ralph Trancoso in the Analysing Climate Data in R workshop. The recording is also available, just email for access.

1 Installing and loading the data, and the raster, ncdf4, rgdal, and ggplot2 packages, setting directory, loading gridded data

To follow this tutorial, you will need to download some prepared climate data.

Save this link to somewhere on you computer, in our example the c drive, the unzip the folder.

If you don’t have the ‘raster’ and ncdf4 packages installed, install them:

install.packages("raster") # Installing the packages required for the workshop
install.packages("ncdf4")
install.packages("rgdal")
install.packages("ggplot2")

Now load the raster package, and set the directory to where you stored the Rclim folder.

Set your home directory, for example, if you put the rclim folder on you C drive:

home <- "C:/Rclim"

Now load the raster packages, and set your directory to where the climate data is.

library(raster)
setwd(home) #workshop dataset

setwd(paste0(home, "/worldclim")) # Set work directory to worldclim data
#getwd() # get work directory
#dir() # list files in the work directory
?stack # what does stack do?
aus_temp <- stack("tmean_australia_wc.nc")  # Loading gridded #data as RasterStack

2 Querying the RasterStack data and quick plot using raster::plot and raster::spplot

Below are a whole bunch of checks that you can run on the raster data set.

ncol(aus_temp) #check the number of columns
## [1] 554
nrow(aus_temp) #check the number of rows
## [1] 551
ncell(aus_temp) #check the number of cells
## [1] 305254
nlayers(aus_temp) #check the number of layers
## [1] 12
dim(aus_temp) #check the dimensions (rows, columns, layers)
## [1] 551 554  12
projection(aus_temp) #check the projection
## [1] "+proj=longlat +datum=WGS84 +no_defs"
res(aus_temp) #check the resolution
## [1] 0.08333333 0.08333333
inMemory(aus_temp) #check if the data is stored in memory
## [1] FALSE
fromDisk(aus_temp) #check if the data was read from disk
## [1] TRUE
names(aus_temp) #check the names of the layers
##  [1] "X1"  "X2"  "X3"  "X4"  "X5"  "X6"  "X7"  "X8"  "X9"  "X10" "X11" "X12"

Now plot the rasters using the plot function:

plot(aus_temp/10)

Or use spplot:

spplot(aus_temp/10) # lattice plot, returns a trellice 

Each layer represents a month of the year, from 1-12. So lets rename the layers and plot again.

months <- c("Jan", "Feb", "Mar", "Apr", "May", "Jun",
            "Jul", "Aug", "Sep", "Oct", "Nov", "Dec")

names(aus_temp) <- months #set the layer names to months

plot(aus_temp/10)

Or, using spplot to create a lattice plot

spplot(aus_temp/10) # lattice plot, returns a trellice  

3 Calculating anomalies as gridded time-series and global average

First, load the CMIP6 data set.

setwd(paste0(home, "/CMIP6")) # Set work directory to CMIP6 data
proj_temp <- stack("tas_Asea_ACCESS-ESM1-5_ssp370_r1i1p1f1_gr1.5_1950-2100.nc")
names(proj_temp)
##   [1] "X1"   "X2"   "X3"   "X4"   "X5"   "X6"   "X7"   "X8"   "X9"   "X10" 
##  [11] "X11"  "X12"  "X13"  "X14"  "X15"  "X16"  "X17"  "X18"  "X19"  "X20" 
##  [21] "X21"  "X22"  "X23"  "X24"  "X25"  "X26"  "X27"  "X28"  "X29"  "X30" 
##  [31] "X31"  "X32"  "X33"  "X34"  "X35"  "X36"  "X37"  "X38"  "X39"  "X40" 
##  [41] "X41"  "X42"  "X43"  "X44"  "X45"  "X46"  "X47"  "X48"  "X49"  "X50" 
##  [51] "X51"  "X52"  "X53"  "X54"  "X55"  "X56"  "X57"  "X58"  "X59"  "X60" 
##  [61] "X61"  "X62"  "X63"  "X64"  "X65"  "X66"  "X67"  "X68"  "X69"  "X70" 
##  [71] "X71"  "X72"  "X73"  "X74"  "X75"  "X76"  "X77"  "X78"  "X79"  "X80" 
##  [81] "X81"  "X82"  "X83"  "X84"  "X85"  "X86"  "X87"  "X88"  "X89"  "X90" 
##  [91] "X91"  "X92"  "X93"  "X94"  "X95"  "X96"  "X97"  "X98"  "X99"  "X100"
## [101] "X101" "X102" "X103" "X104" "X105" "X106" "X107" "X108" "X109" "X110"
## [111] "X111" "X112" "X113" "X114" "X115" "X116" "X117" "X118" "X119" "X120"
## [121] "X121" "X122" "X123" "X124" "X125" "X126" "X127" "X128" "X129" "X130"
## [131] "X131" "X132" "X133" "X134" "X135" "X136" "X137" "X138" "X139" "X140"
## [141] "X141" "X142" "X143" "X144" "X145" "X146" "X147" "X148" "X149" "X150"
## [151] "X151"

We can see that these names make no sense. So the names relate to years, we can re-name each layer:

years <- seq(1950, 2100, by=1)
names(proj_temp ) <- years
names(proj_temp)
##   [1] "X1950" "X1951" "X1952" "X1953" "X1954" "X1955" "X1956" "X1957" "X1958"
##  [10] "X1959" "X1960" "X1961" "X1962" "X1963" "X1964" "X1965" "X1966" "X1967"
##  [19] "X1968" "X1969" "X1970" "X1971" "X1972" "X1973" "X1974" "X1975" "X1976"
##  [28] "X1977" "X1978" "X1979" "X1980" "X1981" "X1982" "X1983" "X1984" "X1985"
##  [37] "X1986" "X1987" "X1988" "X1989" "X1990" "X1991" "X1992" "X1993" "X1994"
##  [46] "X1995" "X1996" "X1997" "X1998" "X1999" "X2000" "X2001" "X2002" "X2003"
##  [55] "X2004" "X2005" "X2006" "X2007" "X2008" "X2009" "X2010" "X2011" "X2012"
##  [64] "X2013" "X2014" "X2015" "X2016" "X2017" "X2018" "X2019" "X2020" "X2021"
##  [73] "X2022" "X2023" "X2024" "X2025" "X2026" "X2027" "X2028" "X2029" "X2030"
##  [82] "X2031" "X2032" "X2033" "X2034" "X2035" "X2036" "X2037" "X2038" "X2039"
##  [91] "X2040" "X2041" "X2042" "X2043" "X2044" "X2045" "X2046" "X2047" "X2048"
## [100] "X2049" "X2050" "X2051" "X2052" "X2053" "X2054" "X2055" "X2056" "X2057"
## [109] "X2058" "X2059" "X2060" "X2061" "X2062" "X2063" "X2064" "X2065" "X2066"
## [118] "X2067" "X2068" "X2069" "X2070" "X2071" "X2072" "X2073" "X2074" "X2075"
## [127] "X2076" "X2077" "X2078" "X2079" "X2080" "X2081" "X2082" "X2083" "X2084"
## [136] "X2085" "X2086" "X2087" "X2088" "X2089" "X2090" "X2091" "X2092" "X2093"
## [145] "X2094" "X2095" "X2096" "X2097" "X2098" "X2099" "X2100"

The X is at the start of each of each year to ensure they are of type character.

Now we can create a simple function to calculate the temperature anomaly.

anomaly <- function(x) {
    anom <-  x - mean(x[[32:61]]) # for reference period 1981-2100
    names(anom) <- seq(1950, 2100, by=1)    
    return(anom)
}
T_anom <- anomaly(proj_temp)

Now plot the temperature anomaly, from 1:16 (1950 - 1965)

spplot(T_anom[[1:16]])

And from 2084 - 2100

spplot(T_anom[[135:151]])

Then we can calculate the average temperature anomaly

T_anom_mean <- as.data.frame(cbind(years, cellStats(T_anom, mean)))
names(T_anom_mean) <- c("year", "T_anom_mean")

Finally, we can take a look at the average temperature anomaloy for the entire dataset.

setwd(paste0(home, "/output")) #set directory to output
setwd(home) # Set work directory to main folder
dir.create("output")
setwd(paste0(home, "/output"))
jpeg(file="anomaly_ts.jpeg", height = 600,  width = 1000, res=150)
plot(T_anom_mean$year, T_anom_mean$T_anom_mean, type = "p", pch = 19, 
     col = "red", xlab="year", ylab="Projected temperature anomaly", 
     main="ACCESS-ESM1-5 SSP370 - ref period:1981-2010")
dev.off()
## png 
##   2

Have a look in the output folder, you should see something like this

4 Handling regions as shapefiles

Here, we load and plot a shapefile of the worlds country boundaries.

Here, we load and plot a shapefile of the worlds country boundaries.

library(rgdal)

setwd(paste0(home, "/shp"))
countries = readOGR(dsn=".", layer="TM_WORLD_BORDERS_SIMPL-0.3")
## OGR data source with driver: ESRI Shapefile 
## Source: "C:\Users\uqmrudg1\Desktop\Rclim\shp", layer: "TM_WORLD_BORDERS_SIMPL-0.3"
## with 246 features
## It has 11 fields
## Integer64 fields read as strings:  POP2005
plot(countries)

Or use spplot to color by population:

spplot(countries, "POP2005")

5 Regionalizations of climate data - plotting time-series as line plot and bar chart

library(ggplot2)

setwd(paste0(home, "/CMIP6")) # Set work directory to CMIP6 data
proj_temp <- stack("tas_Asea_ACCESS-ESM1-5_ssp370_r1i1p1f1_gr1.5_1950-2100.nc")
proj_temp <- proj_temp -273.15
names(proj_temp) <- c(seq(1950,2100, by=1))

## From your countries vector we read in, select a country customise your study area for the workshop analysis:
my_country <- subset(countries, NAME == "Australia")


df<- as.data.frame(countries@data)
#fix(df) # to have a look at the dataframe of the countries

mydf <- structure(list(
longitude = c(153.0251, 145.7781, 149.1821, 146.8169, 139.4927, 144.2555), 
latitude = c(-27.4698, -16.9186, -21.1425, -19.2590, -20.7256, -23.4422)), 
.Names = c("longitude", "latitude"), class = "data.frame", row.names = c(NA, -6L))
xy <- mydf[,c(1,2)]
spdf <- SpatialPointsDataFrame(coords = xy, data = mydf, proj4string = CRS("+proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0"))

Finally, plot the points we chose on the map of Australia.

plot(my_country)
plot(spdf, add=T, col="red")

proj_temp_cities <- extract(proj_temp, spdf)
proj_temp_cities <- as.data.frame(proj_temp_cities)
proj_temp_cities <-as.data.frame(t(proj_temp_cities))
proj_temp_cities$year <- 1:nrow(proj_temp_cities)+1980
names(proj_temp_cities) <- c("Brisbane", "Cairns", "Mackay", "Townsville", "Mount_Isa", "Longreach", "year")

We can plot the projected temperature change of the cities we have selected.

We can plot the projected temperature change of the cities we have selected.

#install.packages("reshape2")
library(reshape2)
proj_temp_cities_melt <- melt(proj_temp_cities, id="year")

ts1 <- ggplot(proj_temp_cities_melt) +
    geom_line(aes(x=year, y=value, colour=variable)) +
    ggtitle("Projected average temperature ACCESS-ESM1.5 SSP3-7.0") +
    ylab("Temperature (°C)")  +
    scale_color_brewer(name= "cities", palette="Set1") +
    theme_bw() 
ts1

6 Convertig gridcells to data frame within a study area mask and plotting boxplots in ggplot2

setwd(paste0(home, "/IPCC")) # Set work directory to CMIP6 data
dir() 
##  [1] "CMIP6 - Consecutive Dry Days (CDD) Change days - Warming 1.5°C SSP5-8.5 (rel. to 1850-1900) - Annual (32 models).nc"
##  [2] "CMIP6 - Consecutive Dry Days (CDD) Change days - Warming 2°C SSP5-8.5 (rel. to 1850-1900) - Annual (32 models).nc"  
##  [3] "CMIP6 - Consecutive Dry Days (CDD) Change days - Warming 3°C SSP5-8.5 (rel. to 1850-1900) - Annual (32 models).nc"  
##  [4] "CMIP6 - Consecutive Dry Days (CDD) Change days - Warming 4°C SSP5-8.5 (rel. to 1850-1900) - Annual (19 models).nc"  
##  [5] "CMIP6 - Maximum temperature (TX) Change deg C - Warming 1.5°C SSP5-8.5 (rel. to 1850-1900) - Annual (27 models).nc" 
##  [6] "CMIP6 - Maximum temperature (TX) Change deg C - Warming 2°C SSP5-8.5 (rel. to 1850-1900) - Annual (27 models).nc"   
##  [7] "CMIP6 - Maximum temperature (TX) Change deg C - Warming 3°C SSP5-8.5 (rel. to 1850-1900) - Annual (27 models).nc"   
##  [8] "CMIP6 - Maximum temperature (TX) Change deg C - Warming 4°C SSP5-8.5 (rel. to 1850-1900) - Annual (16 models).nc"   
##  [9] "CMIP6 - Mean temperature (T) Change deg C - Warming 1.5°C SSP5-8.5 (rel. to 1850-1900) - Annual (34 models).nc"     
## [10] "CMIP6 - Mean temperature (T) Change deg C - Warming 2°C SSP5-8.5 (rel. to 1850-1900) - Annual (34 models).nc"       
## [11] "CMIP6 - Mean temperature (T) Change deg C - Warming 3°C SSP5-8.5 (rel. to 1850-1900) - Annual (34 models).nc"       
## [12] "CMIP6 - Mean temperature (T) Change deg C - Warming 4°C SSP5-8.5 (rel. to 1850-1900) - Annual (20 models).nc"       
## [13] "CMIP6 - Total precipitation (PR) Change % - Warming 1.5°C SSP5-8.5 (rel. to 1850-1900) - Annual (33 models).nc"     
## [14] "CMIP6 - Total precipitation (PR) Change % - Warming 2°C SSP5-8.5 (rel. to 1850-1900) - Annual (33 models).nc"       
## [15] "CMIP6 - Total precipitation (PR) Change % - Warming 3°C SSP5-8.5 (rel. to 1850-1900) - Annual (33 models).nc"       
## [16] "CMIP6 - Total precipitation (PR) Change % - Warming 4°C SSP5-8.5 (rel. to 1850-1900) - Annual (19 models).nc"
# Mean temperature
temp_list <- list.files(path= paste0(home, "/IPCC"), pattern = "Mean temperature", full.names = TRUE)
temp_GW <- stack(temp_list)
names(temp_GW) <- c("GW1.5", "GW2.0", "GW3.0", "GW4.0") 
plot(temp_GW)

#masking data outside country and converting grid to dataframe

temp_GW_country <- as.data.frame(mask(temp_GW, my_country))
dim(temp_GW_country)
## [1] 64800     4
head(temp_GW_country)   
##   GW1.5 GW2.0 GW3.0 GW4.0
## 1    NA    NA    NA    NA
## 2    NA    NA    NA    NA
## 3    NA    NA    NA    NA
## 4    NA    NA    NA    NA
## 5    NA    NA    NA    NA
## 6    NA    NA    NA    NA
temp_GW_country <- na.omit(temp_GW_country)
dim(temp_GW_country)
## [1] 699   4
#install.packages("reshape2")
library(reshape2)
temp_GW_country_m <- melt(temp_GW_country)
## No id variables; using all as measure variables
#creating a boxplot of avg temp per global warming level on ggplot2

bp1 <- ggplot(temp_GW_country_m, aes(x=variable, y=value, fill=variable)) +
geom_boxplot()+
xlab("Global warming level (°C)")+
ylab("Change in average surface temperature (°C)")+
ggtitle(paste("Change in average surface temperature per global warming level in ", my_country@data$NAME[1], sep="")) +
theme_bw()

bp1 <- bp1 + scale_color_brewer(name= "GW level", palette="YlOrRd") # why it does not work? - change from color to fill
bp1

7 Repeat the extraction and plot for precipitation and/or other metrics

# Total precipitation
prec_list <- list.files(path= paste0(home, "/IPCC"), pattern = "Total precipitation", full.names = TRUE)
prec_GW <- stack(prec_list)
names(prec_GW) <- c("GW1.5", "GW2.0", "GW3.0", "GW4.0") 
plot(prec_GW)

#masking data outside country and converting grid to dataframe

prec_GW_country <- as.data.frame(mask(prec_GW, my_country))
prec_GW_GW_country <- na.omit(prec_GW_country)
prec_GW_country_m <- melt(prec_GW_country)
## No id variables; using all as measure variables
#creating a boxplot of precipitation per global warming level on ggplot2

bp2 <- ggplot(prec_GW_country_m, aes(x=variable, y=value, fill=variable)) +
geom_boxplot()+
xlab("Global warming level (°C)")+
ylab("Change in total precipitation (%)")+
ggtitle(paste("Change in total precipitation per global warming level in ", my_country@data$NAME[1], sep="")) +
theme_bw() +
scale_fill_brewer(name= "GW level", palette="YlOrRd")
bp2
## Warning: Removed 256404 rows containing non-finite values (stat_boxplot).

Where to find climate data

IPCC interactive atlas The authority for climate data is the IPCC. And the interactive atlas has the latest data https://interactive-atlas.ipcc.ch/regional-information

Worldclim Global climate and weather data. WorldClim is a database of high spatial resolution global weather and climate data. https://www.worldclim.org/

SILO gridded data for Australia until yesterday: https://longpaddock.qld.gov.au/silo/gridded-data/

LongPaddock Provided by the Queensland Government, gridded data available for a range of variables in NetCDF and GeoTiff formats. The NetCDF datasets are arranged in annual blocks where each file contains all of the grids for the selected year and variable. https://longpaddock.qld.gov.au

CMIP5 downscaled climate projections over Queensland High-resolution climate change projections for Queensland using dynamical downscaling of CMIP5 global climate models are available for download in gridded format with spatial resolution of 10 km at Terrestrial Ecosystem Research Network (TERN) https://longpaddock.qld.gov.au/qld-future-climate/data-info/tern/

About the author


Twitter URL

Ralph is a research scientist with expertise in climate change, ecohydrology, and deforestation impacts. Ralph is particularly interested in how climate and landscape changes affect the environment and impact society. He is a Principal Climate Change Scientist at the Department of Environment and Science (Queensland Government) and a Research Fellow at the School of Biological Sciences (University of Queensland).