Introduction to Google Earth Engine: A Powerful Tool for Modern Forest Science

Introduction
Forest ecosystems face unprecedented challenges from climate change, land-use transformation, and increasing natural disturbances. Traditional forest monitoring methods, while valuable, are often limited by their spatial coverage and temporal frequency. In the current era of big data and rapid environmental change, forest researchers need to analyze vast amounts of information across different spatial and temporal scales spanning decades. This presents three major challenges: data acquisition, processing power, and analysis complexity.
What is Google Earth Engine?
Google Earth Engine (GEE) emerges as a transformative solution by addressing these challenges through its cloud computing architecture. GEE is a cloud-based platform developed by Google that enables analysis and visualization of vast amounts of geospatial data. This powerful platform combines a multi-petabyte catalog of satellite imagery and geospatial datasets with planetary-scale analysis capabilities.

Figure 1: Google Earth Engine Website
What makes GEE particularly powerful for forest science is its combination of immediate access to regularly updated satellite imagery, built-in algorithms for environmental analysis, cloud-based processing that eliminates local hardware limitations and integration capabilities with popular programming languages like R and Python.
This technological advancement means you can now monitor forest health in near-real-time, detect and track disturbances as they occur, analyze long-term trends in forest dynamics, study climate change impacts, combine multiple data sources for more comprehensive analyses and much more.
With GEE the processing is automatically distributed across thousands of computers, turning what would be days and weeks into minutes or hours. Most importantly, you only download the final results to computers, while all intermediate data processing happens entirely in the cloud. For perspective, analyzing 30 years of Landsat data for a large forest area traditionally would require approximately 500 GB of storage and weeks of processing time. With GEE the same analysis runs in minutes with zero local storage requirements.
The rgee package serves as a crucial bridge between the R programming language and the GEE platform. This integration is particularly valuable for forest researchers who are already familiar with R programming, as it eliminates the need to learn JavaScript, which is typically required for GEE operations. The package seamlessly integrates GEE functionalities into existing R workflows, providing access to all GEE capabilities through a familiar R interface.
Getting Started with Google Earth Engine
Before you can start using Google GEE in R, you’ll need to complete several setup steps:
- Create a Google Account
If you don’t already have one, create a Google account. Visit https://earthengine.google.com, click ‘Sign Up’, and accept the terms of use.
- Install Required Software
- R (version 3.6 or higher)
- RStudio (recommended for better interface)
- Python (required for rgee functionality)
- Package Installation
The rgee package needs to be installed from GitHub to ensure you have the latest version with all features:
# Install rgee from GitHub
remotes::install_github("r-spatial/rgee")
# Install additional required packages
install.packages(c("sf", "raster"))
- Loading and Initializing
# Load required packages
library(rgee)
library(sf) # handling spatial data
library(raster) # handling spatial data
library(dplyr) # data manipulation
library(readr) # reading rectangular data
library(leaflet) # creating interactive maps
library(ggplot2) # data visualization
# First-time setup
ee_install() # This installs required Python dependencies
# Initialize Earth Engine
ee_Initialize()
- Authentication Process
When you run ee_Initialize() for the first time a web browser window will automatically open and you’ll be asked to log in to your Google account. Google will generate an authentication token. Copy this token from your browser, return to RStudio and paste the token when prompted. rgee will store these credentials for future sessions.
If successful, you’ll see a confirmation message in the R console. This authentication process only needs to be completed once on each computer you use.
GEE Data Catalogue
GEE data catalog offers a rich collection of datasets particularly valuable for forest research:
- Landsat Series (1972-present)
- 30-meter resolution
- Global coverage every 16 days
- Multispectral and thermal data
- Sentinel-2 (Copernicus Program)
- High-resolution optical imagery
- Multiple spectral bands
- Frequent revisit time
- Climate and Weather
- Historical reanalysis (NCEP/NCAR)
- GridMET datasets
- NLDAS-2 meteorological data
- Precipitation data (TRMM)
- Temperature and humidity data
- Wind measurements
- Atmospheric Data
- Ozone measurements
- MODIS atmospheric products
- Elevation Data
- SRTM (30-meter resolution globally)
- Various regional DEMs
- HydroSHEDS hydrology database
- Land Cover Maps
- Dynamic World (near real-time)
- ESA World Cover
- Forest/non-forest classifications
Getting Started: Visualizing Forest Plots and Satellite Imagery
Combining ground-based plot data with satellite imagery provides valuable insights into forest conditions and spatial patterns. In this example, we’ll explore how to visualize forest plots using GEE and Sentinel-2 satellite imagery.
The first step involves preparing plot coordinates for use with satellite imagery. Our forest plots, located in the Dinaric region of Croatia, were originally surveyed using the Gauss Kruger coordinate system. However, to work with GEE, we need to transform these coordinates to the global WGS84 system.
#-------------------------------------------
# Read forest plot coordinates from CSV
#-------------------------------------------
plots <- read_csv("data/extracted_coordinates.csv")
#---------------------------------------------
# Transform coordinates from Gauss Kruger to
# WGS84 (lat/lon) system
#---------------------------------------------
coords_df <- plots %>%
st_as_sf(coords = c("x_coordinate", "y_coordinate"),
crs = 31275) %>%
st_transform(4326)
# Display first few rows
coords_df %>%
head()
#> Simple feature collection with 6 features and 3 fields
#> Geometry type: POINT
#> Dimension: XY
#> Bounding box: xmin: 14.52124 ymin: 45.52129 xmax: 14.66221 ymax: 45.58453
#> Geodetic CRS: WGS 84
#> # A tibble: 6 × 4
#> file_name plot_number folder_name geometry
#> <chr> <chr> <chr> <POINT [°]>
#> 1 5-1053-10.xls 5-1053 Delnice (14.52124 45.56628)
#> 2 5-1054-3.xls 5-1054 Delnice (14.58516 45.58453)
#> 3 5-1055-12.xls 5-1055 Delnice (14.64935 45.56674)
#> 4 5-1056-13.xls 5-1056 Delnice (14.66221 45.55779)
#> 5 5-1135-14.xls 5-1135 Delnice (14.52162 45.52129)
#> 6 5-1136-10.xls 5-1136 Delnice (14.57275 45.53049)
Once we have our coordinates, we need to convert them into a format that GEE understands.
#------------------------------------------------
# Convert our spatial data to GEE format
#------------------------------------------------
plots_ee <- sf_as_ee(coords_df)
With plot locations properly formatted, we can now access Sentinel-2 satellite imagery. Sentinel-2 provides high-resolution optical imagery perfect for forest monitoring, with new images available every few days. We’ll focus on summer months and filter out cloudy images to ensure the best possible visualization.
#-------------------------------------------------
# Access and process Sentinel-2 satellite imagery
#-------------------------------------------------
s2_imagery <- ee$ImageCollection('COPERNICUS/S2_SR')$
#-------------------------
# Limit to study area
#-------------------------
filterBounds(plots_ee)$
#--------------------------
# Select summer months only
#--------------------------
filterDate('2023-06-01', '2023-09-30')$
#----------------------------------------------
# Remove images with more than 20% cloud cover
#----------------------------------------------
filter(ee$Filter$lt('CLOUDY_PIXEL_PERCENTAGE', 20))$
#----------------------------------------------
# Creates a composite image using the median
# values, helping to remove any remaining clouds
# or artifacts
#----------------------------------------------
median()
To display the satellite imagery we need to set appropriate visualization parameters. We’ll use a true-color combination that shows the forest as it would appear to the human eye, with some adjustments to enhance visibility and contrast.
#--------------------------------------------
# Set parameters for true-color visualization
#--------------------------------------------
vis_params <- list(
bands = c('B4', 'B3', 'B2'), # Red, Green, Blue bands for true color
min = 0, # Minimum reflectance value
max = 3000, # Maximum reflectance value
gamma = 1.2 # Slight contrast enhancement
)
These parameters control how the satellite imagery is displayed. Bands (‘B4’, ‘B3’, ‘B2’) create a true-color image. Min and max control the brightness range and gamma adjusts the contrast (values > 1 increase contrast).
Finally, we can create our visualization by combining the satellite imagery with plot locations. The resulting map allows users to zoom in and out, toggle between different base maps, and switch layers on and off using the layer control panel in the top-left corner. This interactivity allow us to understand the spatial distribution of our sampling plots within the broader forest landscape.
#-----------------------------
# Center view on study area
#-----------------------------
Map$centerObject(plots_ee, zoom = 9)
#-----------------------------
# Add satellite imagery
#-----------------------------
map <- Map$addLayer(s2_imagery, vis_params, "Sentinel-2") +
Map$addLayer(plots_ee, list(color = 'white', pointSize = 1), "Forest Plots")
map
Assessing Forest Health Through Vegetation Indices
Understanding vegetation health and vigor across large areas requires efficient and reliable methods of assessment. While traditional field measurements provide detailed ground data, satellite-based vegetation indices offer a powerful complement, enabling us to analyze entire forest landscapes systematically. Among these indices, the Normalized Difference Vegetation Index (NDVI) stands out as one of the most valuable tools for forest health assessment.
NDVI leverages a fundamental characteristic of vegetation: healthy plants strongly absorb visible red light for photosynthesis while reflecting most near-infrared (NIR) light. By calculating the difference between NIR and red reflectance values, normalized by their sum, NDVI provides a standardized way to measure vegetation health and density. The resulting values range from -1 to +1, where higher positive values typically indicate healthier and denser vegetation. Negative values usually represent water, snow, or clouds
Let’s analyze the vegetation health across our study plots using Sentinel-2 satellite imagery. We’ll calculate NDVI values, visualize their spatial distribution, and examine the patterns of forest health in our study area.
We use Sentinel-2’s specific bands for this calculation. Band 8 (NIR, 842nm) captures the high reflectance of vegetation in the near-infrared spectrum and Band 4 (RED, 665nm) captures chlorophyll absorption during photosynthesis. Both bands have 10-meter spatial resolution, providing detailed forest structure information.
#---------------------------------------------------
# Using Sentinel-2's NIR (B8) and RED (B4) bands for
# NDVI calculation
#---------------------------------------------------
ndvi <- s2_imagery$normalizedDifference(c('B8', 'B4'))
#-------------------------------------------------------
# Setting value range and color palette for NDVI display
#-------------------------------------------------------
ndvi_params <- list(
min = 0,
max = 0.8, # Typical range for vegetation
palette = c(
'#CE7E45', # Orange - low vegetation
'#DF923D', # Light orange
'#F1B555', # Yellow
'#99B718', # Light green
'#74A901' # Dark green - dense vegetation
)
)
Interactive map shows NDVI patterns across the entire landscape through a color gradient from orange (low NDVI) to dark green (high NDVI). White dots represent forest plot locations. You can zoom in/out to explore and layer control allows toggling between the NDVI layer and plot locations.
#--------------------------------------------------
# Creating interactive map with NDVI visualization
#--------------------------------------------------
Map$centerObject(plots_ee, zoom = 9)
map1 <-Map$addLayer(ndvi, ndvi_params, "NDVI") +
Map$addLayer(plots_ee, list(color = 'white', pointSize = 1), "Forest Plots")
map1
After calculating and visualizing NDVI across our study area, we can extract specific NDVI values for each forest plot location and analyze how these values vary across different forest management units.
#-------------------------------------------
# Extract NDVI values for each plot location
#-------------------------------------------
ndvi_values <- ee_extract(ndvi, coords_df)
ndvi_values %>%
head()
#-------------------------------------------
# Calculate summary statistics by management
# unit
#-------------------------------------------
unit_summary <- ndvi_values %>%
group_by(folder_name) %>%
summarise(
mean_ndvi = mean(nd, na.rm = TRUE),
min_ndvi = min(nd, na.rm = TRUE),
max_ndvi = max(nd, na.rm = TRUE),
sd_ndvi = sd(nd, na.rm = TRUE),
n_plots = n()
)
unit_summary %>%
head()
#> file_name plot_number folder_name nd
#> 1 5-1053-10.xls 5-1053 Delnice 0.5386036
#> 2 5-1054-3.xls 5-1054 Delnice 0.5090945
#> 3 5-1055-12.xls 5-1055 Delnice 0.5291917
#> 4 5-1056-13.xls 5-1056 Delnice 0.5818992
#> 5 5-1135-14.xls 5-1135 Delnice 0.4860941
#> 6 5-1136-10.xls 5-1136 Delnice 0.5070886
#> # A tibble: 4 × 6
#> folder_name mean_ndvi min_ndvi max_ndvi sd_ndvi n_plots
#> <chr> <dbl> <dbl> <dbl> <dbl> <int>
#> 1 Delnice 0.535 0.395 0.636 0.0366 106
#> 2 Gospić 0.546 0.275 0.629 0.0559 213
#> 3 Ogulin 0.552 0.486 0.604 0.0275 55
#> 4 Senj 0.473 0.00858 0.627 0.113 78
We’ll create an interactive map that categorizes NDVI values and allows users to explore individual plot characteristics. This visualization combines both categorical classification of forest health and precise NDVI values, making it easier to identify patterns and potential areas of interest across different management units.
The map’s interactive elements allow viewers to access plot information by hovering over locations, while clicking on specific plots reveals detailed information including management unit name, plot identification number, precise NDVI value, and NDVI category. For easier interpretation, a legend displays the NDVI value color scale, and users can freely navigate through different areas using zoom and pan functions.
#------------------------------------------------------
# Prepare data for interactive visualization
# Categorize NDVI values and create informative labels
#------------------------------------------------------
plots_with_ndvi <- ndvi_values %>%
mutate(
# Classify NDVI values into meaningful categories
ndvi_category = case_when(
nd < 0.2 ~ "Very Low", # Indicates non-vegetated or disturbed areas
nd < 0.4 ~ "Low", # Sparse or stressed vegetation
nd < 0.5 ~ "Moderate", # Moderate vegetation density
nd < 0.6 ~ "High", # Dense, healthy vegetation
TRUE ~ "Very High" # Very dense, vigorous vegetation
),
# Create informative labels for interactive display
label = paste(
"Management Unit:", folder_name,
"<br>Plot:", plot_number,
"<br>NDVI Value:", round(nd, 3),
"<br>Category:", ndvi_category
)
) %>%
# Add geographic coordinates for mapping
bind_cols(
st_coordinates(coords_df) %>%
as.data.frame() %>%
rename(lon = X, lat = Y)
)
#--------------------------------------------------
# Create interactive map using Leaflet
#--------------------------------------------------
leaflet() %>%
# Add clean, minimalist CartoDB basemap
addProviderTiles("CartoDB.Positron") %>%
# Center view on study area
setView(lng = mean(plots_with_ndvi$lon),
lat = mean(plots_with_ndvi$lat),
zoom = 9) %>%
# Add plot locations as colored circles
addCircleMarkers(
data = plots_with_ndvi,
lng = ~lon,
lat = ~lat,
# Color circles based on NDVI values
color = ~colorNumeric(
palette = c('#CE7E45', '#DF923D', '#F1B555', '#99B718', '#74A901'),
domain = range(nd)
)(nd),
radius = 5,
popup = ~label, # Show detailed info on click
label = ~label %>% lapply(htmltools::HTML), # Show info on hover
fillOpacity = 0.7
) %>%
# Add legend to explain NDVI value colors
addLegend(
position = "bottomright",
pal = colorNumeric(
palette = c('#CE7E45', '#DF923D', '#F1B555', '#99B718', '#74A901'),
domain = range(plots_with_ndvi$nd)
),
values = plots_with_ndvi$nd,
title = "NDVI Values",
opacity = 1
)
To complement our analysis, we’ll visualize the statistical distribution of NDVI values across forest management units using a combined violin and box plot. This visualization helps us understand both the central tendencies and the full distribution of forest health conditions within each management unit.
#--------------------------------------------------
# Create violin plot combined with box plot to show
# NDVI distribution across management units
#--------------------------------------------------
ggplot(plots_with_ndvi, aes(x = folder_name, y = nd)) +
# Add violin plot to show full distribution shape
geom_violin(
alpha = 0.7,
fill = "gray80"
) +
# Add box plot to show key statistics
geom_boxplot(
width = 0.2,
alpha = 0.7,
fill = "white"
) +
# Use minimal theme for clean appearance
theme_minimal() +
# Add descriptive labels
labs(
title = "",
x = "Management Unit",
y = "NDVI Value"
) +
# Customize theme elements
theme(
plot.background = element_rect(fill = "#e0dedc"),
panel.background = element_rect(fill = "#e0dedc"),
plot.title = element_text(hjust = 0.5), # Center title
axis.text.x = element_text(angle = 0, hjust = 1) # Horizontal labels
)

Figure 2: Distribution of NDVI values across forest management units
Analyzing Forest Cover Change
The assessment of long-term forest cover changes is crucial for understanding forest dynamics and evaluating forest management practices. For our second example, we’ll utilize the Hansen Global Forest Change dataset, a comprehensive resource that provides detailed information about global forest changes from 2000 onwards.
The Hansen Global Forest Change dataset, developed by researchers at the University of Maryland, represents a significant advancement in global forest monitoring. This dataset uses Landsat satellite imagery to track forest extent and changes globally at a 30-meter resolution. It provides annual updates of tree cover percentage for the year 2000 (baseline) and annual forest loss from 2000 onwards with forest gain over the entire period.
Let’s analyze forest cover changes in our study area using this dataset:
#--------------------------------------------------
# Load Hansen Global Forest Change dataset
#--------------------------------------------------
hansen <- ee$Image('UMD/hansen/global_forest_change_2022_v1_10')
#--------------------------------------------------
# Extract different components from Hansen dataset
#--------------------------------------------------
# Extract forest cover for year 2000 (baseline)
forest2000 <- hansen$select('treecover2000')
# Extract forest loss year information
# Values 1-22 represent years 2001-2022 where:
# 0 = no loss
# 1 = loss in 2001
# 2 = loss in 2002
# ...
# 22 = loss in 2022
loss_year <- hansen$select('lossyear')
To understand the spatial patterns of forest cover and loss, we create an interactive visualization. The interactive map displays three key layers of forest information. The base layer shows the forest cover as it existed in the year 2000, represented by different shades of green where darker green indicates denser forest cover (>30% tree cover). Overlaid on this is a layer showing where forest loss has occurred between 2001 and 2022, visualized through a red color gradient. In this loss layer, lighter red shades indicate areas where forest loss occurred earlier in the time period (closer to 2001), while darker red shades represent more recent forest loss (closer to 2022). The topmost layer consists of white points marking our forest plot locations across the study area. This layered visualization allows us to simultaneously understand the initial forest conditions, track where and when forest loss occurred, and relate these patterns to our sampling locations.
#--------------------------------------------------
# Set visualization parameters
#--------------------------------------------------
# Parameters for forest cover
vis_params_forest <- list(
min = 30, # Only show areas with >30% tree cover
max = 100, # Maximum tree cover is 100%
palette = c(
'#FFFFFF', # No/sparse forest
'#A1CCA1', # Moderate forest
'#4E944E' # Dense forest
)
)
# Parameters for forest loss
vis_params_loss <- list(
min = 1, # First year of loss (2001)
max = 22, # Last year of loss (2022)
# Red color gradient from light to dark
# Represents progression of loss from 2001 (light) to 2022 (dark)
palette = c('#FFF5F0', '#FEE0D2', '#FCBBA1', '#FC9272',
'#FB6A4A', '#EF3B2C', '#CB181D', '#99000D')
)
#--------------------------------------------------
# Create interactive map
#--------------------------------------------------
Map$centerObject(plots_ee, zoom = 9)
map2 <- Map$addLayer(forest2000, vis_params_forest, "Forest Cover 2000") +
Map$addLayer(loss_year, vis_params_loss, "Year of Loss", opacity = 0.6) +
Map$addLayer(plots_ee, list(color = 'white', pointSize = 1), "Forest Plots")
map2
Now let’s see how forest disturbances vary across management units by analyzing patterns of forest loss.
#--------------------------------------------------
# Calculate statistics for each management unit
#--------------------------------------------------
# Extract loss year values for each plot
loss_year_values <- ee_extract(loss_year, coords_df) # This gives us the year of loss for each plot location
loss_year_values %>%
head()
loss_analysis <- loss_year_values %>%
bind_cols(ndvi_values) %>%
group_by(folder_name...3) %>%
summarise(
# Total number of plots in each unit
total_plots = n(),
# Number of plots that experienced loss
disturbed_plots = sum(lossyear > 0, na.rm = TRUE),
# Percentage of affected plots
percent_disturbed = (disturbed_plots/total_plots) * 100,
# Average year when loss occurred
mean_loss_year = mean(lossyear[lossyear > 0], na.rm = TRUE) + 2000
)
loss_analysis %>%
head()
#> file_name plot_number folder_name lossyear
#> 1 5-1053-10.xls 5-1053 Delnice NA
#> 2 5-1054-3.xls 5-1054 Delnice NA
#> 3 5-1055-12.xls 5-1055 Delnice 18
#> 4 5-1056-13.xls 5-1056 Delnice 14
#> 5 5-1135-14.xls 5-1135 Delnice 11
#> 6 5-1136-10.xls 5-1136 Delnice 18
#> # A tibble: 4 × 5
#> folder_name...3 total_plots disturbed_plots percent_disturbed mean_loss_year
#> <chr> <int> <int> <dbl> <dbl>
#> 1 Delnice 106 76 71.7 2016.
#> 2 Gospić 213 74 34.7 2011.
#> 3 Ogulin 55 24 43.6 2017.
#> 4 Senj 78 51 65.4 2013.
To understand the dynamics of forest disturbance over time, we analyze the annual occurrence of loss events. With interactive map we can visualize the distribution of forest plots, distinguishing between disturbed (red) and undisturbed (green) plots across the region.
#--------------------------------------------------
# Analyze temporal patterns of forest loss
#--------------------------------------------------
temporal_summary <- loss_year_values %>%
filter(lossyear > 0) %>%
mutate(year = lossyear + 2000) %>%
group_by(folder_name, year) %>%
summarise(
loss_events = n(),
.groups = 'drop'
)
temporal_summary %>%
head()
ggplot(temporal_summary,
aes(x = year, y = loss_events, color = folder_name)) +
geom_line(linewidth = 1) +
geom_point(size = 2) +
theme_minimal() +
labs(
title = "",
subtitle = "",
x = "",
y = "Number of Loss Events",
color = "Management Unit"
) +
scale_x_continuous(breaks = seq(2000, 2022, 2)) +
scale_color_manual(
values = c(
"Delnice" = "#783D19",
"Gospić" = "#a78780",
"Ogulin" = "#5F6F52",
"Senj" = "#284345c7"
)
) +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom"
)
#> # A tibble: 6 × 3
#> folder_name year loss_events
#> <chr> <dbl> <int>
#> 1 Delnice 2003 1
#> 2 Delnice 2004 1
#> 3 Delnice 2005 2
#> 4 Delnice 2007 5
#> 5 Delnice 2010 2
#> 6 Delnice 2011 1

Figure 3: Patterns of forest loss events across management units from 2000 to 2022
#--------------------------------------------------
# Prepare and visualize disturbance data
#--------------------------------------------------
plots_with_disturbance <- loss_year_values %>%
# Add coordinates
bind_cols(
st_coordinates(coords_df) %>%
as.data.frame() %>%
rename(lon = X, lat = Y)
) %>%
# Add plot information
bind_cols(plots) %>%
# Create disturbance status and year
mutate(
disturbance_status = case_when(
lossyear > 0 ~ "Disturbed",
TRUE ~ "Not Disturbed"
),
loss_year = ifelse(lossyear > 0,
lossyear + 2000,
NA)
)
# Create interactive map
leaflet() %>%
addProviderTiles("CartoDB.Positron") %>%
setView(lng = mean(plots_with_disturbance$lon),
lat = mean(plots_with_disturbance$lat),
zoom = 8) %>%
addCircleMarkers(
data = plots_with_disturbance,
lng = ~lon,
lat = ~lat,
color = ~ifelse(disturbance_status == "Disturbed",
"darkred", "darkgreen"),
radius = 2,
popup = ~paste(
"Management Unit:", folder_name...3,
"<br>Status:", disturbance_status,
"<br>Year of Loss:", loss_year
),
label = ~folder_name...3
) %>%
addLegend(
position = "bottomright",
colors = c("darkred", "darkgreen"),
labels = c("Disturbed", "Not Disturbed"),
title = "Plot Status"
)
Extracting and Analyzing Climate Data for Forest Plots
When studying forest ecosystems, understanding local climate conditions is crucial. The TerraClimate dataset, accessible through Google Earth Engine, provides high-resolution climate data that can help us understand the climate patterns affecting our forest plots.
TerraClimate is a dataset of monthly climate and climatic water balance for global terrestrial surfaces. It provides high-spatial resolution (4-km) data that includes maximum and minimum temperature, precipitation, wind speed, vapor pressure, solar radiation and other climate variables.
First, we access the TerraClimate dataset and select our variables of interest. This retrieves monthly climate data for our plot location for the period 2014-2023, including maximum temperature (tmmx), minimum temperature (tmmn) and precipitation (pr).
#--------------------------------------------------
# Load TerraClimate dataset with key climate variables
#--------------------------------------------------
terraclimate_data <- ee$ImageCollection("IDAHO_EPSCOR/TERRACLIMATE")$
filterDate('2014-01-01', '2023-12-31')$
select(c('tmmx', 'tmmn', 'pr')) # max temp, min temp, precipitation
Then we create a function that will process climate data for a single plot. This processing selects relevant climate columns, reshapes data from wide to long format, creates proper date structures, converts temperature values from tenths of degrees to Celsius and returns a tidy dataframe with climate variables.
#--------------------------------------------------
# Define function to process climate data for plots
#--------------------------------------------------
process_plot_climate <- function(plot_sf) {
# Get climate data for the plot
climate_data <- ee_extract(terraclimate_data, plot_sf)
# Process the data
processed_data <- data.frame(climate_data) %>%
dplyr::select(matches("^X\\d{6}_(pr|tmmn|tmmx)$")) %>%
tidyr::pivot_longer(
cols = everything(),
names_to = c("date", "variable"),
names_pattern = "X(\\d{6})_(.*)",
values_to = "value"
) %>%
dplyr::mutate(
year = as.numeric(substr(date, 1, 4)),
month = as.numeric(substr(date, 5, 6)),
date = as.Date(paste(year, month, "15", sep="-"))
) %>%
tidyr::pivot_wider(
id_cols = c(date, year, month),
names_from = variable,
values_from = value
) %>%
dplyr::mutate(
tmmx = tmmx/10, # Convert max temp to Celsius
tmmn = tmmn/10 # Convert min temp to Celsius
) %>%
dplyr::arrange(date) %>%
dplyr::select(
date, year, month,
max_temp = tmmx,
min_temp = tmmn,
precip = pr
) %>%
# Add plot information
dplyr::mutate(
plot_number = plot_sf$plot_number,
folder_name = plot_sf$folder_name
)
return(processed_data)
}
For a single plot, we can create visualizations to understand climate patterns. Temperature time series shows the pattern of maximum and minimum temperatures over time, revealing seasonal temperature variations.
#--------------------------------------------------
# Analyze climate patterns for first plot
#--------------------------------------------------
single_plot_climate <- process_plot_climate(coords_df[1, ])
# Temperature time series
temp_plot <- ggplot(single_plot_climate, aes(x = date)) +
# Add temperature lines
geom_line(aes(y = max_temp, color = "Maximum"), size = 1) +
geom_line(aes(y = min_temp, color = "Minimum"), size = 1) +
# Customize colors and labels
scale_color_manual(
values = c("Maximum" = "darkred", "Minimum" = "darkblue"),
name = "Temperature Type"
) +
# Add labels and theme
labs(
title = "",
x = "",
y = "Temperature (°C)"
) +
theme_minimal() +
theme(
legend.position = "bottom",
panel.grid.minor = element_blank(),
axis.text.x = element_text(angle = 45, hjust = 1)
)
temp_plot

Figure 4: Monthly maximum and minimum temperatures (°C) from 2014 to 2023 for a representative forest plot
Precipitation time series displays monthly precipitation totals, helping identify wet and dry periods.
#--------------------------------------------------
# Create precipitation visualization for first plot
#--------------------------------------------------
precip_plot <- ggplot(single_plot_climate, aes(x = date)) +
# Add precipitation bars
geom_bar(aes(y = precip),
stat = "identity",
fill = "steelblue",
alpha = 0.7) +
# Add labels and theme
labs(
title = "",
x = "",
y = "Total Precipitation (mm)"
) +
theme_minimal() +
theme(
panel.grid.minor = element_blank(),
axis.text.x = element_text(angle = 45, hjust = 1)
)
precip_plot

Figure 5: Monthly precipitation totals (mm) from 2014 to 2023 for a representative forest plot
Average seasonal patterns combines temperature and precipitation patterns to show typical annual cycles.
#--------------------------------------------------
# Calculate and visualize seasonal patterns
#--------------------------------------------------
# Calculate seasonal averages
seasonal_data <- single_plot_climate %>%
dplyr::group_by(month) %>%
dplyr::summarise(
mean_max_temp = mean(max_temp, na.rm = TRUE),
mean_min_temp = mean(min_temp, na.rm = TRUE),
mean_precip = mean(precip, na.rm = TRUE),
.groups = 'drop'
)
# Seasonal patterns
seasonal_plot <- ggplot(seasonal_data) +
# Map both temperature and precipitation to y aesthetic
geom_col(aes(x = month, y = mean_precip * (max(mean_max_temp) - min(mean_min_temp)) / max(mean_precip) + min(mean_min_temp)),
fill = "lightblue", alpha = 0.3) +
geom_line(aes(x = month, y = mean_max_temp, color = "Maximum"), size = 1) +
geom_line(aes(x = month, y = mean_min_temp, color = "Minimum"), size = 1) +
scale_x_continuous(breaks = 1:12,
labels = month.abb) +
scale_color_manual(
values = c("Maximum" = "darkred", "Minimum" = "darkblue"),
name = "Temperature Type"
) +
# Set up dual axes
scale_y_continuous(
name = "Temperature (°C)",
# Primary axis for temperature
sec.axis = sec_axis(
# Transform function to convert from temperature to precipitation scale
~ . * max(seasonal_data$mean_precip) /
(max(seasonal_data$mean_max_temp) - min(seasonal_data$mean_min_temp)) -
min(seasonal_data$mean_min_temp) * max(seasonal_data$mean_precip) /
(max(seasonal_data$mean_max_temp) - min(seasonal_data$mean_min_temp)),
name = "Precipitation (mm)"
)
) +
labs(
title = "",
x = ""
) +
theme_minimal() +
theme(
legend.position = "bottom",
panel.grid.minor = element_blank(),
axis.text.x = element_text(angle = 0),
# Add right axis text color
axis.text.y.right = element_text(color = "lightblue4"),
axis.title.y.right = element_text(color = "lightblue4")
)
seasonal_plot

Figure 6: Average monthly temperature and precipitation patterns from 2014 to 2023
Extending to Multiple Plots
The process_plot_climate function can be used to analyze all plots in dataset using lapply function. This makes the code highly reusable, allowing you to analyze climate patterns for either individual plots or your entire forest inventory with minimal modifications.
#--------------------------------------------------
# Process climate data for all plots simultaneously
# This approach is suitable for smaller datasets or
# when memory constraints are not a concern
#--------------------------------------------------
all_plots_climate <- lapply(1:nrow(coords_df), function(i) {
# Add progress indicator for monitoring
cat("Processing plot", i, "of", nrow(coords_df), "\n")
process_plot_climate(coords_df[i, ])
}) %>%
bind_rows() # Combine all results into a single data frame
For large datasets, it’s recommended to process plots in batches. This approach processes plots in manageable batches, provides progress updates, handles potential errors, saves intermediate results and creates a final dataset combining all plots.
#--------------------------------------------------
# Process plots in batches for efficient memory usage
# This approach is preferred for large datasets as it:
# 1. Reduces memory pressure
# 2. Allows for error recovery
# 3. Provides intermediate results
#--------------------------------------------------
# Define batch parameters
batch_size <- 50 # Number of plots to process in each batch
n_plots <- nrow(coords_df) # Total number of plots
n_batches <- ceiling(n_plots/batch_size) # Calculate number of batches needed
all_plots_climate <- list()
# Process each batch separately
for(i in 1:n_batches) {
# Calculate indices for current batch
start_idx <- (i-1)*batch_size + 1
end_idx <- min(i*batch_size, n_plots)
# Log progress for monitoring
cat("Processing batch", i, "of", n_batches, "\n")
# Process plots in current batch with error handling
batch_results <- lapply(start_idx:end_idx, function(j) {
tryCatch({
process_plot_climate(coords_df[j, ])
}, error = function(e) {
# Log any errors but continue processing
cat("Error in plot", j, ":", conditionMessage(e), "\n")
return(NULL)
})
})
# Remove any failed results (NULL values)
batch_results <- batch_results[!sapply(batch_results, is.null)]
all_plots_climate[[i]] <- batch_results
# Save intermediate results for each batch
# This ensures we don't lose everything if processing is interrupted
saveRDS(batch_results,
file = paste0("data/climate_batch_", i, ".rds"))
}
# Combine all batch results into final dataset
final_climate_data <- do.call(rbind,
lapply(all_plots_climate, bind_rows))
# Save final processed results
saveRDS(final_climate_data, "data/final_climate_data.rds")
final_climate_data %>%
head
#> Processing batch 1 of 10
#> Processing batch 2 of 10
#> Processing batch 3 of 10
#> Processing batch 4 of 10
#> Processing batch 5 of 10
#> Processing batch 6 of 10
#> Processing batch 7 of 10
#> Processing batch 8 of 10
#> Processing batch 9 of 10
#> Processing batch 10 of 10
#> # A tibble: 6 × 8
#> date year month max_temp min_temp precip plot_number folder_name
#> <date> <dbl> <dbl> <dbl> <dbl> <int> <chr> <chr>
#> 1 2014-01-15 2014 1 3.5 -1.4 210 5-1053 Delnice
#> 2 2014-02-15 2014 2 4.8 -2.7 299 5-1053 Delnice
#> 3 2014-03-15 2014 3 8.2 0 58 5-1053 Delnice
#> 4 2014-04-15 2014 4 11.1 2.5 181 5-1053 Delnice
#> 5 2014-05-15 2014 5 12.6 4.3 162 5-1053 Delnice
#> 6 2014-06-15 2014 6 18.5 8.9 165 5-1053 Delnice
Limitations
While TerraClimate provides valuable climate data for forest research, it’s important to acknowledge its limitations. The 4-kilometer spatial resolution means that local microclimatic conditions might not be captured, as each grid cell represents an averaged value for its entire area. The monthly temporal resolution smooths out daily extremes and may miss significant short-term weather events that could impact forest processes. The nearest neighbor sampling method used for data extraction means that no interpolation occurs between grid cells, potentially affecting the representativeness of data for plots located near grid cell boundaries.
Conclusion
This article demonstrate the transformative potential of GEE and the rgee package for modern forest research. This practical applications in vegetation health assessment, forest cover change analysis, and climate data extraction, enables forest scientists to overcome traditional limitations of data access, processing power, and analytical complexity.
The applications demonstrated here represent only a fraction of GEE’s potential for forest research. The platform’s extensive data catalog, combined with its processing capabilities, opens up possibilities for numerous other applications.
As forest ecosystems continue to face unprecedented challenges from climate change and human activities, tools like GEE and rgee become increasingly vital for understanding and managing these complex systems. Their ability to process and analyze vast amounts of environmental data at multiple scales positions them as essential tools in modern forest science, supporting both research and evidence-based forest management decisions.
References
- Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment, 202, 18-27.
- Aybar, C., Wu, Q., Bautista, L., Yali, R., & Barja, A. (2020). rgee: An R package for interacting with Google Earth Engine. Journal of Open Source Software, 5(51), 2272.
- Hansen, M. C., Potapov, P. V., Moore, R., Hancher, M., Turubanova, S. A., Tyukavina, A., … & Townshend, J. R. G. (2013). High-resolution global maps of 21st-century forest cover change. Science, 342(6160), 850-853.
- Abatzoglou, J. T., Dobrowski, S. Z., Parks, S. A., & Hegewisch, K. C. (2018). TerraClimate, a high-resolution global dataset of monthly climate and climatic water balance from 1958–2015. Scientific Data, 5(1), 1-12.
Table of Contents
Similar Content

Automating Forest Data Extraction: From Multiple Files to Analysis-Ready Datasets
6 February, 2025
Learn how to automate the extraction of forest data from multiple files using R, significantly reducing manual data processing time and ensuring data accuracy.
12 minutes read
Read More
Working with NetCDF Files in Forest Research: A Practical Guide
20 January, 2025
Learn how to work with NetCDF climate data files in forest research using R, including accessing CRU datasets, processing climate data, and implementing spatial interpolation techniques for forest plot analysis.
24 minutes read
Read More