Introduction to Google Earth Engine: A Powerful Tool for Modern Forest Science

6 veljače, 2025

Introduction

Forest ecosystems face unprecedented challenges from climate change, land-use transformation, and increasing natural disturbances. Traditional forest monitoring methods, while valuable, are often limited by their spatial coverage and temporal frequency. In the current era of big data and rapid environmental change, forest researchers need to analyze vast amounts of information across different spatial and temporal scales spanning decades. This presents three major challenges: data acquisition, processing power, and analysis complexity.

What is Google Earth Engine?

Google Earth Engine (GEE) emerges as a transformative solution by addressing these challenges through its cloud computing architecture. GEE is a cloud-based platform developed by Google that enables analysis and visualization of vast amounts of geospatial data. This powerful platform combines a multi-petabyte catalog of satellite imagery and geospatial datasets with planetary-scale analysis capabilities.

Google Earth Engine Website

Figure 1: Google Earth Engine Website

What makes GEE particularly powerful for forest science is its combination of immediate access to regularly updated satellite imagery, built-in algorithms for environmental analysis, cloud-based processing that eliminates local hardware limitations and integration capabilities with popular programming languages like R and Python.

This technological advancement means you can now monitor forest health in near-real-time, detect and track disturbances as they occur, analyze long-term trends in forest dynamics, study climate change impacts, combine multiple data sources for more comprehensive analyses and much more.

With GEE the processing is automatically distributed across thousands of computers, turning what would be days and weeks into minutes or hours. Most importantly, you only download the final results to computers, while all intermediate data processing happens entirely in the cloud. For perspective, analyzing 30 years of Landsat data for a large forest area traditionally would require approximately 500 GB of storage and weeks of processing time. With GEE the same analysis runs in minutes with zero local storage requirements.

The rgee package serves as a crucial bridge between the R programming language and the GEE platform. This integration is particularly valuable for forest researchers who are already familiar with R programming, as it eliminates the need to learn JavaScript, which is typically required for GEE operations. The package seamlessly integrates GEE functionalities into existing R workflows, providing access to all GEE capabilities through a familiar R interface.

Getting Started with Google Earth Engine

Before you can start using Google GEE in R, you’ll need to complete several setup steps:

  1. Create a Google Account

If you don’t already have one, create a Google account. Visit https://earthengine.google.com, click ‘Sign Up’, and accept the terms of use.

  1. Install Required Software
  • R (version 3.6 or higher)
  • RStudio (recommended for better interface)
  • Python (required for rgee functionality)
  1. Package Installation

The rgee package needs to be installed from GitHub to ensure you have the latest version with all features:

# Install rgee from GitHub
remotes::install_github("r-spatial/rgee")

# Install additional required packages
install.packages(c("sf", "raster"))
  1. Loading and Initializing
# Load required packages
library(rgee)        
library(sf)      # handling spatial data
library(raster)  # handling spatial data
library(dplyr)   # data manipulation      
library(readr)   # reading rectangular data
library(leaflet) # creating interactive maps
library(ggplot2) # data visualization

# First-time setup
ee_install()  # This installs required Python dependencies

# Initialize Earth Engine
ee_Initialize()
  1. Authentication Process

When you run ee_Initialize() for the first time a web browser window will automatically open and you’ll be asked to log in to your Google account. Google will generate an authentication token. Copy this token from your browser, return to RStudio and paste the token when prompted. rgee will store these credentials for future sessions.

If successful, you’ll see a confirmation message in the R console. This authentication process only needs to be completed once on each computer you use.

GEE Data Catalogue

GEE data catalog offers a rich collection of datasets particularly valuable for forest research:

  • Landsat Series (1972-present)
    • 30-meter resolution
    • Global coverage every 16 days
    • Multispectral and thermal data
  • Sentinel-2 (Copernicus Program)
    • High-resolution optical imagery
    • Multiple spectral bands
    • Frequent revisit time
  • Climate and Weather
    • Historical reanalysis (NCEP/NCAR)
    • GridMET datasets
    • NLDAS-2 meteorological data
    • Precipitation data (TRMM)
    • Temperature and humidity data
    • Wind measurements
  • Atmospheric Data
    • Ozone measurements
    • MODIS atmospheric products
  • Elevation Data
    • SRTM (30-meter resolution globally)
    • Various regional DEMs
    • HydroSHEDS hydrology database
  • Land Cover Maps
    • Dynamic World (near real-time)
    • ESA World Cover
    • Forest/non-forest classifications

Getting Started: Visualizing Forest Plots and Satellite Imagery

Combining ground-based plot data with satellite imagery provides valuable insights into forest conditions and spatial patterns. In this example, we’ll explore how to visualize forest plots using GEE and Sentinel-2 satellite imagery.

The first step involves preparing plot coordinates for use with satellite imagery. Our forest plots, located in the Dinaric region of Croatia, were originally surveyed using the Gauss Kruger coordinate system. However, to work with GEE, we need to transform these coordinates to the global WGS84 system.

#-------------------------------------------
# Read forest plot coordinates from CSV
#-------------------------------------------
plots <- read_csv("data/extracted_coordinates.csv")

#---------------------------------------------
# Transform coordinates from Gauss Kruger to 
# WGS84 (lat/lon) system
#---------------------------------------------
coords_df <- plots %>%
    st_as_sf(coords = c("x_coordinate", "y_coordinate"),
             crs = 31275) %>%  
    st_transform(4326)  

# Display first few rows
coords_df %>% 
    head()
#> Simple feature collection with 6 features and 3 fields
#> Geometry type: POINT
#> Dimension:     XY
#> Bounding box:  xmin: 14.52124 ymin: 45.52129 xmax: 14.66221 ymax: 45.58453
#> Geodetic CRS:  WGS 84
#> # A tibble: 6 × 4
#>   file_name     plot_number folder_name            geometry
#>   <chr>         <chr>       <chr>               <POINT [°]>
#> 1 5-1053-10.xls 5-1053      Delnice     (14.52124 45.56628)
#> 2 5-1054-3.xls  5-1054      Delnice     (14.58516 45.58453)
#> 3 5-1055-12.xls 5-1055      Delnice     (14.64935 45.56674)
#> 4 5-1056-13.xls 5-1056      Delnice     (14.66221 45.55779)
#> 5 5-1135-14.xls 5-1135      Delnice     (14.52162 45.52129)
#> 6 5-1136-10.xls 5-1136      Delnice     (14.57275 45.53049)

Once we have our coordinates, we need to convert them into a format that GEE understands.

#------------------------------------------------
# Convert our spatial data to GEE format
#------------------------------------------------
plots_ee <- sf_as_ee(coords_df)

With plot locations properly formatted, we can now access Sentinel-2 satellite imagery. Sentinel-2 provides high-resolution optical imagery perfect for forest monitoring, with new images available every few days. We’ll focus on summer months and filter out cloudy images to ensure the best possible visualization.

#-------------------------------------------------
# Access and process Sentinel-2 satellite imagery
#-------------------------------------------------
s2_imagery <- ee$ImageCollection('COPERNICUS/S2_SR')$
    #-------------------------
    # Limit to study area
    #-------------------------
    filterBounds(plots_ee)$ 
    #--------------------------
    # Select summer months only
    #--------------------------
    filterDate('2023-06-01', '2023-09-30')$ 
    #----------------------------------------------
    # Remove images with more than 20% cloud cover
    #----------------------------------------------
    filter(ee$Filter$lt('CLOUDY_PIXEL_PERCENTAGE', 20))$
    #----------------------------------------------
    # Creates a composite image using the median 
    # values, helping to remove any remaining clouds 
    # or artifacts
    #----------------------------------------------
    median() 

To display the satellite imagery we need to set appropriate visualization parameters. We’ll use a true-color combination that shows the forest as it would appear to the human eye, with some adjustments to enhance visibility and contrast.

#--------------------------------------------
# Set parameters for true-color visualization
#--------------------------------------------
vis_params <- list(
    bands = c('B4', 'B3', 'B2'),   # Red, Green, Blue bands for true color
    min = 0,                       # Minimum reflectance value
    max = 3000,                    # Maximum reflectance value
    gamma = 1.2                    # Slight contrast enhancement
) 

These parameters control how the satellite imagery is displayed. Bands (‘B4’, ‘B3’, ‘B2’) create a true-color image. Min and max control the brightness range and gamma adjusts the contrast (values > 1 increase contrast).

Finally, we can create our visualization by combining the satellite imagery with plot locations. The resulting map allows users to zoom in and out, toggle between different base maps, and switch layers on and off using the layer control panel in the top-left corner. This interactivity allow us to understand the spatial distribution of our sampling plots within the broader forest landscape.

#-----------------------------
# Center view on study area 
#-----------------------------
Map$centerObject(plots_ee, zoom = 9)  
#-----------------------------
# Add satellite imagery
#-----------------------------
map <- Map$addLayer(s2_imagery, vis_params, "Sentinel-2") +
Map$addLayer(plots_ee, list(color = 'white', pointSize = 1), "Forest Plots")  

map

Assessing Forest Health Through Vegetation Indices

Understanding vegetation health and vigor across large areas requires efficient and reliable methods of assessment. While traditional field measurements provide detailed ground data, satellite-based vegetation indices offer a powerful complement, enabling us to analyze entire forest landscapes systematically. Among these indices, the Normalized Difference Vegetation Index (NDVI) stands out as one of the most valuable tools for forest health assessment.

NDVI leverages a fundamental characteristic of vegetation: healthy plants strongly absorb visible red light for photosynthesis while reflecting most near-infrared (NIR) light. By calculating the difference between NIR and red reflectance values, normalized by their sum, NDVI provides a standardized way to measure vegetation health and density. The resulting values range from -1 to +1, where higher positive values typically indicate healthier and denser vegetation. Negative values usually represent water, snow, or clouds

Let’s analyze the vegetation health across our study plots using Sentinel-2 satellite imagery. We’ll calculate NDVI values, visualize their spatial distribution, and examine the patterns of forest health in our study area.

We use Sentinel-2’s specific bands for this calculation. Band 8 (NIR, 842nm) captures the high reflectance of vegetation in the near-infrared spectrum and Band 4 (RED, 665nm) captures chlorophyll absorption during photosynthesis. Both bands have 10-meter spatial resolution, providing detailed forest structure information.

#---------------------------------------------------
# Using Sentinel-2's NIR (B8) and RED (B4) bands for 
# NDVI calculation
#---------------------------------------------------
ndvi <- s2_imagery$normalizedDifference(c('B8', 'B4')) 
#-------------------------------------------------------
# Setting value range and color palette for NDVI display
#-------------------------------------------------------
ndvi_params <- list(
    min = 0,
    max = 0.8,  # Typical range for vegetation
    palette = c(
        '#CE7E45',  # Orange - low vegetation
        '#DF923D',  # Light orange
        '#F1B555',  # Yellow
        '#99B718',  # Light green
        '#74A901'   # Dark green - dense vegetation
    )
)

Interactive map shows NDVI patterns across the entire landscape through a color gradient from orange (low NDVI) to dark green (high NDVI). White dots represent forest plot locations. You can zoom in/out to explore and layer control allows toggling between the NDVI layer and plot locations.


#--------------------------------------------------
# Creating interactive map with NDVI visualization
#--------------------------------------------------
Map$centerObject(plots_ee, zoom = 9)

map1 <-Map$addLayer(ndvi, ndvi_params, "NDVI") +
Map$addLayer(plots_ee, list(color = 'white', pointSize = 1), "Forest Plots")

map1

After calculating and visualizing NDVI across our study area, we can extract specific NDVI values for each forest plot location and analyze how these values vary across different forest management units.

#-------------------------------------------
# Extract NDVI values for each plot location
#-------------------------------------------
ndvi_values <- ee_extract(ndvi, coords_df)

ndvi_values %>% 
    head()

#-------------------------------------------
# Calculate summary statistics by management
# unit
#-------------------------------------------
unit_summary <- ndvi_values %>%
    group_by(folder_name) %>%
    summarise(
        mean_ndvi = mean(nd, na.rm = TRUE),
        min_ndvi = min(nd, na.rm = TRUE),
        max_ndvi = max(nd, na.rm = TRUE),
        sd_ndvi = sd(nd, na.rm = TRUE),
        n_plots = n()
    )

unit_summary %>% 
    head()
#>       file_name plot_number folder_name        nd
#> 1 5-1053-10.xls      5-1053     Delnice 0.5386036
#> 2  5-1054-3.xls      5-1054     Delnice 0.5090945
#> 3 5-1055-12.xls      5-1055     Delnice 0.5291917
#> 4 5-1056-13.xls      5-1056     Delnice 0.5818992
#> 5 5-1135-14.xls      5-1135     Delnice 0.4860941
#> 6 5-1136-10.xls      5-1136     Delnice 0.5070886
#> # A tibble: 4 × 6
#>   folder_name mean_ndvi min_ndvi max_ndvi sd_ndvi n_plots
#>   <chr>           <dbl>    <dbl>    <dbl>   <dbl>   <int>
#> 1 Delnice         0.535  0.395      0.636  0.0366     106
#> 2 Gospić          0.546  0.275      0.629  0.0559     213
#> 3 Ogulin          0.552  0.486      0.604  0.0275      55
#> 4 Senj            0.473  0.00858    0.627  0.113       78

We’ll create an interactive map that categorizes NDVI values and allows users to explore individual plot characteristics. This visualization combines both categorical classification of forest health and precise NDVI values, making it easier to identify patterns and potential areas of interest across different management units.

The map’s interactive elements allow viewers to access plot information by hovering over locations, while clicking on specific plots reveals detailed information including management unit name, plot identification number, precise NDVI value, and NDVI category. For easier interpretation, a legend displays the NDVI value color scale, and users can freely navigate through different areas using zoom and pan functions.

#------------------------------------------------------
# Prepare data for interactive visualization
# Categorize NDVI values and create informative labels
#------------------------------------------------------
plots_with_ndvi <- ndvi_values %>%
    mutate(
        # Classify NDVI values into meaningful categories
        ndvi_category = case_when(
            nd < 0.2 ~ "Very Low",    # Indicates non-vegetated or disturbed areas
            nd < 0.4 ~ "Low",         # Sparse or stressed vegetation
            nd < 0.5 ~ "Moderate",    # Moderate vegetation density
            nd < 0.6 ~ "High",        # Dense, healthy vegetation
            TRUE ~ "Very High"        # Very dense, vigorous vegetation
        ),
        # Create informative labels for interactive display
        label = paste(
            "Management Unit:", folder_name,
            "<br>Plot:", plot_number,
            "<br>NDVI Value:", round(nd, 3),
            "<br>Category:", ndvi_category
        )
    ) %>%
    # Add geographic coordinates for mapping
    bind_cols(
        st_coordinates(coords_df) %>% 
            as.data.frame() %>%
            rename(lon = X, lat = Y)
    )

#--------------------------------------------------
# Create interactive map using Leaflet
#--------------------------------------------------
leaflet() %>%
    
    # Add clean, minimalist CartoDB basemap
    addProviderTiles("CartoDB.Positron") %>%
    
    # Center view on study area
    setView(lng = mean(plots_with_ndvi$lon), 
            lat = mean(plots_with_ndvi$lat), 
            zoom = 9) %>%
    
    # Add plot locations as colored circles
    addCircleMarkers(
        data = plots_with_ndvi,
        lng = ~lon, 
        lat = ~lat,
        
        # Color circles based on NDVI values
        color = ~colorNumeric(
            palette = c('#CE7E45', '#DF923D', '#F1B555', '#99B718', '#74A901'),
            domain = range(nd)
        )(nd),
        radius = 5,
        popup = ~label,        # Show detailed info on click
        label = ~label %>% lapply(htmltools::HTML),  # Show info on hover
        fillOpacity = 0.7
    ) %>%
    
    # Add legend to explain NDVI value colors
    addLegend(
        position = "bottomright",
        pal = colorNumeric(
            palette = c('#CE7E45', '#DF923D', '#F1B555', '#99B718', '#74A901'),
            domain = range(plots_with_ndvi$nd)
        ),
        values = plots_with_ndvi$nd,
        title = "NDVI Values",
        opacity = 1
    )

To complement our analysis, we’ll visualize the statistical distribution of NDVI values across forest management units using a combined violin and box plot. This visualization helps us understand both the central tendencies and the full distribution of forest health conditions within each management unit.

#--------------------------------------------------
# Create violin plot combined with box plot to show
# NDVI distribution across management units
#--------------------------------------------------
ggplot(plots_with_ndvi, aes(x = folder_name, y = nd)) +
    # Add violin plot to show full distribution shape
    geom_violin(
        alpha = 0.7,          
        fill = "gray80"      
    ) +
    # Add box plot to show key statistics
    geom_boxplot(
        width = 0.2,          
        alpha = 0.7,        
        fill = "white"       
    ) +
    # Use minimal theme for clean appearance
    theme_minimal() +
    # Add descriptive labels
    labs(
        title = "",
        x = "Management Unit",
        y = "NDVI Value"
    ) +
    # Customize theme elements
    theme(
        plot.background = element_rect(fill = "#e0dedc"),
        panel.background = element_rect(fill = "#e0dedc"),
        plot.title = element_text(hjust = 0.5),      # Center title
        axis.text.x = element_text(angle = 0, hjust = 1)  # Horizontal labels
    )
Distribution of NDVI values across forest management units

Figure 2: Distribution of NDVI values across forest management units

Analyzing Forest Cover Change

The assessment of long-term forest cover changes is crucial for understanding forest dynamics and evaluating forest management practices. For our second example, we’ll utilize the Hansen Global Forest Change dataset, a comprehensive resource that provides detailed information about global forest changes from 2000 onwards.

The Hansen Global Forest Change dataset, developed by researchers at the University of Maryland, represents a significant advancement in global forest monitoring. This dataset uses Landsat satellite imagery to track forest extent and changes globally at a 30-meter resolution. It provides annual updates of tree cover percentage for the year 2000 (baseline) and annual forest loss from 2000 onwards with forest gain over the entire period.

Let’s analyze forest cover changes in our study area using this dataset:

#--------------------------------------------------
# Load Hansen Global Forest Change dataset
#--------------------------------------------------
hansen <- ee$Image('UMD/hansen/global_forest_change_2022_v1_10')

#--------------------------------------------------
# Extract different components from Hansen dataset
#--------------------------------------------------
# Extract forest cover for year 2000 (baseline)
forest2000 <- hansen$select('treecover2000')

# Extract forest loss year information
# Values 1-22 represent years 2001-2022 where:
# 0 = no loss
# 1 = loss in 2001
# 2 = loss in 2002
# ...
# 22 = loss in 2022
loss_year <- hansen$select('lossyear')

To understand the spatial patterns of forest cover and loss, we create an interactive visualization. The interactive map displays three key layers of forest information. The base layer shows the forest cover as it existed in the year 2000, represented by different shades of green where darker green indicates denser forest cover (>30% tree cover). Overlaid on this is a layer showing where forest loss has occurred between 2001 and 2022, visualized through a red color gradient. In this loss layer, lighter red shades indicate areas where forest loss occurred earlier in the time period (closer to 2001), while darker red shades represent more recent forest loss (closer to 2022). The topmost layer consists of white points marking our forest plot locations across the study area. This layered visualization allows us to simultaneously understand the initial forest conditions, track where and when forest loss occurred, and relate these patterns to our sampling locations.


#--------------------------------------------------
# Set visualization parameters
#--------------------------------------------------
# Parameters for forest cover
vis_params_forest <- list(
    min = 30,       # Only show areas with >30% tree cover
    max = 100,      # Maximum tree cover is 100%
    palette = c(
        '#FFFFFF',        # No/sparse forest
        '#A1CCA1',        # Moderate forest
        '#4E944E'         # Dense forest
    )
)

# Parameters for forest loss
vis_params_loss <- list(
    min = 1,     # First year of loss (2001)
    max = 22,    # Last year of loss (2022)
    # Red color gradient from light to dark
    # Represents progression of loss from 2001 (light) to 2022 (dark)
    palette = c('#FFF5F0', '#FEE0D2', '#FCBBA1', '#FC9272', 
                '#FB6A4A', '#EF3B2C', '#CB181D', '#99000D')
)

#--------------------------------------------------
# Create interactive map
#--------------------------------------------------
Map$centerObject(plots_ee, zoom = 9)
map2 <- Map$addLayer(forest2000, vis_params_forest, "Forest Cover 2000") +
Map$addLayer(loss_year, vis_params_loss, "Year of Loss", opacity = 0.6) +
Map$addLayer(plots_ee, list(color = 'white', pointSize = 1), "Forest Plots")

map2

Now let’s see how forest disturbances vary across management units by analyzing patterns of forest loss.

#--------------------------------------------------
# Calculate statistics for each management unit
#--------------------------------------------------

# Extract loss year values for each plot
loss_year_values <- ee_extract(loss_year, coords_df)  # This gives us the year of loss for each plot location

loss_year_values %>% 
    head()

loss_analysis <- loss_year_values %>%
    bind_cols(ndvi_values) %>%
    group_by(folder_name...3) %>%
    summarise(
        # Total number of plots in each unit
        total_plots = n(),
        # Number of plots that experienced loss
        disturbed_plots = sum(lossyear > 0, na.rm = TRUE),
        # Percentage of affected plots
        percent_disturbed = (disturbed_plots/total_plots) * 100,
        # Average year when loss occurred
        mean_loss_year = mean(lossyear[lossyear > 0], na.rm = TRUE) + 2000
    )

loss_analysis %>% 
    head()
#>       file_name plot_number folder_name lossyear
#> 1 5-1053-10.xls      5-1053     Delnice       NA
#> 2  5-1054-3.xls      5-1054     Delnice       NA
#> 3 5-1055-12.xls      5-1055     Delnice       18
#> 4 5-1056-13.xls      5-1056     Delnice       14
#> 5 5-1135-14.xls      5-1135     Delnice       11
#> 6 5-1136-10.xls      5-1136     Delnice       18
#> # A tibble: 4 × 5
#>   folder_name...3 total_plots disturbed_plots percent_disturbed mean_loss_year
#>   <chr>                 <int>           <int>             <dbl>          <dbl>
#> 1 Delnice                 106              76              71.7          2016.
#> 2 Gospić                  213              74              34.7          2011.
#> 3 Ogulin                   55              24              43.6          2017.
#> 4 Senj                     78              51              65.4          2013.

To understand the dynamics of forest disturbance over time, we analyze the annual occurrence of loss events. With interactive map we can visualize the distribution of forest plots, distinguishing between disturbed (red) and undisturbed (green) plots across the region.

#--------------------------------------------------
# Analyze temporal patterns of forest loss
#--------------------------------------------------
temporal_summary <- loss_year_values %>%
    filter(lossyear > 0) %>%
    mutate(year = lossyear + 2000) %>%
    group_by(folder_name, year) %>%
    summarise(
        loss_events = n(),
        .groups = 'drop'
    )

temporal_summary %>% 
    head()

ggplot(temporal_summary, 
       aes(x = year, y = loss_events, color = folder_name)) +
    geom_line(linewidth = 1) +
    geom_point(size = 2) +
    theme_minimal() +
    labs(
        title = "",
        subtitle = "",
        x = "",
        y = "Number of Loss Events",
        color = "Management Unit"
    ) +
    scale_x_continuous(breaks = seq(2000, 2022, 2)) +
    scale_color_manual(
        values = c(
            "Delnice" = "#783D19",  
            "Gospić" = "#a78780",
            "Ogulin" = "#5F6F52",
            "Senj" = "#284345c7"
        )
    ) +
    theme(
        axis.text.x = element_text(angle = 45, hjust = 1),
        legend.position = "bottom"
    )

#> # A tibble: 6 × 3
#>   folder_name  year loss_events
#>   <chr>       <dbl>       <int>
#> 1 Delnice      2003           1
#> 2 Delnice      2004           1
#> 3 Delnice      2005           2
#> 4 Delnice      2007           5
#> 5 Delnice      2010           2
#> 6 Delnice      2011           1
 Patterns of forest loss events across management units from 2000 to 2022

Figure 3: Patterns of forest loss events across management units from 2000 to 2022


#--------------------------------------------------
# Prepare and visualize disturbance data
#--------------------------------------------------
plots_with_disturbance <- loss_year_values %>%
    # Add coordinates
    bind_cols(
        st_coordinates(coords_df) %>% 
            as.data.frame() %>%
            rename(lon = X, lat = Y)
    ) %>%
    # Add plot information
    bind_cols(plots) %>%
    # Create disturbance status and year
    mutate(
        disturbance_status = case_when(
            lossyear > 0 ~ "Disturbed",
            TRUE ~ "Not Disturbed"
        ),
        loss_year = ifelse(lossyear > 0, 
                           lossyear + 2000, 
                           NA)
    )

# Create interactive map
leaflet() %>%
    addProviderTiles("CartoDB.Positron") %>%
    setView(lng = mean(plots_with_disturbance$lon), 
            lat = mean(plots_with_disturbance$lat), 
            zoom = 8) %>%
    addCircleMarkers(
        data = plots_with_disturbance,
        lng = ~lon,
        lat = ~lat,
        color = ~ifelse(disturbance_status == "Disturbed", 
                        "darkred", "darkgreen"),
        radius = 2,
        popup = ~paste(
            "Management Unit:", folder_name...3,
            "<br>Status:", disturbance_status,
            "<br>Year of Loss:", loss_year
        ),
        label = ~folder_name...3
    ) %>%
    addLegend(
        position = "bottomright",
        colors = c("darkred", "darkgreen"),
        labels = c("Disturbed", "Not Disturbed"),
        title = "Plot Status"
    )

Extracting and Analyzing Climate Data for Forest Plots

When studying forest ecosystems, understanding local climate conditions is crucial. The TerraClimate dataset, accessible through Google Earth Engine, provides high-resolution climate data that can help us understand the climate patterns affecting our forest plots.

TerraClimate is a dataset of monthly climate and climatic water balance for global terrestrial surfaces. It provides high-spatial resolution (4-km) data that includes maximum and minimum temperature, precipitation, wind speed, vapor pressure, solar radiation and other climate variables.

First, we access the TerraClimate dataset and select our variables of interest. This retrieves monthly climate data for our plot location for the period 2014-2023, including maximum temperature (tmmx), minimum temperature (tmmn) and precipitation (pr).

#--------------------------------------------------
# Load TerraClimate dataset with key climate variables
#--------------------------------------------------
terraclimate_data <- ee$ImageCollection("IDAHO_EPSCOR/TERRACLIMATE")$
    filterDate('2014-01-01', '2023-12-31')$
    select(c('tmmx', 'tmmn', 'pr'))  # max temp, min temp, precipitation

Then we create a function that will process climate data for a single plot. This processing selects relevant climate columns, reshapes data from wide to long format, creates proper date structures, converts temperature values from tenths of degrees to Celsius and returns a tidy dataframe with climate variables.

#--------------------------------------------------
# Define function to process climate data for plots
#--------------------------------------------------
process_plot_climate <- function(plot_sf) {
    # Get climate data for the plot
    climate_data <- ee_extract(terraclimate_data, plot_sf)
    
    # Process the data
    processed_data <- data.frame(climate_data) %>%
        dplyr::select(matches("^X\\d{6}_(pr|tmmn|tmmx)$")) %>%
        tidyr::pivot_longer(
            cols = everything(),
            names_to = c("date", "variable"),
            names_pattern = "X(\\d{6})_(.*)",
            values_to = "value"
        ) %>%
        dplyr::mutate(
            year = as.numeric(substr(date, 1, 4)),
            month = as.numeric(substr(date, 5, 6)),
            date = as.Date(paste(year, month, "15", sep="-"))
        ) %>%
        tidyr::pivot_wider(
            id_cols = c(date, year, month),
            names_from = variable,
            values_from = value
        ) %>%
        dplyr::mutate(
            tmmx = tmmx/10,  # Convert max temp to Celsius
            tmmn = tmmn/10   # Convert min temp to Celsius
        ) %>%
        dplyr::arrange(date) %>%
        dplyr::select(
            date, year, month,
            max_temp = tmmx,
            min_temp = tmmn,
            precip = pr
        ) %>%
        # Add plot information
        dplyr::mutate(
            plot_number = plot_sf$plot_number,
            folder_name = plot_sf$folder_name
        )
    
    return(processed_data)
}

For a single plot, we can create visualizations to understand climate patterns. Temperature time series shows the pattern of maximum and minimum temperatures over time, revealing seasonal temperature variations.

#--------------------------------------------------
# Analyze climate patterns for first plot
#--------------------------------------------------
single_plot_climate <- process_plot_climate(coords_df[1, ])

# Temperature time series
temp_plot <- ggplot(single_plot_climate, aes(x = date)) +
    # Add temperature lines
    geom_line(aes(y = max_temp, color = "Maximum"), size = 1) +
    geom_line(aes(y = min_temp, color = "Minimum"), size = 1) +
    # Customize colors and labels
    scale_color_manual(
        values = c("Maximum" = "darkred", "Minimum" = "darkblue"),
        name = "Temperature Type"
    ) +
    # Add labels and theme
    labs(
        title = "",
        x = "",
        y = "Temperature (°C)"
    ) +
    theme_minimal() +
    theme(
        legend.position = "bottom",
        panel.grid.minor = element_blank(),
        axis.text.x = element_text(angle = 45, hjust = 1)
    )

temp_plot
Monthly maximum and minimum temperatures (°C) from 2014 to 2023 for a representative forest plot

Figure 4: Monthly maximum and minimum temperatures (°C) from 2014 to 2023 for a representative forest plot

Precipitation time series displays monthly precipitation totals, helping identify wet and dry periods.


#--------------------------------------------------
# Create precipitation visualization for first plot
#--------------------------------------------------
precip_plot <- ggplot(single_plot_climate, aes(x = date)) +
    # Add precipitation bars
    geom_bar(aes(y = precip), 
             stat = "identity",
             fill = "steelblue", 
             alpha = 0.7) +
    # Add labels and theme
    labs(
        title = "",
        x = "",
        y = "Total Precipitation (mm)"
    ) +
    theme_minimal() +
    theme(
        panel.grid.minor = element_blank(),
        axis.text.x = element_text(angle = 45, hjust = 1)
    )

precip_plot
Monthly precipitation totals (mm) from 2014 to 2023 for a representative forest plot

Figure 5: Monthly precipitation totals (mm) from 2014 to 2023 for a representative forest plot

Average seasonal patterns combines temperature and precipitation patterns to show typical annual cycles.


#--------------------------------------------------
# Calculate and visualize seasonal patterns
#--------------------------------------------------
# Calculate seasonal averages
seasonal_data <- single_plot_climate %>%
    dplyr::group_by(month) %>%
    dplyr::summarise(
        mean_max_temp = mean(max_temp, na.rm = TRUE),
        mean_min_temp = mean(min_temp, na.rm = TRUE),
        mean_precip = mean(precip, na.rm = TRUE),
        .groups = 'drop'
    )

# Seasonal patterns
seasonal_plot <- ggplot(seasonal_data) +
   # Map both temperature and precipitation to y aesthetic
   geom_col(aes(x = month, y = mean_precip * (max(mean_max_temp) - min(mean_min_temp)) / max(mean_precip) + min(mean_min_temp)), 
            fill = "lightblue", alpha = 0.3) +
   geom_line(aes(x = month, y = mean_max_temp, color = "Maximum"), size = 1) +
   geom_line(aes(x = month, y = mean_min_temp, color = "Minimum"), size = 1) +
   scale_x_continuous(breaks = 1:12, 
                     labels = month.abb) +
   scale_color_manual(
       values = c("Maximum" = "darkred", "Minimum" = "darkblue"),
       name = "Temperature Type"
   ) +
   # Set up dual axes
   scale_y_continuous(
       name = "Temperature (°C)",
       # Primary axis for temperature
       sec.axis = sec_axis(
           # Transform function to convert from temperature to precipitation scale
           ~ . * max(seasonal_data$mean_precip) / 
               (max(seasonal_data$mean_max_temp) - min(seasonal_data$mean_min_temp)) -
               min(seasonal_data$mean_min_temp) * max(seasonal_data$mean_precip) /
               (max(seasonal_data$mean_max_temp) - min(seasonal_data$mean_min_temp)),
           name = "Precipitation (mm)"
       )
   ) +
   labs(
       title = "",
       x = ""
   ) +
   theme_minimal() +
   theme(
       legend.position = "bottom",
       panel.grid.minor = element_blank(),
       axis.text.x = element_text(angle = 0),
       # Add right axis text color
       axis.text.y.right = element_text(color = "lightblue4"),
       axis.title.y.right = element_text(color = "lightblue4")
   )

seasonal_plot
Average monthly temperature and precipitation patterns from 2014 to 2023

Figure 6: Average monthly temperature and precipitation patterns from 2014 to 2023

Extending to Multiple Plots

The process_plot_climate function can be used to analyze all plots in dataset using lapply function. This makes the code highly reusable, allowing you to analyze climate patterns for either individual plots or your entire forest inventory with minimal modifications.

#--------------------------------------------------
# Process climate data for all plots simultaneously
# This approach is suitable for smaller datasets or
# when memory constraints are not a concern
#--------------------------------------------------

all_plots_climate <- lapply(1:nrow(coords_df), function(i) {
        # Add progress indicator for monitoring
        cat("Processing plot", i, "of", nrow(coords_df), "\n")
        process_plot_climate(coords_df[i, ])
    }) %>%
        bind_rows()  # Combine all results into a single data frame
    

For large datasets, it’s recommended to process plots in batches. This approach processes plots in manageable batches, provides progress updates, handles potential errors, saves intermediate results and creates a final dataset combining all plots.

#--------------------------------------------------
# Process plots in batches for efficient memory usage
# This approach is preferred for large datasets as it:
# 1. Reduces memory pressure
# 2. Allows for error recovery
# 3. Provides intermediate results
#--------------------------------------------------
# Define batch parameters
batch_size <- 50  # Number of plots to process in each batch
n_plots <- nrow(coords_df)  # Total number of plots
n_batches <- ceiling(n_plots/batch_size)  # Calculate number of batches needed

    all_plots_climate <- list()
    
    # Process each batch separately
    for(i in 1:n_batches) {
        # Calculate indices for current batch
        start_idx <- (i-1)*batch_size + 1
        end_idx <- min(i*batch_size, n_plots)
        
        # Log progress for monitoring
        cat("Processing batch", i, "of", n_batches, "\n")
        
        # Process plots in current batch with error handling
        batch_results <- lapply(start_idx:end_idx, function(j) {
            tryCatch({
                process_plot_climate(coords_df[j, ])
            }, error = function(e) {
                # Log any errors but continue processing
                cat("Error in plot", j, ":", conditionMessage(e), "\n")
                return(NULL)
            })
        })
        
        # Remove any failed results (NULL values)
        batch_results <- batch_results[!sapply(batch_results, is.null)]
        all_plots_climate[[i]] <- batch_results
        
        # Save intermediate results for each batch
        # This ensures we don't lose everything if processing is interrupted
        saveRDS(batch_results, 
                file = paste0("data/climate_batch_", i, ".rds"))
    }
    
    # Combine all batch results into final dataset
    final_climate_data <- do.call(rbind, 
                                 lapply(all_plots_climate, bind_rows))
    
    # Save final processed results
    saveRDS(final_climate_data, "data/final_climate_data.rds")


final_climate_data %>% 
    head
#> Processing batch 1 of 10 
#> Processing batch 2 of 10 
#> Processing batch 3 of 10 
#> Processing batch 4 of 10 
#> Processing batch 5 of 10 
#> Processing batch 6 of 10 
#> Processing batch 7 of 10 
#> Processing batch 8 of 10 
#> Processing batch 9 of 10 
#> Processing batch 10 of 10 
#> # A tibble: 6 × 8
#>   date        year month max_temp min_temp precip plot_number folder_name
#>   <date>     <dbl> <dbl>    <dbl>    <dbl>  <int> <chr>       <chr>      
#> 1 2014-01-15  2014     1      3.5     -1.4    210 5-1053      Delnice    
#> 2 2014-02-15  2014     2      4.8     -2.7    299 5-1053      Delnice    
#> 3 2014-03-15  2014     3      8.2      0       58 5-1053      Delnice    
#> 4 2014-04-15  2014     4     11.1      2.5    181 5-1053      Delnice    
#> 5 2014-05-15  2014     5     12.6      4.3    162 5-1053      Delnice    
#> 6 2014-06-15  2014     6     18.5      8.9    165 5-1053      Delnice

Limitations

While TerraClimate provides valuable climate data for forest research, it’s important to acknowledge its limitations. The 4-kilometer spatial resolution means that local microclimatic conditions might not be captured, as each grid cell represents an averaged value for its entire area. The monthly temporal resolution smooths out daily extremes and may miss significant short-term weather events that could impact forest processes. The nearest neighbor sampling method used for data extraction means that no interpolation occurs between grid cells, potentially affecting the representativeness of data for plots located near grid cell boundaries.

Conclusion

This article demonstrate the transformative potential of GEE and the rgee package for modern forest research. This practical applications in vegetation health assessment, forest cover change analysis, and climate data extraction, enables forest scientists to overcome traditional limitations of data access, processing power, and analytical complexity.

The applications demonstrated here represent only a fraction of GEE’s potential for forest research. The platform’s extensive data catalog, combined with its processing capabilities, opens up possibilities for numerous other applications.

As forest ecosystems continue to face unprecedented challenges from climate change and human activities, tools like GEE and rgee become increasingly vital for understanding and managing these complex systems. Their ability to process and analyze vast amounts of environmental data at multiple scales positions them as essential tools in modern forest science, supporting both research and evidence-based forest management decisions.

References

  1. Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D., & Moore, R. (2017). Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sensing of Environment, 202, 18-27.
  2. Aybar, C., Wu, Q., Bautista, L., Yali, R., & Barja, A. (2020). rgee: An R package for interacting with Google Earth Engine. Journal of Open Source Software, 5(51), 2272.
  3. Hansen, M. C., Potapov, P. V., Moore, R., Hancher, M., Turubanova, S. A., Tyukavina, A., … & Townshend, J. R. G. (2013). High-resolution global maps of 21st-century forest cover change. Science, 342(6160), 850-853.
  4. Abatzoglou, J. T., Dobrowski, S. Z., Parks, S. A., & Hegewisch, K. C. (2018). TerraClimate, a high-resolution global dataset of monthly climate and climatic water balance from 1958–2015. Scientific Data, 5(1), 1-12.

Slični sadržaji

Working with NetCDF Files in Forest Research: A Practical Guide
Working with NetCDF Files in Forest Research: A Practical Guide

6 veljače, 2025

Learn how to work with NetCDF climate data files in forest research using R, including accessing CRU datasets, processing climate data, and implementing spatial interpolation techniques for forest plot analysis.

24 minutes read

Read More
Automating Forest Data Extraction: From Multiple Files to Analysis-Ready Datasets
Automating Forest Data Extraction: From Multiple Files to Analysis-Ready Datasets

10 siječnja, 2025

Learn how to automate the extraction of forest data from multiple files using R, significantly reducing manual data processing time and ensuring data accuracy.

12 minutes read

Read More