Maintenance: The SOA will be performing scheduled maintenance of our Actuarial Directory and Explorer servers on Thursday, April 25th, 2024 from 5:00 AM to 9:00 AM CT.

A Practical Guide for Working with Weather Datasets

Author

Patrick Wiese, ASA

Description

This series of papers is intended to serve as a practical guide for actuaries and researchers who wish to analyze weather datasets. The first paper provides an overview of the main types of weather datasets. The second paper describes computer programming strategies for processing large weather datasets using a standard personal computer. Each subsequent paper describes a particular weather dataset, accompanied by a free, open-source computer program for analyzing the dataset. The computer programs – developed by SOA staff and by volunteers -- reduce the upfront time and effort required to begin working with weather datasets. Because each program is open source, researchers can modify and expand the code to suit their own purposes.

A wide range of datasets will be covered in this series of papers, including (1) data collected by weather stations; (2) data estimated using Doppler radar and/or sensors on satellites; (3) “reanalysis” datasets generated by weather models that assimilate historical data from many sources (land-based stations, ships, planes, weather balloons, buoys, satellites, and radar) and produce, as an output, spatially and temporally complete historical records; (4) short and medium term forecasts, (5) sub-seasonal and seasonal forecasts and (6) long-range climate projections.

Released Papers

Topic #1: The Main Types of Weather Datasets

Report

This report categorizes weather datasets into 8 main types, of which 5 are historical data and 3 are forecast data. Each of the 8 dataset types is described, and an illustrative example of each is provided.

Topic #2: Strategies for Processing Large Weather Datasets

Report
Demo Computer Programs

Many weather datasets exceed 100 gigabytes and some are much larger. While most climate scientists have access to servers that can store and process massive weather datasets, other types of researchers may wish to perform weather analyses on a standard personal computer. A personal computer rarely offers more than 1000 gigabytes of storage space and 16 to 32 gigabytes of RAM (active memory for running applications and programs). Given these constraints, a researcher will need a clever approach for working with large weather datasets. This paper describes techniques for running analyses of a large weather dataset despite the storage and memory limitations imposed by a personal computer. This paper is accompanied by 6 small computer programs written in VBA, Python, R, and C++, zipped together into a single file to simplify the download process. These programs illustrate techniques for processing large weather datasets using the limited memory (RAM) available on a personal computer.

Upcoming Papers

Topic #3: Introduction to the GHCN Daily Dataset

The Global Historical Climatology Network Daily (GHCNd) dataset consists of daily temperature, precipitation and wind speed observations collected from over 100,000 land-based weather stations. GHCNd provides geographic coverage of much of the world, but the availability of data varies significantly from one country to another, as well as within each country. In general, the density of weather stations is greatest in urban areas and lowest in rural areas.

Topic #4: Introduction to the CPC Global Temperature and Precipitation Datasets

To address the irregular spacing of ground-based weather stations, researchers have developed algorithms to translate station data into data on a regularly spaced geographic grid. The Climate Prediction Center – a federal agency that is part of the National Oceanic and Atmospheric Administration – uses station data to produce the Global Unified Temperature Dataset and the Global Unified Precipitation Dataset. These datasets cover the period from 1979 to the present and are gridded at 0.5 degrees (i.e. data is available every 0.5 degrees of latitude and longitude). For each day and for each grid point, the data provides minimum temperature, maximum temperature, and total precipitation. Note that the data does not include areas over oceans; rather, the data covers land areas.

Topic #5: Introduction to the ERA5 Dataset

“ERA5” is short for “Fifth generation European Center for Medium-Range Weather Forecasts Atmospheric Reanalysis of the Global Climate”. ERA5 is a “reanalysis” dataset. This type of data is produced by weather models that reconstruct the past in a manner consistent with available historical data, and consistent with the physical laws governing the earth’s atmosphere and oceans. The historical data fed into the model(s) can come from many sources, including weather stations, ships, planes, weather balloons, satellites, and radar. In contrast to station data which has an uneven distribution across space and time, reanalysis data is spatially and temporally complete. Reanalysis data is particularly helpful when analyzing historical trends in a region that has a low density of weather stations. ERA5 data extends from 1950 to the present in hourly time steps and provides coverage of the entire earth using a 0.25-degree geographic grid (i.e. data points are available every 0.25 degrees of latitude and longitude). The ERA5 dataset is updated daily, with a latency of only five days, and provides a comprehensive set of weather metrics including air temperature, air pressure, sea surface temperature, wind speed, total precipitation, and various measures of soil moisture.

Topic #6: Introduction to the IBTrACS Dataset

IBTrACS stands for “International Best Track Archive for Climate Stewardship”. This dataset contains worldwide data on the paths and intensities of tropical cyclones that occurred between 1842 and the present.

Topic #7: Introduction to NOAA’s Storm Events Database

The Storm Events Database, compiled by the National Oceanic and Atmospheric Administration (NOAA), contains data for severe weather events in the United States having sufficient intensity to cause loss of life, injuries, significant property damage, and/or disruption to commerce. This database covers the period from 1950 to the present, and provides loss estimates in addition to data describing each severe weather event.

Topic #8: CMIP6 Climate Projections

A climate projection provides a sense of how the distribution of weather events may gradually change in the decades ahead. Perhaps the most well-known climate projections are those that feed into the climate assessment reports issued by the Intergovernmental Panel on Climate Change (IPCC). The IPCC issued its first climate assessment report in 1990. Since that time, an updated report has been released roughly every 5 years. The most recent report – the sixth in the series – was published in 2021. Because the modeling of climate change is complex, and because different models produce different results, the IPCC uses an “ensemble” of models to produce projections. For the latest IPCC report, the ensemble incudes over 100 models from more than 50 modeling centers. This ensemble is referred to as “CMIP6” which stands for “Coupled Model Intercomparison Project, Phase 6” (corresponding to the sixth report in the series). By running standardized greenhouse gas scenarios through each model in the ensemble, a range of forecasts is produced. The average results across all models serve as a best estimate, while the distribution of results provides some sense of the level of uncertainty.

Topics #9 and beyond: to be determined

Computer Programs for Tabulating Weather Datasets

We have not yet released any computer programs. Soon, we expect to release programs for tabulating both the GHCN dataset and the ERA5 dataset.

Questions or Comments?

Give us your feedback! Take a short survey on this report. Take Survey

If you have comments or questions, please send an email to research@soa.org.