Role of the discipline of Computer Science in Climate Modeling

Goals of this post

1. Understand why improving climate models matters for addressing climate change
2. Understand the role of computer science in the field of climate modeling

What are climate models?

Scientific models are representations of processes and systems; they can be mathematical, conceptual, or physical. They are useful in formulating and testing numerous hypotheses about the world and creating projections about the future state of the world. Numerical climate models are subsets of scientific models that mathematically describe the Earth's climate systems/processes, such as large-scale precipitation, the carbon cycle, atmospheric chemistry/aerosols, etc. They involve equations for climate variables (e.g. temperature, winds, ocean current..) that can be solved numerically on computers. Climate models range in complexity - from simple, idealized box models with mixing assumptions to complex ones such as global circulation models (GCMs) that simulate the Earth's atmosphere and oceans. 

Why does advancing climate modeling matter for addressing climate change?

Climate models predict how the climate will change over time. Currently, the uncertainties in the climate model predictions remain large. For example, in the most recent International Panel on Climate Change (IPCC) assessment reports, various RCP scenarios from below 1°C to above 6°C are considered which represent vastly different scenarios. Improving the climate models to have higher confidence in climate predictions means that the scientists can inform policymakers to develop more appropriate climate change adaptation and mitigation strategies. Climate change mitigation is key to reducing or preventing greenhouse gas emissions so that we can collectively minimize global warming. 

Why is the uncertainty high in climate projections?

A reliable source for information on the scientific and technical state of climate change is the aforementioned IPCC, an intergovernmental body of the United Nations that regularly assesses the science and impact of climate change to inform policymakers. The IPCC reports are contributed to by hundreds of scientists assessing thousands of scientific papers, hence are regarded as well-reviewed, comprehensive, and reliable.

In the latest cycle of IPCC in 2014, there were three Working Group (WG) reports. WG1 focuses on physical science, WG2 focuses on impacts/adaptation, and WG3 focuses on climate change mitigation. WG1 report's Chapter 9 "Evaluation of Climate Models" is where I went to understand the current state of climate models in terms of their confidences in climate projections. Overall, the chapter states that the scientists have improved the models to include more climate variables/processes, coordinate models via CMIP (Coupled Model Intercomparison Project), and develop standardized metrics. The large spread in climate projections remains, however, mainly due to the inability to adequately model small-scale processes and smaller time steps. More specifically, the notable points are: 

1. "The simulations of large-scale patterns of precipitation have improved somewhat since the AR4, although models continue to perform less well for precipitation than for surface temperature. "

2. "The simulations of clouds in climate models remain challenging; there's very high confidence that uncertainties in cloud processes explain much of the spread in modeled climate sensitivity."

3. "The simulation of the tropical Atlantic remains deficient with many models unable to reproduce the basic east-west temperature gradient."

4. "There's a tendency of models to slightly overestimate sea ice extend in the Arctic (by about 10%) in winter and spring/"

5. "In these models (Earth System Models), regional patterns of carbon update and release are less well-produced (than global), especially for NH land where models systematically underestimate the sink implied by atmospheric inversion techniques. The ability to simulate carbon fluxes is important for estimating 'compatible emissions'. "

6. "Majority of ESMs include aerosols, but uncertainties in sulfur cycle processes and natural sources and sinks remain."

What roles does the discipline of computer science play in improving climate modeling?

1. Advancing high-performance computing 

Consider the current state-of-the-art and also the most complex models - Earth System Models (ESMs). The ESMs simulate the atmosphere, oceans, land surfaces, ice, and snow by first dividing the Earth into a 3-dimensional grid system. For each cell in the grid, there are physics equations and parameterizations (formulas based on empirical evidence) which govern how physical quantities for that cell are calculated. Every cell also depends on the values of neighboring cells, so the calculation goes as follows: For every time step, solve the values of physical quantities at every cell. Then, march forward one-time step & calculate the values again. Do this iteratively for a given time horizon. 

General circulation model - Wikipedia

Now, imagine millions of cells over a long time horizon (millions of time steps), with multiple physical equations and parameterizations. This involves a lot of calculations and therefore becomes computationally very expensive.

When we start reducing the temporal resolution (length of timestep; e.g. 3 hours) and spatial resolution (resolution of cells; e.g. 5km) of the models, they become generally more mathematically accurate at the cost of higher computational cost. At some point, computational cost can become prohibitive, forcing the modelers to impose an upper bound on the model resolution, number of processes included in the model, and number of simulation runs for a given model (to account for initial conditions). 

Advances in high-performance computing have aided in raising the upper bound higher, enabling higher-resolution models and simulation of complex processes. Just about ten years ago, supercomputers were at the Teraflop scale; now, it's at the Petaflop scale (1000x). Although the rate at which supercomputers are becoming more powerful has been slowing down[1], we are projected to reach the Exaflope scale in the new few years. 

What exascale supercomputing means that kilometer-resolution becomes tractable (Currently, climate models are at >10Km, depending on the complexity of models). Kilometre-resolution would enable, most notably, modeling of convective clouds and gravity wave drags which are critical sources of uncertainties in climate change projections [2]. Other processes that become possible to model are individual thunderstorms, the evolution of fine-scale disturbances on the tropopause, and surface-atmosphere interactions [3]. I am excited to be reading about the effects of kilometer-resolution modeling in the 6th or 7th IPCC reports.


2. Improving climate model implementations

As the hardware moves to exascale, the software must also evolve to take advantage of the massive parallelism available. Currently, a given climate model can be a million lines of code [4], reaching only about 5% efficiency on supercomputers due to coding inefficiencies [5]. This means that addressing these inefficiencies could result in up to 20x speedup! 

Where do the coding inefficiencies come from? The principal inefficiency comes from the incomplete parallelization of the code which becomes a moving target with evolving hardware architectures. Some of the newer hardware architectures involve Vector Parallel Processors (VPPs) and architectures based on GPUs [6]. The new hardware architectures come from the necessity of coping with the demise of Dennard scaling. The Dennard scaling had allowed the chipmakers to increase clock frequency without increasing power consumption. Without the clock speed-ups that Dennard scaling guaranteed, and with the projected loss of Moore's law in the new few years as well, chipmakers have had to turn to hardware innovations.

The new hardware architectures, however, imply that a developer/modeler has to have the full knowledge of how the software stack works with the hardware. For example, with GPU hardware architectures, the modeler is faced with having to use extensions to a programming language such as OpenACC compiler directives, or entirely new programming languages such as CUDA [2]. 

Computer scientists are already working in this area to port over the climate models to be compatible with the new novel hardware architectures [7]. For example, there was an early effort to translate Fortran to CUDA to make the Non-hydrostatic Icosahedral Model (NIM) of NOAA [7]. Perhaps more scalable than doing brute force source code translation is to develop general portability solutions. One idea is to separate architecture-dependent details from the source code using domain-specific languages (DSLs) [2]. This way, after the one-time effort of developing the DSLs, they can be reused across many models and the DSL compiler can be responsible for optimized, parallel code for a specific hardware architecture. Another idea is for computer scientists to exercise general software engineering principles in breaking up the monolithic climate model code into modular components so that well-designed APIs can be adapted to different hardware architectures, alternative programming models, and alternative algorithms [7]. Lastly, computer scientists can consult research scientists in choosing the most computationally optimal numerical methods, as this can depend on hardware architectures [2].

3. Building software infrastructure for Exascale data access

The data output volume is already overwhelming the climate modeling community. For example, performing all simulations considered for phase 6 of the CMIP6 amounts to about 0.8PB output for each of the 100 participating models [2]. As we push the model resolutions to be finer with exascale computing, these data output volumes would multiply as well. Storing this data over a long time horizon would become very expensive or practically impossible. For example, storing 100PBs of data in S3 costs $252K every year. 

Computer scientists can come into play in helping the climate science community to deal with the ever-increasing 'data avalanche'. One approach is developing virtualization frameworks for enabling online analysis. For example, the SimFS framework ( exposes only a view of the data; it monitors the access patterns of the analysis applications to decide which to keep and which to re-stimulate on demand, a method for trading computation and storage [9]. Another approach is to reduce the data size such as data compression and reducing precision without losing climate prediction accuracies. Yet another approach is to perform analysis on the server to store the minimal amount of data and to avoid expensive data communication. These server-side analytics capabilities require computer science skillsets to enable, as it can take scalable data management, cloud computing, and domain-specific APIs [10]. 

4. Applying machine learning for climate predictions 

Machine learning broadly describes a set of techniques that find patterns in large amounts of data and make predictions based on the found patterns. As we established previously, with climate modeling, we have a lot of data and we are interested in predicting long-term climate and short-term weather events. This makes for a perfect candidate for integrating machine learning techniques. 

A well-explored approach for using machine learning for improving climate modeling has been in small-scale parameterizations, which are difficult for global climate models to represent accurately. Machine learning models can be trained to learn the patterns from the simulation runs of high-resolution models. Then, the learned machine learning models can be incorporated into the global climate model to enhance it for the small-scale parameterization prediction capability. An example of this is 'Cloud Brain', where deep learning models are trained with data from short-term runs of fine-scale cloud models, and then are integrated into the global climate model to predict cloud behaviors on the global scale [11]. This approach has remaining challenges, which are to make the machine learning models be more generalizable across different climate models and to optimize the model training costs. 

For short-term weather prediction (1-2 days ahead), machine learning has the potential to replace the existing techniques. This is because, in these timescales, physical constraints such as conservation laws can be ignored as the errors do not accumulate significantly over a short time. Machine learning algorithms could learn from Internet-of-Things (IoT) data which is becoming increasingly available, or radar and satellite data. In one example, a neural network was able to predict participation up to 8 hours into the future at a spatial resolution of 1Km^2 and a temporal resolution of 2 minutes, outperforming existing methods such as Numerical Weather Prediction (NWP).

Wrapping up..

In this post, I sought to understand a few ways that computer science can help with climate modeling. I realized there were multiple layers to peel off until I could get there. First, I defined what exactly are the climate models. Second, I outlined why climate modeling matters in adapting/mitigating climate change which is what grabbed my attention with this topic initially. Then, I outlined the current state of climate modeling and the nature of uncertainties in the climate models. After those steps, I explored a few potential ways computer science can help with climate modeling.

Ultimately, climate models are a million lines of scientific code that intakes numerical data and outputs predictions at the scale of Peta/Exascale. High-performance computing, improving implementation efficiency, improving the code to deal with different supercomputing hardware architectures, building tools for the scientific community to access Exascales of data, building server-side data analytics, and exploring machine learning approaches - seem to be only the beginning in how computer science will continue to co-evolve with climate modeling/climate science.



Popular posts from this blog

Benchmarks: Dask Distributed vs. Ray for Dask Workloads

Parallel processing libraries in Python - Dask vs Ray

2020 Climbing Season - Day Crag Photo Compilations & My Route Pyramid