Posts

Showing posts from March, 2021

Role of the discipline of Computer Science in Climate Modeling

Image
Goals of this post 1. Understand why improving climate models matters for addressing climate change 2. Understand the role of computer science in the field of climate modeling What are climate models? Scientific models are representations of processes and systems; they can be mathematical, conceptual, or physical. They are useful in formulating and testing numerous hypotheses about the world and creating projections about the future state of the world. Numerical climate models are subsets of scientific models that mathematically describe the Earth's climate systems/processes, such as large-scale precipitation, the carbon cycle, atmospheric chemistry/aerosols, etc. They involve equations for climate variables (e.g. temperature, winds, ocean current..) that can be solved numerically on computers. Climate models range in complexity - from simple, idealized box models with mixing assumptions to complex ones such as global circulation models (GCMs) that simulate the Earth's atmosphe

Benchmarks: Dask Distributed vs. Ray for Dask Workloads

Image
  Dask Workloads  In this section, I detail the dask workloads that will be used to benchmark two cluster backends - Ray and Dask Distributed. In summary, my dask workload involves processing ~3.3 Petabytes of ~6.06 million input files with dask-based Processing Chain and producing ~14.88 Terabytes of ~1.86 million output files. Input Data ~6.06 million ~550MB files, totaling ~3.3 Petabytes  Each file represents a N-dimensional numerical tensor (np.ndarray).  Processing Chain: How we go from Input data to Output data (Dask Task Graphs) We use  Xarray  / Dask APIs for  lazy , numerical computations on out-of-core multidimensional tensors. Per 3 input files, we generate 1 output file using below  Dask Task Graph  (auto generated by using Dask APIs) The Task Graph is consisted of small, Dask-level 'tasks'; tasks can be grouped into 3 actions:   Load Download file from S3 and load each file into 2D  np.ndarray  of type np.float32 Convert  np.ndarray  into  dask.array  of chunksize