Search code, repositories, users, issues, pull requests...

Provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

  • Notifications

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement . We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assigning values to a subset of a dataset #3015

@AdrianSosic

AdrianSosic commented Jun 11, 2019

  • 👍 1 reaction

@shoyer

shoyer commented Jun 12, 2019

Sorry, something went wrong.

AdrianSosic commented Jun 12, 2019

@matzegoebel

Successfully merging a pull request may close this issue.

@shoyer

Dask Examples documentation

Xarray with dask arrays.

Live Notebook

Xarray with Dask Arrays ¶

Xarray Dataset

Xarray is an open source project and Python package that extends the labeled data functionality of Pandas to N-dimensional array-like datasets. It shares a similar API to NumPy and Pandas and supports both Dask and NumPy arrays under the hood.

Start Dask Client for Dashboard ¶

Starting the Dask Client is optional. It will provide a dashboard which is useful to gain insight on the computation.

The link to the dashboard will become visible when you create the client below. We recommend having it open on one side of your screen while using your notebook on the other side. This can take some effort to arrange your windows, but seeing them both at the same is very useful when learning.

Client-4eea680b-0de0-11ed-9d1a-000d3a8f7959

Cluster Info

Localcluster, scheduler info.

Scheduler-f26c7784-2ac7-471c-91ac-1b0f9e3135b1

Open a sample dataset ¶

We will use some of xarray’s tutorial data for this example. By specifying the chunk shape, xarray will automatically create Dask arrays for each data variable in the Dataset . In xarray, Datasets are dict-like container of labeled arrays, analogous to the pandas.DataFrame . Note that we’re taking advantage of xarray’s dimension labels when specifying chunk shapes.

  • time : 2920
  • lat (lat) float32 75.0 72.5 70.0 ... 20.0 17.5 15.0 standard_name : latitude long_name : Latitude units : degrees_north axis : Y array([75. , 72.5, 70. , 67.5, 65. , 62.5, 60. , 57.5, 55. , 52.5, 50. , 47.5, 45. , 42.5, 40. , 37.5, 35. , 32.5, 30. , 27.5, 25. , 22.5, 20. , 17.5, 15. ], dtype=float32)
  • lon (lon) float32 200.0 202.5 205.0 ... 327.5 330.0 standard_name : longitude long_name : Longitude units : degrees_east axis : X array([200. , 202.5, 205. , 207.5, 210. , 212.5, 215. , 217.5, 220. , 222.5, 225. , 227.5, 230. , 232.5, 235. , 237.5, 240. , 242.5, 245. , 247.5, 250. , 252.5, 255. , 257.5, 260. , 262.5, 265. , 267.5, 270. , 272.5, 275. , 277.5, 280. , 282.5, 285. , 287.5, 290. , 292.5, 295. , 297.5, 300. , 302.5, 305. , 307.5, 310. , 312.5, 315. , 317.5, 320. , 322.5, 325. , 327.5, 330. ], dtype=float32)
  • time (time) datetime64[ns] 2013-01-01 ... 2014-12-31T18:00:00 standard_name : time long_name : Time array(['2013-01-01T00:00:00.000000000', '2013-01-01T06:00:00.000000000', '2013-01-01T12:00:00.000000000', ..., '2014-12-31T06:00:00.000000000', '2014-12-31T12:00:00.000000000', '2014-12-31T18:00:00.000000000'], dtype='datetime64[ns]')
  • Attributes: (5) Conventions : COARDS title : 4x daily NMC reanalysis (1948) description : Data is from NMC initialized reanalysis (4x/day). These are the 0.9950 sigma level values. platform : Model references : http://www.esrl.noaa.gov/psd/data/gridded/data.ncep.reanalysis.html

Quickly inspecting the Dataset above, we’ll note that this Dataset has three dimensions akin to axes in NumPy ( lat , lon , and time ), three coordinate variables akin to pandas.Index objects (also named lat , lon , and time ), and one data variable ( air ). Xarray also holds Dataset specific metadata as attributes .

  • Attributes: (11) long_name : 4xDaily Air temperature at sigma level 995 units : degK precision : 2 GRIB_id : 11 GRIB_name : TMP var_desc : Air temperature dataset : NMC Reanalysis level_desc : Surface statistic : Individual Obs parent_stat : Other actual_range : [185.16 322.1 ]

Each data variable in xarray is called a DataArray . These are the fundamental labeled array objects in xarray. Much like the Dataset , DataArrays also have dimensions and coordinates that support many of its label-based opperations.

Accessing the underlying array of data is done via the data property. Here we can see that we have a Dask array. If this array were to be backed by a NumPy array, this property would point to the actual values in the array.

Use Standard Xarray Operations ¶

In almost all cases, operations using xarray objects are identical, regardless if the underlying data is stored as a Dask array or a NumPy array.

  • month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12 array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
  • Attributes: (0)

Call .compute() or .load() when you want your result as a xarray.DataArray with data stored as NumPy arrays.

If you started Client() above then you may want to watch the status page during computation.

  • -5.15 -5.477 -9.832 -16.44 -26.68 ... -4.14 -4.072 -3.129 -1.848 array([[[[-5.14987183e+00, -5.47715759e+00, -9.83168030e+00, ..., -2.06136017e+01, -1.25448456e+01, -6.77099609e+00], [-3.88607788e+00, -3.90576172e+00, -8.17987061e+00, ..., -1.87125549e+01, -1.11448669e+01, -5.52117920e+00], [-2.71517944e+00, -2.44839478e+00, -6.68945312e+00, ..., -1.70036011e+01, -9.99716187e+00, -4.41302490e+00], ..., [-1.02611389e+01, -9.05839539e+00, -9.39399719e+00, ..., -1.53933716e+01, -1.01606750e+01, -6.97190857e+00], [-8.58795166e+00, -7.50210571e+00, -7.61483765e+00, ..., -1.35699463e+01, -8.43449402e+00, -5.52383423e+00], [-7.04670715e+00, -5.84384155e+00, -5.70956421e+00, ..., -1.18162537e+01, -6.54209900e+00, -4.02824402e+00]], [[-5.05761719e+00, -4.00010681e+00, -9.17195129e+00, ..., -2.52222595e+01, -1.53296814e+01, -5.93362427e+00], [-4.40733337e+00, -3.25991821e+00, -8.36616516e+00, ..., -2.44294434e+01, -1.41292725e+01, -5.66036987e+00], [-4.01040649e+00, -2.77757263e+00, -7.87347412e+00, ..., -2.40147858e+01, -1.34914398e+01, -5.78581238e+00], ... -3.56890869e+00, -2.47412109e+00, -1.16558838e+00], [ 6.08795166e-01, 1.47219849e+00, 1.11965942e+00, ..., -3.59872437e+00, -2.50396729e+00, -1.15667725e+00], [ 6.59942627e-01, 1.48742676e+00, 1.03787231e+00, ..., -3.84628296e+00, -2.71829224e+00, -1.33132935e+00]], [[ 5.35827637e-01, 4.01092529e-01, 3.08258057e-01, ..., -1.68054199e+00, -1.12142944e+00, -1.90887451e-01], [ 8.51684570e-01, 8.73504639e-01, 6.26892090e-01, ..., -1.33462524e+00, -7.66601562e-01, 1.03210449e-01], [ 1.04107666e+00, 1.23202515e+00, 8.63311768e-01, ..., -1.06607056e+00, -5.31036377e-01, 3.14453125e-01], ..., [ 4.72015381e-01, 1.32940674e+00, 1.15509033e+00, ..., -3.23403931e+00, -2.23956299e+00, -1.11035156e+00], [ 4.14459229e-01, 1.23419189e+00, 1.07876587e+00, ..., -3.47311401e+00, -2.56188965e+00, -1.37548828e+00], [ 5.35278320e-02, 8.10333252e-01, 6.73461914e-01, ..., -4.07232666e+00, -3.12890625e+00, -1.84762573e+00]]]], dtype=float32)

Persist data in memory ¶

If you have the available RAM for your dataset then you can persist data in memory.

This allows future computations to be much faster.

Time Series Operations ¶

Because we have a datetime index time-series operations work efficiently. Here we demo the use of xarray’s resample method:

_images/xarray_20_1.png

and rolling window operations:

Since xarray stores each of its coordinate variables in memory, slicing by label is trivial and entirely lazy.

  • time () datetime64[ns] 2013-01-01T18:00:00 standard_name : time long_name : Time array('2013-01-01T18:00:00.000000000', dtype='datetime64[ns]')
  • 241.9 241.8 241.8 242.1 242.6 243.3 ... 298.6 298.2 297.8 298.0 297.9 array([[241.89 , 241.79999, 241.79999, ..., 234.39 , 235.5 , 237.59999], [246.29999, 245.29999, 244.2 , ..., 230.89 , 231.5 , 234.5 ], [256.6 , 254.7 , 252.09999, ..., 230.7 , 231.79999, 236.09999], ..., [296.6 , 296.4 , 296. , ..., 296.5 , 295.79 , 295.29 ], [297. , 297.5 , 297.1 , ..., 296.79 , 296.6 , 296.29 ], [297.5 , 297.69998, 297.5 , ..., 297.79 , 298. , 297.9 ]], dtype=float32)

Custom workflows and automatic parallelization ¶

Almost all of xarray’s built-in operations work on Dask arrays. If you want to use a function that isn’t wrapped by xarray, one option is to extract Dask arrays from xarray objects (.data) and use Dask directly.

Another option is to use xarray’s apply_ufunc() function, which can automate embarrassingly parallel “map” type operations where a function written for processing NumPy arrays should be repeatedly applied to xarray objects containing Dask arrays. It works similarly to dask.array.map_blocks() and dask.array.blockwise() , but without requiring an intermediate layer of abstraction.

Here we show an example using NumPy operations and a fast function from bottleneck , which we use to calculate Spearman’s rank-correlation coefficient:

In the examples above, we were working with an some air temperature data. For this example, we’ll calculate the spearman correlation using the raw air temperature data with the smoothed version that we also created ( da_smooth ). For this, we’ll also have to rechunk the data ahead of time.

  • lat (lat) float32 75.0 72.5 70.0 ... 20.0 17.5 15.0 array([75. , 72.5, 70. , 67.5, 65. , 62.5, 60. , 57.5, 55. , 52.5, 50. , 47.5, 45. , 42.5, 40. , 37.5, 35. , 32.5, 30. , 27.5, 25. , 22.5, 20. , 17.5, 15. ], dtype=float32)
  • lon (lon) float32 200.0 202.5 205.0 ... 327.5 330.0 array([200. , 202.5, 205. , 207.5, 210. , 212.5, 215. , 217.5, 220. , 222.5, 225. , 227.5, 230. , 232.5, 235. , 237.5, 240. , 242.5, 245. , 247.5, 250. , 252.5, 255. , 257.5, 260. , 262.5, 265. , 267.5, 270. , 272.5, 275. , 277.5, 280. , 282.5, 285. , 287.5, 290. , 292.5, 295. , 297.5, 300. , 302.5, 305. , 307.5, 310. , 312.5, 315. , 317.5, 320. , 322.5, 325. , 327.5, 330. ], dtype=float32)

_images/xarray_31_1.png

Operating on Dask Dataframes with SQL

Resilience against hardware failures

Computations and Masks with Xarray

In this tutorial, we will cover the following topics:

Performing basic arithmetic on DataArrays and Datasets

Performing aggregation (i.e., reduction) along single or multiple dimensions of a DataArray or Dataset

Computing climatologies and anomalies of data using Xarray’s “split-apply-combine” approach, via the .groupby() method

Performing weighted-reduction operations along single or multiple dimensions of a DataArray or Dataset

Providing a broad overview of Xarray’s data-masking capability

Using the .where() method to mask Xarray data

Prerequisites

Time to learn : 60 minutes

In order to work with data and plotting, we must import NumPy, Matplotlib, and Xarray. These packages are covered in greater detail in earlier tutorials. We also import a package that allows quick download of Pythia example datasets.

The bulk of the examples in this tutorial make use of a single dataset. This dataset contains monthly sea surface temperature (SST, call ‘tos’ here) data, and is obtained from the Community Earth System Model v2 (CESM2). (For this tutorial, however, the dataset will be retrieved from the Pythia example data repository.) The following example illustrates the process of retrieving this Global Climate Model dataset:

  • time (time) object 2000-01-15 12:00:00 ... 2014-12-... axis : T bounds : time_bnds standard_name : time title : time type : double array([cftime.DatetimeNoLeap(2000, 1, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2000, 2, 14, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2000, 3, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2000, 4, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2000, 5, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2000, 6, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2000, 7, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2000, 8, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2000, 9, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2000, 10, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2000, 11, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2000, 12, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2001, 1, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2001, 2, 14, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2001, 3, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2001, 4, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2001, 5, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2001, 6, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2001, 7, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2001, 8, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2001, 9, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2001, 10, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2001, 11, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2001, 12, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2002, 1, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2002, 2, 14, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2002, 3, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2002, 4, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2002, 5, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2002, 6, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2002, 7, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2002, 8, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2002, 9, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2002, 10, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2002, 11, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2002, 12, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2003, 1, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2003, 2, 14, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2003, 3, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2003, 4, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2003, 5, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2003, 6, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2003, 7, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2003, 8, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2003, 9, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2003, 10, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2003, 11, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2003, 12, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2004, 1, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2004, 2, 14, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2004, 3, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2004, 4, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2004, 5, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2004, 6, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2004, 7, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2004, 8, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2004, 9, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2004, 10, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2004, 11, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2004, 12, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2005, 1, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2005, 2, 14, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2005, 3, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2005, 4, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2005, 5, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2005, 6, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2005, 7, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2005, 8, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2005, 9, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2005, 10, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2005, 11, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2005, 12, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2006, 1, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2006, 2, 14, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2006, 3, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2006, 4, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2006, 5, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2006, 6, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2006, 7, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2006, 8, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2006, 9, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2006, 10, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2006, 11, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2006, 12, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2007, 1, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2007, 2, 14, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2007, 3, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2007, 4, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2007, 5, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2007, 6, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2007, 7, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2007, 8, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2007, 9, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2007, 10, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2007, 11, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2007, 12, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2008, 1, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2008, 2, 14, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2008, 3, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2008, 4, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2008, 5, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2008, 6, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2008, 7, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2008, 8, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2008, 9, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2008, 10, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2008, 11, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2008, 12, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2009, 1, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2009, 2, 14, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2009, 3, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2009, 4, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2009, 5, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2009, 6, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2009, 7, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2009, 8, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2009, 9, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2009, 10, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2009, 11, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2009, 12, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2010, 1, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2010, 2, 14, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2010, 3, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2010, 4, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2010, 5, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2010, 6, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2010, 7, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2010, 8, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2010, 9, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2010, 10, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2010, 11, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2010, 12, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2011, 1, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2011, 2, 14, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2011, 3, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2011, 4, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2011, 5, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2011, 6, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2011, 7, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2011, 8, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2011, 9, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2011, 10, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2011, 11, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2011, 12, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2012, 1, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2012, 2, 14, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2012, 3, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2012, 4, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2012, 5, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2012, 6, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2012, 7, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2012, 8, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2012, 9, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2012, 10, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2012, 11, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2012, 12, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2013, 1, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2013, 2, 14, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2013, 3, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2013, 4, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2013, 5, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2013, 6, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2013, 7, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2013, 8, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2013, 9, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2013, 10, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2013, 11, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2013, 12, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2014, 1, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2014, 2, 14, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2014, 3, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2014, 4, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2014, 5, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2014, 6, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2014, 7, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2014, 8, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2014, 9, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2014, 10, 15, 12, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2014, 11, 15, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2014, 12, 15, 12, 0, 0, 0, has_year_zero=True)], dtype=object)
  • lat (lat) float64 -89.5 -88.5 -87.5 ... 88.5 89.5 axis : Y bounds : lat_bnds long_name : latitude standard_name : latitude units : degrees_north array([-89.5, -88.5, -87.5, -86.5, -85.5, -84.5, -83.5, -82.5, -81.5, -80.5, -79.5, -78.5, -77.5, -76.5, -75.5, -74.5, -73.5, -72.5, -71.5, -70.5, -69.5, -68.5, -67.5, -66.5, -65.5, -64.5, -63.5, -62.5, -61.5, -60.5, -59.5, -58.5, -57.5, -56.5, -55.5, -54.5, -53.5, -52.5, -51.5, -50.5, -49.5, -48.5, -47.5, -46.5, -45.5, -44.5, -43.5, -42.5, -41.5, -40.5, -39.5, -38.5, -37.5, -36.5, -35.5, -34.5, -33.5, -32.5, -31.5, -30.5, -29.5, -28.5, -27.5, -26.5, -25.5, -24.5, -23.5, -22.5, -21.5, -20.5, -19.5, -18.5, -17.5, -16.5, -15.5, -14.5, -13.5, -12.5, -11.5, -10.5, -9.5, -8.5, -7.5, -6.5, -5.5, -4.5, -3.5, -2.5, -1.5, -0.5, 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5, 19.5, 20.5, 21.5, 22.5, 23.5, 24.5, 25.5, 26.5, 27.5, 28.5, 29.5, 30.5, 31.5, 32.5, 33.5, 34.5, 35.5, 36.5, 37.5, 38.5, 39.5, 40.5, 41.5, 42.5, 43.5, 44.5, 45.5, 46.5, 47.5, 48.5, 49.5, 50.5, 51.5, 52.5, 53.5, 54.5, 55.5, 56.5, 57.5, 58.5, 59.5, 60.5, 61.5, 62.5, 63.5, 64.5, 65.5, 66.5, 67.5, 68.5, 69.5, 70.5, 71.5, 72.5, 73.5, 74.5, 75.5, 76.5, 77.5, 78.5, 79.5, 80.5, 81.5, 82.5, 83.5, 84.5, 85.5, 86.5, 87.5, 88.5, 89.5])
  • lon (lon) float64 0.5 1.5 2.5 ... 357.5 358.5 359.5 axis : X bounds : lon_bnds long_name : longitude standard_name : longitude units : degrees_east array([ 0.5, 1.5, 2.5, ..., 357.5, 358.5, 359.5])
  • time_bnds (time, d2) object ... [360 values with dtype=object]
  • lat_bnds (lat, d2) float64 ... long_name : latitude bounds units : degrees_north [360 values with dtype=float64]
  • lon_bnds (lon, d2) float64 ... long_name : longitude bounds units : degrees_east [720 values with dtype=float64]
  • tos (time, lat, lon) float32 ... cell_measures : area: areacello cell_methods : area: mean where sea time: mean comment : Model data on the 1x1 grid includes values in all cells for which ocean cells on the native grid cover more than 52.5 percent of the 1x1 grid cell. This 52.5 percent cutoff was chosen to produce ocean surface area on the 1x1 grid as close as possible to ocean surface area on the native grid, while not introducing fractional cell coverage. description : This may differ from "surface temperature" in regions of sea ice or floating ice shelves. For models using conservative temperature as the prognostic field, they should report the top ocean layer as surface potential temperature, which is the same as surface in situ temperature. frequency : mon id : tos long_name : Sea Surface Temperature mipTable : Omon out_name : tos prov : Omon ((isd.003)) realm : ocean standard_name : sea_surface_temperature time : time time_label : time-mean time_title : Temporal mean title : Sea Surface Temperature type : real units : degC variable_id : tos [11664000 values with dtype=float32]
  • time PandasIndex PandasIndex(CFTimeIndex([2000-01-15 12:00:00, 2000-02-14 00:00:00, 2000-03-15 12:00:00, 2000-04-15 00:00:00, 2000-05-15 12:00:00, 2000-06-15 00:00:00, 2000-07-15 12:00:00, 2000-08-15 12:00:00, 2000-09-15 00:00:00, 2000-10-15 12:00:00, ... 2014-03-15 12:00:00, 2014-04-15 00:00:00, 2014-05-15 12:00:00, 2014-06-15 00:00:00, 2014-07-15 12:00:00, 2014-08-15 12:00:00, 2014-09-15 00:00:00, 2014-10-15 12:00:00, 2014-11-15 00:00:00, 2014-12-15 12:00:00], dtype='object', length=180, calendar='noleap', freq='None'))
  • lat PandasIndex PandasIndex(Index([-89.5, -88.5, -87.5, -86.5, -85.5, -84.5, -83.5, -82.5, -81.5, -80.5, ... 80.5, 81.5, 82.5, 83.5, 84.5, 85.5, 86.5, 87.5, 88.5, 89.5], dtype='float64', name='lat', length=180))
  • lon PandasIndex PandasIndex(Index([ 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, ... 350.5, 351.5, 352.5, 353.5, 354.5, 355.5, 356.5, 357.5, 358.5, 359.5], dtype='float64', name='lon', length=360))
  • Attributes: (45) Conventions : CF-1.7 CMIP-6.2 activity_id : CMIP branch_method : standard branch_time_in_child : 674885.0 branch_time_in_parent : 219000.0 case_id : 972 cesm_casename : b.e21.BHIST.f09_g17.CMIP6-historical.011 contact : [email protected] creation_date : 2019-04-02T04:44:58Z data_specs_version : 01.00.29 experiment : Simulation of recent past (1850 to 2014). Impose changing conditions (consistent with observations). Should be initialised from a point early enough in the pre-industrial control run to ensure that the end of all the perturbed runs branching from the end of this historical run end before the end of the control. Only one ensemble member is requested but modelling groups are strongly encouraged to submit at least three ensemble members of their CMIP historical simulation. experiment_id : historical external_variables : areacello forcing_index : 1 frequency : mon further_info_url : https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2.historical.none.r11i1p1f1 grid : ocean data regridded from native gx1v7 displaced pole grid (384x320 latxlon) to 180x360 latxlon using conservative regridding grid_label : gr initialization_index : 1 institution : National Center for Atmospheric Research, Climate and Global Dynamics Laboratory, 1850 Table Mesa Drive, Boulder, CO 80305, USA institution_id : NCAR license : CMIP6 model data produced by <The National Center for Atmospheric Research> is licensed under a Creative Commons Attribution-[]ShareAlike 4.0 International License (https://creativecommons.org/licenses/). Consult https://pcmdi.llnl.gov/CMIP6/TermsOfUse for terms of use governing CMIP6 output, including citation requirements and proper acknowledgment. Further information about this data, including some limitations, can be found via the further_info_url (recorded as a global attribute in this file)[]. The data producers and data providers make no warranty, either express or implied, including, but not limited to, warranties of merchantability and fitness for a particular purpose. All liabilities arising from the supply of the information (including any liability arising in negligence) are excluded to the fullest extent permitted by law. mip_era : CMIP6 model_doi_url : https://doi.org/10.5065/D67H1H0V nominal_resolution : 1x1 degree parent_activity_id : CMIP parent_experiment_id : piControl parent_mip_era : CMIP6 parent_source_id : CESM2 parent_time_units : days since 0001-01-01 00:00:00 parent_variant_label : r1i1p1f1 physics_index : 1 product : model-output realization_index : 11 realm : ocean source : CESM2 (2017): atmosphere: CAM6 (0.9x1.25 finite volume grid; 288 x 192 longitude/latitude; 32 levels; top level 2.25 mb); ocean: POP2 (320x384 longitude/latitude; 60 levels; top grid cell 0-10 m); sea_ice: CICE5.1 (same grid as ocean); land: CLM5 0.9x1.25 finite volume grid; 288 x 192 longitude/latitude; 32 levels; top level 2.25 mb); aerosol: MAM4 (0.9x1.25 finite volume grid; 288 x 192 longitude/latitude; 32 levels; top level 2.25 mb); atmoschem: MAM4 (0.9x1.25 finite volume grid; 288 x 192 longitude/latitude; 32 levels; top level 2.25 mb); landIce: CISM2.1; ocnBgchem: MARBL (320x384 longitude/latitude; 60 levels; top grid cell 0-10 m) source_id : CESM2 source_type : AOGCM BGC sub_experiment : none sub_experiment_id : none table_id : Omon tracking_id : hdl:21.14100/2975ffd3-1d7b-47e3-961a-33f212ea4eb2 variable_id : tos variant_info : CMIP6 20th century experiments (1850-2014) with CAM6, interactive land (CLM5), coupled ocean (POP2) with biogeochemistry (MARBL), interactive sea ice (CICE5.1), and non-evolving land ice (CISM2.1) variant_label : r11i1p1f1

Arithmetic Operations

In a similar fashion to NumPy arrays, performing an arithmetic operation on a DataArray will automatically perform the operation on all array values; this is known as vectorization. To illustrate the process of vectorization, the following example converts the air temperature data from units of degrees Celsius to units of Kelvin:

  • nan nan nan nan nan nan nan ... 271.4 271.4 271.4 271.4 271.4 271.4 array([[[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [271.3552 , 271.3553 , 271.3554 , ..., 271.35495, 271.355 , 271.3551 ], [271.36005, 271.36014, 271.36023, ..., 271.35986, 271.35992, 271.36 ], [271.36447, 271.36453, 271.3646 , ..., 271.3643 , 271.36435, 271.3644 ]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ... [271.40677, 271.40674, 271.4067 , ..., 271.40695, 271.4069 , 271.40683], [271.41296, 271.41293, 271.41293, ..., 271.41306, 271.413 , 271.41296], [271.41772, 271.41772, 271.41772, ..., 271.41766, 271.4177 , 271.4177 ]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [271.39386, 271.39383, 271.3938 , ..., 271.39407, 271.394 , 271.39392], [271.39935, 271.39932, 271.39932, ..., 271.39948, 271.39944, 271.39938], [271.40372, 271.40372, 271.40375, ..., 271.4037 , 271.4037 , 271.40372]]], dtype=float32)
  • Attributes: (0)

In addition, there are many other arithmetic operations that can be performed on DataArrays . In this example, we demonstrate squaring the original Celsius values of our air temperature data:

  • nan nan nan nan nan nan nan nan ... 3.05 3.05 3.05 3.05 3.05 3.05 3.05 array([[[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [3.2213385, 3.2209656, 3.220537 , ..., 3.2221622, 3.221913 , 3.2216525], [3.203904 , 3.203617 , 3.2032912, ..., 3.2045207, 3.2043478, 3.2041442], [3.1881146, 3.1879027, 3.1876712, ..., 3.188714 , 3.1885312, 3.1883302]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ... [3.0388296, 3.0389647, 3.0390673, ..., 3.038165 , 3.0383828, 3.0386322], [3.0173173, 3.0173445, 3.0173297, ..., 3.0169601, 3.0171173, 3.0172386], [3.000791 , 3.0007784, 3.0007539, ..., 3.000933 , 3.000896 , 3.0008452]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [3.0839543, 3.0841148, 3.0842566, ..., 3.0832636, 3.0834875, 3.0837412], [3.064733 , 3.0648024, 3.0648358, ..., 3.0642793, 3.0644639, 3.0646174], [3.0494578, 3.0494475, 3.0494263, ..., 3.049596 , 3.0495603, 3.0495107]]], dtype=float32)

Aggregation Methods

A common practice in the field of data analysis is aggregation. Aggregation is the process of reducing data through methods such as sum() , mean() , median() , min() , and max() , in order to gain greater insight into the nature of large datasets. In this set of examples, we demonstrate correct usage of a select group of aggregation methods:

Compute the mean:

  • 14.25 array(14.250171, dtype=float32)
  • Coordinates: (0)
  • Indexes: (0)

Notice that we did not specify the dim keyword argument; this means that the function was applied over all of the dataset’s dimensions. In other words, the aggregation method computed the mean of every element of the temperature dataset across every temporal and spatial data point. However, if a dimension name is used with the dim keyword argument, the aggregation method computes an aggregation along the given dimension. In this next example, we use aggregation to calculate the temporal mean across all spatial data; this is performed by providing the dimension name 'time' to the dim keyword argument:

../../_images/9c569cf95fd1b2c0f4524927c4f2918114f8cedb8839d52b44a11627c70fa044.png

There are many other combinations of aggregation methods and dimensions on which to perform these methods. In this example, we compute the temporal minimum:

  • nan nan nan nan nan nan ... -1.799 -1.799 -1.799 -1.799 -1.799 -1.798 array([[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [-1.8083605, -1.8083031, -1.8082187, ..., -1.8083988, -1.8083944, -1.8083915], [-1.8025414, -1.8024837, -1.8024155, ..., -1.8026428, -1.8026177, -1.8025846], [-1.7984415, -1.7983989, -1.7983514, ..., -1.7985678, -1.7985296, -1.7984871]], dtype=float32)

This example computes the spatial sum. Note that this dataset contains no altitude data; as such, the required spatial dimensions passed to the method consist only of latitude and longitude.

  • 6.038e+05 6.077e+05 6.04e+05 ... 6.065e+05 6.042e+05 6.088e+05 array([603767. , 607702.5 , 603976.5 , 599373.56, 595119.94, 595716.75, 598177.3 , 600670.6 , 597825.56, 591869. , 590507.7 , 597189.2 , 605954.06, 609151. , 606868.9 , 602329.9 , 599465.75, 601205.5 , 605144.4 , 608588.5 , 604046.9 , 598927.75, 597519.75, 603876.9 , 612424.44, 615765.2 , 612615.44, 606310.6 , 602034.4 , 600784.9 , 602013.5 , 603142.2 , 598850.9 , 591917.44, 589234.56, 596162.5 , 602942.06, 607196.9 , 604928.2 , 601735.6 , 599011.8 , 599490.9 , 600801.44, 602786.94, 598867.2 , 594081.8 , 593736.25, 598995.6 , 607285.25, 611901.06, 609562.75, 603527.3 , 600215.4 , 601372.6 , 604144.5 , 605376.75, 601256.2 , 595245.2 , 594002.06, 600490.4 , 611878.6 , 616563. , 613050.8 , 605734. , 600808.75, 600898.06, 603930.56, 605644.7 , 599917.5 , 592048.06, 590082.8 , 596950.7 , 607701.94, 610844.7 , 609509.6 , 603380.94, 599838.1 , 600334.25, 604386.6 , 607848.1 , 602155.2 , 594949.06, 593815.06, 598365.3 , 608730.8 , 612056.5 , 609922.5 , 603077.1 , 600134.1 , 602821.2 , 606152.75, 610257.8 , 604685.8 , 596858. , 592894.8 , 599944.9 , 609764.44, 614610.75, 611434.75, 605606.4 , 603790.94, 605750.2 , 609250.06, 612935.7 , 609645.06, 601706.4 , 598896.5 , 605349.75, 614671.8 , 618686.7 , 615895.2 , 609438.2 , 605399.56, 606126.75, 607942.3 , 609680.4 , 604814.25, 595841.94, 591908.44, 595638.7 , 604798.94, 611327.1 , 609765.7 , 603727.56, 600970. , 602514. , 606303.7 , 609225.25, 603724.3 , 595944.8 , 594477.4 , 597807.4 , 607379.06, 611808.56, 610112.94, 607196.3 , 604733.06, 605488.25, 610048.3 , 612655.75, 608906.25, 602349.7 , 601754.2 , 609220.4 , 619367.1 , 623783.2 , 619949.7 , 613369.06, 610190.8 , 611091.2 , 614213.44, 615665.06, 611722.2 , 606259.56, 605970.2 , 611463.3 , 619794.6 , 626036.5 , 623085.44, 616295.9 , 611886.3 , 611881.9 , 614420.75, 616853.56, 610375.44, 603471.5 , 602108.25, 608094.3 , 617450.7 , 623508.7 , 619830.2 , 612033.3 , 608737.2 , 610105.25, 613692.7 , 616360.44, 611735.4 , 606512.7 , 604249.44, 608777.44], dtype=float32)

For the last example in this set of aggregation examples, we compute the temporal median:

  • nan nan nan nan nan nan ... -1.754 -1.754 -1.754 -1.754 -1.754 -1.754 array([[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [-1.7648907, -1.7648032, -1.7647004, ..., -1.7650614, -1.7650102, -1.7649589], [-1.7590305, -1.7589546, -1.7588665, ..., -1.7591925, -1.7591486, -1.759095 ], [-1.7536805, -1.753602 , -1.7535168, ..., -1.753901 , -1.753833 , -1.7537591]], dtype=float32)

In addition, there are many other commonly used aggregation methods in Xarray. Some of the more popular aggregation methods are summarized in the following table:

GroupBy: Split, Apply, Combine

While we can obtain useful summaries of datasets using simple aggregation methods, it is more often the case that aggregation must be performed over coordinate labels or groups. In order to perform this type of aggregation, it is helpful to use the split-apply-combine workflow. Fortunately, Xarray provides this functionality for DataArrays and Datasets by means of the groupby operation. The following figure illustrates the split-apply-combine workflow in detail:

../../_images/xarray-split-apply-combine.jpeg

Based on the above figure, you can understand the split-apply-combine process performed by groupby . In detail, the steps of this process are:

The split step involves breaking up and grouping an xarray Dataset or DataArray depending on the value of the specified group key.

The apply step involves computing some function, usually an aggregate, transformation, or filtering, within the individual groups.

The combine step merges the results of these operations into an output xarray Dataset or DataArray .

In this set of examples, we will remove the seasonal cycle (also known as a climatology) from our dataset using groupby . There are many types of input that can be provided to groupby ; a full list can be found in Xarray’s groupby user guide .

In this first example, we plot data to illustrate the annual cycle described above. We first select the grid point closest to a specific latitude-longitude point. Once we have this grid point, we can plot a temporal series of sea-surface temperature (SST) data at that location. Reviewing the generated plot, the annual cycle of the data becomes clear.

../../_images/27e9d523dc05a5d85c83188d03ecb3ee01ddbfa731ae7ab4bf2f9de557f3601d.png

The first step of the split-apply-combine process is splitting. As described above, this step involves splitting a dataset into groups, with each group matching a group key. In this example, we split the SST data using months as a group key. Therefore, there is one resulting group for January data, one for February data, etc. This code illustrates how to perform such a split:

In the above code example, we are extracting components of date/time data by way of the time coordinate’s .dt attribute. This attribute is a DatetimeAccessor object that contains additional attributes for units of time, such as hour, day, and year. Since we are splitting the data into monthly data, we use the month attribute of .dt in this example. (In addition, there exists similar functionality in Pandas; see the official documentation for details.)

In addition, there is a more concise syntax that can be used in specific instances. This syntax can be used if the variable on which the grouping is performed is already present in the dataset. The following example illustrates this syntax; it is functionally equivalent to the syntax used in the above example.

Apply & Combine

Now that we have split our data into groups, the next step is to apply a calculation to the groups. There are two types of calculation that can be applied:

aggregation: reduces the size of the group

transformation: preserves the group’s full size

After a calculation is applied to the groups, Xarray will automatically combine the groups back into a single object, completing the split-apply-combine workflow.

Compute climatology

In this example, we use the split-apply-combine workflow to calculate the monthly climatology at every point in the dataset. Notice that we are using the month DatetimeAccessor , as described above, as well as the .mean() aggregation function:

  • nan nan nan nan nan nan nan ... -1.76 -1.76 -1.76 -1.76 -1.76 -1.76 array([[[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [-1.780786 , -1.780688 , -1.7805718, ..., -1.7809757, -1.7809197, -1.7808627], [-1.7745041, -1.7744204, -1.7743237, ..., -1.77467 , -1.774626 , -1.7745715], [-1.7691481, -1.7690798, -1.7690051, ..., -1.7693441, -1.7692844, -1.7692182]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ... [-1.7605033, -1.760397 , -1.7602725, ..., -1.760718 , -1.7606541, -1.7605885], [-1.7544289, -1.7543424, -1.7542422, ..., -1.754608 , -1.754559 , -1.7545002], [-1.7492163, -1.749148 , -1.7490736, ..., -1.7494118, -1.7493519, -1.7492864]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [-1.7711828, -1.7710832, -1.7709653, ..., -1.7713748, -1.7713183, -1.7712607], [-1.7648666, -1.7647841, -1.7646879, ..., -1.7650299, -1.7649865, -1.7649331], [-1.759478 , -1.7594113, -1.7593384, ..., -1.7596704, -1.7596117, -1.759547 ]]], dtype=float32)
  • month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12 array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
  • month PandasIndex PandasIndex(Index([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12], dtype='int64', name='month'))
  • Attributes: (19) cell_measures : area: areacello cell_methods : area: mean where sea time: mean comment : Model data on the 1x1 grid includes values in all cells for which ocean cells on the native grid cover more than 52.5 percent of the 1x1 grid cell. This 52.5 percent cutoff was chosen to produce ocean surface area on the 1x1 grid as close as possible to ocean surface area on the native grid, while not introducing fractional cell coverage. description : This may differ from "surface temperature" in regions of sea ice or floating ice shelves. For models using conservative temperature as the prognostic field, they should report the top ocean layer as surface potential temperature, which is the same as surface in situ temperature. frequency : mon id : tos long_name : Sea Surface Temperature mipTable : Omon out_name : tos prov : Omon ((isd.003)) realm : ocean standard_name : sea_surface_temperature time : time time_label : time-mean time_title : Temporal mean title : Sea Surface Temperature type : real units : degC variable_id : tos

Now that we have a DataArray containing the climatology data, we can plot the data in different ways. In this example, we plot the climatology at a specific latitude-longitude point:

../../_images/20c8f8050db88eb062c5262b0b3501a3fc5da2ada2aa73d9485439166dce1328.png

In this example, we plot the zonal mean climatology:

../../_images/ff1471f7a5adbc98e51c1723f0a24b275ee423e653816f6ce2cc59be1cd3648a.png

Finally, this example calculates and plots the difference between the climatology for January and the climatology for December:

../../_images/76c8f49aaa86d6462848d0c67b2d8182b202145ff1af63dd09bf04b0083aa556.png

Compute anomaly

In this example, we compute the anomaly of the original data by removing the climatology from the data values. As shown in previous examples, the climatology is first calculated. The calculated climatology is then removed from the data using arithmetic and Xarray’s groupby method:

  • nan nan nan nan nan nan ... 0.01345 0.01341 0.01336 0.01331 0.01326 array([[[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [-0.01402271, -0.01401687, -0.01401365, ..., -0.01406252, -0.01404917, -0.01403356], [-0.01544118, -0.01544476, -0.01545036, ..., -0.0154475 , -0.01544321, -0.01544082], [-0.01638114, -0.01639009, -0.01639998, ..., -0.01635301, -0.01636147, -0.01637137]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ... [ 0.01727939, 0.01713431, 0.01698041, ..., 0.0176847 , 0.01755834, 0.01742125], [ 0.0173862 , 0.0172919 , 0.01719594, ..., 0.01766813, 0.01757395, 0.01748013], [ 0.01693714, 0.01687253, 0.01680517, ..., 0.01709175, 0.0170424 , 0.01699162]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ 0.01506364, 0.01491845, 0.01476014, ..., 0.01545238, 0.0153321 , 0.01520228], [ 0.0142287 , 0.01412642, 0.01402068, ..., 0.0145216 , 0.01442552, 0.01432824], [ 0.01320827, 0.01314461, 0.01307774, ..., 0.0133611 , 0.0133127 , 0.01326215]]], dtype=float32)
  • month (time) int64 1 2 3 4 5 6 7 ... 6 7 8 9 10 11 12 array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])

../../_images/2a9d433308b8faf7355c231e034fd137bcf2f0fdfcb9033c73b6e590fa0dca96.png

In this example, we compute and plot our dataset’s mean global anomaly over time. In order to specify global data, we must provide both lat and lon to the mean() method’s dim keyword argument:

../../_images/7f77afa348588c0c00df9589b6847d29b5c077905c9cd7054183a829bda8dab2.png

Many geoscientific algorithms perform operations over data contained in many different grid cells. However, if the grid cells are not equivalent in size, the operation is not scientifically valid by default. Fortunately, this can be fixed by weighting the data in each grid cell by the size of the cell. Weighting data in Xarray is simple, as Xarray has a built-in weighting method, known as .weighted() .

In this example, we again make use of the Pythia example data library to load a new CESM2 dataset. Contained in this dataset are weights corresponding to the grid cells in our anomaly data:

  • ... [64800 values with dtype=float64]
  • Attributes: (17) cell_methods : area: sum comment : TAREA description : Cell areas for any grid used to report ocean variables and variables which are requested as used on the model ocean grid (e.g. hfsso, which is a downward heat flux from the atmosphere interpolated onto the ocean grid). These cell areas should be defined to enable exact calculation of global integrals (e.g., of vertical fluxes of energy at the surface and top of the atmosphere). frequency : fx id : areacello long_name : Grid-Cell Area for Ocean Variables mipTable : Ofx out_name : areacello prov : Ofx ((isd.003)) realm : ocean standard_name : cell_area time_label : None time_title : No temporal dimensions ... fixed field title : Grid-Cell Area for Ocean Variables type : real units : m2 variable_id : areacello

In a similar fashion to a previous example, this example calculates mean global anomaly. However, this example makes use of the .weighted() method and the newly loaded CESM2 dataset to weight the grid cell data as described above:

This example plots both unweighted and weighted mean data, which illustrates the degree of scientific error with unweighted data:

../../_images/82c3f0c4dbf283ffe69aa79af408e7306bee4fec01f8892e3032ca31234aec34.png

Other high level computation functionality

resample : This method behaves similarly to groupby, but is specialized for time dimensions, and can perform temporal upsampling and downsampling.

rolling : This method is used to compute aggregation functions, such as mean , on moving windows of data in a dataset.

coarsen : This method provides generic functionality for performing downsampling operations on various types of data.

This example illustrates the resampling of a dataset’s time dimension to annual frequency:

  • nan nan nan nan nan nan ... -1.672 -1.672 -1.672 -1.672 -1.672 -1.672 array([[[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [-1.7474419, -1.7474264, -1.7474008, ..., -1.7474308, -1.7474365, -1.7474445], [-1.7424874, -1.7424612, -1.7424251, ..., -1.742536 , -1.7425283, -1.7425116], [-1.7382039, -1.7381679, -1.7381277, ..., -1.7383199, -1.7382846, -1.7382454]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ... [-1.6902231, -1.6899008, -1.6895409, ..., -1.6910189, -1.6907759, -1.6905178], [-1.6879102, -1.6876906, -1.6874666, ..., -1.6885366, -1.6883289, -1.688121 ], [-1.6883243, -1.6881752, -1.6880217, ..., -1.6886654, -1.6885542, -1.6884427]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [-1.6893266, -1.6893964, -1.6894479, ..., -1.6889572, -1.6890831, -1.6892204], [-1.6776317, -1.6777302, -1.6778082, ..., -1.6771463, -1.6773272, -1.677492 ], [-1.672563 , -1.6726688, -1.6727766, ..., -1.6723493, -1.6724195, -1.6724887]]], dtype=float32)
  • time (time) object 2000-01-01 00:00:00 ... 2014-01-... axis : T bounds : time_bnds standard_name : time title : time type : double array([cftime.DatetimeNoLeap(2000, 1, 1, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2001, 1, 1, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2002, 1, 1, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2003, 1, 1, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2004, 1, 1, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2005, 1, 1, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2006, 1, 1, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2007, 1, 1, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2008, 1, 1, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2009, 1, 1, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2010, 1, 1, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2011, 1, 1, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2012, 1, 1, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2013, 1, 1, 0, 0, 0, 0, has_year_zero=True), cftime.DatetimeNoLeap(2014, 1, 1, 0, 0, 0, 0, has_year_zero=True)], dtype=object)
  • time PandasIndex PandasIndex(CFTimeIndex([2000-01-01 00:00:00, 2001-01-01 00:00:00, 2002-01-01 00:00:00, 2003-01-01 00:00:00, 2004-01-01 00:00:00, 2005-01-01 00:00:00, 2006-01-01 00:00:00, 2007-01-01 00:00:00, 2008-01-01 00:00:00, 2009-01-01 00:00:00, 2010-01-01 00:00:00, 2011-01-01 00:00:00, 2012-01-01 00:00:00, 2013-01-01 00:00:00, 2014-01-01 00:00:00], dtype='object', length=15, calendar='noleap', freq='YS-JAN'))

This example illustrates using the rolling method to compute averages in a moving window of 5 months of data:

  • nan nan nan nan nan nan nan nan ... nan nan nan nan nan nan nan nan array([[[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ... [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]], [[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan]]], dtype=float32)

../../_images/5aab0578f43d335bc897b104e639deee90c737310350f54db5e6b267bd232084.png

Masking Data

Masking of data can be performed in Xarray by providing single or multiple conditions to either Xarray’s .where() method or a Dataset or DataArray ’s .where() method. Data values matching the condition(s) are converted into a single example value, effectively masking them from the scientifically important data. In the following set of examples, we use the .where() method to mask various data values in the tos DataArray .

For reference, we will first print our entire sea-surface temperature (SST) dataset:

  • tos (time, lat, lon) float32 nan nan nan ... -1.746 -1.746 cell_measures : area: areacello cell_methods : area: mean where sea time: mean comment : Model data on the 1x1 grid includes values in all cells for which ocean cells on the native grid cover more than 52.5 percent of the 1x1 grid cell. This 52.5 percent cutoff was chosen to produce ocean surface area on the 1x1 grid as close as possible to ocean surface area on the native grid, while not introducing fractional cell coverage. description : This may differ from "surface temperature" in regions of sea ice or floating ice shelves. For models using conservative temperature as the prognostic field, they should report the top ocean layer as surface potential temperature, which is the same as surface in situ temperature. frequency : mon id : tos long_name : Sea Surface Temperature mipTable : Omon out_name : tos prov : Omon ((isd.003)) realm : ocean standard_name : sea_surface_temperature time : time time_label : time-mean time_title : Temporal mean title : Sea Surface Temperature type : real units : degC variable_id : tos array([[[ nan, nan, ..., nan, nan], [ nan, nan, ..., nan, nan], ..., [-1.789945, -1.789865, ..., -1.790069, -1.790012], [-1.785529, -1.78547 , ..., -1.785646, -1.78559 ]], [[ nan, nan, ..., nan, nan], [ nan, nan, ..., nan, nan], ..., [-1.794789, -1.794727, ..., -1.794875, -1.794837], [-1.79065 , -1.790609, ..., -1.790736, -1.790694]], ..., [[ nan, nan, ..., nan, nan], [ nan, nan, ..., nan, nan], ..., [-1.737043, -1.737051, ..., -1.736985, -1.73702 ], [-1.732279, -1.732275, ..., -1.732309, -1.732295]], [[ nan, nan, ..., nan, nan], [ nan, nan, ..., nan, nan], ..., [-1.750638, -1.750658, ..., -1.750561, -1.750605], [-1.74627 , -1.746267, ..., -1.746299, -1.746285]]], dtype=float32)

Using where with one condition

In this set of examples, we are trying to analyze data at the last temporal value in the dataset. This first example illustrates the use of .isel() to perform this analysis:

  • nan nan nan nan nan nan ... -1.746 -1.746 -1.746 -1.746 -1.746 -1.746 array([[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [-1.756119, -1.756165, -1.756205, ..., -1.755922, -1.755986, -1.756058], [-1.750638, -1.750658, -1.750667, ..., -1.750508, -1.750561, -1.750605], [-1.74627 , -1.746267, -1.746261, ..., -1.746309, -1.746299, -1.746285]], dtype=float32)
  • time () object 2014-12-15 12:00:00 axis : T bounds : time_bnds standard_name : time title : time type : double array(cftime.DatetimeNoLeap(2014, 12, 15, 12, 0, 0, 0, has_year_zero=True), dtype=object)

As shown in the previous example, methods like .isel() and .sel() return data of a different shape than the original data provided to them. However, .where() preserves the shape of the original data by masking the values with a Boolean condition. Data values for which the condition is True are returned identical to the values passed in. On the other hand, data values for which the condition is False are returned as a preset example value. (This example value defaults to nan , but can be set to other values as well.)

Before testing .where() , it is helpful to look at the official documentation . As stated above, the .where() method takes a Boolean condition. (Boolean conditions use operators such as less-than, greater-than, and equal-to, and return a value of True or False .) Most uses of .where() check whether or not specific data values are less than or greater than a constant value. As stated in the documentation, the data values specified in the Boolean condition of .where() can be any of the following:

a DataArray

In the following example, we make use of .where() to mask data with temperature values greater than 0 . Therefore, values greater than 0 are set to nan , as described above. (It is important to note that the Boolean condition matches values to keep, not values to mask out.)

  • nan nan nan nan nan nan ... -1.746 -1.746 -1.746 -1.746 -1.746 -1.746 array([[ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], [ nan, nan, nan, ..., nan, nan, nan], ..., [-1.7561191, -1.7561648, -1.7562052, ..., -1.7559224, -1.7559862, -1.7560585], [-1.7506379, -1.7506577, -1.7506672, ..., -1.7505083, -1.750561 , -1.7506049], [-1.7462697, -1.7462667, -1.7462606, ..., -1.7463093, -1.746299 , -1.7462848]], dtype=float32)

In this example, we use Matplotlib to plot the original, unmasked data, as well as the masked data created in the previous example.

../../_images/7971bde9fb8bb3ae6ab4d233e64c5d61cd508127d62e38418ca7fba9737a4ccf.png

Using where with multiple conditions

Those familiar with Boolean conditions know that such conditions can be combined by using logical operators. In the case of .where() , the relevant logical operators are bitwise or exclusive 'and' (represented by the & symbol) and bitwise or exclusive ‘or’ (represented by the | symbol). This allows multiple masking conditions to be specified in a single use of .where() ; however, be aware that if multiple conditions are specified in this way, each simple Boolean condition must be enclosed in parentheses. (If you are not familiar with Boolean conditions, or this section is confusing in any way, please review a detailed Boolean expression guide before continuing with the tutorial.) In this example, we provide multiple conditions to .where() using a more complex Boolean condition. This allows us to mask locations with temperature values less than 25, as well as locations with temperature values greater than 30. (As stated above, the Boolean condition matches values to keep, and everything else is masked out. Because we are now using more complex Boolean conditions, understanding the following example may be difficult. Please review a Boolean condition guide if needed.)

../../_images/e004fcb8adc5e43a4ca52dec4f755946854f34d8f408bdcdae22cf3cc8888361.png

In addition to using DataArrays and Datasets in Boolean conditions provided to .where() , we can also use coordinate variables. In the following example, we make use of Boolean conditions containing latitude and longitude coordinates. This greatly simplifies the masking of regions outside of the Niño 3.4 region :

../../_images/4d62746dd378c46fcccc74b8079dc7eb4d937875c6c592ae96285a4547f5d277.png

Using where with a custom fill value

In the previous examples that make use of .where() , the masked data values are set to nan . However, this behavior can be modified by providing a second value, in numeric form, to .where() ; if this numeric value is provided, it will be used instead of nan for masked data values. In this example, masked data values are set to 0 by providing a second value of 0 to the .where() method:

../../_images/6b54d4345c65af1b790178de0cc76242bb1295447e202499f6dcfc3c21b8edd0.png

In a similar manner to NumPy arrays, performing arithmetic on a DataArray affects all values simultaneously.

Xarray allows for simple data aggregation, over single or multiple dimensions, by way of built-in methods such as sum() and mean() .

Xarray supports the useful split-apply-combine workflow through the groupby method.

Xarray allows replacing (masking) of data matching specific Boolean conditions by means of the .where() method.

What’s next?

The next tutorial illustrates the use of previously covered Xarray concepts in a geoscientifically relevant example: plotting the Niño 3.4 Index .

Resources and References

groupby : Useful for binning/grouping data and applying reductions and/or transformations on those groups

resample : Functionality similar to groupby, specialized for time dimensions. Can be used for temporal upsampling and downsampling

rolling : Useful for computing aggregations on moving windows of your dataset, e.g., computing moving averages

coarsen : Generic functionality for downsampling data

weighted : Useful for weighting data before applying reductions

More xarray tutorials and videos

Xarray Documentation - Masking with where()

Introduction to Xarray

Calculating ENSO with Xarray

IMAGES

  1. [Solved] Assign values to array during loop

    assign values to xarray

  2. Array : Assign values to array during loop

    assign values to xarray

  3. GIS: How to change certain values in an xarray depending on the

    assign values to xarray

  4. Array : How to assign values to 2-D array?

    assign values to xarray

  5. Assign Values To Array Vba? The 20 Top Answers

    assign values to xarray

  6. xarray.DataArray: Simple Guide to Labeled N-Dimensional Array

    assign values to xarray

VIDEO

  1. Xarray Friendly Interactive and Scalable Scientific Data Analysis

  2. Topic 6 Assign Multiple Values

  3. 113 Assign values to TextFormFields

  4. walrus operator in Python #programming #data #python

  5. JavaScript class-1 Hindi(data & variables) Question & Answer #frontend #javascript #variables #tech

  6. Different ways to assign values to variables in python. #shorts #python #code #developers

COMMENTS

  1. xarray.Dataset.assign

    Dataset.assign(variables=None, **variables_kwargs) [source] #. Assign new data variables to a Dataset, returning a new object with all the original variables in addition to the new ones. Parameters: variables ( mapping of hashable to Any) - Mapping from variables names to the new values. If the new values are callable, they are computed on ...

  2. python

    Set values using name index in xarray Ask Question Asked 6 years ago Modified 6 years ago Viewed 6k times 4 I'm trying an MA crossover I did in pandas panels using xarray. Data I'm using: <xarray.Dataset> Dimensions: (DATE: 3355, DN_NAME: 22670) Coordinates: * DATE (DATE) datetime64 [ns] 2004-05-18 2004-05-19 2004-05-21 ...

  3. pandas.DataFrame.to_xarray

    Data in the pandas structure converted to Dataset if the object is a DataFrame, or a DataArray if the object is a Series. See also DataFrame.to_hdf Write DataFrame to an HDF5 file. DataFrame.to_parquet Write a DataFrame to the binary parquet format. Notes See the xarray docs Examples

  4. Assigning values to a subset of a dataset #3015

    Consider the following example: import numpy as np import xarray as xr shape = (3, 2) da1 = xr.DataArray (np.zeros (shape), dims=...

  5. xarray.Dataset.assign_coords

    xarray.Dataset.assign_coords. #. Dataset.assign_coords(coords=None, **coords_kwargs) [source] #. Assign new coordinates to this object. Returns a new object with all the original data in addition to the new coordinates. Parameters: coords ( mapping of dim to coord, optional) - A mapping whose keys are the names of the coordinates and values ...

  6. Introduction to Xarray

    Overview The examples in this tutorial focus on the fundamentals of working with gridded, labeled data using Xarray. Xarray works by introducing additional abstractions into otherwise ordinary data arrays. In this tutorial, we demonstrate the usefulness of these abstractions.

  7. How to change certain values in an xarray depending on the coordinates

    2 Answers Sorted by: 0 If I understand correctly, you want to replace the values for bare soil, vegetation and water everywhere where yy is the value of y. If so, your mock_array might have the shape (5,3), for each of the 5 timesteps and 3 bare soil/vegetation/water values.

  8. Indexing and selecting data

    Xarray offers extremely flexible indexing routines that combine the best features of NumPy and pandas for data selection. The most basic way to access elements of a DataArray object is to use Python's [] syntax, such as array [i, j], where i and j are both integers.

  9. python

    This format is totally new for me and is hard to figure out solution. I've tried assign new data variable using following syntax but without expecting results. I guess it because is taking into calculation whole array at once. nc = nc.assign(cdd_hdd=lambda x: x.tavg - 65 if tavg > 65 else 65 - x.tavg) My goal:

  10. Indexing and selecting data

    xarray offers extremely flexible indexing routines that combine the best features of NumPy and pandas for data selection. The most basic way to access elements of a DataArray object is to use Python's [] syntax, such as array [i, j], where i and j are both integers.

  11. Is it possible to change a coordinate in an netCDF loaded with xarray?

    I have a netCDF which is loaded in xarray with a dimension named bands (it was originally an import via rioxarray of ENVI data), but actually, I want to be able to parse the data by time. ... Use .assign_coords to change the coordinate values of "band". Then .rename "band" to "time" ... periods=408, frequ="M") ds.assign_coordinates ...

  12. Xarray with Dask Arrays

    Xarray is an open source project and Python package that extends the labeled data functionality of Pandas to N-dimensional array-like datasets. It shares a similar API to NumPy and Pandas and supports both Dask and NumPy arrays under the hood. [1]: %matplotlib inline from dask.distributed import Client import xarray as xr

  13. Computations and Masks with Xarray

    Computing climatologies and anomalies of data using Xarray's "split-apply-combine" approach, via the .groupby() method. Performing weighted-reduction operations along single or multiple dimensions of a DataArray or Dataset. Providing a broad overview of Xarray's data-masking capability. Using the .where() method to mask Xarray data

  14. How do I

    # ,, How do I…, Solution,,, add a DataArray to my dataset as a new variable, my_dataset [varname] = my_dataArray or Dataset.assign () (see also Dictionary like methods),, add variables from other datase...

  15. Duke Stops Assigning Point Values to Essays, Test Scores

    In the past Duke has assigned point values of one to five to applicants' essays and standardized test scores, which in turn were factored into a holistic score on a 30-point scale. The university is still using the point system, but only for the remaining numerically weighted categories: curriculum strength, academics, recommendations and ...

  16. python

    <xarray.Dataset> Dimensions: (latitude: 106, longitude: 193, time: 3653) Coordinates: * latitude (latitude) float32 -39.2 -39.149525 ... -33.950478 -33.9 * longitude (longitude) float32 140.8 140.84792 140.89584 ... 149.95209 150.0

  17. How can I replace values in an xarray variable?

    <xarray.Dataset> Dimensions: (elevation_band: 4, latitude: 1, longitude: 1) Coordinates: * longitude (longitude) float64 -111.4 * latitude (latitude) float64 44.51 * elevation_band (elevation_band) int32 1 2 3 4 Data variables: area_frac (elevation_band, latitude, longitude) float64 0.005109 ... mean_elev (elevation_band, latitude, longitud...

  18. xarray.DataArray.values

    For users Getting Started User Guide Gallery Tutorials & Videos API Reference xarray.apply_ufunc xarray.align xarray.broadcast xarray.concat xarray.merge xarray.combine_by_coords xarray.combine_nested xarray.where xarray.infer_freq xarray.full_like xarray.zeros_like xarray.ones_like xarray.cov xarray.corr xarray.cross xarray.dot xarray.polyval

  19. PHP

    Alternatively, you can use array_column() function which returns the values from a single column of the input, identified by the column_key. Optionally, an index_key may be provided to index the values in the returned array by the values from the index_key column of the input array. You can use the array_column as given,

  20. xarray.Variable

    For users Getting Started User Guide Gallery Tutorials & Videos API Reference xarray.apply_ufunc xarray.align xarray.broadcast xarray.concat xarray.merge xarray.combine_by_coords xarray.combine_nested xarray.where xarray.infer_freq xarray.full_like xarray.zeros_like xarray.ones_like xarray.cov xarray.corr xarray.cross xarray.dot xarray.polyval