重新采样 xarray 对象以降低空间分辨率

Posted 2023-03-11

技术标签:

【中文标题】重新采样 xarray 对象以降低空间分辨率【英文标题】：Resample xarray object to lower resolution spatially 【发布时间】：2019-05-22 00:09:48 【问题描述】：

使用 xarray 重新采样以降低空间分辨率

我想将我的 xarray 对象重新采样为较低的空间分辨率 (LESS PIXELS)。

import pandas as pd
import numpy as np
import xarray as xr

time = pd.date_range(np.datetime64('1998-01-02T00:00:00.000000000'), np.datetime64('2005-12-28T00:00:00.000000000'), freq='8D')
x = np.arange(1200)
y = np.arange(1200)

latitude = np.linspace(40,50,1200)
longitude = np.linspace(0,15.5572382,1200)

latitude, longitude = np.meshgrid(latitude, longitude)

BHR_SW = np.ones((365, 1200, 1200))

output_da = xr.DataArray(BHR_SW, coords=[time, y, x])
latitude_da = xr.DataArray(latitude, coords=[y, x])
longitude_da = xr.DataArray(longitude, coords=[y, x])

output_da = output_da.rename('dim_0':'time','dim_1':'y','dim_2':'x')
latitude_da = latitude_da.rename('dim_0':'y','dim_1':'x')
longitude_da = longitude_da.rename('dim_0':'y','dim_1':'x')

output_ds = output_da.to_dataset(name='BHR_SW')
output_ds = output_ds.assign('latitude':latitude_da, 'longitude':longitude_da)

print(output_ds)

<xarray.Dataset>
Dimensions:    (time: 365, x: 1200, y: 1200)
Coordinates:
  * time       (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-23
  * y          (y) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
  * x          (x) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
Data variables:
    BHR_SW     (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
    latitude   (y, x) float64 40.0 40.01 40.02 40.03 ... 49.97 49.98 49.99 50.0
    longitude  (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 15.56 15.56 15.56 15.56
```

我的问题是，如何通过 x,y 坐标将以下内容重新采样为 200x200 网格？

这是一个 REDUCING 变量的空间分辨率。

我尝试过的如下：

output_ds.resample(x=200).mean()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-54-10fbdf855a5d> in <module>()
----> 1 output_ds.resample(x=200).mean()

/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, indexer, skipna, closed, label, base, keep_attrs, **indexer_kwargs)
    701         group = DataArray(dim_coord, coords=dim_coord.coords,
    702                           dims=dim_coord.dims, name=RESAMPLE_DIM)
--> 703         grouper = pd.Grouper(freq=freq, closed=closed, label=label, base=base)
    704         resampler = self._resample_cls(self, group=group, dim=dim_name,
    705                                        grouper=grouper,

/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/pandas/core/resample.pyc in __init__(self, freq, closed, label, how, axis, fill_method, limit, loffset, kind, convention, base, **kwargs)
   1198                              .format(convention))
   1199
-> 1200         freq = to_offset(freq)
   1201
   1202         end_types = set(['M', 'A', 'Q', 'BM', 'BA', 'BQ', 'W'])

/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/pandas/tseries/frequencies.pyc in to_offset(freq)
    174                     delta = delta + offset
    175         except Exception:
--> 176             raise ValueError(libfreqs._INVALID_FREQ_ERROR.format(freq))
    177
    178     if delta is None:

ValueError: Invalid frequency: 200

但我得到了显示的错误。

如何完成对 x 和 y 的空间重采样？

理想情况下我想这样做：

output_ds.resample(x=200, y=200).mean()

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-55-e0bfce19e037> in <module>()
----> 1 output_ds.resample(x=200, y=200).mean()

/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, indexer, skipna, closed, label, base, keep_attrs, **indexer_kwargs)
    679         if len(indexer) != 1:
    680             raise ValueError(
--> 681                 "Resampling only supported along single dimensions."
    682             )
    683         dim, freq = indexer.popitem()

ValueError: Resampling only supported along single dimensions.

注意：真实数据具有不同的行为

这是我在上面创建的测试数据。关于从 netcdf 文件中读取的真实数据

<xarray.Dataset>
Dimensions:    (time: 368, x: 1200, y: 1200)
Coordinates:
  * time       (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-28
Dimensions without coordinates: x, y
Data variables:
    latitude   (y, x) float32 ...
    longitude  (y, x) float32 ...
    Data_Mask  (y, x) float32 ...
    BHR_SW     (time, y, x) float32 ...
Attributes:
    CDI:               Climate Data Interface version 1.9.5 (http://mpimet.mp...
    Conventions:       CF-1.4
    history:           Fri Dec 07 13:29:13 2018: cdo mergetime GLOBALBEDO/Glo...
    content:           extracted variabel BHR_SW of the original GlobAlbedo (...
    metadata_profile:  beam
    metadata_version:  0.5
    CDO:               Climate Data Operators version 1.9.5 (http://mpimet.mp...
```

我尝试过类似的事情：

ds.resample(x=200).mean()

/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/common.pyc in resample(self, indexer, skipna, closed, label, base, keep_attrs, **indexer_kwargs)
    686         dim_coord = self[dim]
    687
--> 688         if isinstance(self.indexes[dim_name], CFTimeIndex):
    689             raise NotImplementedError(
    690                 'Resample is currently not supported along a dimension '

/home/mpim/m300690/miniconda3/envs/holaps/lib/python2.7/site-packages/xarray/core/coordinates.pyc in __getitem__(self, key)
    309         if key not in self._sizes:
    310             raise KeyError(key)
--> 311         return self._variables[key].to_index()
    312
    313     def __unicode__(self):

KeyError: 'x'

非常感谢任何帮助。

【问题讨论】：

相关：***.com/a/42463491/1456927 【参考方案1】：

更新

@clausmichele 的answer 使用coarsen 现在是最好的方法。请注意，粗化现在包括指定所需输出坐标的能力。

原帖

正如piman314 建议的那样，groupby 是在 xarray 中执行此操作的唯一方法。重采样只能用于日期时间坐标。

由于xarray目前不处理多维groupby，这必须分两个阶段完成：

# this results in bin centers on 100, 300, ...
reduced = (
    output_ds
    .groupby(((output_ds.x//200) + 0.5) * 200)
    .mean(dim='x')
    .groupby(((output_ds.y//200) + 0.5) * 200)
    .mean(dim='y'))

如果您只是想对数据进行下采样，可以使用位置切片：

output_ds[:, ::200, ::200]

或者，使用命名的暗淡：

output_ds['x': slice(None, None, 200), 'y': slice(None, None, 200)]

最后，还有其他一些专门为与 xarray 兼容的快速重新网格化而设计的软件包。 xESMF 不错。

【讨论】：

现在，使用 groupby_bins 甚至更粗略一点。【参考方案2】：

最近 coarsen 方法已添加到 xarray 中，我认为这是空间下采样的最佳方法，即使无法使用它设置所需的最终分辨率并自动计算它。 Coarsen 将对非重叠窗口执行操作（平均值、最大值、最小值等），根据您设置的窗口大小，您将获得所需的最终分辨率。

作者的原始输入数据：

import pandas as pd
import numpy as np
import xarray as xr



time = pd.date_range(np.datetime64('1998-01-02T00:00:00.000000000'), np.datetime64('2005-12-28T00:00:00.000000000'), freq='8D')
x = np.arange(1200)
y = np.arange(1200)


latitude = np.linspace(40,50,1200)
longitude = np.linspace(0,15.5572382,1200)
latitude, longitude = np.meshgrid(latitude, longitude)

BHR_SW = np.ones((365, 1200, 1200))

output_da = xr.DataArray(BHR_SW, coords=[time, y, x])
latitude_da = xr.DataArray(latitude, coords=[y, x])
longitude_da = xr.DataArray(longitude, coords=[y, x])
output_da = output_da.rename('dim_0':'time','dim_1':'y','dim_2':'x')
latitude_da = latitude_da.rename('dim_0':'y','dim_1':'x')
longitude_da = longitude_da.rename('dim_0':'y','dim_1':'x')

output_ds = output_da.to_dataset(name='BHR_SW')
output_ds = output_ds.assign('latitude':latitude_da, 'longitude':longitude_da)
print(output_ds)



<xarray.Dataset>
Dimensions:    (time: 365, x: 1200, y: 1200)
Coordinates:
  * time       (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-23
  * y          (y) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
  * x          (x) int64 0 1 2 3 4 5 6 7 ... 1193 1194 1195 1196 1197 1198 1199
Data variables:
    BHR_SW     (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
    latitude   (y, x) float64 40.0 40.01 40.02 40.03 ... 49.97 49.98 49.99 50.0
    longitude  (y, x) float64 0.0 0.0 0.0 0.0 0.0 ... 15.56 15.56 15.56 15.56

将空间分辨率从 1200x1200 降低到 200x200 的粗略方法，我们需要 6x6 窗口。

output_ds.coarsen(x=6).mean().coarsen(y=6).mean()

<xarray.Dataset>
Dimensions:    (time: 365, x: 200, y: 200)
Coordinates:
  * time       (time) datetime64[ns] 1998-01-02 1998-01-10 ... 2005-12-23
  * y          (y) float64 2.5 8.5 14.5 20.5 ... 1.184e+03 1.19e+03 1.196e+03
  * x          (x) float64 2.5 8.5 14.5 20.5 ... 1.184e+03 1.19e+03 1.196e+03
Data variables:
    BHR_SW     (time, y, x) float64 1.0 1.0 1.0 1.0 1.0 ... 1.0 1.0 1.0 1.0 1.0
    latitude   (y, x) float64 40.02 40.07 40.12 40.17 ... 49.88 49.93 49.98
    longitude  (y, x) float64 0.03244 0.03244 0.03244 ... 15.52 15.52 15.52

【讨论】：

【参考方案3】：

由于您使用的是已经使用 CDO 操作的 NetCDF 文件，您还可以使用 CDO SAMPLEGRID 函数或 NCO bilinear_interp 函数：

SAMPLEGRID (https://code.mpimet.mpg.de/projects/cdo/embedded/cdo.pdf) 不进行插值，它只是删除每个第 n 个网格点。

bilinear_interp (http://nco.sourceforge.net/nco.html#Bilinear-interpolation) 进行插值。

您可能想要平均值，最大值，无论您可能更喜欢 NCO bilinear_interp 的反照率值。但是 CDO SAMPLEGRID 可以为您提供 NOC bilinear_interp 所需的 grid_out。

【讨论】：

【参考方案4】：

要使用xarray，最明显的方法是使用groupby_bins，但事实证明这非常慢。进入numpy 并使用超快速索引（[:, :, frequency]）可能更有效

nsamples = 200
bins = np.linspace(output_ds.x.min(),
                   output_ds.x.max(), nsamples).astype(int)
output_ds = output_ds.groupby_bins('x', bins).first()

【讨论】：

以上是关于重新采样 xarray 对象以降低空间分辨率的主要内容，如果未能解决你的问题，请参考以下文章

如何重新采样到较粗的分辨率但在原始索引内采样？

为啥 xarray 重新采样均值计算会产生额外的时间？

在python中从PDF中提取图像而不重新采样？

MediaMuxer 视频文件大小减小（重新压缩，降低分辨率）

opencv 图像金字塔(python)

[Paper Weekly]CNN采样方法：空间变换网络(STN)与可变形卷积网络（DCN）