Java NetCDF:聚合现有文件:未找到时间维度问题

Posted

技术标签:

【中文标题】Java NetCDF:聚合现有文件:未找到时间维度问题【英文标题】:Java NetCDF : Aggregating Existing Files : time dimension not found issue 【发布时间】:2017-05-20 09:06:00 【问题描述】:

对于我的脑筋急转弯,我已经在文档和邮件列表存档中搜索了一段时间,并且很难将处理此聚合所需的步骤放在一起。

CFSR 1 小时数据文件数据来自这里:http://rda.ucar.edu/datasets/ds094.0/

cdas_20161215_0000_f00000_G4.grib2
cdas_20161215_0000_f00100_G4
cdas_20161215_0000_f00200_G4
cdas_20161215_0000_f00300_G4
etc...

每小时文件声明了 2 个时间维度,一个设置了边界,另一个没有设置。

cdas_20161215_0000_f00300_G4.grib2
double time(time=1);
  :units = "Hour since 2016-12-15T00:00:00Z";
  :standard_name = "time";
  :long_name = "GRIB forecast or observation time";
  :calendar = "proleptic_gregorian";
  :bounds = "time_bounds";
double time_bounds(time=1, 2);
  :units = "Hour since 2016-12-15T00:00:00Z";
  :long_name = "bounds for time";
double time1(time1=1);
  :units = "Hour since 2016-12-15T00:00:00Z";
  :standard_name = "time";
  :long_name = "GRIB forecast or observation time";
  :calendar = "proleptic_gregorian";

问题在于,当我逐步创建每个数据集时,不同的每小时文件将交换 2 个时间维度名称的名称。那么AggregationExisting 无法找到某些文件的维度名称“时间”,例如在 0300 文件中的 u-component_of_wind_isobaric 变量上,因为它被声明为 time1。

我正在调用的代码:

List<String> variableNames = Arrays.asList("u-component_of_wind_isobaric","u-component_of_wind_height_above_ground","v-component_of_wind_isobaric","v-component_of_wind_height_above_ground","Pressure_reduced_to_MSL_msl","Geopotential_height_isobaric");
NetcdfDataset netcdfDataset = new NetcdfDataset();
//here i'm trying to aggregate on a dimension called 'time'
AggregationExisting aggregationExisting = new AggregationExisting(netcdfDataset, "time", null);
aggregationExisting.addDatasetScan(null,
                   "/cfsr-gribs/201612/",
                    "G4.grib2",
                    null,
                    null,
                    NetcdfDataset.getDefaultEnhanceMode(),
                    "false",
                    null);
aggregationExisting.persistWrite();
aggregationExisting.finish(new CancelTaskImpl());
GridDataset gridDataset = new GridDataset(netcdfDataset);
writer.setRedefineMode(true);
CFGridWriter2.writeFile(gridDataset, variableNames, gridDataset.getBoundingBox(), null, 1, null, null, 1, true, writer);

时间维度名称问题在 2 个文件中说明:

//cdas_20161215_0000_f00300_G4.grib2

float u-component_of_wind_isobaric(time1=1, isobaric3=37, lat=361, lon=720);
  :long_name = "u-component of wind @ Isobaric surface";
  :units = "m/s";
  :abbreviation = "UGRD";
  :missing_value = NaNf; // float
  :grid_mapping = "LatLon_Projection";
  :coordinates = "reftime time1 isobaric3 lat lon ";
  :Grib_Variable_Id = "VAR_0-2-2_L100";
  :Grib2_Parameter = 0, 2, 2; // int
  :Grib2_Parameter_Discipline = "Meteorological products";
  :Grib2_Parameter_Category = "Momentum";
  :Grib2_Parameter_Name = "u-component of wind";
  :Grib2_Level_Type = "Isobaric surface";
  :Grib2_Generating_Process_Type = "Forecast";


//cdas_20161215_0000_f00200_G4.grib2

float u-component_of_wind_isobaric(time=1, isobaric3=37, lat=361, lon=720);
  :long_name = "u-component of wind @ Isobaric surface";
  :units = "m/s";
  :abbreviation = "UGRD";
  :missing_value = NaNf; // float
  :grid_mapping = "LatLon_Projection";
  :coordinates = "reftime time isobaric3 lat lon ";
  :Grib_Variable_Id = "VAR_0-2-2_L100";
  :Grib2_Parameter = 0, 2, 2; // int
  :Grib2_Parameter_Discipline = "Meteorological products";
  :Grib2_Parameter_Category = "Momentum";
  :Grib2_Parameter_Name = "u-component of wind";
  :Grib2_Level_Type = "Isobaric surface";
  :Grib2_Generating_Process_Type = "Forecast";

这是我第一次使用 NetCDF 库,所以我正在购买一些预处理工具来合并这些具有这种怪癖的数据集。例如,我可以将所有变量移动到同一时间维度并重命名吗?即使是指向我错过的示例的链接也会有所帮助。否则我猜我会考虑手动删除维度并使用 readDataSlice() 手动将数据复制到新的合并文件中。

【问题讨论】:

【参考方案1】:

如果您对使用非 Java 工具感兴趣,我建议您查看NCO。

首先,您需要将 grib 转换为 netcdf,可能使用wgrib2 实用程序(转换示例为here)或ncl_convert2nc。

其次,您可以开发一个简单的脚本,循环遍历相关的 netcdf 文件,检查 time1 是否作为维度名称存在,如果存在,请将名称更改为 time。 NCO 的ncrename 工具可以做到这一点:

ncrename -d time1,time file.nc file.nc 

第三,检查以确保time(现在应该存在于所有文件中)是记录维度。如果没有,让我们使用 NCO 的ncks 工具来实现:

ncks --mk_rec_dmn time file.nc 

最后,使用 NCO 的 ncrcat 沿记录 (time) 维度连接文件:

ncrcat cdas*.nc all_files.nc 

注意:您不必在上面的行中使用通配符,您可以只包含要连接的文件列表,例如

ncrcat cdas_20161215_0000_f00000_G4.nc cdas_20161215_0000_f00100_G4.nc all_files.nc 

【讨论】:

感谢您的回复!这将在 Java 管道中,所以我不得不坚持使用 Netcdf-java 方法。【参考方案2】:

所以我收到了 Ucar 的回复,说 Grib2 是一种不同的野兽,目前无法与 AggregationExisting 一起使用。他们的 THREDDS 服务器产品具有 Grib2 文件的功能,因此它是一些不同的类,例如GribCollectionImmutable。

这是他们对这种方法的推荐,对我来说效果很好:

        List<String> variableNames = Arrays.asList("u-component_of_wind_isobaric","u-component_of_wind_height_above_ground","v-component_of_wind_isobaric","v-component_of_wind_height_above_ground","Pressure_reduced_to_MSL_msl","Geopotential_height_isobaric");
        FeatureCollectionType fcType = FeatureCollectionType.GRIB2;
        Path outputPath = Paths.get("/cfsr/Netcdf4/201612/Cfsr_201612_Monthly.nc");
        String dataDir = "/cfsr-gribs/201612/";
        String spec = dataDir + ".*grib2$";
        String timePartition = "file";
        String dateFormatMark = null;
        String olderThan = null;
        Element innerNcml = null;
        String path = dataDir;
        String name = "cfsr";
        String collectionName = "cfsrCollection";

        //find and configure the folder as a grib collection
        FeatureCollectionConfig fcc = new FeatureCollectionConfig(name, path, fcType, spec,
                collectionName, dateFormatMark, olderThan, timePartition, innerNcml);

        try (GribCollectionImmutable gc = GribCdmIndex.openGribCollection(fcc, null, log)) 
            //had to breakpoint and see the dataset typenames to choose 'TP', could be different for each dataset
            GribCollectionImmutable.Dataset ds = gc.getDatasetByTypeName("TP");
            String fullCollectionIndexFilePath = dataDir + name + ".ncx3";
            // now we open the collection index file, which catalogs all of the grib
            //  records in your collection
            NetcdfDataset ncd = gc.getNetcdfDataset(ds, ds.getGroup(0), fullCollectionIndexFilePath,
                    fcc, null, log);
            try (NetcdfFileWriter writer = NetcdfFileWriter.createNew(NetcdfFileWriter.Version.netcdf4,
                    outputPath.toString(), new Nc4ChunkingDefault())) 
                GridDataset gridDataset = new GridDataset(ncd);
                for (String variableName : variableNames) 
                    GeoGrid grid = gridDataset.findGridByShortName(variableName);
                    //Check that the time dimension is the length you'd expect
                    log.info(String.format("Found grid for : %s = %s, with dimension length %s", variableName, grid != null, grid != null ? grid.getDimension(0).getLength() : 0));
                
                writer.setRedefineMode(true);
                //write the aggregated variables to my output file
                CFGridWriter2.writeFile(gridDataset, variableNames, gridDataset.getBoundingBox(), null, 1, null, null, 1, true, writer);
             catch (Exception exc) 
                exc.printStackTrace();
            
         catch (IOException e) 
            e.printStackTrace();
        

【讨论】:

以上是关于Java NetCDF:聚合现有文件:未找到时间维度问题的主要内容,如果未能解决你的问题,请参考以下文章

加快在python中读取非常大的netcdf文件

使用 Java 从 HDF5 文件中的 NetCDF 字符数组变量中检索一维数组

NetCDF 4.5 NetCDF 文件版本 4 的 Java 问题 + HDF 的旧代码不起作用

NetCDF:未安装 nc-config

用于读取 HDf5 错误的 Netcdf java 库

在现有 iOS 应用程序中集成 react-native(0.40.0) 后未找到 yoga/Yoga.h 头文件