在HDF5文件中创建和访问数据集

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了在HDF5文件中创建和访问数据集相关的知识,希望对你有一定的参考价值。

我正在尝试使用两个数据集“数据”和“标签”创建HDF5文件。但是,当我尝试访问所述文件时,出现如下错误:

Traceback (most recent call last):
  File "C:Program FilesJetBrainsPyCharm Community Edition 2018.1.4helperspydevpydevd.py", line 1664, in <module>
    main()
  File "C:Program FilesJetBrainsPyCharm Community Edition 2018.1.4helperspydevpydevd.py", line 1658, in main
    globals = debugger.run(setup['file'], None, None, is_module)
  File "C:Program FilesJetBrainsPyCharm Community Edition 2018.1.4helperspydevpydevd.py", line 1068, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "C:Program FilesJetBrainsPyCharm Community Edition 2018.1.4helperspydev\_pydev_imps\_pydev_execfile.py", line 18, in execfile
    exec(compile(contents+"
", file, 'exec'), glob, loc)
  File "C:/pycharm/Input_Pipeline.py", line 140, in <module>
    data_h5 = f['data'][:]
  File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "C:Usersu20x47PycharmProjectsPCLvenvlibsite-packagesh5py\_hlgroup.py", line 177, in __getitem__
    oid = h5o.open(self.id, self._e(name), lapl=self._lapl)
  File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5pyh5o.pyx", line 190, in h5py.h5o.open
ValueError: Not a location (invalid object ID)

用于创建数据集的代码:

h5_file.create_dataset('data', data=data_x, compression='gzip', compression_opts=4, dtype='float32')
h5_file.create_dataset('label', data=label, compression='gzip', compression_opts=1, dtype='uint8')

data_x an array of arrays. Each element in data_x is a 3D array of 1024 elements. 
label is an array of arrays as well. Each element is a 1D array of 1 element. 

访问所述文件的代码:

f = h5_file
data_h5 = f['data'][:]
label_h5 = f['label'][:]
print (data_h5, label_h5)

我怎样才能解决这个问题?这是语法错误还是逻辑错误?

答案

我无法重现错误。也许您忘记关闭文件或在执行期间更改h5的内容。

您也可以使用print h5_file.items()来检查h5文件的内容

经过测试的代码:

import h5py
import numpy as np

h5_file = h5py.File('test.h5', 'w')
# bogus data with the correct size
data_x = np.random.rand(16,8,8)
label = np.random.randint(100, size=(1,1),dtype='uint8')
#
h5_file.create_dataset('data', data=data_x, compression='gzip', compression_opts=4, dtype='float32')
h5_file.create_dataset('label', data=label, compression='gzip', compression_opts=1, dtype='uint8')
h5_file.close()

h5_file = h5py.File('test.h5', 'r')
f = h5_file
print f.items()
data_h5 = f['data'][...]
label_h5 = f['label'][...]
print (data_h5, label_h5)
h5_file.close()

产生

[(u'data', <HDF5 dataset "data": shape (16, 8, 8), type "<f4">), (u'label', <HDF5 dataset "label": shape (1, 1), type "|u1">)]
(array([[[4.36837107e-01, 8.05664659e-01, 3.34415197e-01, ...,
     8.89135897e-01, 1.84097692e-01, 3.60782951e-01],
      [8.86442482e-01, 6.07181549e-01, 2.42844030e-01, ...,
      [4.24369454e-01, 6.04596496e-01, 5.56676507e-01, ...,
     7.22884715e-01, 2.45932683e-01, 9.18777227e-01]]], dtype=float32), array([[25]], dtype=uint8))

以上是关于在HDF5文件中创建和访问数据集的主要内容,如果未能解决你的问题,请参考以下文章

如何在 HDF5DotNet 中创建 2D H5Array

如何在 MATLAB 中创建和保存大型数据集?

如何使用 maven 程序集插件在 tar 中创建和包含 zip 文件

在 Windows Azure 中创建和流式传输复合清单文件

如何使用 h5py 通过 szip 压缩访问 HDF5 数据集

KafkaConsumer 在 spark 中创建和删除 globalTempView 时对多线程访问不安全