使用 pandas 读取 hdf5 数据集

Posted 2023-03-11

技术标签:

【中文标题】使用 pandas 读取 hdf5 数据集【英文标题】：Reading hdf5 datasets with pandas 【发布时间】：2016-10-27 08:32:36 【问题描述】：

我正在尝试使用 pandas 打开一个无组的 hdf5 文件：

import pandas as pd
foo = pd.read_hdf('foo.hdf5')

但我得到一个错误：

TypeError: 如果对象不存在也没有传递值，则无法创建存储器

我尝试通过分配key 来解决这个问题：

foo = pd.read_hdf('foo.hdf5','key')

如果key 是一个组，则该方法有效，但该文件没有组，而是最高 hdf 结构中的几个数据集。即工作文件的结构是：Groups --> Datasets，而非工作文件的结构是：Datasets。使用 h5py 打开它们时两者都可以正常工作，我会在其中使用：

f = h5py.File('foo.hdf5','r')

和

dset = f['dataset']

查看数据集。任何想法如何在熊猫中阅读这个？

【问题讨论】：

如果你尝试：df = pd.read_hdf('foo.hdf5', 'dataset')会发生什么？可能相关：Pandas can't read hdf5 file created with h5py 【参考方案1】：

我认为您对不同的术语感到困惑 - Pandas 的 HDF 存储 key 是完整路径，即 Group + DataSet_name...

演示：

In [67]: store = pd.HDFStore(r'D:\temp\.data\hdf\test.h5')

In [68]: store.append('dataset1', df)

In [69]: store.append('/group1/sub_group1/dataset2', df)

In [70]: store.groups
Out[70]:
<bound method HDFStore.groups of <class 'pandas.io.pytables.HDFStore'>
File path: D:\temp\.data\hdf\test.h5
/dataset1                              frame_table  (typ->appendable,nrows->9,ncols->2,indexers->[index])
/group1/sub_group1/dataset2            frame_table  (typ->appendable,nrows->9,ncols->2,indexers->[index])>

In [71]: store.items
Out[71]:
<bound method HDFStore.items of <class 'pandas.io.pytables.HDFStore'>
File path: D:\temp\.data\hdf\test.h5
/dataset1                              frame_table  (typ->appendable,nrows->9,ncols->2,indexers->[index])
/group1/sub_group1/dataset2            frame_table  (typ->appendable,nrows->9,ncols->2,indexers->[index])>

In [72]: store.close()

In [73]: x = pd.read_hdf(r'D:\temp\.data\hdf\test.h5', 'dataset1')

In [74]: x.shape
Out[74]: (9, 2)

In [75]: x = pd.read_hdf(r'D:\temp\.data\hdf\test.h5', '/group1/sub_group1/dataset2')

In [76]: x.shape
Out[76]: (9, 2)

【讨论】：

输出是``` 文件路径：/path/foo.hdf5 Empty> ``` 我认为这不是忘记关闭文件的问题。我只是像往常一样尝试用 h5py 打开和关闭它，它工作正常。我还尝试创建 2 个新的 hdf5 文件。一个有一个结构：组 --> 几个数据集，另一个：几个数据集。第一个使用组名作为键以 pandas 正常打开，第二个不会。 @hsnee，Group 是什么意思？你能用不工作的例子更新你的问题吗？组，我的意思是他们在 HDF5 术语中所指的内容 hdfgroup.org/HDF5/doc1.6/UG/09_Groups.html 好的，我猜是Group == key（在 Pandas 术语中）。您能否在某处上传一个您无法打开的小 h5 文件（该文件无法正常工作） - 这将很难帮助您，无法重现问题...

以上是关于使用 pandas 读取 hdf5 数据集的主要内容，如果未能解决你的问题，请参考以下文章