在python中追加HDF5矩阵
Posted
技术标签:
【中文标题】在python中追加HDF5矩阵【英文标题】:HDF5 matrix append in python 【发布时间】:2013-10-19 10:49:01 【问题描述】:例如我们有矩阵(例如我们想要存储 numpy 数组)并且我们将它存储在 HDF5 文件中,但是我们想要通过在原始矩阵的末尾附加一些行来扩展矩阵(考虑到原始矩阵可以很大~几十 Gb,并且不能加载到 RAM 中)
此外,我们还希望能够从任意点(可能称为 slice(?))从矩阵中读取几行,而无需将整个矩阵加载到 RAM 中。
谁能提供一个如何在python中完成的例子?
更新:
我认为另一个选项是 numpy.memmap ,但似乎没有附加。
This 似乎也是一种选择,但它使用原始二进制数据进行操作,但我想访问矩阵。另外我不知道在这种情况下如何进行附加。
【问题讨论】:
【参考方案1】:如果您要使用 HDF5 文件,那么我建议您使用其中一个可用的库,例如 Pytables。我在这里发布和简化他们的教程:http://pytables.github.io/usersguide/tutorials.html
from tables import *
# Define a user record to characterize some kind of particles
class Particle(IsDescription):
name = StringCol(16) # 16-character String
idnumber = Int64Col() # Signed 64-bit integer
ADCcount = UInt16Col() # Unsigned short integer
TDCcount = UInt8Col() # unsigned byte
grid_i = Int32Col() # integer
grid_j = Int32Col() # integer
pressure = Float32Col() # float (single-precision)
energy = FloatCol() # double (double-precision)
filename = "test.h5"
# Open a file in "w"rite mode
h5file = openFile(filename, mode = "w", title = "Test file")
# Create a new group under "/" (root)
group = h5file.createGroup("/", 'detector', 'Detector information')
# Create one table on it
table = h5file.createTable(group, 'readout', Particle, "Readout example")
# Fill the table with 10 particles
particle = table.row
for i in xrange(10):
particle['name'] = 'Particle: %6d' % (i)
particle['TDCcount'] = i % 256
particle['ADCcount'] = (i * 256) % (1 << 16)
particle['grid_i'] = i
particle['grid_j'] = 10 - i
particle['pressure'] = float(i*i)
particle['energy'] = float(particle['pressure'] ** 4)
particle['idnumber'] = i * (2 ** 34)
# Insert a new particle record
particle.append()
# Close (and flush) the file
h5file.close()
#now we will append some data to the table, after taking some slices
f=tables.openFile(filename, mode="a")
f.root.detector
f.root.detector.readout
f.root.detector.readout[1::3]
f.root.detector.readout.attrs.TITLE
ro = f.root.detector.readout
#generators work
[row['energy'] for row in ro.where('pressure > 10')]
#append some data
table = f.root.detector.readout
particle = table.row
for i in xrange(10, 15):
particle['name'] = 'Particle: %6d' % (i)
particle['TDCcount'] = i % 256
particle['ADCcount'] = (i * 256) % (1 << 16)
particle['grid_i'] = i
particle['grid_j'] = 10 - i
particle['pressure'] = float(i*i)
particle['energy'] = float(particle['pressure'] ** 4)
particle['idnumber'] = i * (2 ** 34)
particle.append()
table.flush()
f.close()
【讨论】:
以上是关于在python中追加HDF5矩阵的主要内容,如果未能解决你的问题,请参考以下文章