如何将多维数组写入文本文件？

Posted 2023-02-23

技术标签:

【中文标题】如何将多维数组写入文本文件？【英文标题】：How to write a multidimensional array to a text file? 【发布时间】：2011-04-10 18:14:19 【问题描述】：

在另一个问题中，如果我能提供遇到问题的阵列，其他用户会提供一些帮助。但是，我什至在基本的 I/O 任务上都失败了，比如将数组写入文件。

谁能解释我需要什么样的循环才能将 4x11x14 numpy 数组写入文件？

这个数组由四个 11 x 14 数组组成，所以我应该用一个漂亮的换行符来格式化它，以便其他人更容易读取文件。

编辑：所以我尝试了 numpy.savetxt 函数。奇怪的是，它给出了以下错误：

TypeError: float argument required, not numpy.ndarray

我认为这是因为该函数不适用于多维数组？任何我想要的解决方案都包含在一个文件中？

【问题讨论】：

【参考方案1】：

您也可以将 NumPy 多维数组数据存储为.npy 文件类型（它是一个二进制文件）。

save()

import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) #shape (3x3)
np.save('filename.npy', a)

load()

b = np.load('filename.npy')

【讨论】：

【参考方案2】：

如果您的数组是 numpy.array 或 torch.tensor 并且维度小于 4。使用此代码。

# from util.npa2csv import Visualarr; Visualarr(x)
import numpy as np
import torch

def Visualarr(arr, out = 'array_out.txt'):
    dim = arr.ndim 
    if isinstance(arr, np.ndarray):
        # (#Images, #Chennels, #Row, #Column)
        if dim == 4:
            arr = arr.transpose(3,2,0,1)
        if dim == 3:
            arr = arr.transpose(2,0,1)

    if isinstance(arr, torch.Tensor):
        arr = arr.numpy()
    
    
    with open(out, 'w') as outfile:    
        outfile.write('# Array shape: 0\n'.format(arr.shape))
        
        if dim == 1 or dim == 2:
            np.savetxt(outfile, arr, fmt='%-7.3f')

        elif dim == 3:
            for i, arr2d in enumerate(arr):
                outfile.write('# 0-th channel\n'.format(i))
                np.savetxt(outfile, arr2d, fmt='%-7.3f')

        elif dim == 4:
            for j, arr3d in enumerate(arr):
                outfile.write('\n# 0-th Image\n'.format(j))
                for i, arr2d in enumerate(arr3d):
                    outfile.write('# 0-th channel\n'.format(i))
                    np.savetxt(outfile, arr2d, fmt='%-7.3f')

        else:
            print("Out of dimension!")

    

def test_va():
    arr = np.random.rand(4,2)
    tens = torch.rand(2,5,6,3)
    Visualarr(arr)

test_va()

【讨论】：

【参考方案3】：

文件 I/O 通常是代码中的瓶颈。这就是为什么重要的是要知道 ASCII I/O 总是比二进制 I/O 慢几个数量级。我已经将一些建议的解决方案与perfplot 进行了比较：

重现情节的代码：

import json
import pickle

import numpy as np
import perfplot
import scipy.io


def numpy_save(data):
    np.save("test.dat", data)


def numpy_savetxt(data):
    np.savetxt("test.txt", data)


def numpy_savetxt_fmt(data):
    np.savetxt("test.txt", data, fmt="%-7.2f")


def pickle_dump(data):
    with open("data.pkl", "wb") as f:
        pickle.dump(data, f)


def scipy_savemat(data):
    scipy.io.savemat("test.dat", mdict="out": data)


def numpy_tofile(data):
    data.tofile("test.txt", sep=" ", format="%s")


def json_dump(data):
    with open("test.json", "w") as f:
        json.dump(data.tolist(), f)


perfplot.save(
    "out.png",
    setup=np.random.rand,
    n_range=[2 ** k for k in range(20)],
    kernels=[
        numpy_save,
        numpy_savetxt,
        numpy_savetxt_fmt,
        pickle_dump,
        scipy_savemat,
        numpy_tofile,
        json_dump,
    ],
    equality_check=None,
)

【讨论】：

【参考方案4】：

Write to a file with Python's print():

import numpy as np
import sys

stdout_sys = sys.stdout
np.set_printoptions(precision=8) # Sets number of digits of precision.
np.set_printoptions(suppress=True) # Suppress scientific notations.
np.set_printoptions(threshold=sys.maxsize) # Prints the whole arrays.
with open('myfile.txt', 'w') as f:
    sys.stdout = f
    print(nparr)
    sys.stdout = stdout_sys

使用set_printoptions() to customize 对象的显示方式。

【讨论】：

【参考方案5】：

对多维数组使用 JSON 模块，例如

import json
with open(filename, 'w') as f:
   json.dump(myndarray.tolist(), f)

【讨论】：

【参考方案6】：

如果您想将其写入磁盘以便将其作为 numpy 数组读回，请查看 numpy.save。酸洗它也可以正常工作，但是对于大型数组来说效率较低（你的数组不是，所以两者都很好）。

如果您希望它是人类可读的，请查看numpy.savetxt。

编辑：因此，对于具有 >2 维的数组来说，savetxt 似乎不是一个很好的选择......但只是为了得出所有结论： p>

我刚刚意识到numpy.savetxt 阻塞了具有超过 2 个维度的 ndarray...这可能是设计使然，因为没有固有定义的方式来指示文本文件中的其他维度。

例如这个（二维数组）工作正常

import numpy as np
x = np.arange(20).reshape((4,5))
np.savetxt('test.txt', x)

虽然对于 3D 数组，同样的事情会失败（带有相当无意义的错误：TypeError: float argument required, not numpy.ndarray）：

import numpy as np
x = np.arange(200).reshape((4,5,10))
np.savetxt('test.txt', x)

一种解决方法是将 3D（或更大）数组分解为 2D 切片。例如

x = np.arange(200).reshape((4,5,10))
with open('test.txt', 'w') as outfile:
    for slice_2d in x:
        np.savetxt(outfile, slice_2d)

但是，我们的目标是清晰易读，同时仍可通过 numpy.loadtxt 轻松读回。因此，我们可以更详细一些，并使用注释掉的行来区分切片。默认情况下，numpy.loadtxt 将忽略任何以# 开头的行（或comments kwarg 指定的任何字符）。（这看起来比实际上更冗长......）

import numpy as np

# Generate some test data
data = np.arange(200).reshape((4,5,10))

# Write the array to disk
with open('test.txt', 'w') as outfile:
    # I'm writing a header here just for the sake of readability
    # Any line starting with "#" will be ignored by numpy.loadtxt
    outfile.write('# Array shape: 0\n'.format(data.shape))
    
    # Iterating through a ndimensional array produces slices along
    # the last axis. This is equivalent to data[i,:,:] in this case
    for data_slice in data:

        # The formatting string indicates that I'm writing out
        # the values in left-justified columns 7 characters in width
        # with 2 decimal places.  
        np.savetxt(outfile, data_slice, fmt='%-7.2f')

        # Writing out a break to indicate different slices...
        outfile.write('# New slice\n')

这会产生：

# Array shape: (4, 5, 10)
0.00    1.00    2.00    3.00    4.00    5.00    6.00    7.00    8.00    9.00   
10.00   11.00   12.00   13.00   14.00   15.00   16.00   17.00   18.00   19.00  
20.00   21.00   22.00   23.00   24.00   25.00   26.00   27.00   28.00   29.00  
30.00   31.00   32.00   33.00   34.00   35.00   36.00   37.00   38.00   39.00  
40.00   41.00   42.00   43.00   44.00   45.00   46.00   47.00   48.00   49.00  
# New slice
50.00   51.00   52.00   53.00   54.00   55.00   56.00   57.00   58.00   59.00  
60.00   61.00   62.00   63.00   64.00   65.00   66.00   67.00   68.00   69.00  
70.00   71.00   72.00   73.00   74.00   75.00   76.00   77.00   78.00   79.00  
80.00   81.00   82.00   83.00   84.00   85.00   86.00   87.00   88.00   89.00  
90.00   91.00   92.00   93.00   94.00   95.00   96.00   97.00   98.00   99.00  
# New slice
100.00  101.00  102.00  103.00  104.00  105.00  106.00  107.00  108.00  109.00 
110.00  111.00  112.00  113.00  114.00  115.00  116.00  117.00  118.00  119.00 
120.00  121.00  122.00  123.00  124.00  125.00  126.00  127.00  128.00  129.00 
130.00  131.00  132.00  133.00  134.00  135.00  136.00  137.00  138.00  139.00 
140.00  141.00  142.00  143.00  144.00  145.00  146.00  147.00  148.00  149.00 
# New slice
150.00  151.00  152.00  153.00  154.00  155.00  156.00  157.00  158.00  159.00 
160.00  161.00  162.00  163.00  164.00  165.00  166.00  167.00  168.00  169.00 
170.00  171.00  172.00  173.00  174.00  175.00  176.00  177.00  178.00  179.00 
180.00  181.00  182.00  183.00  184.00  185.00  186.00  187.00  188.00  189.00 
190.00  191.00  192.00  193.00  194.00  195.00  196.00  197.00  198.00  199.00 
# New slice

只要我们知道原始数组的形状，重新读回它就很容易了。我们可以做numpy.loadtxt('test.txt').reshape((4,5,10))。举个例子（你可以在一行中做到这一点，我只是为了澄清事情而冗长）：

# Read the array from disk
new_data = np.loadtxt('test.txt')

# Note that this returned a 2D array!
print new_data.shape

# However, going back to 3D is easy if we know the 
# original shape of the array
new_data = new_data.reshape((4,5,10))
    
# Just to check that they're the same...
assert np.all(new_data == data)

【讨论】：

+1 来自我，另见 numpy.loadtxt (docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html) 现在有一个更简单的解决方案来解决这个问题：yourStrArray = np.array([str(val) for val in yourMulDArray],dtype='string'); np.savetxt('YourTextFile.txt',yourStrArray,fmt='%s') @GregKramida 你如何恢复阵列？ @Juanlu001：我知道 numpy.loadtxt(...) 也接受 dtype 参数，可以设置为 np.string_。首先，我会试一试。还有一个 numpy.fromstring(...) 用于从字符串中解析数组。嘿，如果我需要存储一个图像数组怎么办？如果图像大小是 512 x 512，我们将如何调整它的大小？【参考方案7】：

我不确定这是否符合您的要求，因为我认为您有兴趣使文件可供人们阅读，但如果这不是主要问题，只需 pickle 即可。

保存：

import pickle

my_data = 'a': [1, 2.0, 3, 4+6j],
           'b': ('string', u'Unicode string'),
           'c': None
output = open('data.pkl', 'wb')
pickle.dump(my_data, output)
output.close()

回读：

import pprint, pickle

pkl_file = open('data.pkl', 'rb')

data1 = pickle.load(pkl_file)
pprint.pprint(data1)

pkl_file.close()

【讨论】：

您可能不需要pprint 来打印字典。【参考方案8】：

Pickle 最适合这些情况。假设您有一个名为 x_train 的 ndarray。您可以将其转储到文件中并使用以下命令将其还原：

import pickle

###Load into file
with open("myfile.pkl","wb") as f:
    pickle.dump(x_train,f)

###Extract from file
with open("myfile.pkl","rb") as f:
    x_temp = pickle.load(f)

【讨论】：

【参考方案9】：

ndarray.tofile() 也应该可以工作

例如如果你的数组被称为a:

a.tofile('yourfile.txt',sep=" ",format="%s")

但不确定如何获取换行符格式。

编辑（感谢 Kevin J. Black 的评论 here）：

从 1.5.0 版开始，np.tofile() 采用可选参数 newline='\n' 允许多行输出。 https://docs.scipy.org/doc/numpy-1.13.0/reference/generated/numpy.savetxt.html

【讨论】：

但是有没有办法从 texfile 创建原始数组？ @AhashanAlamSojib 见***.com/questions/3518778/… tofile 没有newline='\n'。【参考方案10】：

我有一种方法可以使用简单的 filename.write() 操作。它对我来说很好，但我正在处理具有约 1500 个数据元素的数组。

我基本上只有 for 循环来遍历文件并以 csv 样式输出将其逐行写入输出目标。

import numpy as np

trial = np.genfromtxt("/extension/file.txt", dtype = str, delimiter = ",")

with open("/extension/file.txt", "w") as f:
    for x in xrange(len(trial[:,1])):
        for y in range(num_of_columns):
            if y < num_of_columns-2:
                f.write(trial[x][y] + ",")
            elif y == num_of_columns-1:
                f.write(trial[x][y])
        f.write("\n")

if 和 elif 语句用于在数据元素之间添加逗号。无论出于何种原因，在将文件作为 nd 数组读取时，这些都会被删除。我的目标是将文件输出为 csv，因此此方法有助于处理。

希望这会有所帮助！

【讨论】：

【参考方案11】：

如果您不需要人类可读的输出，您可以尝试的另一个选项是将数组保存为 MATLAB .mat 文件，这是一个结构化数组。我鄙视 MATLAB，但我可以在很少的几行中读取和写入 .mat 的事实很方便。

与 Joe Kington 的回答不同，这样做的好处是您不需要知道 .mat 文件中数据的原始形状，即无需在读入时重新调整形状。而且，与使用pickle 不同，MATLAB 可以读取.mat 文件，并且可能还可以读取其他一些程序/语言。

这是一个例子：

import numpy as np
import scipy.io

# Some test data
x = np.arange(200).reshape((4,5,10))

# Specify the filename of the .mat file
matfile = 'test_mat.mat'

# Write the array to the mat file. For this to work, the array must be the value
# corresponding to a key name of your choice in a dictionary
scipy.io.savemat(matfile, mdict='out': x, oned_as='row')

# For the above line, I specified the kwarg oned_as since python (2.7 with 
# numpy 1.6.1) throws a FutureWarning.  Here, this isn't really necessary 
# since oned_as is a kwarg for dealing with 1-D arrays.

# Now load in the data from the .mat that was just saved
matdata = scipy.io.loadmat(matfile)

# And just to check if the data is the same:
assert np.all(x == matdata['out'])

如果您忘记了 .mat 文件中数组命名的键，您可以随时这样做：

print matdata.keys()

当然，您可以使用更多键存储许多数组。

所以是的——它不会用你的眼睛阅读，但只需要 2 行来写入和读取数据，我认为这是一个公平的权衡。

查看scipy.io.savemat 的文档和scipy.io.loadmat 还有这个教程页面：scipy.io File IO Tutorial

【讨论】：

【参考方案12】：

有专门的库可以做到这一点。（加上python的包装器）

netCDF4：http://www.unidata.ucar.edu/software/netcdf/

netCDF4 Python 接口：http://www.unidata.ucar.edu/software/netcdf/software.html#Python

HDF5：http://www.hdfgroup.org/HDF5/

希望对你有帮助

【讨论】：

【参考方案13】：

您可以简单地在三个嵌套循环中遍历数组并将它们的值写入您的文件。对于阅读，您只需使用相同的精确循环结构。您将以完全正确的顺序获得值，以再次正确填充数组。

【讨论】：

以上是关于如何将多维数组写入文本文件？的主要内容，如果未能解决你的问题，请参考以下文章