如何在循环中绘制来自多个文件的数据

Posted 2023-02-23

技术标签:

【中文标题】如何在循环中绘制来自多个文件的数据【英文标题】：How to plot data from multiple files in a loop 【发布时间】：2017-01-10 07:44:39 【问题描述】：

我有超过 1000 个.csv 文件（data_1.csv......data1000.csv），每个都包含 X 和 Y 值！

x1  y1   x2  y2
5.0 60  5.5 500
6.0 70  6.5 600
7.0 80  7.5 700
8.0 90  8.5 800
9.0 100 9.5 900

我在 python 中制作了一个子图程序，它可以使用一个文件一次给出两个图（plot1 - X1vsY1，Plot2 - X2vsY2）。

我需要帮助来循环所有文件，（打开一个文件，读取它，绘制它，选择另一个文件，打开它，读取它，绘制它......直到文件夹中的所有文件都被绘制）

我有以下代码：

import pandas as pd
import matplotlib.pyplot as plt

df1=pd.read_csv("data_csv",header=1,sep=',')
fig = plt.figure()
plt.subplot(2, 1, 1)
plt.plot(df1.iloc[:,[1]],df1.iloc[:,[2]])

plt.subplot(2, 1, 2)
plt.plot(df1.iloc[:,[3]],df1.iloc[:,[4]])

plt.show()

如何才能更有效地完成这项工作？

【问题讨论】：

【参考方案1】：

您可以使用glob 生成文件名列表，然后将它们绘制在 for 循环中。

import glob
import pandas as pd
import matplotlib.pyplot as plt

files = glob.glob(# file pattern something like '*.csv')

for file in files:
    df1=pd.read_csv(file,header=1,sep=',')
    fig = plt.figure()
    plt.subplot(2, 1, 1)
    plt.plot(df1.iloc[:,[1]],df1.iloc[:,[2]])

    plt.subplot(2, 1, 2)
    plt.plot(df1.iloc[:,[3]],df1.iloc[:,[4]])
    plt.show() # this wil stop the loop until you close the plot

【讨论】：

【参考方案2】：

这是工作中使用的基本设置。此代码将分别绘制来自每个文件并通过每个文件的数据。只要列名保持不变，这将适用于任意数量的文件。只需将其指向正确的文件夹即可。

import os
import csv

def graphWriterIRIandRut():
    m = 0
    List1 = []
    List2 = []
    List3 = []
    List4 = []
    fileList = []
    for file in os.listdir(os.getcwd()):
        fileList.append(file)
    while m < len(fileList):
        for col in csv.DictReader(open(fileList[m],'rU')):
            List1.append(col['Col 1 Name'])
            List2.append(col['Col 2 Name'])
            List3.append(col['Col 3 Name'])
            List4.append(col['Col 4 Name'])

        plt.subplot(2, 1, 1)
        plt.grid(True)
        colors = np.random.rand(n)
        plt.plot(List1,List2,c=colors)
        plt.tick_params(axis='both', which='major', labelsize=8)

        plt.subplot(2, 1, 2)
        plt.grid(True)
        colors = np.random.rand(n)
        plt.plot(List1,List3,c=colors)
        plt.tick_params(axis='both', which='major', labelsize=8)

        m = m + 1
        continue

    plt.show()
    plt.gcf().clear()
    plt.close('all')

【讨论】：

我运行了这段代码，稍作修改；但不幸的是什么也没发生？我想我错过了一些东西！我用作答案的代码实际上不是很好。我在一个问题中发布了相同的代码，这是链接：***.com/questions/39378487/…。这个问题的答案很好，我现在还在用。我应该删除或编辑我发布的这个答案，我的错。【参考方案3】：

# plotting all the file data and saving the plots
import os
import csv
import matplotlib.pyplot as plt


def graphWriterIRIandRut():
    m = 0
    List1 = []
    List2 = []
    List3 = []
    List4 = []
    fileList = []
    for file in os.listdir(os.getcwd()):
        fileList.append(file)
    while m < len(fileList):
        for col in csv.DictReader(open(fileList[m],'rU')):
            List1.append(col['x1'])
            List2.append(col['y1'])
            List3.append(col['x2'])
            List4.append(col['y2'])

            plt.subplot(2, 1, 1)
            plt.grid(True)
#            colors = np.random.rand(2)
            plt.plot(List1,List2,c=colors)
            plt.tick_params(axis='both', which='major', labelsize=8)

            plt.subplot(2, 1, 2)
            plt.grid(True)
#            colors = np.random.rand(2)
            plt.plot(List1,List3,c=colors)
            plt.tick_params(axis='both', which='major', labelsize=8)

            m = m + 1
        continue
    plt.show()
    plt.gcf().clear()
    plt.close('all')

【讨论】：

【参考方案4】：

我们要做的是为每次迭代或文件创建一个新的空列表。因此，对于每次迭代，都会绘制数据，但是一旦绘制了数据，就会创建并绘制一个新的空列表。一旦每个文件中的所有数据都被绘制出来，那么你最终想要 plt.show() 将所有的图一起显示。这是我遇到的类似问题的链接：Traceback lines on plot of multiple files。祝你好运！

import csv 
import matplotlib.pyplot as plt

def graphWriter():

    for file in os.listdir(os.getcwd()):
        List1 = []
        List2 = []
        List3 = []
        List4 = []

        with open(filename, 'r') as file:
            for col in csv.DictReader(file):            
                List1.append(col['x1'])
                List2.append(col['y1'])
                List3.append(col['x2'])
                List4.append(col['y2'])

        plt.subplot(2, 1, 1)
        plt.grid(True)
        colors = np.random.rand(2)
        plt.plot(List1,List2,c=colors)
        plt.tick_params(axis='both', which='major', labelsize=8)

        plt.subplot(2, 1, 2)
        plt.grid(True)
        colors = np.random.rand(2)
        plt.plot(List1,List3,c=colors)
        plt.tick_params(axis='both', which='major', labelsize=8)

    plt.show()
    plt.gcf().clear()
    plt.close('all')

【讨论】：

【参考方案5】：

我使用 NetCDF(.nc) 以防万一有人对使用 NetCDF 数据感兴趣。另外，你也可以用 .txt 替换它，想法是一样的。我将它用于等高线图循环。

path_to_folder='#type the path to the files'

count=0
fig = plt.figure(figsize=(10,5))

files = []
for i in os.listdir(path_to_folder):
    if i.endswith('.nc'):
        count=count+1
        files.append(open(i))
        data=xr.open_dataset(i)
        prec=data['tp']
        plt.subplot(1, 2, count) # change 1 and 2 to the shape you want
        prec.groupby('time.month').mean(dim=('time','longitude')).T.plot.contourf(cmap='Purples') *#this is to plot contour plot but u can replace with any plot command

print(files)
plt.savefig('try,png',dpi=500,orientation='landscape',format='png')

【讨论】：

【参考方案6】：

如果由于某种原因@Neill Herbst 的答案没有按预期工作（我认为是最简单的方法）我在读取文件时遇到问题，我重新编写了对我有用的代码

import glob
import pandas as pd
import matplotlib.pyplot as plt

os.chdir(r'path')
for file in glob.glob("*.csv")::
    df1=pd.read_csv(file,header=1,sep=',')
    fig = plt.figure()
    plt.subplot(2, 1, 1)
    plt.plot(df1.iloc[:,[1]],df1.iloc[:,[2]])

    plt.subplot(2, 1, 2)
    plt.plot(df1.iloc[:,[3]],df1.iloc[:,[4]])
    plt.show() # plot one csv when you close it, plots next one
#plt.show  <------ if u want to see all the plots in different windows

【讨论】：

【参考方案7】： 使用p = Path(...): p → WindowsPath('so_data/files') files = p.rglob(...) 产生与模式匹配的所有文件 file[0] → WindowsPath('so_data/files/data_1.csv') p.parent / 'plots' / f'file.stem.png' → WindowsPath('so_data/plots/data_1.png') p.parent → WindowsPath('so_data') file.stem → data_1 这假定所有目录都存在。不包括目录创建/检查。此示例使用pandas，OP 也是如此。使用pandas.DataFrame.plot 绘制，它使用matplotlib 作为默认后端。使用.iloc 指定列，然后x=0 将始终是x 轴数据，基于给定的示例数据。 在python 3.8.11、pandas 1.3.2、matplotlib 3.4.3中测试

import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path

p = Path('so_data/files')  # specify the path to the files
files = p.rglob('data_*.csv')  # generator for all files based on rglob pattern

for file in files:
    df = pd.read_csv(file, header=0, sep=',')  # specify header row and separator as needed
    fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(7, 5))
    df.iloc[:, [0, 1]].plot(x=0, ax=ax1)  # plot 1st x/y pair; assumes x data is at position 0
    df.iloc[:, [2, 3]].plot(x=0, ax=ax2)  # plot 2nd x/y pair; assumes x data is at position 0
    fig.savefig(p.parent / 'plots' / f'file.stem.png')
    plt.close(fig)  # close each figure, otherwise they stay in memory

样本数据

这是用于测试绘图代码手动创建so_data/files 目录。

df = pd.DataFrame('x1': [5.0, 6.0, 7.0, 8.0, 9.0], 'y1': [60, 70, 80, 90, 100], 'x2': [5.5, 6.5, 7.5, 8.5, 9.5], 'y2': [500, 600, 700, 800, 900])

for x in range(1, 1001):
    df.to_csv(f'so_data/files/data_x.csv', index=False)

备用答案

此答案针对存在许多连续 x/y 列对的情况 df.column 创建一个列数组，可以分块成对对于连续的列对，这个answer 有效 list(zip(*[iter(df.columns)]*2)) → [('x1', 'y1'), ('x2', 'y2')] 如有必要，使用其他模式创建列对使用.loc，因为会有列名，而不是.iloc 作为列索引。

p = Path('so_data/files')
files = p.rglob('data_*.csv')

for file in files:
    df = pd.read_csv(file, header=0, sep=',')
    col_pair = list(zip(*[iter(df.columns)]*2))  # extract column pairs
    fig, axes = plt.subplots(len(col_pair), 1)  # a number of subplots based on number of col_pairs
    axes = axes.ravel()  # flatten the axes if necessary
    for cols, ax in zip(col_pair, axes):
        df.loc[:, cols].plot(x=0, ax=ax)  # assumes x data is at position 0
    fig.savefig(p.parent / 'plots' / f'file.stem.png')
    plt.close(fig)

【讨论】：

以上是关于如何在循环中绘制来自多个文件的数据的主要内容，如果未能解决你的问题，请参考以下文章