使用 python 解析 CSV 文件并转换为 Pandas Dataframe 以绘制 Django 模板

Posted 2023-03-11

技术标签:

【中文标题】使用 python 解析 CSV 文件并转换为 Pandas Dataframe 以绘制 Django 模板【英文标题】：Parsing a CSV file with python and convert to Pandas Dataframe to graph for a Django template 【发布时间】：2015-02-11 15:24:11 【问题描述】：

我正在开发一个 web 应用程序，它允许客户上传 CSV 文件（不幸的是，这些文件几乎符合零标准），将解析 CSV 以获取适当的数据，然后在 matplotlib 图中显示数据Pandas DataFrame 对象指向 Django html 模板。

以下是可以上传的示例 CSV 文件：

Shock Name, 
Shock ID, 
Vehicle, 
Location, 
Compression Valving, 
Rebound Valving, 
Piston Valving, 
Other Valving, 
Compression Setting, 
Rebound Setting, 
Preload Setting, 
Notes, 
, 
, 
Measured_Stroke, 2.00 in
Test_Temperature, 79.58 F
Peak_Velocity, 9.98 in/sec
, 
Amplitude, 1.00 in
Test_Period, 0.01 sec
Gas_Force, 9.25 lbs
Test_Speed, 1.00 in/sec


Compression Velocity, Compression Force, Rebound Velocity, Rebound Force
in/sec, lbs, in/sec, lbs
-8.373589E-03, 6.810879, -8.373589E-03, 6.810879
-0.9864202, 140.6932, 0.9310969, -170.4664
-1.97424, 158.4015, 1.915599, -388.0251
-2.984882, 171.0502, 2.903838, -410.7928
-3.976808, 178.6395, 3.910722, -425.9714
-4.987449, 186.2288, 4.898961, -441.15
-5.941944, 191.2883, 5.905845, -451.269
-6.952637, 198.8775, 6.894975, -463.9178
-7.963353, 203.937, 7.865953, -474.0368
-8.955353, 208.9965, 8.855605, -486.6855
-9.947352, 214.056, 9.882603, -494.2748

文件顶部的各种变量名称只是生成 CSV 的原始程序保存的设置。相关数据从包含Compression Velocity, Compression Force, etc... 的行开始，一直持续到最后。事实上，绘图所需的唯一真正相关的数据是第一列和第二列，即：Compression Force 和Compression Velocity。需要在标题下方带有单位的行，因为图表必须采用公制格式，因此在构建绘图时，需要将相应列中的值转换（例如“in/sec”到“meters/sec”，等），但不是图表的一部分。

我的想法是扫描 CSV，直到找到 "Compression Velocity" 的实例，然后将该行用作 DataFrame 的标题行。我相信我正确地做到了这一点，但无法正确构建图表。以下是我的尝试：

def graph(request):
    pd.set_option('display.mpl_style', 'default')
    plt.rcParams['figure.figsize'] = (15,10)

    # get filename from sessions id
    new_file = request.session.get('docFile')

    # Process csv file --> 
    raw_data = open(new_file, 'rb').read()  
    rows = re.split('\n', raw_data.decode())

    for index, row in enumerate(rows):
        cells = row.split(',')
        if 'Compression Velocity' in cells:     # scan for the string 'Compression Velocity' in csv file read   
            header_names = cells
            header_row = index
            break
        else:
            header_names = ''
            header_row = 0

    fig = Figure()
    ax = fig.add_subplot(111)
    ax.set_xlabel("Velocity")
    ax.set_ylabel("Force")
    data_df = pd.read_csv(new_file, header=header_row-2)
    data_df = pd.DataFrame(data_df)
    data_df.plot(ax=ax, title="Roehrig Shock Data", style="-o")

    canvas = FigureCanvas(fig)
    response = HttpResponse( content_type = 'image/png')
    canvas.print_png(response)
    return response

在那之后我尝试解析 CSV 并将适当的行存储到 NumPy 数组中，然后从该数组构造 DataFrame，但这也不起作用。以下是我的尝试：

raw_data = open(new_file, 'rb').read()  
    rows = re.split('\n', raw_data.decode())

    for index, row in enumerate(rows):
        cells = row.split(',')
        if 'Compression Velocity' in cells:     # scan for the string 'Compression Velocity' in csv file read   
            header_names = cells
            header_row = index
            break
        else:
            header_names = ''
            header_row = 0

    useable_data = []
    csv_reader = csv.reader(open(new_file, 'r'))

    for row in islice(csv_reader, header_row, None):
        if 'Compression Velocity' in cells:
            useable_data.append(row)
        else:
            continue

    useable_data = np.array(useable_data)

    fig = Figure()
    ax = fig.add_subplot(111)
    ax.set_xlabel("Velocity")
    ax.set_ylabel("Force")
    data_df = pd.read_csv(new_file, header=header_row)
    data_df = pd.DataFrame(useable_data, columns=['Compression Velocity', 'Force'])
    data_df = pd.DataFrame.from_csv(new_file, header=header_row, index_col=True)
    data_df.plot(ax=ax, title="Roehrig Shock Data", style="-o")

    canvas = FigureCanvas(fig)
    response = HttpResponse( content_type = 'image/png')
    canvas.print_png(response)
    return response

我知道很难在没有错误消息的情况下提出关于 SO 的问题，但我的问题是我目前没有任何错误消息要给出。出错的一切都是基于逻辑错误而不是语法错误。如果你能想出一个更好的方法来实现这个目标，那么我很想听听它，或者如果你能发现我上面的代码中明显存在的可以产生解决方案的明显错误，那么我会喜欢它好吧。感谢您的帮助。

编辑：

我当前的数据框：

-8.373589E-03, 6.810879, -8.373589E-03, 6.810879.1
-0.9864202,140.6932,0.9310969000000001,-170.4664
-1.9742400000000002,158.4015,1.915599,-388.0251
-2.984882,171.0502,2.903838,-410.7928
-3.976808,178.6395,3.9107220000000003,-425.9714
-4.987449,186.2288,4.898961,-441.15
-5.941944,191.2883,5.905844999999999,-451.269
-6.952636999999999,198.8775,6.894975,-463.9178
-7.963353,203.937,7.865953,-474.0368
-8.955353,208.9965,8.855605,-486.6855
-9.947352,214.05599999999998,9.882603,-494.2748

根据 Paul H 的建议。我添加了 skiprows 参数，但由于我跳过了带有单位的行，它似乎也忽略了列标题行。

数据读取

data_df = pd.read_csv(new_file, index_col=0, skiprows=header_row+2)  # skip the row with the units
    data_df = pd.DataFrame(data_df)

【问题讨论】：

你为什么不直接用 pandas 解析数据，大量使用skiprows kwarg？不敢相信我错过了文档中的那个方法，谢谢！您对我如何仅绘制第二列（即“压缩力”）有任何想法吗？ df['列名'].plot(...) 这不是一个坏主意，但它给出了一个 KeyError，如果我正确理解 KeyErrors 是因为它找不到具有该标题名称的列。是的，您需要将“列名”替换为您实际要绘制的列。 【参考方案1】：

添加到 Paul H 答案中，您可以像这样找出需要跳过的行数：

with open('file.txt') as f:
    skip_rows = next(i for i, line in enumerate(f) 
                     if line.startswith('Compression Velocity'))

【讨论】：

【参考方案2】：

这是我阅读文件的方式：

csv = StringIO("""\
Shock Name, 
Shock ID, 
Vehicle, 
Location, 
Compression Valving, 
Rebound Valving, 
Piston Valving, 
Other Valving, 
Compression Setting, 
Rebound Setting, 
Preload Setting, 
Notes, 
, 
, 
Measured_Stroke, 2.00 in
Test_Temperature, 79.58 F
Peak_Velocity, 9.98 in/sec
, 
Amplitude, 1.00 in
Test_Period, 0.01 sec
Gas_Force, 9.25 lbs
Test_Speed, 1.00 in/sec


Compression Velocity, Compression Force, Rebound Velocity, Rebound Force
in/sec, lbs, in/sec, lbs
-8.373589E-03, 6.810879, -8.373589E-03, 6.810879
-0.9864202, 140.6932, 0.9310969, -170.4664
-1.97424, 158.4015, 1.915599, -388.0251
-2.984882, 171.0502, 2.903838, -410.7928
-3.976808, 178.6395, 3.910722, -425.9714
-4.987449, 186.2288, 4.898961, -441.15
-5.941944, 191.2883, 5.905845, -451.269
-6.952637, 198.8775, 6.894975, -463.9178
-7.963353, 203.937, 7.865953, -474.0368
-8.955353, 208.9965, 8.855605, -486.6855
-9.947352, 214.056, 9.882603, -494.2748
""")

pandas.read_csv(csv, skiprows=24).drop(0, axis=0).astype(float)

.drop(0, axis=0) 删除第二行。 .astype(float) 将所有内容转换为数字。如何确定skiprows=24 取决于您自己。

我明白了：

   Compression Velocity  Compression Force  Rebound Velocity  Rebound Force
1         -8.373589E-03           6.810879     -8.373589E-03       6.810879
2            -0.9864202           140.6932         0.9310969      -170.4664
3              -1.97424           158.4015          1.915599      -388.0251
4             -2.984882           171.0502          2.903838      -410.7928
5             -3.976808           178.6395          3.910722      -425.9714
6             -4.987449           186.2288          4.898961        -441.15
7             -5.941944           191.2883          5.905845       -451.269
8             -6.952637           198.8775          6.894975      -463.9178
9             -7.963353            203.937          7.865953      -474.0368
10            -8.955353           208.9965          8.855605      -486.6855
11            -9.947352            214.056          9.882603      -494.2748

现在：

df['Compression Velocity'].plot(legend=False)

给我：

【讨论】：

感谢您的帮助。我真的很感激。

以上是关于使用 python 解析 CSV 文件并转换为 Pandas Dataframe 以绘制 Django 模板的主要内容，如果未能解决你的问题，请参考以下文章