仅打印/绘制 CSV 文件中某些特定站点的特定列

Posted 2023-03-11

技术标签:

【中文标题】仅打印/绘制 CSV 文件中某些特定站点的特定列【英文标题】：Print/Plot only a specific column for some specific stations from CSV file 【发布时间】：2019-05-27 15:01:12 【问题描述】：

我创建了一个充电模拟程序，可以模拟不同的电动汽车到达不同的充电站进行充电。

模拟完成后，程序会为充电站创建 CSV 文件，包括每小时统计数据和每天统计数据，首先，每小时统计数据 CSV 对我来说很重要。

我想为不同的车站绘制queue_length_per_hour（有多少辆车在排队等候，从 0 到 24 每小时）。

但问题是我不想包括所有站，因为它们太多了，所以我认为只有 3 个站就足够了。

我应该选择哪 3 个站点？我选择了 3 个车站，根据他们中哪个车站在白天访问的汽车最多（我可以在 24 小时看到），

正如您在代码中看到的那样，我使用了 pandas 的过滤方法，因此我可以根据 CSV 文件中第 24 小时访问次数最多的汽车选择前 3 个站点。

现在我有了前三个站点，现在我想绘制整个列 cars_in_queue_per_hour，不仅是第 24 小时，而且是从第 0 小时开始。

from time import sleep
import pandas as pd
import csv
import matplotlib.pyplot as plt


file_to_read = pd.read_csv('results_per_hour/hotspot_districts_results_from_simulation.csv', sep=";",encoding = "ISO-8859-1")


read_columns_of_file = file_to_read.columns

read_description = file_to_read.describe()


visited_cars_at_hour_24 = file_to_read["hour"] == 24

filtered = file_to_read.where(visited_cars_at_hour_24, inplace = True, axis=0)

top_three = (file_to_read.nlargest(3, 'visited_cars')) 
# This pick top 3 station based on how many visited cars they had during the day

#print("Top Three stations based on amount of visisted cars:\n".format(top_three))

#print(type(top_three))
top_one_station = (top_three.iloc[0]) # HOW CAN I PLOT QUEUE_LENGTH_PER_HOUR COLUMN FROM THIS STATION TO A GRAPH?
top_two_station = (top_three.iloc[1]) # HOW CAN I ALSO PLOT QUEUE_LENGTH_PER_HOUR COLUMN FROM THIS STATION TO A GRAPH?
top_three_station = (top_three.iloc[2]) # AND ALSO THIS?
#print(top_one_station)

#print(file_to_read.where(file_to_read["name"] == "Vushtrri"))

#for row_index, row in top_three.iterrows():
#  print(row)
#  print(row_index)
#  print(file_to_read.where(file_to_read["name"] == row["name"]))
#  print(file_to_read.where(file_to_read["name"] == row["name"]).columns)


xlabel = []
for hour in range(0,25):
    xlabel.append(hour)
ylabel = [0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0] # how to append queue length per hour for the top 3 stations here?

plt.plot(xlabel,ylabel)
plt.show()

代码也可以在这个 repl.it 链接和 CSV 文件中找到：https://repl.it/@raxor2k/almost-done

【问题讨论】：

【参考方案1】：

我真的很喜欢 seaborn-package 来制作这种类型的情节，所以我会使用

import seaborn as sns
df_2 = file_to_read[file_to_read['name'].isin(top_three['name'])]
sns.factorplot(x='hour', y='cars_in_queue_per_hour', data=df_2, hue='name')

您已经选择了前三个名称，因此唯一相关的部分是使用pd.isin 选择名称与前三个中的名称匹配的数据框行，并让 seaborn 制作情节。

为此，请确保通过删除原地更改一行代码：

filtered = file_to_read.where(visited_cars_at_hour_24, axis=0)
top_three = (filtered.nlargest(3, 'visited_cars'))

这会使您的原始数据框保持完整，以便使用其中的所有数据。如果您使用就地，则无法将其分配回去 - 操作在就地执行并返回 None。

我清理了绘图不需要的代码行，因此您要重现的完整代码将是

import seaborn as sns
top_three = file_to_read[file_to_read['hour'] == 24].nlargest(3, 'visited_cars')
df_2 = file_to_read[file_to_read['name'].isin(top_three['name'])]
sns.factorplot(x='hour', y='cars_in_queue_per_hour', data=df_2, hue='name')

【讨论】：

谢谢，我会的！只是一个“额外的问题”：假设我现在还想绘制“每小时使用的充电器的百分比”，是否可以将其添加到同一图表中，或者是仅针对使用的充电器绘制另一个图的最合乎逻辑的方式每小时？有可能，例如见***.com/questions/33925494/…。无论是逻辑上还是视觉上都清晰，我会说由你决定。祝你好运！ :)

以上是关于仅打印/绘制 CSV 文件中某些特定站点的特定列的主要内容，如果未能解决你的问题，请参考以下文章