使用 pandas/matplotlib/python，我无法将我的 csv 文件可视化为集群

Posted 2023-03-11

技术标签:

【中文标题】使用 pandas/matplotlib/python，我无法将我的 csv 文件可视化为集群【英文标题】：Using pandas/matplotlib/python, I cannot visualize my csv file as clusters 【发布时间】：2015-09-24 00:45:39 【问题描述】：

我的 csv 文件是， https://github.com/camenergydatalab/EnergyDataSimulationChallenge/blob/master/challenge2/data/total_watt.csv

我想将此 csv 文件可视化为集群。我的理想结果是下图。（较高的点（红色区域）将是较高的能源消耗，较低的点（蓝色区域）将是较低的能源消耗。）

我想将 x 轴设置为日期（例如 2011-04-18），将 y 轴设置为时间（例如 13:22:00），将 z 轴设置为能耗（例如 925.840613752523）。

我使用以下程序成功地将 csv 数据文件可视化为每 30 分钟的值。

from matplotlib import style
from matplotlib import pylab as plt
import numpy as np

style.use('ggplot')

filename='total_watt.csv'
date=[]
number=[]

import csv
with open(filename, 'rb') as csvfile:
    csvreader = csv.reader(csvfile, delimiter=',', quotechar='|')
    for row in csvreader:
        if len(row) ==2 :
            date.append(row[0])
            number.append(row[1])

number=np.array(number)

import datetime
for ii in range(len(date)):
    date[ii]=datetime.datetime.strptime(date[ii], '%Y-%m-%d %H:%M:%S')

plt.plot(date,number)

plt.title('Example')
plt.ylabel('Y axis')
plt.xlabel('X axis')

plt.show()

我还成功地使用以下程序将 csv 数据文件可视化为每天的值。

from matplotlib import style
from matplotlib import pylab as plt
import numpy as np
import pandas as pd

style.use('ggplot')

filename='total_watt.csv'
date=[]
number=[]

import csv
with open(filename, 'rb') as csvfile:

    df = pd.read_csv('total_watt.csv', parse_dates=[0], index_col=[0])
    df = df.resample('1D', how='sum')





import datetime
for ii in range(len(date)):
    date[ii]=datetime.datetime.strptime(date[ii], '%Y-%m-%d %H:%M:%S')

plt.plot(date,number)

plt.title('Example')
plt.ylabel('Y axis')
plt.xlabel('X axis')

df.plot()
plt.show()

虽然我可以将 csv 文件可视化为每 30 分钟和每天的值，但我不知道将 csv 数据可视化为 3D 中的集群..

我该如何编程...？

【问题讨论】：

很难从这张图片中准确地看出你想要什么，无论如何你可以解释得更好一点吗？我现在的想法是，您想将日期和时间分开，并将日期用作 x 轴，将时间用作 y 轴，然后将数据用作 z 轴。另请注意，您不必在执行pd.read_csv() 之前打开 csv 文件。我目前不在家里的电脑前，但我回家后也许可以。感谢您的评论，NightHallow。感谢您的评论，NightHallow！我想用 3D 图表中的能量数据可视化，当能量消耗高时为红色，当能量消耗低时为蓝色。抱歉，很难解释我想要什么..lol 【参考方案1】：

您的主要问题可能只是重塑您的数据，以便您在一个维度上拥有日期，而在另一个维度上拥有时间。一旦你这样做了，你就可以使用你最喜欢的任何绘图（这里我使用了 matplotlib 的 mplot3d，但它有一些怪癖）。

接下来的内容会获取您的数据并适当地对其进行整形，以便您可以绘制一个我认为是您正在寻找的表面。关键是使用pivot 方法，该方法按日期和时间重组您的数据。

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import axes3d

fname = 'total_watt.csv'

# Read in the data, but I skipped setting the index and made sure no data
# is lost to a nonexistent header
df = pd.read_csv(fname, parse_dates=[0], header=None, names=['datetime', 'watt'])

# We want to separate the date from the time, so create two new columns
df['date'] = [x.date() for x in df['datetime']]
df['time'] = [x.time() for x in df['datetime']]

# Now we want to reshape the data so we have dates and times making the result 2D
pv = df.pivot(index='time', columns='date', values='watt')

# Not every date has every time, so fill in the subsequent NaNs or there will be holes
# in the surface
pv = pv.fillna(0.0)

# Now, we need to construct some arrays that matplotlib will like for X and Y values
xx, yy = np.mgrid[0:len(pv),0:len(pv.columns)]

# We can now plot the values directly in matplotlib using mplot3d
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')

ax.plot_surface(xx, yy, pv.values, cmap='jet', rstride=1, cstride=1)
ax.grid(False)

# Now we have to adjust the ticks and ticklabels - so turn the values into strings
dates = [x.strftime('%Y-%m-%d') for x in pv.columns]
times = [str(x) for x in pv.index]

# Setting a tick every fifth element seemed about right
ax.set_xticks(xx[::5,0])
ax.set_xticklabels(times[::5])
ax.set_yticks(yy[0,::5])
ax.set_yticklabels(dates[::5])

plt.show()

这给了我（使用您的数据）以下图表：

请注意，我在绘制和制作刻度时假设您的日期和时间是线性的（在这种情况下是线性的）。如果您有不均匀样本的数据，则必须在绘图前进行一些插值。

【讨论】：

我还建议查看Mayavi, specifically mlab 的 3D 绘图。可以更容易，通常你可以用它做更多的事情（绝对不是每个项目都可以，很高兴知道它存在）谢谢。我会检查的！ :))) 您如何将时间设置为 x 轴！？！？！你的代码是“times = [str(x) for x in pv.index]”虽然.. 我可以理解你的代码，除了 ax.set_xticks(xx[::5,0]) ax.set_xticklabels(times[::5]) ax.set_yticks(yy[0,:: 5]) ax.set_yticklabels(dates[::5]) 这个“[::5]”是什么意思！?!??!? 埃加德，冷静点。 [::5] 是非常标准的 Python 索引（这是一个步骤，这意味着每五个元素取一个）。如果你不知道，你应该复习你的基本 Python。在这段代码中，轴本身只是整数，我所做的只是更改刻度 labels （字符串）以匹配适当的时间/日期。如果您想交换时间和日期（即在 x 上设置日期，在 y 上设置时间）......只需在它们出现的任何地方交换它们。

以上是关于使用 pandas/matplotlib/python，我无法将我的 csv 文件可视化为集群的主要内容，如果未能解决你的问题，请参考以下文章