列和行操作 Python Pandas
Posted
技术标签:
【中文标题】列和行操作 Python Pandas【英文标题】:Column and Row manipulation Python Pandas 【发布时间】:2015-09-12 13:52:05 【问题描述】:这是我自己在 Pandas 中的第一个程序,我正在尝试按列和行执行一些 csv 操作。我有一个包含多个文件的转换存储库。过渡存储库不断向其中添加文件。我正在尝试动态读取文件并执行一些操作并写入另一个文件夹中的主 csv 文件。
输入
1. Folder_1: `Transition_Data`
Test_1.csv, Test_2.csv
Nos,Time,Count Nos,Time,Count
------------------- ------------------
2341,12:00:00,9865 1234,12:30:00,7865
7352,12:00:00,8969 8435,12:30:00,7649
2. Folder2: Data_repository:Master_2.csv
Nos,00:00:00
------------
1234,1000
8435,5243
2341,563
7352,345
3.Expected Output
Nos,00:00:00,12:00:00,12:30:00
----------------------------------
1234,1000,0,6865
8435,5243,0,2406
2341,563,9302,0
7352,345,8624,0
从transition_data文件中读取Nos
列并检查Nos
在Master_2.csv
中的位置,每次创建一个以Time
为新标题的新列,并减去col[1]
的Transition_data文件的col[2]
如果数据间隙用0
填充,则Master_2.csv
的新值在新创建的列中填充。我确实尝试了几个例子,但我搞砸了。
如下所述的程序更新,现在在路由文件读写逻辑时遇到问题
import pandas as pd
import os
import numpy as np
import glob
path_1 = '/Transition_Data/'
path_2 = 'Data_repository/Master_2.csv'
df_1 = pd.DataFrame(dict(Nos=Nos, Time=Time, Count=Count))
pivot = pd.pivot_table(path_1, '/.*CSV, index='Nos', columns='Time', values='Count')
df_master = pd.DataFrame('Master_2.csv', 'Nos':, '00:00:00':).set_index('Nos')
result = df_master.join(pivot, how='inner')
result[result.columns[1:]] = result[result.columns[1:]].sub(result[result.columns[0]], axis=0)
result.fillna(0)
我尝试了上面的程序并得到以下错误
Traceback (most recent call last):
File "read_test.py", line 19, in <module>
df = pd.read_csv(filename, header='Count')
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 420, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 218, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 502, in __init__
self._make_engine(self.engine)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 610, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/lib/python2.7/dist-packages/pandas/io/parsers.py", line 972, in __init__
self._reader = _parser.TextReader(src, **kwds)
File "parser.pyx", line 476, in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4538)
TypeError: an integer is required
【问题讨论】:
我投票结束这个问题,因为用户不再是 Stack Overflow 的成员。对此问题的任何回应或澄清都不会得到回应。 【参考方案1】:我能看到的最简单的方法是将它们全部连接到一个 DataFrame 中,按时间对列进行排序,然后移位并减去以获得增量:
import pandas as pd
import os
path_1 = 'Transition_Data/'
path_2 = 'Data_repository/Master_2.csv'
# Read data, and combine "transition" data into
# single joined data frame
master = pd.read_csv(path_2)
other_data = pd.concat([
pd.read_csv(path_1 + f) for f in os.listdir(path_1)
])
# Index master data frame by Nos
master.set_index('Nos', inplace=True)
# Index transition data by Nos and Time
other_data.set_index(['Nos', 'Time'], inplace=True)
# Convert to series (to remove Count column heading)
# and unstack time to convert to columns
other_data = other_data['Count'].unstack('Time')
# Join the data sets on the Time axis
joined = pd.concat([master, other_data], axis=1)
# Sort the data sets by Time
joined = joined.sort_index(axis=1)
# Fill na values with data in previous period
joined = joined.fillna(method='pad',axis=1)
# Shift dataframe and subtract to get delta
delta = joined - joined.shift(axis=1).fillna(0)
print(delta)
这给出了你想要的输出:
00:00:00 12:00:00 12:30:00
Nos
1234 1000 0 6865
2341 563 9302 0
7352 345 8624 0
8435 5243 0 2406
【讨论】:
以上是关于列和行操作 Python Pandas的主要内容,如果未能解决你的问题,请参考以下文章