分隔特定列并将它们添加为 CSV 中的列（Python3、CSV）

Posted 2023-03-11

技术标签:

【中文标题】分隔特定列并将它们添加为 CSV 中的列（Python3、CSV）【英文标题】：Delimit a specific column and add them as columns in CSV (Python3, CSV) 【发布时间】：2016-02-08 05:01:14 【问题描述】：

我有一个 csv 文件，其中包含我首先用冒号 (;) 分隔的几列。但是，ONE 列由竖线 | 分隔。我想分隔此列并创建新列。

输入：

  Column 1    Column 2      Column 3
     1           2          3|4|5
     6           7          6|7|8
     10          11         12|13|14

期望的输出：

  Column 1   Column 2      ID    Age  Height
     1          2          3      4    5 
     6          7          6      7    8
     10         11         12     13   14

到目前为止，我的代码第一次用 ;然后转换为 DF（这是我想要的结束格式）

delimit = list(csv.reader(open('test.csv', 'rt'), delimiter=';'))
df = pd.DataFrame(delimit)

【问题讨论】：

可以解析最后一列和split it 【参考方案1】：

delimit = list(csv.reader(open('test.csv', 'rt'), delimiter=';'))

for row in delimit:
    piped = row.pop()
    row.extend(piped.split('|'))

df = pd.DataFrame(delimit)

delimit 最终看起来像：

[
    ['1', '2', '3', '4', '5'],
    ['6', '7', '6', '7', '8'],
    ['10', '11', '12', '13', '14'],
]

【讨论】：

【参考方案2】：

您没有准确显示数据的样子（您说它是用分号分隔的，但您的示例没有），但是如果它看起来像

Column 1;Column 2;Column 3
1;2;3|4|5
6;7;6|7|8
10;11;12|13|14

你可以这样做

>>> df = pd.read_csv("test.csv", sep="[;|]", engine='python', skiprows=1, 
                     names=["Column 1", "Column 2", "ID", "Age", "Height"])
>>> df
   Column 1  Column 2  ID  Age  Height
0         1         2   3    4       5
1         6         7   6    7       8
2        10        11  12   13      14

这通过使用正则表达式分隔符来工作，意思是“; 或 |”并手动强制列名。

或者，您可以通过几个步骤完成：

>>> df = pd.read_csv("test.csv", sep=";")
>>> df
   Column 1  Column 2  Column 3
0         1         2     3|4|5
1         6         7     6|7|8
2        10        11  12|13|14
>>> c3 = df.pop("Column 3").str.split("|", expand=True)
>>> c3.columns = ["ID", "Age", "Height"]
>>> df.join(c3)
   Column 1  Column 2  ID Age Height
0         1         2   3   4      5
1         6         7   6   7      8
2        10        11  12  13     14

【讨论】：

在尝试运行代码的后半部分时出现以下错误：TypeError: split() got an unexpected keyword argument 'expand' @user3682157：您可能使用的是旧版本的 pandas。【参考方案3】：

使用 csv lib 和 str.replace 实际上要快得多：

import csv
with open("test.txt") as f:
    next(f)
    # itertools.imap python2
    df = pd.DataFrame.from_records(csv.reader(map(lambda x: x.rstrip().replace("|", ";"), f), delimiter=";"),
                                   columns=["Column 1", "Column 2", "ID", "Age", "Height"]).astype(int)

一些时间安排：

In [35]: %%timeit
pd.read_csv("test.txt", sep="[;|]", engine='python', skiprows=1,
                     names=["Column 1", "Column 2", "ID", "Age", "Height"])
   ....: 
100 loops, best of 3: 14.7 ms per loop

In [36]: %%timeit                                                             
with open("test.txt") as f:
    next(f)
    df = pd.DataFrame.from_records(csv.reader(map(lambda x: x.rstrip().replace("|", ";"), f),delimiter=";"),
                               columns=["Column 1", "Column 2", "ID", "Age", "Height"]).astype(int)
   ....: 
100 loops, best of 3: 6.05 ms per loop

你可以只是 str.split:

with open("test.txt") as f:
    next(f)
    df = pd.DataFrame.from_records(map(lambda x: x.rstrip().replace("|", ";").split(";"), f),
                                   columns=["Column 1", "Column 2", "ID", "Age", "Height"])

【讨论】：

【参考方案4】：

为自己想出了一个解决方案：

df = pd.DataFrame(delimit)
s = df['Column 3'].apply(lambda x: pd.Series(x.split('|')))
frame = pd.DataFrame(s)
frame.rename(columns=0: 'ID',1:'Height',2:'Age', inplace=True)
result = pd.concat([df, frame], axis=1)

【讨论】：

以上是关于分隔特定列并将它们添加为 CSV 中的列（Python3、CSV）的主要内容，如果未能解决你的问题，请参考以下文章