如何删除数据框中的回车符

Posted 2023-03-12

技术标签:

【中文标题】如何删除数据框中的回车符【英文标题】：How to remove carriage return in a dataframe 【发布时间】：2016-09-06 18:05:46 【问题描述】：

我有一个数据框，其中包含名为 id、country_name、location 和 total_deaths 的列。在进行数据清理过程中，我在一行中发现了一个附加了'\r' 的值。完成清理过程后，我将生成的数据帧存储在destination.csv 文件中。由于上述特定行附加了\r，因此它总是会创建一个新行。

id                               29
location            Uttar Pradesh\r
country_name                  India
total_deaths                     20

我想删除\r。我试过df.replace('\r': '', regex=True)。它不适合我。

还有其他解决办法吗？有人可以帮忙吗？

编辑：

在上述过程中，我正在遍历 df 以查看是否存在 \r。如果存在，则需要更换。这里row.replace() 或row.str.strip() 似乎不起作用，或者我可能以错误的方式进行操作。

我不想在使用replace() 时指定列名或行号。因为我不能确定只有“位置”列会有\r。请在下面找到代码。

count = 0
for row_index, row in df.iterrows():
    if re.search(r"\\r", str(row)):
        print type(row)               #Return type is pandas.Series
        row.replace(r'\\r': '' , regex=True)
        print row
        count += 1

【问题讨论】：

而df.replace(r'\\r': '', regex=True) 也不起作用？为什么使用iterrows()？我认为这不是必需的，因为迭代非常慢。我没有其他方法可以迭代 df。 df.replace(r'\\r': '', regex=True) 不工作 【参考方案1】：

另一种解决方案是使用str.strip:

df['29'] = df['29'].str.strip(r'\\r')
print df
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

如果你想使用replace，添加r和一个\：

print df.replace(r'\\r': '', regex=True)
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

在replace 中，您可以定义用于替换的列：

print df
               id               29
0        location  Uttar Pradesh\r
1    country_name            India
2  total_deaths\r               20

print df.replace('29': r'\\r': '', regex=True)
               id             29
0        location  Uttar Pradesh
1    country_name          India
2  total_deaths\r             20

print df.replace(r'\\r': '', regex=True)
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

通过评论编辑：

import pandas as pd

df = pd.read_csv('data_source_test.csv')
print df
   id country_name           location  total_deaths
0   1        India          New Delhi           354
1   2        India         Tamil Nadu            48
2   3        India          Karnataka             0
3   4        India      Andra Pradesh            32
4   5        India              Assam           679
5   6        India             Kerala           128
6   7        India             Punjab             0
7   8        India      Mumbai, Thane             1
8   9        India  Uttar Pradesh\r\n            20
9  10        India             Orissa            69

print df.replace(r'\r\n': '', regex=True)
   id country_name       location  total_deaths
0   1        India      New Delhi           354
1   2        India     Tamil Nadu            48
2   3        India      Karnataka             0
3   4        India  Andra Pradesh            32
4   5        India          Assam           679
5   6        India         Kerala           128
6   7        India         Punjab             0
7   8        India  Mumbai, Thane             1
8   9        India  Uttar Pradesh            20
9  10        India         Orissa            69

如果只需要替换列location:

df['location'] = df.location.str.replace(r'\r\n', '')
print df
   id country_name       location  total_deaths
0   1        India      New Delhi           354
1   2        India     Tamil Nadu            48
2   3        India      Karnataka             0
3   4        India  Andra Pradesh            32
4   5        India          Assam           679
5   6        India         Kerala           128
6   7        India         Punjab             0
7   8        India  Mumbai, Thane             1
8   9        India  Uttar Pradesh            20
9  10        India         Orissa            69

【讨论】：

谢谢！上述解决方案似乎无法解决我的问题，或者我可能做错了什么。在上述过程中，基本上我正在迭代一个数据帧，并检查\r 是否存在。如果存在，那么我需要执行替换。我再次编辑了我的问题。谢谢。我在问题下添加评论。好的，如果使用这个DataFrame -

df = pd.DataFrame('id': 0: 'location', 1: 'country_name', 2: 'total_deaths',                     '29': 0: 'Uttar Pradesh\\r', 1: 'India', 2: '20')

，它可以工作吗？什么返回print df['29'].tolist()？上面的DataFrame 给我返回了以下结果：['Uttar Pradesh\\r', 'India', '20']。我已将我的测试文件和数据源添加到 [link] (github.com/itsmesaranya/data-cleaning) 。你能看看吗？请检查我的解决方案。【参考方案2】：

使用str.replace，您需要对序列进行转义，以便将其视为回车而不是文字\r：

In [15]:
df['29'] = df['29'].str.replace(r'\\r','')
df

Out[15]:
             id             29
0      location  Uttar Pradesh
1  country_name          India
2  total_deaths             20

【讨论】：

【参考方案3】：

以下代码删除了\n 制表符空格、\n 换行符和\r 回车符，非常适合将数据压缩为一行。答案取自https://gist.github.com/smram/d6ded3c9028272360eb65bcab564a18a

df.replace(to_replace=[r"\\t|\\n|\\r", "\t|\n|\r"], value=["",""], regex=True, inplace=<INPLACE>)

【讨论】：

【参考方案4】：

不知何故，接受的答案对我不起作用。最终，我按照以下方式找到了解决方案

df["29"] = df["29"].replace(r'\r', '', regex=True)

不同的是我使用\r而不是\\r。

【讨论】：

【参考方案5】：

只要使 df 等于 df.replace 代码行，然后打印 df。

df=df.replace('\r': '', regex=True) 
print(df)

【讨论】：

这个答案已经存在，一个字一个字。我建议您将其删除，以免答案空间与已经存在的答案混淆。出于对未来读者和已经发布您答案的用户的尊重。

以上是关于如何删除数据框中的回车符的主要内容，如果未能解决你的问题，请参考以下文章

pandas使用strip函数将dataframe所有数据列名称中的空格（空格回车符制表符）字符删除

SQL语句如何批量删除数据中回车、换行符？

如何从字符串输出中删除回车符？

excel如何删除回车键

如何删除数据框中的引号[重复]

如何删除Word中的软回车(向下的小箭头)