如何将数据从合并的单元格拆分为Python数据帧同一行中的其他单元格?
Posted
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了如何将数据从合并的单元格拆分为Python数据帧同一行中的其他单元格?相关的知识,希望对你有一定的参考价值。
我有一个数据帧的示例,看起来像这样:
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| | Date | Professional | Description |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 0 | 2019-12-19 00:00:00 | Katie Cool | Travel to Space ... |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 1 | 2019-12-20 00:00:00 | Jenn Blossoms | Review stuff; prepare cancellations of ... |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 2 | 2019-12-27 00:00:00 | Jenn Blossoms | Review lots of stuff/o... |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 3 | 2019-12-27 00:00:00 | Jenn Blossoms | Draft email to world leader... |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 4 | 2019-12-30 00:00:00 | Jenn Blossoms | Review this thing. |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
| 5 | 12-30-2019 Jenn Blossoms Telephone Call to A. Bell return her multiple | NaN | NaN |
| | voicemails. | | |
+---+--------------------------------------------------------------------------------------+---------------+--------------------------------------------+
该行的许多数据都在日期单元格中。
我希望样本看起来像这样:
+---+---------------------+---------------+-------------------------------------------------------------+
| | Date | Professional | Description |
+---+---------------------+---------------+-------------------------------------------------------------+
| 0 | 2019-12-19 00:00:00 | Katie Cool | Travel to Space ... |
+---+---------------------+---------------+-------------------------------------------------------------+
| 1 | 2019-12-20 00:00:00 | Jenn Blossoms | Review stuff; prepare cancellations of ... |
+---+---------------------+---------------+-------------------------------------------------------------+
| 2 | 2019-12-27 00:00:00 | Jenn Blossoms | Review lots of stuff/o... |
+---+---------------------+---------------+-------------------------------------------------------------+
| 3 | 2019-12-27 00:00:00 | Jenn Blossoms | Draft email to world leader... |
+---+---------------------+---------------+-------------------------------------------------------------+
| 4 | 2019-12-30 00:00:00 | Jenn Blossoms | Review this thing. |
+---+---------------------+---------------+-------------------------------------------------------------+
| 5 | 12-30-2019 | Jenn Blossoms | Telephone Call to A. Bell return her multiple |
| | | | voicemails. |
+---+---------------------+---------------+-------------------------------------------------------------+
我已经尝试过此代码:
date = dftopdata['Date'].str.extract('(\d2-\d2-\d4)(\s\w+\s\w+)\s(\w+.*)')[0]
name = dftopdata['Date'].str.extract('(\d2-\d2-\d4)(\s\w+\s\w+)\s(\w+.*)')[1]
description = dftopdata['Date'].str.extract('(\d2-\d2-\d4)(\s\w+\s\w+)\s(\w+.*)')[2]
dftopdata.loc[pd.to_datetime(dftopdata['Date'],errors='coerce').isnull(),'Professional'] = name
dftopdata.loc[pd.to_datetime(dftopdata['Date'],errors='coerce').isnull(),'Description'] = description
dftopdata.loc[pd.to_datetime(dftopdata['Date'],errors='coerce').isnull(),'Date'] = date
但是当我运行上面的代码时,数据帧示例如下所示:
+---+------------+---------------+--------------------------------------------+
| | Date | Professional | Description |
+---+------------+---------------+--------------------------------------------+
| 0 | 12/19/2019 | Katie Cool | Travel to space ... |
+---+------------+---------------+--------------------------------------------+
| 1 | 12/20/2019 | Jenn Blossoms | Review stuff; prepare cancellations of ... |
+---+------------+---------------+--------------------------------------------+
| 2 | 12/27/2019 | Jenn Blossoms | Review lots of stuff/o… |
+---+------------+---------------+--------------------------------------------+
| 3 | 12/27/2019 | Jenn Blossoms | Draft email to world leader... |
+---+------------+---------------+--------------------------------------------+
| 4 | 12/30/2019 | Jenn Blossoms | Review this thing. |
+---+------------+---------------+--------------------------------------------+
| 5 | NaN | NaN | NaN |
+---+------------+---------------+--------------------------------------------+
答案
您可以使用str.split
方法将字符串拆分为“单词”。
df['list_of_words'] = dftopdata['Date'].str.split()
如果有一种模式可以从此list_of_words
中拆分专业和描述部分,则可以使用它。例如,如果list_of_words
的前2个单词组成了专业人士的名称,那么您可以-->
df['Professional'] = df.apply(lambda x: ' '.join(x['list_of_words'][:2]), axis=1)
df['Description'] = df.apply(lambda x: ' '.join(x['list_of_words'][2:]), axis=1)
以上是关于如何将数据从合并的单元格拆分为Python数据帧同一行中的其他单元格?的主要内容,如果未能解决你的问题,请参考以下文章