熊猫多重合并创建多维重复列

Posted

技术标签:

【中文标题】熊猫多重合并创建多维重复列【英文标题】:Pandas multiple merge creates multidimensional duplicate columns 【发布时间】:2020-12-04 12:11:31 【问题描述】:

我的目标是根据相似的主机名、序列号、类别将 4 个 excel 工作表合并为 1 个...我正在使用下面的 pandas 合并功能。

InventoryDf = pd.read_excel("Inventory.xlsx", sheet_name='Inventory')
SoftwareDf = pd.read_excel("Inventory.xlsx", sheet_name='Software')
HardwarewareDf = pd.read_excel("Inventory.xlsx", sheet_name='Hardware')
CoverageDf = pd.read_excel("Inventory.xlsx", sheet_name='Coverage')
data_frames = [InventoryDf, SoftwareDf, HardwarewareDf, CoverageDf]
merge = partial(pd.merge, on=['Priority','Category','Product Family','Host Name','Serial Number'], how='outer')
merge = reduce(merge, data_frames)

问题在于每个工作表都有一个“IP 地址”列,其中的 IP 大多相似。由于某种原因,合并数据框包含 4 列,有 2 个重复名称:“IP Address_x”、“IP Address_x”、“IP Address_y”、“IP Address_y”

我想将这 4 列合并为 1,但我不能,因为它们的名称相似。我没有手动重命名它们,因为有大约 30 个数据框列,而且很乏味。

有没有办法合并它们以便:

    如果ip相同,合并一下 如果 IP 不同,请使用左侧第一个“IP 地址_x”列 如果缺少一列,如果 IP 不为空,则只显示第一个“IP Address_x”

这是工作表的示例,我有更多列,例如:名称、网址、站点名称、城市...

InventoryDf

+-----------+---------------+------------+----------+----------+
| Host Name | Serial Number | IP Address | Priority | Category |
+-----------+---------------+------------+----------+----------+
| SwitchA   | 1230          | 1.1.1.1    | 1        | Switch   |
+-----------+---------------+------------+----------+----------+
| SwitchA   | 1231          | 1.1.1.1    | 1        | Switch   |
+-----------+---------------+------------+----------+----------+
| SwitchB   | 1240          | 1.1.1.2    | 2        | Switch   |
+-----------+---------------+------------+----------+----------+

HardwareDf

+-----------+---------------+------------+----------+----------+
| Host Name | Serial Number | IP Address | Priority | Category |
+-----------+---------------+------------+----------+----------+
| SwitchA   | 1230          | 1.1.0.1    | 1        | Switch   |
+-----------+---------------+------------+----------+----------+
| SwitchD   | 1250          | 1.2.2.2    | 1        | Switch   |
+-----------+---------------+------------+----------+----------+
| SwitchE   | 1260          | 1.3.3.3    | 2        | Switch   |
+-----------+---------------+------------+----------+----------+

SoftwareDf

+-----------+---------------+------------+----------+----------+---------+
| Host Name | Serial Number | IP Address | Priority | Category | Version |
+-----------+---------------+------------+----------+----------+---------+
| SwitchA   | 1230          | 1.1.1.1    | 1        | Switch   | X       |
+-----------+---------------+------------+----------+----------+---------+
| SwitchA   | 1231          | 1.1.1.1    | 1        | Switch   | X       |
+-----------+---------------+------------+----------+----------+---------+
| SwitchB   | 1240          | 1.1.1.2    | 2        | Switch   | Y       |
+-----------+---------------+------------+----------+----------+---------+

CoverageDf

+-----------+---------------+------------+----------+----------+-------------+-------+
| Host Name | Serial Number | IP Address | Priority | Category | Coverage    | Price |
+-----------+---------------+------------+----------+----------+-------------+-------+
| SwitchA   | 1230          | 1.1.1.1    | 1        | Switch   | Not Covered | 100   |
+-----------+---------------+------------+----------+----------+-------------+-------+
| SwitchA   | 1231          | 1.1.1.1    | 1        | Switch   | Covered     | 300   |
+-----------+---------------+------------+----------+----------+-------------+-------+
| SwitchB   | 1240          | 1.1.1.2    | 2        | Switch   | Not Covered | 200   |
+-----------+---------------+------------+----------+----------+-------------+-------+

预期结果(IP 地址被合并,即使 SwitchA 的一些不同)

+-----------+---------------+------------+----------+----------+---------+-------------+-------+
| Host Name | Serial Number | IP Address | Priority | Category | Version | Coverage    | Price |
+-----------+---------------+------------+----------+----------+---------+-------------+-------+
| SwitchA   | 1230          | 1.1.1.1    | 1        | Switch   | X       | Not Covered | 100   |
+-----------+---------------+------------+----------+----------+---------+-------------+-------+
| SwitchA   | 1231          | 1.1.1.1    | 1        | Switch   | X       | Covered     | 300   |
+-----------+---------------+------------+----------+----------+---------+-------------+-------+
| SwitchB   | 1240          | 1.1.1.2    | 2        | Switch   | Y       | Not Covered | 200   |
+-----------+---------------+------------+----------+----------+---------+-------------+-------+
| SwitchD   | 1250          | 1.2.2.2    | 1        | Switch   |         |             |       |
+-----------+---------------+------------+----------+----------+---------+-------------+-------+
| SwitchE   | 1260          | 1.3.3.3    | 2        | Switch   |         |             |       |
+-----------+---------------+------------+----------+----------+---------+-------------+-------+

结果的原始提取。注意冗余列的丢失,IP 地址_x

 Source.Name_x  Priority Item Type_x  Category                            Product Family       Product ID_x Software Type_x OS Version_x Suggested Version 1_x                       Host Name  IP Address_x Serial Number                                                  Source.Name_y       Product ID_y Software Type_y OS Version_y           Current Milestone_x Suggested Version 1_y Suggested Version 2 Suggested Version 3  IP Address_y SW End of Life SW End of Sale                                                                                                                    URL_x                                                  Source.Name_x IP Address_x Item Type_y      Product ID_x Current Milestone_y   Hardware Lifecycle Status    Replacement PID Replacement PID Info  Replacement PID Price  Replacement PID Price Discount Replacement PID Service Level  Replacement PID Service Price  Current PID Service Price  Replacement PID Service Price Discount HW End of Life HW End of Sale                                                                                                                    URL_y                                                  Source.Name_y Item Type       Product ID_y  IP Address_y      Coverage Status    Contract Status    Contract Number Coverage Start Date Coverage End Date SLA type
 inventory_30_Jun_15_19_35.xlsx         3     Chassis  Switches     Cisco Catalyst 2960-X Series Switches   WS-C2960X-24PS-L             ios    15.2(4)E6             15.2(7)E2                  SWITCH-IDF5-A1    10.1.1.8   XXXXX  software_02_Jul_07_54_15.xlsx   WS-C2960X-24PS-L             IOS    15.2(4)E6  End of Vulnerability Support             15.2(7)E2                <NA>                <NA>    10.1.1.8     2023-04-30     2018-05-01  https://www.cisco.com/c/en/us/products/collateral/switches/catalyst-4500-series-switches/eos-eol-notice-c51-739919.html                                                           <NA>         <NA>        <NA>              <NA>                <NA>                        <NA>               <NA>                 None                    NaN                            <NA>                          <NA>                            NaN                        NaN                                    <NA>            NaT            NaT                                                                                                                     <NA>  coverage_24_Jul_10_06_49.xlsx   Chassis   WS-C2960X-24PS-L    10.1.1.8    Covered - Non-IBM  Covered - Non-IBM  Covered - Non-IBM                 NaT               NaT     None
 inventory_30_Jun_15_19_35.xlsx         3     Chassis  Switches     Cisco Catalyst 2960-X Series Switches  WS-C2960X-48LPS-L             IOS    15.2(4)E6             15.2(7)E2                  SWITCH-IDF6-A1    10.1.1.9   YYYYY  software_02_Jul_07_54_15.xlsx  WS-C2960X-48LPS-L             IOS    15.2(4)E6  End of Vulnerability Support             15.2(7)E2                <NA>                <NA>    10.1.1.9     2023-04-30     2018-05-01  https://www.cisco.com/c/en/us/products/collateral/switches/catalyst-4500-series-switches/eos-eol-notice-c51-739919.html                                                           <NA>         <NA>        <NA>              <NA>                <NA>                        <NA>               <NA>                 None                    NaN                            <NA>                          <NA>                            NaN                        NaN                                    <NA>            NaT            NaT                                                                                                                     <NA>  coverage_24_Jul_10_06_49.xlsx   Chassis  WS-C2960X-48LPS-L    10.1.1.9    Covered - Non-IBM  Covered - Non-IBM  Covered - Non-IBM                 NaT               NaT     None
 inventory_30_Jun_15_19_35.xlsx         3     Chassis  Switches     Cisco Catalyst 2960-X Series Switches   WS-C2960X-24PS-L             IOS    15.2(4)E6             15.2(7)E2                  SWITCH-IDF7-A1   10.1.1.11   ZZZZZZ  software_02_Jul_07_54_15.xlsx   WS-C2960X-24PS-L             IOS    15.2(4)E6  End of Vulnerability Support             15.2(7)E2                <NA>                <NA>   10.1.1.11     2023-04-30     2018-05-01  https://www.cisco.com/c/en/us/products/collateral/switches/catalyst-4500-series-switches/eos-eol-notice-c51-739919.html                                                           <NA>         <NA>        <NA>              <NA>                <NA>                        <NA>               <NA>                 None                    NaN                            <NA>                          <NA>                            NaN                        NaN                                    <NA>            NaT            NaT                                                                                                                     <NA>  coverage_24_Jul_10_06_49.xlsx   Chassis   WS-C2960X-24PS-L   10.1.1.11    Covered - Non-IBM  Covered - Non-IBM  Covered - Non-IBM                 NaT               NaT     None
 inventory_30_Jun_15_19_35.xlsx         3     Chassis  Switches     Cisco Catalyst 2960-X Series Switches   WS-C2960X-24PS-L             IOS    15.2(4)E6             15.2(7)E2                  SWITCH-IDF8-A1   10.1.1.12   QQQQQ  software_02_Jul_07_54_15.xlsx   WS-C2960X-24PS-L             IOS    15.2(4)E6  End of Vulnerability Support             15.2(7)E2                <NA>                <NA>   10.1.1.12     2023-04-30     2018-05-01  https://www.cisco.com/c/en/us/products/collateral/switches/catalyst-4500-series-switches/eos-eol-notice-c51-739919.html                                                           <NA>         <NA>        <NA>              <NA>                <NA>                        <NA>               <NA>                 None                    NaN                            <NA>                          <NA>                            NaN                        NaN                                    <NA>            NaT            NaT                                                                                                                     <NA>  coverage_24_Jul_10_06_49.xlsx   Chassis   WS-C2960X-24PS-L   10.1.1.12    Covered - Non-IBM  Covered - Non-IBM  Covered - Non-IBM                 NaT               NaT     None
 inventory_30_Jun_15_19_35.xlsx         3     Chassis  Switches     Cisco Catalyst 2960-X Series Switches  WS-C2960X-48LPS-L             IOS    15.2(4)E6             15.2(7)E2                  SWITCH-IDF9-A1   10.1.1.13   WWWWW  software_02_Jul_07_54_15.xlsx  WS-C2960X-48LPS-L             IOS    15.2(4)E6  End of Vulnerability Support             15.2(7)E2                <NA>                <NA>   10.1.1.13     2023-04-30     2018-05-01  https://www.cisco.com/c/en/us/products/collateral/switches/catalyst-4500-series-switches/eos-eol-notice-c51-739919.html                                                           <NA>         <NA>        <NA>              <NA>                <NA>                        <NA>               <NA>                 None                    NaN                            <NA>                          <NA>                            NaN                        NaN                                    <NA>            NaT            NaT                                                                                                                     <NA>  coverage_24_Jul_10_06_49.xlsx   Chassis  WS-C2960X-48LPS-L   10.1.1.13    Covered - Non-IBM  Covered - Non-IBM  Covered - Non-IBM                 NaT               NaT     None
 inventory_30_Jun_15_30_08.xlsx         3     Chassis  Switches     Cisco Catalyst 2960-C Series Switches   WS-C2960C-12PC-L             IOS    15.2(4)E6             15.2(7)E2                   SWITCH-MGK-A1   10.1.1.39   EEEEEE  software_02_Jul_08_14_40.xlsx   WS-C2960C-12PC-L             IOS    15.2(4)E6  End of Vulnerability Support             15.2(7)E2                <NA>                <NA>   10.1.1.39     2023-04-30     2018-05-01  https://www.cisco.com/c/en/us/products/collateral/switches/catalyst-4500-series-switches/eos-eol-notice-c51-739919.html  hardware_02_Jul_07_25_04.xlsx  10.1.1.39     Chassis  WS-C2960C-12PC-L  EoL Date Announced  EOL in more than 24 months  WS-C2960L-16PS-LL                 None                  920.7                              50                          PSUT                         215.16                     122.76                                       0     2025-10-31     2020-10-30           https://www.cisco.com/c/en/us/products/switches/catalyst-2960-c-series-switches/eos-eol-notice-c51-743071.html  coverage_24_Jul_10_37_26.xlsx   Chassis   WS-C2960C-12PC-L   10.1.1.39  Uncovered with ELLW        No Contract        No Contract                 NaT               NaT     None
 inventory_30_Jun_15_19_35.xlsx         3     Chassis  Switches     Cisco Catalyst 2960-X Series Switches  WS-C2960X-48LPS-L             IOS    15.2(4)E6             15.2(7)E2               SWITCH-SRVROOM-A1    10.1.1.2   RRRRRR  software_02_Jul_07_54_15.xlsx  WS-C2960X-48LPS-L             IOS    15.2(4)E6  End of Vulnerability Support             15.2(7)E2                <NA>                <NA>    10.1.1.2     2023-04-30     2018-05-01  https://www.cisco.com/c/en/us/products/collateral/switches/catalyst-4500-series-switches/eos-eol-notice-c51-739919.html                                                           <NA>         <NA>        <NA>              <NA>                <NA>                        <NA>               <NA>                 None                    NaN                            <NA>                          <NA>                            NaN                        NaN                                    <NA>            NaT            NaT                                                                                                                     <NA>  coverage_24_Jul_10_06_49.xlsx   Chassis  WS-C2960X-48LPS-L    10.1.1.2    Covered - Non-IBM  Covered - Non-IBM  Covered - Non-IBM                 NaT               NaT     None
 inventory_30_Jun_15_20_39.xlsx         3     Chassis  Switches       Cisco Catalyst 3850 Series Switches     WS-C3850-24P-S          IOS-XE       16.3.7                16.9.5               SWITCH-SRVROOM-C1    10.2.1.254   TTTTTT  software_02_Jul_07_54_33.xlsx     WS-C3850-24P-S          IOS-XE       16.3.7            End of Engineering                16.9.5                <NA>                <NA>    10.2.1.3     2023-07-31     2018-08-01  https://www.cisco.com/c/en/us/products/collateral/switches/catalyst-3850-series-switches/eos-eol-notice-c51-740255.html  software_02_Jul_07_02_48.xlsx   10.2.1.254        <NA>              <NA>  End of Engineering                        <NA>               <NA>                 None                    NaN                            <NA>                          <NA>                            NaN                        NaN                                    <NA>     2023-07-31     2018-08-01  https://www.cisco.com/c/en/us/products/collateral/switches/catalyst-3850-series-switches/eos-eol-notice-c51-740255.html  coverage_24_Jul_10_07_28.xlsx   Chassis     WS-C3850-24P-S    10.2.1.254    Covered - Non-IBM  Covered - Non-IBM  Covered - Non-IBM                 NaT               NaT     None
 software_30_Jun_15_21_13.xlsx         1        <NA>  Security  Cisco ASA 5500-X with FirePOWER Services               <NA>             ASA      9.7(1)4        9.12.3 Interim  SRVROOM-FW2.umbrellacorp.com  10.60.127.19   YYYYYY  software_02_Jul_07_55_54.xlsx         ASA5506-K9             ASA      9.7(1)4            End of Engineering        9.12.3 Interim       9.8.4 Interim                <NA>  10.1.122.9     2022-08-31     2017-08-25          http://www.cisco.com/c/en/us/products/collateral/security/asa-firepower-services/eos-eol-notice-c51-738646.html                                                           <NA>         <NA>        <NA>              <NA>                <NA>                        <NA>               <NA>                 None                    NaN                            <NA>                          <NA>                            NaN                        NaN                                    <NA>            NaT            NaT                                                                                                                     <NA>  coverage_24_Jul_10_07_48.xlsx   Chassis         ASA5506-K9  10.60.127.19    Covered - Non-IBM  Covered - Non-IBM  Covered - Non-IBM                 NaT               NaT     None
 software_30_Jun_15_21_13.xlsx         1        <NA>  Security  Cisco ASA 5500-X with FirePOWER Services               <NA>             ASA      9.7(1)4        9.12.3 Interim  FW2.umbrellacorp.com  10.60.127.18   GGGGGGG  software_02_Jul_07_55_54.xlsx         ASA5506-K9             ASA      9.7(1)4            End of Engineering        9.12.3 Interim       9.8.4 Interim                <NA>  10.1.122.8     2022-08-31     2017-08-25          http://www.cisco.com/c/en/us/products/collateral/security/asa-firepower-services/eos-eol-notice-c51-738646.html                                                           <NA>         <NA>        <NA>              <NA>                <NA>                        <NA>               <NA>                 None                    NaN                            <NA>                          <NA>                            NaN                        NaN                                    <NA>            NaT            NaT                                                                                                                     <NA>  coverage_24_Jul_10_07_48.xlsx   Chassis         ASA5506-K9  10.60.127.18    Covered - Non-IBM  Covered - Non-IBM  Covered - Non-IBM                 NaT               NaT     None

【问题讨论】:

您问题的格式意味着我看不到您的列名是IP_Address 还是1.1.1.1。如果前者在合并之前重命名列以包含工作表名称,我将使用的方法。即IP_Address_Coverage 我尝试使用 ASCII 表生成器,但它总是给我这个输出。你知道如何正确格式化表格@RobRaymond吗?如果我更改列名,我仍然会在合并时获得 4 个不同的 IP 列。我想要一个。 print(df.to_string(index=False)) 是我使用的然后粘贴到三个反引号之间的帖子中 @RobRaymond 谢谢 【参考方案1】:

从使用functools 的高级技术开始。将inspect 添加到组合get variable name

    遍历您的数据框列表。捕获名称并重命名 IP 地址 列 一旦合并数据框重命名最左边的IP地址fillna() 来自其他 IP 地址 列并删除它们
import inspect
import functools

def retrieve_name(var):
    callers_local_vars = inspect.currentframe().f_back.f_locals.items()
    return [var_name for var_name, var_val in callers_local_vars if var_val is var]

data_frames = [InventoryDf, SoftwareDf, HardwareDf, CoverageDf]
names = []
for df in data_frames:
    n = retrieve_name(df)[1].replace("Df", "")
    names.append(n)
    df.columns = [f"n c" if c=="IP Address" else c for c in df.columns]
# merge = functools.partial(pd.merge, on=['Priority','Category','Product Family','Host Name','Serial Number'], how='outer')
merge = functools.partial(pd.merge, on=['Priority','Category','Host Name','Serial Number'], how='outer')

merge = functools.reduce(merge, data_frames)

# take column LHS IP Address and rename it to "IP Address", fillna() from all subsequent columns
# then drop them
merge.rename(columns=f"names[0] IP Address":"IP Address", inplace=True)
for n in names[1:]:
    merge.loc[:,"IP Address"].fillna(merge.loc[:,f"n IP Address"], inplace=True)
    merge.drop(columns=f"n IP Address", inplace=True)
    

【讨论】:

以上是关于熊猫多重合并创建多维重复列的主要内容,如果未能解决你的问题,请参考以下文章

合并Dataframe(熊猫)中的所有列-python3 [重复]

合并两个数据框而不重复熊猫

用熊猫读取和合并文件[重复]

从熊猫列中的列表创建多列[重复]

熊猫在合并时强制到数据帧中的后缀

熊猫仅分箱时间列而不是自定义范围中的日期[重复]