熊猫多重合并创建多维重复列
Posted
技术标签:
【中文标题】熊猫多重合并创建多维重复列【英文标题】:Pandas multiple merge creates multidimensional duplicate columns 【发布时间】:2020-12-04 12:11:31 【问题描述】:我的目标是根据相似的主机名、序列号、类别将 4 个 excel 工作表合并为 1 个...我正在使用下面的 pandas 合并功能。
InventoryDf = pd.read_excel("Inventory.xlsx", sheet_name='Inventory')
SoftwareDf = pd.read_excel("Inventory.xlsx", sheet_name='Software')
HardwarewareDf = pd.read_excel("Inventory.xlsx", sheet_name='Hardware')
CoverageDf = pd.read_excel("Inventory.xlsx", sheet_name='Coverage')
data_frames = [InventoryDf, SoftwareDf, HardwarewareDf, CoverageDf]
merge = partial(pd.merge, on=['Priority','Category','Product Family','Host Name','Serial Number'], how='outer')
merge = reduce(merge, data_frames)
问题在于每个工作表都有一个“IP 地址”列,其中的 IP 大多相似。由于某种原因,合并数据框包含 4 列,有 2 个重复名称:“IP Address_x”、“IP Address_x”、“IP Address_y”、“IP Address_y”
我想将这 4 列合并为 1,但我不能,因为它们的名称相似。我没有手动重命名它们,因为有大约 30 个数据框列,而且很乏味。
有没有办法合并它们以便:
-
如果ip相同,合并一下
如果 IP 不同,请使用左侧第一个“IP 地址_x”列
如果缺少一列,如果 IP 不为空,则只显示第一个“IP Address_x”
这是工作表的示例,我有更多列,例如:名称、网址、站点名称、城市...
InventoryDf
+-----------+---------------+------------+----------+----------+
| Host Name | Serial Number | IP Address | Priority | Category |
+-----------+---------------+------------+----------+----------+
| SwitchA | 1230 | 1.1.1.1 | 1 | Switch |
+-----------+---------------+------------+----------+----------+
| SwitchA | 1231 | 1.1.1.1 | 1 | Switch |
+-----------+---------------+------------+----------+----------+
| SwitchB | 1240 | 1.1.1.2 | 2 | Switch |
+-----------+---------------+------------+----------+----------+
HardwareDf
+-----------+---------------+------------+----------+----------+
| Host Name | Serial Number | IP Address | Priority | Category |
+-----------+---------------+------------+----------+----------+
| SwitchA | 1230 | 1.1.0.1 | 1 | Switch |
+-----------+---------------+------------+----------+----------+
| SwitchD | 1250 | 1.2.2.2 | 1 | Switch |
+-----------+---------------+------------+----------+----------+
| SwitchE | 1260 | 1.3.3.3 | 2 | Switch |
+-----------+---------------+------------+----------+----------+
SoftwareDf
+-----------+---------------+------------+----------+----------+---------+
| Host Name | Serial Number | IP Address | Priority | Category | Version |
+-----------+---------------+------------+----------+----------+---------+
| SwitchA | 1230 | 1.1.1.1 | 1 | Switch | X |
+-----------+---------------+------------+----------+----------+---------+
| SwitchA | 1231 | 1.1.1.1 | 1 | Switch | X |
+-----------+---------------+------------+----------+----------+---------+
| SwitchB | 1240 | 1.1.1.2 | 2 | Switch | Y |
+-----------+---------------+------------+----------+----------+---------+
CoverageDf
+-----------+---------------+------------+----------+----------+-------------+-------+
| Host Name | Serial Number | IP Address | Priority | Category | Coverage | Price |
+-----------+---------------+------------+----------+----------+-------------+-------+
| SwitchA | 1230 | 1.1.1.1 | 1 | Switch | Not Covered | 100 |
+-----------+---------------+------------+----------+----------+-------------+-------+
| SwitchA | 1231 | 1.1.1.1 | 1 | Switch | Covered | 300 |
+-----------+---------------+------------+----------+----------+-------------+-------+
| SwitchB | 1240 | 1.1.1.2 | 2 | Switch | Not Covered | 200 |
+-----------+---------------+------------+----------+----------+-------------+-------+
预期结果(IP 地址被合并,即使 SwitchA 的一些不同)
+-----------+---------------+------------+----------+----------+---------+-------------+-------+
| Host Name | Serial Number | IP Address | Priority | Category | Version | Coverage | Price |
+-----------+---------------+------------+----------+----------+---------+-------------+-------+
| SwitchA | 1230 | 1.1.1.1 | 1 | Switch | X | Not Covered | 100 |
+-----------+---------------+------------+----------+----------+---------+-------------+-------+
| SwitchA | 1231 | 1.1.1.1 | 1 | Switch | X | Covered | 300 |
+-----------+---------------+------------+----------+----------+---------+-------------+-------+
| SwitchB | 1240 | 1.1.1.2 | 2 | Switch | Y | Not Covered | 200 |
+-----------+---------------+------------+----------+----------+---------+-------------+-------+
| SwitchD | 1250 | 1.2.2.2 | 1 | Switch | | | |
+-----------+---------------+------------+----------+----------+---------+-------------+-------+
| SwitchE | 1260 | 1.3.3.3 | 2 | Switch | | | |
+-----------+---------------+------------+----------+----------+---------+-------------+-------+
结果的原始提取。注意冗余列的丢失,IP 地址_x
Source.Name_x Priority Item Type_x Category Product Family Product ID_x Software Type_x OS Version_x Suggested Version 1_x Host Name IP Address_x Serial Number Source.Name_y Product ID_y Software Type_y OS Version_y Current Milestone_x Suggested Version 1_y Suggested Version 2 Suggested Version 3 IP Address_y SW End of Life SW End of Sale URL_x Source.Name_x IP Address_x Item Type_y Product ID_x Current Milestone_y Hardware Lifecycle Status Replacement PID Replacement PID Info Replacement PID Price Replacement PID Price Discount Replacement PID Service Level Replacement PID Service Price Current PID Service Price Replacement PID Service Price Discount HW End of Life HW End of Sale URL_y Source.Name_y Item Type Product ID_y IP Address_y Coverage Status Contract Status Contract Number Coverage Start Date Coverage End Date SLA type
inventory_30_Jun_15_19_35.xlsx 3 Chassis Switches Cisco Catalyst 2960-X Series Switches WS-C2960X-24PS-L ios 15.2(4)E6 15.2(7)E2 SWITCH-IDF5-A1 10.1.1.8 XXXXX software_02_Jul_07_54_15.xlsx WS-C2960X-24PS-L IOS 15.2(4)E6 End of Vulnerability Support 15.2(7)E2 <NA> <NA> 10.1.1.8 2023-04-30 2018-05-01 https://www.cisco.com/c/en/us/products/collateral/switches/catalyst-4500-series-switches/eos-eol-notice-c51-739919.html <NA> <NA> <NA> <NA> <NA> <NA> <NA> None NaN <NA> <NA> NaN NaN <NA> NaT NaT <NA> coverage_24_Jul_10_06_49.xlsx Chassis WS-C2960X-24PS-L 10.1.1.8 Covered - Non-IBM Covered - Non-IBM Covered - Non-IBM NaT NaT None
inventory_30_Jun_15_19_35.xlsx 3 Chassis Switches Cisco Catalyst 2960-X Series Switches WS-C2960X-48LPS-L IOS 15.2(4)E6 15.2(7)E2 SWITCH-IDF6-A1 10.1.1.9 YYYYY software_02_Jul_07_54_15.xlsx WS-C2960X-48LPS-L IOS 15.2(4)E6 End of Vulnerability Support 15.2(7)E2 <NA> <NA> 10.1.1.9 2023-04-30 2018-05-01 https://www.cisco.com/c/en/us/products/collateral/switches/catalyst-4500-series-switches/eos-eol-notice-c51-739919.html <NA> <NA> <NA> <NA> <NA> <NA> <NA> None NaN <NA> <NA> NaN NaN <NA> NaT NaT <NA> coverage_24_Jul_10_06_49.xlsx Chassis WS-C2960X-48LPS-L 10.1.1.9 Covered - Non-IBM Covered - Non-IBM Covered - Non-IBM NaT NaT None
inventory_30_Jun_15_19_35.xlsx 3 Chassis Switches Cisco Catalyst 2960-X Series Switches WS-C2960X-24PS-L IOS 15.2(4)E6 15.2(7)E2 SWITCH-IDF7-A1 10.1.1.11 ZZZZZZ software_02_Jul_07_54_15.xlsx WS-C2960X-24PS-L IOS 15.2(4)E6 End of Vulnerability Support 15.2(7)E2 <NA> <NA> 10.1.1.11 2023-04-30 2018-05-01 https://www.cisco.com/c/en/us/products/collateral/switches/catalyst-4500-series-switches/eos-eol-notice-c51-739919.html <NA> <NA> <NA> <NA> <NA> <NA> <NA> None NaN <NA> <NA> NaN NaN <NA> NaT NaT <NA> coverage_24_Jul_10_06_49.xlsx Chassis WS-C2960X-24PS-L 10.1.1.11 Covered - Non-IBM Covered - Non-IBM Covered - Non-IBM NaT NaT None
inventory_30_Jun_15_19_35.xlsx 3 Chassis Switches Cisco Catalyst 2960-X Series Switches WS-C2960X-24PS-L IOS 15.2(4)E6 15.2(7)E2 SWITCH-IDF8-A1 10.1.1.12 QQQQQ software_02_Jul_07_54_15.xlsx WS-C2960X-24PS-L IOS 15.2(4)E6 End of Vulnerability Support 15.2(7)E2 <NA> <NA> 10.1.1.12 2023-04-30 2018-05-01 https://www.cisco.com/c/en/us/products/collateral/switches/catalyst-4500-series-switches/eos-eol-notice-c51-739919.html <NA> <NA> <NA> <NA> <NA> <NA> <NA> None NaN <NA> <NA> NaN NaN <NA> NaT NaT <NA> coverage_24_Jul_10_06_49.xlsx Chassis WS-C2960X-24PS-L 10.1.1.12 Covered - Non-IBM Covered - Non-IBM Covered - Non-IBM NaT NaT None
inventory_30_Jun_15_19_35.xlsx 3 Chassis Switches Cisco Catalyst 2960-X Series Switches WS-C2960X-48LPS-L IOS 15.2(4)E6 15.2(7)E2 SWITCH-IDF9-A1 10.1.1.13 WWWWW software_02_Jul_07_54_15.xlsx WS-C2960X-48LPS-L IOS 15.2(4)E6 End of Vulnerability Support 15.2(7)E2 <NA> <NA> 10.1.1.13 2023-04-30 2018-05-01 https://www.cisco.com/c/en/us/products/collateral/switches/catalyst-4500-series-switches/eos-eol-notice-c51-739919.html <NA> <NA> <NA> <NA> <NA> <NA> <NA> None NaN <NA> <NA> NaN NaN <NA> NaT NaT <NA> coverage_24_Jul_10_06_49.xlsx Chassis WS-C2960X-48LPS-L 10.1.1.13 Covered - Non-IBM Covered - Non-IBM Covered - Non-IBM NaT NaT None
inventory_30_Jun_15_30_08.xlsx 3 Chassis Switches Cisco Catalyst 2960-C Series Switches WS-C2960C-12PC-L IOS 15.2(4)E6 15.2(7)E2 SWITCH-MGK-A1 10.1.1.39 EEEEEE software_02_Jul_08_14_40.xlsx WS-C2960C-12PC-L IOS 15.2(4)E6 End of Vulnerability Support 15.2(7)E2 <NA> <NA> 10.1.1.39 2023-04-30 2018-05-01 https://www.cisco.com/c/en/us/products/collateral/switches/catalyst-4500-series-switches/eos-eol-notice-c51-739919.html hardware_02_Jul_07_25_04.xlsx 10.1.1.39 Chassis WS-C2960C-12PC-L EoL Date Announced EOL in more than 24 months WS-C2960L-16PS-LL None 920.7 50 PSUT 215.16 122.76 0 2025-10-31 2020-10-30 https://www.cisco.com/c/en/us/products/switches/catalyst-2960-c-series-switches/eos-eol-notice-c51-743071.html coverage_24_Jul_10_37_26.xlsx Chassis WS-C2960C-12PC-L 10.1.1.39 Uncovered with ELLW No Contract No Contract NaT NaT None
inventory_30_Jun_15_19_35.xlsx 3 Chassis Switches Cisco Catalyst 2960-X Series Switches WS-C2960X-48LPS-L IOS 15.2(4)E6 15.2(7)E2 SWITCH-SRVROOM-A1 10.1.1.2 RRRRRR software_02_Jul_07_54_15.xlsx WS-C2960X-48LPS-L IOS 15.2(4)E6 End of Vulnerability Support 15.2(7)E2 <NA> <NA> 10.1.1.2 2023-04-30 2018-05-01 https://www.cisco.com/c/en/us/products/collateral/switches/catalyst-4500-series-switches/eos-eol-notice-c51-739919.html <NA> <NA> <NA> <NA> <NA> <NA> <NA> None NaN <NA> <NA> NaN NaN <NA> NaT NaT <NA> coverage_24_Jul_10_06_49.xlsx Chassis WS-C2960X-48LPS-L 10.1.1.2 Covered - Non-IBM Covered - Non-IBM Covered - Non-IBM NaT NaT None
inventory_30_Jun_15_20_39.xlsx 3 Chassis Switches Cisco Catalyst 3850 Series Switches WS-C3850-24P-S IOS-XE 16.3.7 16.9.5 SWITCH-SRVROOM-C1 10.2.1.254 TTTTTT software_02_Jul_07_54_33.xlsx WS-C3850-24P-S IOS-XE 16.3.7 End of Engineering 16.9.5 <NA> <NA> 10.2.1.3 2023-07-31 2018-08-01 https://www.cisco.com/c/en/us/products/collateral/switches/catalyst-3850-series-switches/eos-eol-notice-c51-740255.html software_02_Jul_07_02_48.xlsx 10.2.1.254 <NA> <NA> End of Engineering <NA> <NA> None NaN <NA> <NA> NaN NaN <NA> 2023-07-31 2018-08-01 https://www.cisco.com/c/en/us/products/collateral/switches/catalyst-3850-series-switches/eos-eol-notice-c51-740255.html coverage_24_Jul_10_07_28.xlsx Chassis WS-C3850-24P-S 10.2.1.254 Covered - Non-IBM Covered - Non-IBM Covered - Non-IBM NaT NaT None
software_30_Jun_15_21_13.xlsx 1 <NA> Security Cisco ASA 5500-X with FirePOWER Services <NA> ASA 9.7(1)4 9.12.3 Interim SRVROOM-FW2.umbrellacorp.com 10.60.127.19 YYYYYY software_02_Jul_07_55_54.xlsx ASA5506-K9 ASA 9.7(1)4 End of Engineering 9.12.3 Interim 9.8.4 Interim <NA> 10.1.122.9 2022-08-31 2017-08-25 http://www.cisco.com/c/en/us/products/collateral/security/asa-firepower-services/eos-eol-notice-c51-738646.html <NA> <NA> <NA> <NA> <NA> <NA> <NA> None NaN <NA> <NA> NaN NaN <NA> NaT NaT <NA> coverage_24_Jul_10_07_48.xlsx Chassis ASA5506-K9 10.60.127.19 Covered - Non-IBM Covered - Non-IBM Covered - Non-IBM NaT NaT None
software_30_Jun_15_21_13.xlsx 1 <NA> Security Cisco ASA 5500-X with FirePOWER Services <NA> ASA 9.7(1)4 9.12.3 Interim FW2.umbrellacorp.com 10.60.127.18 GGGGGGG software_02_Jul_07_55_54.xlsx ASA5506-K9 ASA 9.7(1)4 End of Engineering 9.12.3 Interim 9.8.4 Interim <NA> 10.1.122.8 2022-08-31 2017-08-25 http://www.cisco.com/c/en/us/products/collateral/security/asa-firepower-services/eos-eol-notice-c51-738646.html <NA> <NA> <NA> <NA> <NA> <NA> <NA> None NaN <NA> <NA> NaN NaN <NA> NaT NaT <NA> coverage_24_Jul_10_07_48.xlsx Chassis ASA5506-K9 10.60.127.18 Covered - Non-IBM Covered - Non-IBM Covered - Non-IBM NaT NaT None
【问题讨论】:
您问题的格式意味着我看不到您的列名是IP_Address
还是1.1.1.1
。如果前者在合并之前重命名列以包含工作表名称,我将使用的方法。即IP_Address_Coverage
我尝试使用 ASCII 表生成器,但它总是给我这个输出。你知道如何正确格式化表格@RobRaymond吗?如果我更改列名,我仍然会在合并时获得 4 个不同的 IP 列。我想要一个。
print(df.to_string(index=False))
是我使用的然后粘贴到三个反引号之间的帖子中
@RobRaymond 谢谢
【参考方案1】:
从使用functools
的高级技术开始。将inspect
添加到组合get variable name
-
遍历您的数据框列表。捕获名称并重命名 IP 地址 列
一旦合并数据框重命名最左边的IP地址列
fillna()
来自其他 IP 地址 列并删除它们
import inspect
import functools
def retrieve_name(var):
callers_local_vars = inspect.currentframe().f_back.f_locals.items()
return [var_name for var_name, var_val in callers_local_vars if var_val is var]
data_frames = [InventoryDf, SoftwareDf, HardwareDf, CoverageDf]
names = []
for df in data_frames:
n = retrieve_name(df)[1].replace("Df", "")
names.append(n)
df.columns = [f"n c" if c=="IP Address" else c for c in df.columns]
# merge = functools.partial(pd.merge, on=['Priority','Category','Product Family','Host Name','Serial Number'], how='outer')
merge = functools.partial(pd.merge, on=['Priority','Category','Host Name','Serial Number'], how='outer')
merge = functools.reduce(merge, data_frames)
# take column LHS IP Address and rename it to "IP Address", fillna() from all subsequent columns
# then drop them
merge.rename(columns=f"names[0] IP Address":"IP Address", inplace=True)
for n in names[1:]:
merge.loc[:,"IP Address"].fillna(merge.loc[:,f"n IP Address"], inplace=True)
merge.drop(columns=f"n IP Address", inplace=True)
【讨论】:
以上是关于熊猫多重合并创建多维重复列的主要内容,如果未能解决你的问题,请参考以下文章