当我尝试合并具有相同行大小的数据时，问题增加了行

Posted 2023-03-16

技术标签:

【中文标题】当我尝试合并具有相同行大小的数据时，问题增加了行【英文标题】：The problem is increased the row when I try to merge data that have the same row size 【发布时间】：2021-11-28 05:06:19 【问题描述】：

我们想使用合并的方法将两个修改后的数据框合并为一个数据。每个数据框的形状为 16598 行 × 6 列。预期结果为（16598 行 × 6 列）。但是，合并后的结果是（16602 行 × 7 列），行数增加了 4。我使用的代码如下。

total_data = pd.merge(data_01,data_02,on=['Name',   'Platform', 'Year', 'Genre', 'Publisher'])

更具体一点..

“data_01”的列名称为“Name”、“Platform”、“Year”、“Genre”、“Publisher”和“NA_Sales”。（16598行×6列）

“data_02”的列名称为“Name”、“Platform”、“Year”、“Genre”、“Publisher”和“EU_Sales”。（16598行×6列）

两个数据框仅在索引号和数据行顺序上有所不同，'Name'、'Platform'、'Year'、'Genre'和'Publisher'的值相同。

只有“NA_Sales”、“EU_Sales”和“Year”的值是数字，其余的都是对象类型。

我想做什么... 我想制作一个 DataFrame（16598 行 × 7 列）来组合 Data01 和 Data02。但是，该列不断增加。

data_01（16598 行 × 6 列）

        Name      Platform      Year    Genre      Publisher    NA_Sales
1       Candace..    DS        2008.0   Action     Destineer     40.0
2       The Mun..    Wii       2009.0   Action     Namco..       170.0
3       Otome ..     PS        2010.0   Adventure  Alchemist     0.0
4       Deal..       DS        2010.0   Misc       Zoo Games     40.0
5       Ben 10..     PS3       2010.0   Platform   D3Publisher   120.0
... ... ... ... ... ... ...
16331   Midway..     PS2       2003.0   Misc      Midway Games   720000.0
16409   NASCAR..     PS2       2005.0   Racing    Electronic..   530000.0
16483   Super..      SAT       1998.0   Strategy  Banpresto      0.0
16493   Morta..      PSV       2012.0   Fighting  Warner Bros.   470000.0
16579   Gex:..       PS        1998.0   Platform    BMG...       320000.0

data_02（16598行×6列）

       Name     Platform      Year    Genre      Publisher   EU_Sales
1     Candace..   DS         2008.0   Action     Destineer     0.0
2     The..       Wii        2009.0   Action     Namco ...     0.0
3     Otome..     PSP        2010.0   Adventure  Alchemi..     0.0
4     Deal or..   DS         2010.0   Misc       Zoo Games     0.0
5     Ben 10..    PS3        2010.0   Platform   D3Publisher   90.0
... ... ... ... ... ... ...
16348  Aladdin..  Wii        2011.0   Racing     Big..         0.0
16375   Kill...   XB         2003.0   Shooter   Namco..        50000.0
16385   Tomb..    PS2        2009.0   Action    Eidos..        40000.0
16526   Planet..  GBA        2001.0   Action    Titus          0.0
16572   Koihime.. PS4        2016.0   Fighting  Yeti           0.0

【问题讨论】：

我对你想要基于

The result was expected to be (16598 rows × 6 columns). However, the combined result was (16602 rows × 7 columns), and the number of rows increased by four.

实现的目标有点困惑，看来你想要类似于 LEFT JOIN 的东西；另一方面，您的最后一句

What I want to make.. Based on the five common columns, one data frame that combines "NA_Sales", "EU_Sales" columns accordingly.(16598 rows × 7 columns)

请澄清并修复 df_02 名称。如果您使用 SQL JOIN 示例来说明所需的结果，可能会更容易嗨，你能给我们提供一些 df_01 和 df_02 的例子吗？例如 df_01.head(5) 和 df_02.head(5) 的输出。此外，将所需的输出添加为数据框格式也会很棒。我还是不擅长表达我的意见，对不起。我修改了我的问题所需的东西:) 【参考方案1】：

我想我理解通过Name 到Publisher 的数据在两个表索引中是相同的。

所以只需合并一个数据框的所有内容和另一列的一列。

total_data = pd.merge(data_01, data_02.EU_Sales, left_index=True, right_index=True)

【讨论】：

以上是关于当我尝试合并具有相同行大小的数据时，问题增加了行的主要内容，如果未能解决你的问题，请参考以下文章

datatable 动态合并相同行单元格

动态列的 datatable 中相同行的数据合并，求代码方法

Spark SQL 具有相同行的不同分区

easyui datagrid 合并相同行

合并具有不同行的熊猫数据框？ [复制]

SwiftUI：如何创建具有相同行数和列数的 LazyGrid？