df.join() 出现问题：ValueError：您正在尝试合并对象和 int64 列

Posted 2023-02-24

技术标签:

【中文标题】df.join() 出现问题：ValueError：您正在尝试合并对象和 int64 列【英文标题】：Trouble with df.join(): ValueError: You are trying to merge on object and int64 columns 【发布时间】：2020-01-07 18:15:43 【问题描述】：

这些问题都没有解决这个问题：Question 1 和 Question 2 我也无法在 pandas 文档中找到答案。

您好，我正在尝试查找此错误的根本原因：

ValueError: You are trying to merge on object and int64 columns.

我知道我可以使用 pandas concat 或 merge 函数解决此问题，但我正在尝试了解错误的原因。问题是：为什么我会得到这个ValueError？

这是使用的两个数据帧上 head(5) 和 info() 的输出。

print(the_big_df.head(5)) 输出：

  account  apt  apt_p  balance       date  day    flag  month  reps     reqid  year
0  AA0420    0    0.0  -578.30 2019-03-01    1       1      3    10  82f2d761  2019
1  AA0420    0    0.1  -578.30 2019-03-02    2       1      3    10  82f2d761  2019
2  AA0420    0    0.1  -578.30 2019-03-03    3       1      3    10  82f2d761  2019
3  AA0421    0    0.1  -607.30 2019-03-04    4       1      3    10  82f2d761  2019
4  AA0421    0    0.1  -610.21 2019-03-05    5       1      3    10  82f2d761  2019

print(the_big_df.info()) 输出：

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 36054 entries, 0 to 36053
Data columns (total 11 columns):
account        36054 non-null object
apt            36054 non-null int64
apt_p          36054 non-null float64
balance        36054 non-null float64
date           36054 non-null datetime64[ns]
day            36054 non-null int64
flag           36054 non-null int64
month          36054 non-null int64
reps           36054 non-null int32
reqid          36054 non-null object
year           36054 non-null int64
dtypes: datetime64[ns](1), float64(2), int32(1), int64(5), object(2)
memory usage: 3.2+ MB

这是我传递给join() 的数据框； print(df_to_join.head(5)):

      reqid     id
0  54580f39  13301
1  3ba905c0  77114
2  5f2d80da  13302
3  a1478e98  77115
4  9b09854b  78598

print(df_to_join.info()) 输出：

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 14332 entries, 0 to 14331
Data columns (total 2 columns):
reqid    14332 non-null object
dni      14332 non-null object

上述 4 次打印之后的确切下一行是：

the_max_df = the_big_df.join(df_to_join,on='reqid')

如前所述，输出是：

ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

为什么会发生这种情况，在明确指出列reqid 是两个数据帧中的对象之前？谢谢。

【问题讨论】：

【参考方案1】：

使用pandas.DataFrame.merge。

见docs for merge

【讨论】：

感谢您的回答。我在问题中写道，我知道我可以通过合并解决它，但想了解我使用 .join() 所面临的问题【参考方案2】：

这里的问题是对连接如何工作的误解：当你说the_big_df.join(df_to_join,on='reqid') 它并不意味着加入the_big_df.reqid == df_to_join.reqid 就像乍一看那样，而是加入the_big_df.reqid == df_to_join.index。由于requid 的类型为object，而索引的类型为int64，您会收到错误消息。

见docs for join：

在索引或键列上将列与其他 DataFrame 连接。 ...on：str，str 列表，或类似数组，可选调用者中的列或索引级别名称加入 other 中的索引，否则加入 index-on-index。

看下面的例子：

df1 = pd.DataFrame('id1': [1, 2], 'val1': [11,12])
df2 = pd.DataFrame('id2': [3, 4], 'val2': [21,22])
print(df1)
#   id1  val1
#0    1    11
#1    2    12
print(df2)
#   id2  val2
#0    3    21
#1    4    22

# join on df1.id1 (int64) == df2.index (int64) 
print(df1.join(df2, on='id1'))
#   id1  val1  id2  val2
#0    1    11  4.0  22.0
#1    2    12  NaN   NaN

# now df3 same as df1 but id3 as object:
df3 = pd.DataFrame('id3': ['1', '2'], 'val1': [11,12])

# try to join on df3.id3 (object) == df2.index (int64) 
df3.join(df2, on='id3')
#ValueError: You are trying to merge on object and int64 columns. If you wish to proceed you should use pd.concat

请注意：以上内容适用于现代版本的熊猫。 20.3 版给出了以下结果：

>>> df3.join(df2, on='id3')
  id3  val1  id2  val2
0   1    11  NaN   NaN
1   2    12  NaN   NaN

【讨论】：

这正是我一直在寻找的答案，您将在 18 小时后获得赏金。非常感谢。感谢您让我知道这不是 SQL 之类的联接操作。【参考方案3】：

您为什么不对两个 reqid 列进行 astype(str) 并查看它是否仍然是一个问题。

【讨论】：

我明天试试！他们应该给定格式的字符串。 info 方法显示它们是，但 join 表示不同！

以上是关于df.join() 出现问题：ValueError：您正在尝试合并对象和 int64 列的主要内容，如果未能解决你的问题，请参考以下文章