将 col 添加到 pd.df 并从第二个 df 查找值 [重复]
Posted
技术标签:
【中文标题】将 col 添加到 pd.df 并从第二个 df 查找值 [重复]【英文标题】:Adding col to pd.df with value looked up from second df [duplicate] 【发布时间】:2019-01-12 10:42:12 【问题描述】:我希望向我从第二个 df (df2) 查找的 df 添加一个新列。 df:
code date settlement strike type
0 CBT_21_G2015_S 2015-01-02 1.343750 126.0 C
1 CBT_21_G2015_S 2015-01-02 4.359375 131.5 P
2 CBT_21_G2015_S 2015-01-02 24.671875 102.5 C
3 CBT_21_G2015_S 2015-01-02 0.015625 110.5 P
4 CBT_21_G2015_S 2015-01-02 0.015625 101.0 P
5 CBT_21_G2015_S 2015-01-02 0.015625 140.5 C
6 CBT_21_G2015_S 2015-01-02 10.671875 116.5 C
7 CBT_21_G2015_S 2015-01-02 0.015625 123.5 P
8 CBT_21_F2015_S 2015-01-02 3.875000 131.0 P
9 CBT_21_F2015_S 2015-01-02 0.015625 145.0 C
第二个df(df2):
code expiry_date
id
319 CBT_21_F2013_S 2012-12-21
320 CBT_21_F2014_S 2013-12-27
321 CBT_21_F2015_S 2014-12-26
324 CBT_21_G2012_S 2012-01-27
325 CBT_21_G2013_S 2013-01-25
326 CBT_21_G2014_S 2014-01-24
327 CBT_21_G2015_S 2015-01-23
330 CBT_21_H2012_S 2012-02-24
331 CBT_21_H2013_S 2013-02-22
332 CBT_21_H2014_S 2014-02-21
要添加到 df 的列是“代码”的“到期日期”。要查找到期日期: df2.loc[df2.code == df.code].expiry_date
所以想要的输出应该是这样的:
code date settlement strike type expiry
0 CBT_21_G2015_S 2015-01-02 1.343750 126.0 C 2015-01-23
1 CBT_21_G2015_S 2015-01-02 4.359375 131.5 P 2015-01-23
2 CBT_21_G2015_S 2015-01-02 24.671875 102.5 C 2015-01-23
3 CBT_21_G2015_S 2015-01-02 0.015625 110.5 P 2015-01-23
4 CBT_21_G2015_S 2015-01-02 0.015625 101.0 P 2015-01-23
5 CBT_21_G2015_S 2015-01-02 0.015625 140.5 C 2015-01-23
6 CBT_21_G2015_S 2015-01-02 10.671875 116.5 C 2015-01-23
7 CBT_21_G2015_S 2015-01-02 0.015625 123.5 P 2015-01-23
8 CBT_21_F2015_S 2015-01-02 3.875000 131.0 P 2014-12-26
9 CBT_21_F2015_S 2015-01-02 0.015625 145.0 C 2014-12-26
最简单的方法是什么?
【问题讨论】:
【参考方案1】:IIUC,可以使用索引匹配
df = df.set_index('code')
df['expiry'] = df2.set_index('code')['expiry_date']
df.reset_index()
code date settlement strike type expiry
0 CBT_21_G2015_S 2015-01-02 1.343750 126.0 C 2015-01-23
1 CBT_21_G2015_S 2015-01-02 4.359375 131.5 P 2015-01-23
2 CBT_21_G2015_S 2015-01-02 24.671875 102.5 C 2015-01-23
3 CBT_21_G2015_S 2015-01-02 0.015625 110.5 P 2015-01-23
4 CBT_21_G2015_S 2015-01-02 0.015625 101.0 P 2015-01-23
5 CBT_21_G2015_S 2015-01-02 0.015625 140.5 C 2015-01-23
6 CBT_21_G2015_S 2015-01-02 10.671875 116.5 C 2015-01-23
7 CBT_21_G2015_S 2015-01-02 0.015625 123.5 P 2015-01-23
8 CBT_21_F2015_S 2015-01-02 3.875000 131.0 P 2014-12-26
9 CBT_21_F2015_S 2015-01-02 0.015625 145.0 C 2014-12-26
【讨论】:
以上是关于将 col 添加到 pd.df 并从第二个 df 查找值 [重复]的主要内容,如果未能解决你的问题,请参考以下文章
我想写两个plsql程序。在一个过程中获取数据并从第二个过程中打印出来
Pyspark:内部连接两个 pyspark 数据帧并从第一个数据帧中选择所有列,从第二个数据帧中选择几列