将两个熊猫数据框组合在一起Python [重复]
Posted
技术标签:
【中文标题】将两个熊猫数据框组合在一起Python [重复]【英文标题】:Combining two pandas dataframes together Python [duplicate] 【发布时间】:2021-11-08 05:14:42 【问题描述】:下面的代码计算与日期 month_changes
和 month_changes_2 相关的 vals
和 vals_2 值的 list_val
。该代码通过分隔年份间隔来计算 mean','median' or 'max','min'
。我想对将两个输出图和 graph_2 组合在一起并返回下面的预期输出的代码进行一些操作,我将如何做到这一点?下面的代码来自这个问题的答案:link
代码:
import numpy as np
import pandas as pd
month_changes = np.array(["2018-04-01 00:00:00", "2018-05-01 00:00:00", "2019-03-01 00:00:00", "2019-04-01 00:00:00","2019-08-01 00:00:00", "2019-11-01 00:00:00", "2019-12-01 00:00:00","2021-01-01 00:00:00"])
vals = np.array([10, 23, 45, 4,5,12,4,-6])
month_changes_2 = np.array(["2018-04-06 00:00:00", "2018-05-13 00:00:00", "2018-03-01 00:00:00", "2019-02-01 00:00:00","2019-03-12 00:00:00", "2019-12-01 00:00:00", "2019-12-22 00:00:00","2020-04-01 00:00:00","2021-01-01 00:00:00"])
vals_2 = np.array([140, 213, 15, 4,53,1,42,-63,120])
list_val = ['mean', 'median', 'max', 'min']
def yearly_intervals(mc, vs, start_year, end_year,series_val):
print(series_val)
data = pd.DataFrame(
"Date": pd.to_datetime(mc), # Convert to_datetime immediately
"Averages": vs
)
out = (
data.groupby(data["Date"].dt.year)["Averages"] # Access Series
.agg(list_val[series_val[0]:series_val[-1]])
.rename(columns=lambda x: 'Average' if x == 'mean' else x.title())
)
# If start_year
if start_year is not None:
# Reindex to ensure index contains all years in range
out = out.reindex(range(
start_year,
# Use last year (maximum value) from index or user defined arg
(end_year if end_year is not None else out.index.max()) + 1
), fill_value=0)
return out
graph= yearly_intervals(month_changes, vals, start_year=2016, end_year=2021,series_val=[0,2])
graph_2= yearly_intervals(month_changes_2, vals_2, start_year=2016, end_year=2021,series_val = [2,4])
输出:
Average Median
Date
2016 0.0 0.0
2017 0.0 0.0
2018 16.5 16.5
2019 14.0 5.0
2020 0.0 0.0
2021 -6.0 -6.0
Max Min
Date
2016 0 0
2017 0 0
2018 213 15
2019 53 1
2020 -63 -63
2021 120 120
预期输出
Average Median Max Min
Date
2016 0.0 0.0 0 0
2017 0.0 0.0 0 0
2018 16.5 16.5 213 15
2019 14.0 5.0 53 1
2020 0.0 0.0 -63 -63
2021 -6.0 -6.0 120 120
【问题讨论】:
df1.join(df2)
?
【参考方案1】:
这样的?
import pandas as pd
df1 = pd.DataFrame(
'Average' : [0.0, 0.0, 16.5],
'Median' : [0.0, 0.0, 16.5]
, index=[2016, 2017, 2018])
df2 = pd.DataFrame(
'Max' : [0, 0, 213],
'Min' : [0, 0, 15]
, index= [2016, 2017, 2018])
print(df1)
print(df2)
df = pd.concat([df1, df2], axis=1)
print(df)
【讨论】:
【参考方案2】:我假设您已经创建并处理了两个数据框 graph 和 graph_2。
试试这个
combined_df = pd.concat([graph, graph_2], axis=1)
print(combined_df)
它会输出:
Average Median Max Min
Date
2016 0.0 0.0 0 0
2017 0.0 0.0 0 0
2018 16.5 16.5 213 15
2019 14.0 5.0 53 1
2020 0.0 0.0 -63 -63
2021 -6.0 -6.0 120 120
【讨论】:
【参考方案3】:只需使用您现有的工作并运行graph.join(graph_2)
:
import numpy as np
import pandas as pd
month_changes = np.array(["2018-04-01 00:00:00", "2018-05-01 00:00:00", "2019-03-01 00:00:00", "2019-04-01 00:00:00","2019-08-01 00:00:00", "2019-11-01 00:00:00", "2019-12-01 00:00:00","2021-01-01 00:00:00"])
vals = np.array([10, 23, 45, 4,5,12,4,-6])
month_changes_2 = np.array(["2018-04-06 00:00:00", "2018-05-13 00:00:00", "2018-03-01 00:00:00", "2019-02-01 00:00:00","2019-03-12 00:00:00", "2019-12-01 00:00:00", "2019-12-22 00:00:00","2020-04-01 00:00:00","2021-01-01 00:00:00"])
vals_2 = np.array([140, 213, 15, 4,53,1,42,-63,120])
list_val = ['mean', 'median', 'max', 'min']
def yearly_intervals(mc, vs, start_year, end_year,series_val):
print(series_val)
data = pd.DataFrame(
"Date": pd.to_datetime(mc), # Convert to_datetime immediately
"Averages": vs
)
out = (
data.groupby(data["Date"].dt.year)["Averages"] # Access Series
.agg(list_val[series_val[0]:series_val[-1]])
.rename(columns=lambda x: 'Average' if x == 'mean' else x.title())
)
# If start_year
if start_year is not None:
# Reindex to ensure index contains all years in range
out = out.reindex(range(
start_year,
# Use last year (maximum value) from index or user defined arg
(end_year if end_year is not None else out.index.max()) + 1
), fill_value=0)
return out
graph= yearly_intervals(month_changes, vals, start_year=2016, end_year=2021,series_val=[0,2])
graph_2= yearly_intervals(month_changes_2, vals_2, start_year=2016, end_year=2021,series_val = [2,4])
print(graph.join(graph_2))
打印出来的
[0, 2]
[2, 4]
Average Median Max Min
Date
2016 0.0 0.0 0 0
2017 0.0 0.0 0 0
2018 16.5 16.5 213 15
2019 14.0 5.0 53 1
2020 0.0 0.0 -63 -63
2021 -6.0 -6.0 120 120
【讨论】:
以上是关于将两个熊猫数据框组合在一起Python [重复]的主要内容,如果未能解决你的问题,请参考以下文章