循环列表字典并更新相应的列 - 熊猫
Posted
技术标签:
【中文标题】循环列表字典并更新相应的列 - 熊猫【英文标题】:Loop over a dictionary of list and update the corresponding columns - pandas 【发布时间】:2020-11-25 23:12:22 【问题描述】:我有一个df
和列表字典,如下所示。
Date Tea_Good Tea_bad coffee_good coffee_bad
2020-02-01 3 1 10 7
2020-02-02 3 1 10 7
2020-02-03 3 1 10 7
2020-02-04 3 1 10 7
2020-02-05 6 1 10 7
2020-02-06 6 2 10 11
2020-02-07 6 2 5 11
2020-02-08 6 2 5 11
2020-02-09 9 2 5 11
2020-02-10 9 2 4 11
2020-02-11 9 2 4 11
2020-02-12 9 2 4 11
2020-02-13 9 2 4 11
2020-02-14 9 2 4 11
dict
是
rf =
"tea":
[
"type": "linear",
"from": "2020-02-01T20:00:00.000Z",
"to": "2020-02-03T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1],
"case":"bad"
,
"type": "polynomial",
"from": "2020-02-08T20:00:00.000Z",
"to": "2020-02-10T20:00:00.000Z",
"days":3,
"coef":[0.1,0.1,0.1,0.1,0.1,0.1],
"case":"good"
],
"coffee": [
"type": "quadratic",
"from": "2020-02-01T20:00:00.000Z",
"to": "2020-02-10T20:00:00.000Z",
"days": 10,
"coef": [0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
"case":"good"
,
"type": "constant",
"from": "2020-02-11T20:00:00.000Z",
"to": "2020-02-13T20:00:00.000Z",
"days": 5,
"coef": [0.1, 0.1, 0.1, 0.1, 0.1, 0.1],
"case":"bad"
]
解释:
字典包含两个键
1. "tea"
2. "coffee"
根据键值我要更新df
的列。
1. Which column?
If key == "tea" and "case" == "bad" update the Tea_bad column
2. When?
"from": "2020-02-01T20:00:00.000Z",
"to": "2020-02-03T20:00:00.000Z"
3. How?
if "type": "linear",
when "from": "2020-02-01T20:00:00.000Z"
t = 0,
a0 = coef[0]
a1 = coef[1]
a2 = coef[2]
a3 = coef[3]
a4 = coef[4]
a5 = coef[5]
df.loc[(df['Date'] >= start_date) & (df['Date'] <= end_date), 'Tea_bad'] = a0 + a1 * t.
我尝试了下面的代码,但它不起作用。请不要查看代码。 如果你想以你自己的方式实施并帮助我。
def rf_user_input(df, REQUEST_OBJ):
'''
This functions returns the tea_coffee dataframe with the user input functions for tea, coffee
params: data : tea_coffee dataframe uploaded from user
request_object_api: The api should contain the below params
start_date: start date of the user function for rf
end_date : end date of the user function for the rf
label : 'constant', 'linear', 'quadratic', 'polynomial', 'exponential', 'df'
coef : list with 6 indexes [a0,a1,a2,a3,a4,a5]
return: rf computed with user inputs
'''
# df.days.iloc[(df[df.Date==start_date].index[0])]
df = df.sort_values(by='Date')
df['days'] = (df['Date'] - df.at[0, 'Date']).dt.days + 1
REQUIRED_KEYS = ["tea", "coffee"]
for teacoffee_category in REQUIRED_KEYS:
print(f" teacoffee_category - teacoffee_category")
if teacoffee_category in REQUEST_OBJ.keys():
param_obj_list = REQUEST_OBJ[teacoffee_category]
for params_obj in param_obj_list:
# Do the data processing
goodbad_catgeory = params_obj['case']
kind = teacoffee_category + '_' + goodbad_catgeory
start_date, end_date, label, coef, n_days = params_obj['from'], params_obj['to'], params_obj['type'], \
params_obj['coef'], params_obj['days']
start_date = DT.datetime.strptime(start_date, "%Y-%m-%dT%H:%M:%S.%fZ")
end_date = DT.datetime.strptime(end_date, "%Y-%m-%dT%H:%M:%S.%fZ")
print(f" start date - start_date")
print(f" end date - end_date")
# Additional n_days code - Start
first_date = df['Date'].min()
period_days = (start_date - first_date)
print(f" period day - period_days")
# Additional n_days code - End
# Checking 'start_date' , 'end_date' and 'n_days' conditions
# If the start_date and end_date is null return the calibration df as it is
if (start_date == 0) & (end_date == 0):
return df
if (start_date == 0) & (end_date != 0) & (n_days == 0):
return df
if (start_date != 0) & (end_date == 0) & (n_days == 0):
return df
# if start date, end date and n_days are non zero then consider start date and n_days
if (start_date != 0) & (end_date != 0) & (n_days != 0):
#n_days = (end_date - start_date).days
#n_days = (end_date - start_date).days
end_date = start_date + DT.timedelta(days=n_days)
if (start_date != 0) & (end_date != 0) & (n_days == 0) :
n_days = (end_date - start_date)
print(f" n day = n_days")
end_date = end_date
if (start_date != 0) & (end_date == 0) & (n_days != 0) :
#n_days = (end_date - start_date)
#print(f" n day = n_days")
end_date = start_date + DT.timedelta(days=n_days)
if (start_date == 0) & (end_date != 0) & (n_days != 0) :
start_date = end_date - DT.timedelta(days=n_days)
if (n_days != 0) & (start_date != 0):
end_date = start_date + DT.timedelta(days=n_days)
# If the start_date and end_date is null return the calibration df as it is
if len(coef) == 6:
# Coefficients Index Initializations
a0 = coef[0]
a1 = coef[1]
a2 = coef[2]
a3 = coef[3]
a4 = coef[4]
a5 = coef[5]
# Constant
if label == 'constant':
if kind == 'tea_good':
df.loc[
(df['Date'] >= start_date) & (df['Date'] <= end_date), 'Tea_Good'] = a0 + (df['days']) - period_days
elif kind == 'tea_bad':
df.loc[
(df['Date'] >= start_date) & (df['Date'] <= end_date), 'Tea_bad'] = a0 + df['days'] - period_days
elif kind == 'coffee_good':
df.loc[
(df['Date'] >= start_date) & (df['Date'] <= end_date), 'coffee_good'] = a0 + df['days'] - period_days
elif kind == 'coffee_bad':
df.loc[
(df['Date'] >= start_date) & (df['Date'] <= end_date), 'coffee_bad'] = a0 + df['days'] - period_days
# Linear
if label == 'linear':
if kind == 'tea_good':
df.loc[
(df['Date'] >= start_date) & (df['Date'] <= end_date), 'Tea_Good'] = a0 + (
a1 * ((df['days']) - period_days))
elif kind == 'tea_bad':
df.loc[
(df['Date'] >= start_date) & (df['Date'] <= end_date), 'Tea_bad'] = a0 + (
a1 * ((df['days']) - period_days))
elif kind == 'coffee_good':
df.loc[
(df['Date'] >= start_date) & (df['Date'] <= end_date), 'coffee_good'] = a0 + (
a1 * ((df['days']) - period_days))
elif kind == 'coffee_bad':
df.loc[
(df['Date'] >= start_date) & (df['Date'] <= end_date), 'coffee_bad'] = a0 + (
a1 * ((df['days']) - period_days))
# Quadratic
if label == 'quadratic':
if kind == 'tea_good':
df.loc[
(df['Date'] >= start_date) & (df['Date'] <= end_date), 'Tea_Good'] = a0 + (
a1 * ((df['days']) - period_days)) + (a2 * ((df['days']) - period_days) ** 2)
elif kind == 'tea_bad':
df.loc[
(df['Date'] >= start_date) & (df['Date'] <= end_date), 'Tea_bad'] = a0 + (
a1 * ((df['days']) - period_days)) + (a2 * ((df['days']) - period_days) ** 2)
elif kind == 'coffee_good':
df.loc[
(df['Date'] >= start_date) & (df['Date'] <= end_date), 'coffee_good'] = a0 + (
a1 * ((df['days']) - period_days)) + (a2 * ((df['days']) - period_days) ** 2)
elif kind == 'coffee_bad':
df.loc[
(df['Date'] >= start_date) & (df['Date'] <= end_date), 'coffee_bad'] = a0 + (
a1 * ((df['days']) - period_days)) + (a2 * ((df['days']) - period_days) ** 2)
# Polynomial
if label == 'polynomial':
if kind == 'tea_good':
df.loc[
(df['Date'] >= start_date) & (df['Date'] <= end_date), 'Tea_Good'] = a0 + (
a1 * ((df['days']) - period_days)) + (a2 * (
(df['days']) - period_days) ** 2) + (a3 * (
(df['days']) - period_days) ** 3) + (a4 * (
(df['days']) - period_days) ** 4) + (a5 * ((df['days']) - period_days) ** 5)
elif kind == 'tea_bad':
df.loc[
(df['Date'] >= start_date) & (df['Date'] <= end_date), 'Tea_bad'] = a0 + (
a1 * ((df['days']) - period_days)) + (a2 * (
(df['days']) - period_days) ** 2) + (a3 * (
(df['days']) - period_days) ** 3) + (a4 * (
(df['days']) - period_days) ** 4) + (a5 * ((df['days']) - period_days) ** 5)
elif kind == 'coffee_good':
df.loc[
(df['Date'] >= start_date) & (df['Date'] <= end_date), 'coffee_good'] = a0 + (
a1 * ((df['days']) - period_days)) + (a2 * (
(df['days']) - period_days) ** 2) + (a3 * (
(df['days']) - period_days) ** 3) + (a4 * (
(df['days']) - period_days) ** 4) + (a5 * ((df['days']) - period_days) ** 5)
elif kind == 'coffee_bad':
df.loc[(df['Date'] >= start_date) & (df['Date'] <= end_date), 'coffee_bad'] = a0 + (
a1 * ((df['days']) - period_days)) + (a2 * (
(df['days']) - period_days) ** 2) + (a3 * (
(df['days']) - period_days) ** 3) + (a4 * (
(df['days']) - period_days) ** 4) + (a5 * ((df['days']) - period_days) ** 5)
# Exponential
if label == 'exponential':
if kind == 'tea_good':
df.loc[(df['Date'] >= start_date) & (df['Date'] <= end_date), 'Tea_Good'] = np.exp(a0)
elif kind == 'tea_bad':
df.loc[(df['Date'] >= start_date) & (df['Date'] <= end_date), 'Tea_bad'] = np.exp(a0)
elif kind == 'coffee_good':
df.loc[(df['Date'] >= start_date) & (df['Date'] <= end_date), 'coffee_good'] = np.exp(a0)
elif kind == 'coffee_bad':
df.loc[(df['Date'] >= start_date) & (df['Date'] <= end_date), 'coffee_bad'] = np.exp(a0)
# Calibration File
if label == 'calibration_file':
pass
# return df
else:
raise Exception('Coefficients index do not match. All values of coefficients should be passed')
else:
return df
return df
我以不同的方式添加了相同的问题。我想在我没有解释好。这个问题的链接如下。
Replace the column values based on the list of dictionary and specific date condition - use if and for loop - Pandas
【问题讨论】:
【参考方案1】:用途:
def rf_user_input(df, req_obj):
df = df.sort_values('Date')
df['days'] = (df['Date'] - df.at[0, 'Date']).dt.days + 1
cols, df.columns = df.columns, df.columns.str.lower()
for category in ("tea", "coffee"):
if category not in req_obj.keys():
continue
for params_obj in req_obj[category]:
case = params_obj['case']
kind = '_'.format(category, case)
start_date = pd.to_datetime(params_obj['from'], format='%Y-%m-%dT%H:%M:%S.%fZ')
end_date = pd.to_datetime(params_obj['to'], format='%Y-%m-%dT%H:%M:%S.%fZ')
label, coef, n_days = params_obj['type'], params_obj['coef'], params_obj['days']
# Additional n_days code - Start
first_date = df['date'].min()
period_days = (start_date - first_date).days
# Additional n_days code - End
# Checking 'start_date' , 'end_date' and 'n_days' conditions
# If the start_date and end_date is null return the calibration df as it is
if (start_date == 0) and (end_date == 0):
return df.set_axis(cols, axis=1)
if (start_date == 0) and (end_date != 0) and (n_days == 0):
return df.set_axis(cols, axis=1)
if (start_date != 0) and (end_date == 0) and (n_days == 0):
return df.set_axis(cols, axis=1)
# if start date, end date and n_days are non zero then consider start date and n_days
if (start_date != 0) and (end_date != 0) and (n_days != 0):
end_date = start_date + pd.Timedelta(days=n_days)
if (start_date != 0) and (end_date != 0) and (n_days == 0):
n_days = (end_date - start_date)
if (start_date != 0) and (end_date == 0) and (n_days != 0):
end_date = start_date + pd.Timedelta(days=n_days)
if (start_date == 0) and (end_date != 0) and (n_days != 0):
start_date = end_date - pd.Timedelta(days=n_days)
if (n_days != 0) and (start_date != 0):
end_date = start_date + pd.Timedelta(days=n_days)
# If the start_date and end_date is null return the calibration df as it is
if len(coef) == 6:
a0, a1, a2, a3, a4, a5 = coef
mask = df['date'].between(start_date, end_date)
if label == 'constant':
if kind in ('tea_good', 'tea_bad', 'coffee_good', 'coffee_bad'):
df.loc[mask, kind] = a0 + df['days'] - period_days
elif label == 'linear':
if kind in ('tea_good', 'tea_bad', 'coffee_good', 'coffee_bad'):
df.loc[mask, kind] = a0 + \
(a1 * ((df['days']) - period_days))
# Quadratic
elif label == 'quadratic':
if kind in ('tea_good', 'tea_bad', 'coffee_good', 'coffee_bad'):
df.loc[mask, kind] = a0 + (a1 * ((df['days']) - period_days)) + (
a2 * ((df['days']) - period_days) ** 2)
# Polynomial
elif label == 'polynomial':
if kind in ('tea_good', 'tea_bad', 'coffee_good', 'coffee_bad'):
df.loc[mask, kind] = a0 + (
a1 * ((df['days']) - period_days)) + (a2 * (
(df['days']) - period_days) ** 2) + (a3 * (
(df['days']) - period_days) ** 3) + (a4 * (
(df['days']) - period_days) ** 4) + (a5 * ((df['days']) - period_days) ** 5)
# Exponential
elif label == 'exponential':
if kind in ('tea_good', 'tea_bad', 'coffee_good', 'coffee_bad'):
df.loc[mask, kind] = np.exp(a0)
# Calibration File
elif label == 'calibration_file':
pass
else:
raise Exception(
'Coefficients index do not match. All values of coefficients should be passed')
return df.set_axis(cols, axis=1)
结果:
# rf_unser_input(df, rf)
Date Tea_Good Tea_bad coffee_good coffee_bad days
0 2020-02-01 3.0 1.0 10.0 7.0 1
1 2020-02-02 3.0 0.3 0.3 7.0 2
2 2020-02-03 3.0 0.4 0.4 7.0 3
3 2020-02-04 3.0 0.5 0.5 0.3 4
4 2020-02-05 12.0 1.0 3.1 0.4 5
5 2020-02-06 13.0 2.0 4.3 0.5 6
6 2020-02-07 6.0 2.0 5.7 0.6 7
7 2020-02-08 6.0 2.0 7.3 11.0 8
8 2020-02-09 6.3 2.0 9.1 11.0 9
9 2020-02-10 36.4 2.0 11.1 11.0 10
10 2020-02-11 136.5 2.0 13.3 11.0 11
11 2020-02-12 9.0 2.0 4.0 11.0 12
12 2020-02-13 9.0 2.0 4.0 11.0 13
13 2020-02-14 9.0 2.0 4.0 11.0 14
【讨论】:
【参考方案2】:一种解决方案是遍历字典并使用 apply:
df.Date = pd.to_datetime(df.Date)
df = df.set_index('Date', drop=True)
df['Period'] = [(date - df.index[0]).days for date in df.index]
for key, val in rf.items():
for elem in val:
type_method = elem.get('type')
col_name = f'key.capitalize()_elem.get("case")'
date_from = pd.to_datetime(elem.get('from'))
date_to = pd.to_datetime(elem.get('to'))
a0, a1, a2, a3, a4, a5 = elem.get('coef')
mask_dates = (df.index >= date_from) & (df.index <= date_to)
func_dict =
'linear': lambda x: a0 + a1 * x['Period'],
'constant': lambda x: a0 + x['Period'],
'quadratic': lambda x: a0 + a1 * (x['Period']) + a2 * (x['Period'] ** 2),
'exponential': lambda x: np.exp(a0),
'polynomial': lambda x: a0 +
a1 * (x['Period']) +
a2 * (x['Period'] ** 2) +
a3 * (x['Period'] ** 3) +
a4 * (x['Period'] ** 4) +
a5 * (x['Period'] ** 5),
df.loc[mask_dates, col_name] = df[mask_dates].apply(func_dict[type_method], axis=1)
输出:
Tea_good Tea_bad Coffee_good Coffee_bad Period
Date
2020-02-01 3.0 1.0 10.0 7.0 0
2020-02-02 3.0 0.2 0.3 7.0 1
2020-02-03 3.0 0.3 0.7 7.0 2
2020-02-04 3.0 1.0 1.3 7.0 3
2020-02-05 6.0 1.0 2.1 7.0 4
2020-02-06 6.0 2.0 3.1 11.0 5
2020-02-07 6.0 2.0 4.3 11.0 6
2020-02-08 6.0 2.0 5.7 11.0 7
2020-02-09 3744.9 2.0 7.3 11.0 8
2020-02-10 6643.0 2.0 9.1 11.0 9
2020-02-11 9.0 2.0 4.0 11.0 10
2020-02-12 9.0 2.0 4.0 11.1 11
2020-02-13 9.0 2.0 4.0 12.1 12
2020-02-14 9.0 2.0 4.0 11.0 13
请注意,我必须更改列名,以便将 tea/coffee 大写。另外,像这样使用 lambda 函数是惰性的,应该重构为普通函数。
【讨论】:
以上是关于循环列表字典并更新相应的列 - 熊猫的主要内容,如果未能解决你的问题,请参考以下文章
我想将国家/地区列表与作为熊猫数据框 Python 中字典对象类型的列数据进行比较