Scipy或贝叶斯优化函数在python中具有约束、边界和数据框

Posted

技术标签:

【中文标题】Scipy或贝叶斯优化函数在python中具有约束、边界和数据框【英文标题】:Scipy or bayesian optimize function with constraints, bounds and dataframe in python 【发布时间】:2021-07-03 05:43:13 【问题描述】:

对于下面的数据框,我想优化总回报,同时满足某些界限。

d = 'Win':[0,0,1, 0, 0, 1, 0],'Men':[0,1,0, 1, 1, 0, 0], 'Women':[1,0,1, 0, 0, 1,1],'Matches' :[0,5,4, 7, 4, 10,13],
     'Odds':[1.58,3.8,1.95, 1.95, 1.62, 1.8, 2.1], 'investment':[0,0,6, 10, 5, 25,0],

data = pd.DataFrame(d)

我想最大化以下方程:

totalreturn = np.sum(data['Odds'] * data['investment'] * (data['Win'] == 1))

函数应该最大化满足以下界限:

for i in range(len(data)):
    
    investment = data['investment'][i]
    
    C = alpha0 + alpha1*data['Men'] + alpha2 * data['Women'] + alpha3 * data['Matches']
    
    if (lb < investment ) & (investment < ub) & (investment > C) == False:
        data['investment'][i] = 0

lbub 对于数据框中的每一行都是常量。但是,每行的阈值 C 是不同的。因此有6个参数需要优化:lb, ub, alph0, alpha1, alpha2, alpha3

谁能告诉我如何在 python 中做到这一点?到目前为止,我的程序一直使用 scipy (Approach1) 和 Bayesian (Approach2) 优化,并且仅尝试优化 lbub。 方法1:

import pandas as pd
from scipy.optimize import minimize

def objective(val, data):
    
    # Approach 1
    # Lowerbound and upperbound
    lb, ub = val
    
    # investments
    # These matches/bets are selected to put wager on
    tf1 = (data['investment'] > lb) & (data['investment'] < ub) 
    data.loc[~tf1, 'investment'] = 0
    
        
    # Total investment
    totalinvestment = sum(data['investment'])
    
    # Good placed bets 
    data['reward'] = data['Odds'] * data['investment'] * (data['Win'] == 1)
    totalreward = sum(data['reward'])

    # Return and cumalative return
    data['return'] = data['reward'] - data['investment']
    totalreturn = sum(data['return'])
    data['Cum return'] = data['return'].cumsum()
    
    # Return on investment
    print('\n',)
    print('lb, ub:', lb, ub)
    print('TotalReturn: ',totalreturn)
    print('TotalInvestment: ', totalinvestment)
    print('TotalReward: ', totalreward)
    print('# of bets', (data['investment'] != 0).sum())
          
    return totalreturn
          

# Bounds and contraints
b = (0,100)
bnds = (b,b,)
x0 = [0,100]

sol = minimize(objective, x0, args = (data,), method = 'Nelder-Mead', bounds = bnds)

和方法2:

import pandas as pd
import time
import pickle
from hyperopt import fmin, tpe, Trials
from hyperopt import STATUS_OK
from hyperopt import  hp

def objective(args):
    # Approach2

    # Lowerbound and upperbound
    lb, ub = args
    
    # investments
    # These matches/bets are selected to put wager on
    tf1 = (data['investment'] > lb) & (data['investment'] < ub) 
    data.loc[~tf1, 'investment'] = 0
    
        
    # Total investment
    totalinvestment = sum(data['investment'])
    
    # Good placed bets 
    data['reward'] = data['Odds'] * data['investment'] * (data['Win'] == 1)
    totalreward = sum(data['reward'])

    # Return and cumalative return
    data['return'] = data['reward'] - data['investment']
    totalreturn = sum(data['return'])
    data['Cum return'] = data['return'].cumsum()
    
    # store results
    d = 'loss': - totalreturn, 'status': STATUS_OK, 'eval time': time.time(),
    'other stuff': 'type': None, 'value': [0, 1, 2],
    'attachments': 'time_module': pickle.dumps(time.time)
    
    return d

          

trials = Trials()

parameter_space  = [hp.uniform('lb', 0, 100), hp.uniform('ub', 0, 100)]

best = fmin(objective,
    space= parameter_space,
    algo=tpe.suggest,
    max_evals=500,
    trials = trials)


print('\n', trials.best_trial)

有人知道我应该怎么做吗? Scipy 不会产生预期的结果。 Hyperopt 优化确实会产生预期的结果。在这两种方法中,我都不知道如何合并一个依赖于行的边界 (C(i))。

任何事情都会有帮助! (任何关于优化类型的相关文章、练习或有用的解释也非常受欢迎)

【问题讨论】:

我相信这是公式化的方式,事物是不可微的。 (lb,ub 的微小变化可能会导致目标的显着跳跃,因为突然观察结果丢失或被添加)。 SLSQP 仅适用于平滑问题。我最初的想法是使用二进制变量来指示是否使用了观察。但这需要非常不同的求解器。 感谢您的回答。但是您能否详细说明一下,您认为哪些求解器更适合? 【参考方案1】:

我在这里假设您无法遍历整个数据集,或者它不完整,或者您想要推断,因此您无法计算所有组合。

如果您没有先验,并且您不确定平滑度,或者评估成本可能很高,我会使用贝叶斯优化。您可以控制探索/开发并防止卡在最低限度。

我会使用scikit-optimize,它可以更好地实现贝叶斯优化。他们有更好的初始化技术,如Sobol' 方法,在此处正确实现。这确保您的搜索空间将被正确采样。

from skopt import gp_minimize

res = gp_minimize(objective, bnds, initial_point_generator='sobol')

【讨论】:

【参考方案2】:

我认为您的公式还需要一个变量,该变量将是二进制的,并且将定义投资是否应保存为 0 或是否应具有初始值。假设这个变量将保存在另一个名为“new_binary”的列中,您的目标函数可以更改如下:

totalreturn = np.sum(data['Odds'] * data['investment'] * data['new_binary'] * data['Win'])

那么,唯一缺少的就是引入变量本身。

for i in range(len(data)):
    investment = data['investment'][i]
    C = alpha0 + alpha1*data['Men'] + alpha2 * data['Women'] + alpha3 * data['Matches']
    data['new_binary'] = (lb < data['investment'] ) & ( data['investment'] < ub) & (data['investment'] > C)
    # This should be enough to make the values in the columns binary, while in python it is easily replaced with 0 and 1. 

我现在看到的唯一问题是这个问题变成了整数,所以我不确定 scipy.optimize.minimize 是否可以。我不确定有什么替代方案,但根据this、PuLP 和Pyomo 可以工作。

【讨论】:

谢谢!但是您如何建议将您的 for 循环与目标函数中的引入变量结合起来?只需粘贴到#投资部分?

以上是关于Scipy或贝叶斯优化函数在python中具有约束、边界和数据框的主要内容,如果未能解决你的问题,请参考以下文章

使用 Scipy 在 Python 中进行约束优化

如何在有约束的 scipy 中使用最小化函数

随机森林算法及贝叶斯优化调参Python实践

随机森林算法及贝叶斯优化调参Python实践

随机森林算法及贝叶斯优化调参Python实践

朴素贝叶斯分类算法预测具有属性的人是不是买电脑python