R语言泊松回归对保险定价建模中的应用：风险敞口作为可能的解释变量

Posted 2021-04-18 拓端数据部落

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了R语言泊松回归对保险定价建模中的应用：风险敞口作为可能的解释变量相关的知识，希望对你有一定的参考价值。

原文链接：http://tecdat.cn/?p=13564

在保险定价中，风险敞口通常用作模型索赔频率的补偿变量。如果我们必须使用相同的程序，但是一个程序的暴露时间为6个月，而另一个则是一年，那么自然应该假设平均而言，第二个驾驶员的事故要多两倍。这是使用标准（均匀）泊松过程来建模索赔频率的动机。

当然，在进行费率评估的过程中，这可能不是一个相关的问题，因为精算师需要预测年度索赔频率（因为保险合同应提供一年的保险期）。但是，更好地了解人们为什么会离开我们的投资组合（例如，在任期前取消保险单，或者某天不续签）可能会很有趣。

为了更具体和更好地理解，请考虑以下模型：考虑使用Poisson流程对索赔到达进行建模，以及专职于其保险公司的人员。


> n=983> D1=as.Date("01/01/1993",'%d/%m/%Y')> D2=as.Date("31/12/2013",'%d/%m/%Y')



> for(i in 1:n){+ expo=D2-arrival[i]+ w=0+ while(max(w)<expo) w=c(w,max(w)+1+trunc(rexp(1,1/1000)))+ exposure[i]=departure[i]-arrival[i]+ N[i]=max(0,length(w)-2)}> df=data.frame(N=N,E=exposure/365)

在这里，两次索赔之间的预期时间为1000天。泊松过程的（年度）强度在这里



> 365/1000[1] 0.365

因此，如果我们对曝光的对数进行Poisson回归，我们应该获取一个相近参数



> log(365/1000)[1] -1.007858

在这里，具有偏移量的常数的回归为



 > summary(reg)

Call:

Deviance Residuals:Min 1Q Median 3Q Max-3.4145 -0.4673 0.2367 0.8770 3.6828

Coefficients:Estimate Std. Error z value Pr(>|z|)(Intercept) -1.04233 0.02532 -41.17 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 1116.9 on 982 degrees of freedomResidual deviance: 1116.9 on 982 degrees of freedomAIC: 3282.9

Number of Fisher Scoring iterations: 5

这与我们刚才所说的一致。如果我们以曝光量的对数作为可能的解释变量进行回归，则我们期望其系数接近1。





Call:

Deviance Residuals:Min 1Q Median 3Q Max-3.0810 -0.8373 -0.1493 0.5676 3.9001

Coefficients:Estimate Std. Error z value Pr(>|z|)(Intercept) -1.03350 0.08546 -12.09 <2e-16 ***log(E) 1.00920 0.03292 30.66 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 2553.6 on 982 degrees of freedomResidual deviance: 1064.2 on 981 degrees of freedomAIC: 3762.7

Number of Fisher Scoring iterations: 5

如果我们保留偏移量并添加变量，我们可以看到它变得无用（对单位参数的测试）





Call:



Deviance Residuals:Min 1Q Median 3Q Max-3.0810 -0.8373 -0.1493 0.5676 3.9001

Coefficients:Estimate Std. Error z value Pr(>|z|)(Intercept) -1.033503 0.085460 -12.093 <2e-16 ***log(E) 0.009201 0.032920 0.279 0.78---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 1064.3 on 982 degrees of freedomResidual deviance: 1064.2 on 981 degrees of freedomAIC: 3762.7

Number of Fisher Scoring iterations: 5

在这里，我们确实具有纯泊松过程，因此曝光至关重要，因为泊松分布的参数与曝光成正比。但是我们不能从曝光中学到其他东西。如果考虑暴露的对数的泊松回归，将会得到什么？



> summary(reg)

Call:

Deviance Residuals:Min 1Q Median 3Q Max-0.3988 -0.3388 -0.2786 -0.1981 12.9036

Coefficients:Estimate Std. Error z value Pr(>|z|)(Intercept) -2.83045 0.02822 -100.31 <2e-16 ***log(exposition) 0.53950 0.02905 18.57 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 12931 on 49999 degrees of freedomResidual deviance: 12475 on 49998 degrees of freedomAIC: 16150

Number of Fisher Scoring iterations: 6

如果将曝光量添加到偏移量中，会发生什么情况？（我们使用非参数转换，可视化发生的情况）

plot(reg,se=TRUE)

有明显而显着的效果。时间越长，他们获得索赔的可能性就越小。实际上，无需进行回归即可观察到它。


> plot(h1$mids,h1$density,type='s',lwd=2,col="red")> lines(h0$mids,h0$density,type='s',col='blue',lwd=2)

R语言泊松回归对保险定价建模中的应用：风险敞口作为可能的解释变量

蓝色为没有索赔人的风险密度，红色为有一个或多个索赔人的风险密度。

因此，在这里，我们不能假设参数的单位值。这意味着什么？我们可以重现这种行为吗？

为了更好地理解被保险人，请考虑两种可能的行为。第一个是：如果公司在没有索赔的几年后没有提供大幅折扣，则被保险人可能会离开公司。例如，如果被保险人在5年内没有索偿，那么5年后，他将离开公司（例如，获得更高的价格）。该代码




> df=data.frame(N=N,E=exposure/365)

如果我考虑的是1500天而不是5年。



> summary(reg)

Call:

Deviance Residuals:Min 1Q Median 3Q Max-1.5684 -0.9668 -0.2321 0.4244 3.6265

Coefficients:Estimate Std. Error z value Pr(>|z|)(Intercept) -2.50844 0.10286 -24.39 <2e-16 ***log(E) 1.65738 0.04494 36.88 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 2567.31 on 982 degrees of freedomResidual deviance: 885.71 on 981 degrees of freedom

此处，系数（明显）大于1。



> summary(reg)

Call:

Deviance Residuals:Min 1Q Median 3Q Max-1.5684 -0.9668 -0.2321 0.4244 3.6265

Coefficients:Estimate Std. Error z value Pr(>|z|)(Intercept) -2.50844 0.10286 -24.39 <2e-16 ***log(E) 0.65738 0.04494 14.63 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 1114.24 on 982 degrees of freedomResidual deviance: 885.71 on 981 degrees of freedomAIC: 2897.9

这里显然存在偏见：长时间待在办公室的人更可能发生事故。这与我们的想法一致，因为客户的风险较低。

第二种行为是：有时，被保险人对索赔的处理方式不满意，他们可能会在第一次索赔后离开。考虑一种情况，在一项索赔之后，被保险人很可能（例如，概率为50％）离开公司。与其假设被保险人不喜欢理赔管理，不如考虑汽车被严重损坏以至于他不能再开车了。因此，支付保险费将毫无用处。这里的代码


> for(i in 1:n){+ expo=D2-arrival[i]+ w=0



+ exposure[i]=departure[i]-arrival[i]}> df=data.frame(N=N,E=exposure/365)

在这里，在每次索赔之后，被保险人扔硬币查看他是否取消合同。







Deviance Residuals:Min 1Q Median 3Q Max-2.28402 -0.47763 -0.08215 0.33819 2.37628

Coefficients:Estimate Std. Error z value Pr(>|z|)(Intercept) 0.09920 0.04251 2.334 0.0196 *log(E) 0.30640 0.02511 12.203 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 666.92 on 982 degrees of freedomResidual deviance: 498.29 on 981 degrees of freedomAIC: 2666.3

这次，参数（再次显着）小于1。







Deviance Residuals:Min 1Q Median 3Q Max-2.28402 -0.47763 -0.08215 0.33819 2.37628

Coefficients:Estimate Std. Error z value Pr(>|z|)(Intercept) 0.09920 0.04251 2.334 0.0196 *log(E) -0.69360 0.02511 -27.625 <2e-16 ***---Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for poisson family taken to be 1)

Null deviance: 1116.87 on 982 degrees of freedomResidual deviance: 498.29 on 981 degrees of freedomAIC: 2666.3