第十届“泰迪杯”数据挖掘挑战赛B题：电力系统负荷预测分析 Baseline

Posted 2022-03-04 Better Bench

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了第十届“泰迪杯”数据挖掘挑战赛B题：电力系统负荷预测分析 Baseline相关的知识，希望对你有一定的参考价值。

1 题目

一、问题背景

电力系统负荷（电力需求量，即有功功率）预测是指充分考虑历史的系统负荷、经济状况、气象条件和社会事件等因素的影响，对未来一段时间的系统负荷做出预测。负荷预测是电力系统规划与调度的一项重要内容。短期（两周以内）预测是电网内部机组启停、调度和运营计划制定的基础；中期（未来数月）预测可为保障企业生产和社会生活用电，合理安排电网的运营与检修决策提供支持；长期（未来数年）预测可为电网改造、扩建等计划的制定提供参考，以提高电力系统的经济效益和社会效益。

复杂多变的气象条件和社会事件等不确定因素都会对电力系统负荷造成一定的影响，使得传统负荷预测模型的应用存在一定的局限性。同时，随着电力系统负荷结构的多元化，也使得模型应用的效果有所降低，因此电力系统负荷预测问题亟待进一步研究。

二、解决问题

1．地区负荷的中短期预测分析

根据附件中提供的某地区电网间隔15分钟的负荷数据，建立中短期负荷预测模型：

（1）给出该地区电网未来10天间隔15分钟的负荷预测结果，并分析其预测精度；

（2）给出该地区电网未来3个月日负荷的最大值和最小值预测结果，以及相应达到负荷最大值和最小值的时间，并分析其预测精度。

2．行业负荷的中期预测分析

对不同行业的用电负荷进行中期预测分析，能够为电网运营与调度决策提供重要依据。特别是在新冠疫情、国家“双碳”目标等背景下，通过对大工业、非普工业、普通工业和商业等行业的用电负荷进行预测，有助于掌握各行业的生产和经营状况、复工复产和后续发展走势，进而指导和辅助行业的发展决策。请根据附件中提供的各行业每天用电负荷相关数据，建立数学模型研究下面问题：

（1）挖掘分析各行业用电负荷突变的时间、量级和可能的原因。

（2）给出该地区各行业未来3个月日负荷最大值和最小值的预测结果，并对其预测精度做出分析。

（3）根据各行业的实际情况，研究国家“双碳”目标对各行业未来用电负荷可能产生的影响，并对相关行业提出有针对性的建议。

2 Python实现的baseline

2.1 数据读取

import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt 
import seaborn as sns 
%matplotlib  inline
plt.style.use('seaborn-darkgrid')
sns.set(style = 'darkgrid')
import warnings
warnings.filterwarnings("ignore")
import lightgbm as lgb  
from sklearn.preprocessing import scale
import lightgbm as lgb
import xgboost as xgb 
from catboost import CatBoostRegressor
import time
from tqdm import tqdm
from sklearn.preprocessing import LabelEncoder

y = pd.read_csv('./data/附件1-区域15分钟负荷数据.csv')
indu = pd.read_csv('./data/附件2-行业日负荷数据.csv')
tianqi = pd.read_csv('./data/附件3-气象数据.csv')

tianqi

日期天气状况最高温度最低温度白天风力风向夜晚风力风向 Unnamed: 6
0 2018年1月1日多云/多云 22℃ 12℃ 无持续风向<3级无持续风向<3级 NaN
1 2018年1月1日多云/多云 22℃ 12℃ 无持续风向<3级无持续风向<3级 NaN
2 2018年1月2日多云/多云 22℃ 15℃ 无持续风向<3级无持续风向<3级 NaN
3 2018年1月3日多云/阴 23℃ 15℃ 无持续风向<3级无持续风向<3级 NaN
4 2018年1月4日多云/小雨 21℃ 16℃ 无持续风向<3级无持续风向<3级 NaN
5 2018年1月5日阴/小雨 19℃ 13℃ 无持续风向<3级无持续风向<3级 NaN
6 2018年1月6日小雨-中雨/中雨-大雨 15℃ 11℃ 无持续风向<3级无持续风向<3级 NaN
7 2018年1月7日大雨/中雨 15℃ 7℃ 无持续风向<3级北风4～5级 NaN
8 2018年1月8日中雨/小雨-中雨 12℃ 5℃ 北风4～5级北风3～4级 NaN
9 2018年1月9日小雨/阴 9℃ 6℃ 无持续风向<3级无持续风向<3级 NaN
10 2018年1月10日多云/多云 14℃ 7℃ 无持续风向<3级无持续风向<3级 NaN
11 2018年1月11日多云/多云 15℃ 6℃ 无持续风向<3级无持续风向<3级 NaN
12 2018年1月12日晴/晴 16℃ 6℃ 无持续风向<3级无持续风向<3级 NaN
13 2018年1月13日晴/晴 17℃ 7℃ 无持续风向<3级无持续风向<3级 NaN
14 2018年1月14日多云/多云 20℃ 10℃ 无持续风向<3级无持续风向<3级 NaN

日期	天气状况	最高温度	最低温度	白天风力风向	夜晚风力风向	Unnamed: 6
0	2018年1月1日	多云/多云	22℃	12℃	无持续风向<3级	无持续风向<3级	NaN
1	2018年1月1日	多云/多云	22℃	12℃	无持续风向<3级	无持续风向<3级	NaN
2	2018年1月2日	多云/多云	22℃	15℃	无持续风向<3级	无持续风向<3级	NaN
3	2018年1月3日	多云/阴	23℃	15℃	无持续风向<3级	无持续风向<3级	NaN
4	2018年1月4日	多云/小雨	21℃	16℃	无持续风向<3级	无持续风向<3级	NaN
5	2018年1月5日	阴/小雨	19℃	13℃	无持续风向<3级	无持续风向<3级	NaN
6	2018年1月6日	小雨-中雨/中雨-大雨	15℃	11℃	无持续风向<3级	无持续风向<3级	NaN
7	2018年1月7日	大雨/中雨	15℃	7℃	无持续风向<3级	北风4～5级	NaN
8	2018年1月8日	中雨/小雨-中雨	12℃	5℃	北风4～5级	北风3～4级	NaN
9	2018年1月9日	小雨/阴	9℃	6℃	无持续风向<3级	无持续风向<3级	NaN
10	2018年1月10日	多云/多云	14℃	7℃	无持续风向<3级	无持续风向<3级	NaN
11	2018年1月11日	多云/多云	15℃	6℃	无持续风向<3级	无持续风向<3级	NaN
12	2018年1月12日	晴/晴	16℃	6℃	无持续风向<3级	无持续风向<3级	NaN
13	2018年1月13日	晴/晴	17℃	7℃	无持续风向<3级	无持续风向<3级	NaN
14	2018年1月14日	多云/多云	20℃	10℃	无持续风向<3级	无持续风向<3级	NaN

del tianqi['Unnamed: 6']

2.2 温度特征处理

tianqi['最高温度'] = tianqi['最高温度'].map(lambda d: d.replace('℃','')).astype(int)
tianqi['最低温度'] = tianqi['最低温度'].map(lambda d: d.replace('℃','')).astype(int)

2.3 天气状况特征处理

series = tianqi.join(tianqi['天气状况'].str.split('/',expand=True))
tianqi['天气1'] = series[0]
tianqi['天气2'] = series[1]

tianqi.info()

<class ‘pandas.core.frame.DataFrame’> RangeIndex: 15 entries, 0 to 14 Data columns (total 8 columns):

Column Non-Null Count Dtype — ------ -------------- -----

0 日期 15 non-null object

1 天气状况 15 non-null object

2 最高温度 15 non-null int32

3 最低温度 15 non-null int32

4 白天风力风向 15 non-null object

5 夜晚风力风向 15 non-null object

6 天气1 15 non-null object 7 天气

2 15 non-null object dtypes: int32(2), object(6)

2.4 风向特征处理

tianqi['白天风力风向'].unique()

array([‘无持续风向<3级’, ‘北风4～5级’], dtype=object)

tianqi['夜晚风力风向'].unique()

array([‘无持续风向<3级’, ‘北风4～5级’, ‘北风3～4级’], dtype=object)

dic = '无持续风向<3级':0,
       '北风3～4级':1,
      '北风4～5级':2
tianqi['白天风力风向'] = tianqi['白天风力风向'].map(dic)
tianqi['夜晚风力风向'] = tianqi['夜晚风力风向'].map(dic)

2.5 天气进行有序编码

tianqi['天气1'].unique()

array([‘多云’, ‘阴’, ‘小雨-中雨’, ‘大雨’, ‘中雨’, ‘小雨’, ‘晴’], dtype=object)

tianqi['天气2'].unique()

array([‘多云’, ‘阴’, ‘小雨’, ‘中雨-大雨’, ‘中雨’, ‘小雨-中雨’, ‘晴’], dtype=object)

dic1 = '晴':1, 
        '多云':2,
        '阴':3, 
        '小雨':4,
        '小雨-中雨':5, 
        '中雨':6,
        '中雨-大雨':7,
        '大雨':8
tianqi['天气1'] = tianqi['天气1'].map(dic1)
tianqi['天气2'] = tianqi['天气2'].map(dic1)
del tianqi['天气状况']

2.6 连着两张表

y = y.rename(columns='数据时间':'日期1')
y['日期'] = y['日期1'].apply(lambda x: x.split(' ')[0])
tianqi.loc[:, '日期'] = pd.to_datetime(tianqi.loc[:, '日期'], format='%Y年%m月%d日', errors='coerce')
y.loc[:, '日期'] = pd.to_datetime(y.loc[:, '日期'], format='%Y/%m/%d', errors='coerce')
train = y.merge(tianqi,on='日期',how='left')
del train['日期']

2.7 时序特征提取(后期直接加入测试集数据)

train['日期1'] = pd.to_datetime(train['日期1'])
train['月'] = train['日期1'].dt.month 
train['天'] = train['日期1'].dt.day
train['小时'] = train['日期1'].dt.hour
train['一年第几天'] = train['日期1'].dt.dayofyear
train['一年第几周'] = train['日期1'].dt.week

# test['月'] = test['日期1'].dt.month 
# test['天'] = test['日期1'].dt.day
# test['小时'] = test['日期1'].dt.hour
# test['一年第几天'] = test['日期1'].dt.dayofyear
# test['一年第几周'] = test['日期1'].dt.week
        
#####################
此处代码略，请下载完整代码

#####################

train['是否月末'] = [int(i) for i in train['是否月末']]
train['是否季节初'] = [int(i) for i in train['是否季节初']]
train['是否季节末'] = [int(i) for i in train['是否季节末']]
train['是否周末'] = [int(i) for i in train['是否周末']]
train['是否月初'] = [int(i) for i in train['是否月初']]

# test['是否月末'] = [int(i) for i in test['是否月末']]
# test['是否季节初'] = [int(i) for i in test['是否季节初']]
# test['是否季节末'] = [int(i) for i in test['是否季节末']]
# test['是否周末'] = [int(i) for i in test['是否周末']]
# test['是否月初'] = [int(i) for i in test['是否月初']]

3 模型训练

y = train['总有功功率（kw）']
x_train = 略。。。。

3.1 自定义训练集

（1）标签归一化

y = 略

（2）划分训练集和验证集

x = x_train[:900]
y_train = y[:900]
x_val = x_train[900:]
y_val = y[900:]

3.2 训练

model_lgb = lgb.LGBMRegressor(
                learning_rate=0.01,
                max_depth=-1,
                n_estimators=1000,
                    boosting_type='gbdt',
                    random_state=2021,
                    objective='regression',
                    num_leaves = '32',
                    verbose=-1)
lgb_model = model_lgb.fit(x,y_train)
pred_val_y  = lgb_model.predict(x_val)

3.2 模型评价（MAE，RMSE）

# coding=utf-8
import numpy as np
from sklearn import metrics
 
# MAPE需要自己实现
def mape(y_true, y_pred):
    return np.mean(np.abs((y_pred - y_true) / y_true))
 
y_true = np.array(y_val)
y_pred = np.array(pred_val_y  )
 
print('MSE:',metrics.mean_squared_error(y_true, y_pred))
 
print('RMSE:',np.sqrt(metrics.mean_squared_error(y_true, y_pred)))
 
print('MAE:',metrics.mean_absolute_error(y_true, y_pred))
 
print('MAPE:',mape(y_true, y_pred))
 
## R2-score
from sklearn.metrics import r2_score
print('R2-score:',r2_score(y_true, y_pred))