缓冲区 dtype 不匹配,预期为 'SIZE_t' 但得到了 'long long'

Posted

技术标签:

【中文标题】缓冲区 dtype 不匹配,预期为 \'SIZE_t\' 但得到了 \'long long\'【英文标题】:Buffer dtype mismatch, expected 'SIZE_t' but got 'long long'缓冲区 dtype 不匹配,预期为 'SIZE_t' 但得到了 'long long' 【发布时间】:2020-07-31 06:12:35 【问题描述】:

我在spyder开发了3个ML模型,分别是线性回归、多项式回归和随机森林回归。在sypder 中,它们都运行良好。但是,当我在 Django 上部署以创建 Web 应用程序时,Random Forest 出现“ValueError: Buffer type mismatch, expected 'SIZE_t' but got 'long long'”。 (我尝试删除 randomforest 并且其他两个模型运行良好)。

先检查一下:-

在 Sypder 中开发的模型

"""****************** Import Lib ******************"""
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score

"""****************** Loading dataset ******************"""
boston = load_boston()
dataset = pd.DataFrame(boston.data, columns=boston.feature_names)
dataset['target'] = boston.target

"""****************** Data Preprocessing ******************"""
""" Data Analysis """
# Check Null
dataset.isnull().sum()
# Calculate X and y 
X = dataset.iloc[:,:-1].values
y = dataset.iloc[:,-1].values.reshape(-1,1)
# train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=25)
""" Visualizing Data """
corr = dataset.corr()
sns.heatmap(corr, annot=True, cmap='Blues')
sns.pairplot(dataset)

"""****************** Regression Models ******************"""
""" Linear Regression """
from sklearn.linear_model import LinearRegression
regressor_linear = LinearRegression()
regressor_linear.fit(X_train, y_train)
cv_linear = cross_val_score(estimator = regressor_linear, X=X_train, y=y_train, cv=10)

""" Polynomial Regression """
from sklearn.preprocessing import PolynomialFeatures
poly_reg = PolynomialFeatures(degree = 2)
X_poly = poly_reg.fit_transform(X_train)
regressor_poly2 = LinearRegression()
regressor_poly2.fit(X_poly, y_train)
cv_poly2 = cross_val_score(estimator=regressor_poly2, X=X_poly, y=y_train, cv=10)

""" Random Forest Regression """
from sklearn.ensemble import RandomForestRegressor
regressor_rf = RandomForestRegressor(n_estimators=500, random_state=0, n_jobs=-1)
regressor_rf.fit(X_train, y_train.ravel())
cv_rf = cross_val_score(estimator=regressor_rf, X=X_train, y=y_train.ravel(), cv=10)

"""****************** Measuring the Error ******************"""
models=[
        ('Linear Regression', cv_linear.mean()),
        ('Polynomial Regression (2)', cv_poly2.mean()),
        ('Random Forest Regression', cv_rf.mean())
        ]
cv_scores = pd.DataFrame(data=models, columns=['Model','CV Score'])

"""****************** Dump ******************"""
from sklearn.externals import joblib
joblib.dump(regressor_linear,'regressor_linear_jb')
joblib.dump(regressor_poly2,'regressor_poly2_jb')
joblib.dump(regressor_rf,'regressor_rf_jb')

Django 实现代码

from django.shortcuts import render
from django.http import HttpResponse
import json
from django.http import JsonResponse
import pandas as pd
import numpy as np
from sklearn.externals import joblib
from sklearn.preprocessing import PolynomialFeatures
# Create your views here.

# ML Code

regressor_linear = joblib.load('./models/regressor_linear_jb')
regressor_poly2 = joblib.load('./models/regressor_poly2_jb')
regressor_rf = joblib.load('./models/regressor_rf_jb')

# ML Code End

def predict(request):
    temp_data = [
                 0.16902,
                 0,
                 25.65,
                 0,
                 0.581,
                 5.986,
                 88.4,
                 1.9929,
                 2,
                 188,
                 19.1,
                 385.02,
                 14.81,
                 ]
    temp_df = pd.DataFrame(temp_data).transpose()
    predict = 

    # Linear Regression
    predict['Linear Regressor'] = round(regressor_linear.predict(temp_df)[0, 0], 2)

    # Polynomial Regression.
    regressor_poly = PolynomialFeatures(degree=2)
    temp_df_poly = regressor_poly.fit_transform(temp_df)
    predict['Polynomial Regressor'] = round(regressor_poly2.predict(temp_df_poly)[0, 0], 2)
    
    # Random Forest Regression
    predict['Random Forest Regressor'] = round(regressor_rf.predict(temp_df)[0],2)
    
    return JsonResponse(predict)

【问题讨论】:

见***.com/questions/21033038/… 【参考方案1】:

将Django的环境切换到anaconda,这将得到解决

Jupyter notebook 使用 anaconda 环境,而 Django 使用安装在系统上的不同环境 (主要问题 --> 一个是 32 位,另一个是 64 位)

【讨论】:

以上是关于缓冲区 dtype 不匹配,预期为 'SIZE_t' 但得到了 'long long'的主要内容,如果未能解决你的问题,请参考以下文章

Pickle 加载模型 ValueError:缓冲区 dtype 不匹配,预期为 'ITYPE_t' 但得到了 'long long'

RuntimeError:预期的标量类型 Long 但发现 Float

TypeError:数组 dtype 和格式说明符不匹配。如何将具有不同值类型的数据框保存为 txt 文件?

Python:ufunc'add'不包含签名匹配类型dtype('S21')dtype('S21')dtype('S21')的循环

Pyspark Dataframe TypeError:预期的字符串或缓冲区

TypeError: 数组 dtype ('|S32') 和格式说明符 ('%.7f %.7f %.7f %s') 不匹配