训练模型时出现不兼容行维度的值错误

Posted

技术标签:

【中文标题】训练模型时出现不兼容行维度的值错误【英文标题】:Value error of incompatible row dimension occurred while training the model 【发布时间】:2020-09-02 04:59:45 【问题描述】:

我在dataset 上实施决策树。在此之前,我想用 CountVectorizer 转换特定列。为此,我使用 pipeline 使其更简单。

但是有一个行尺寸不兼容的错误

代码

# Imported the libraries....
from sklearn.feature_extraction.text import CountVectorizer as cv
from sklearn.preprocessing import OneHotEncoder as ohe
from sklearn.compose import ColumnTransformer as ct
from sklearn.pipeline import make_pipeline as mp
from sklearn.tree import DecisionTreeClassifier as dtc


transformer=ct(transformers=[('review_counts',cv(),['verified_reviews']),
                             ('variation_dummies', ohe(),['variation'])
                            ],remainder='passthrough')

pipe= mp(transformer,dtc(random_state=42))

x= data[['rating','variation','verified_reviews']].copy()
y= data.feedback

x_train,x_test,y_train,y_test= tts(x,y,test_size=0.3,random_state=42,stratify=y)
print(x_train.shape,y_train.shape)             # ((2205, 3), (2205,))

pipe.fit(x_train,y_train)                       # Error on this line

错误

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-79-a981c354b190> in <module>()
----> 1 pipe.fit(x_train,y_train)

7 frames
/usr/local/lib/python3.6/dist-packages/scipy/sparse/construct.py in bmat(blocks, format, dtype)
    584                                                     exp=brow_lengths[i],
    585                                                     got=A.shape[0]))
--> 586                     raise ValueError(msg)
    587 
    588                 if bcol_lengths[j] == 0:

ValueError: blocks[0,:] has incompatible row dimensions. Got blocks[0,1].shape[0] == 2205, expected 1.

问题

    这个行尺寸不兼容的错误是如何形成的? 如何解决?

【问题讨论】:

【参考方案1】:

尝试将所需的列作为列表传递给 ohe,而将简单的字符串传递给 cv

from sklearn.feature_extraction.text import CountVectorizer as cv
from sklearn.preprocessing import OneHotEncoder as ohe
from sklearn.compose import ColumnTransformer as ct
from sklearn.pipeline import make_pipeline as mp
from sklearn.tree import DecisionTreeClassifier as dtc

data = pd.DataFrame('rating':np.random.randint(0,10,6),'variation':['a','b','c','a','b','c'],
                   'verified_reviews':['adnf asd','sdf dsa','das j s','asd jd s','sad jds a','sajd'],
                   'feedback':np.random.randint(0,2,6))

transformer=ct(transformers=[('review_counts',cv(),'verified_reviews'),
                             ('variation_dummies', ohe(),['variation'])],
               remainder='passthrough')

pipe= mp(transformer, dtc(random_state=42))

x= data[['rating','variation','verified_reviews']].copy()
y= data.feedback

pipe.fit(x,y)

根据documentation,只要转换器需要一维数组作为输入,列就被指定为字符串(“xxx”)。对于需要 2D 数据的转换器,我们需要将列指定为字符串列表 (["xxx"])。

【讨论】:

以上是关于训练模型时出现不兼容行维度的值错误的主要内容,如果未能解决你的问题,请参考以下文章

在 Windows 上安装 CUDA 时出现不兼容错误

运行仪器测试离子Gitlab CI时出现不兼容的AVD错误

错误评估分类器训练和测试数据集不兼容

通过Rails中的关联使用has_many创建新模型时出现不允许的参数错误

yolov5 奇奇怪怪的错误汇总:版本兼容,模型训练,数据加载,模型加速

yolov5 奇奇怪怪的错误汇总:版本兼容,模型训练,数据加载,模型加速