Matplotlib 错误:“高度”必须是长度 5 或标量

Posted

技术标签:

【中文标题】Matplotlib 错误:“高度”必须是长度 5 或标量【英文标题】:Matplotlib error: 'height' must be length 5 or scalar 【发布时间】:2018-01-28 13:04:58 【问题描述】:

我正在尝试使用 python 2.7 中的 matplotlib 将脚本的输出绘制成 2 系列条形图。

我的脚本打印 'msg' 会产生以下输出:

KNN:90.000000 (0.322734)

LDA:83.641395 (0.721210)

购物车:92.600996 (0.399870)

注意:29.214167 (1.743959)

随机森林:92.617598 (0.323824)

在代码输出“msg”的结果后,我尝试使用 matplotlib 将结果绘制成 2 系列条形图,然后返回以下错误:

Traceback (most recent call last):
  File "comparison.py", line 113, in <module>
    label='mean')
  File "C:\Users\Scot\Anaconda2\lib\site-packages\matplotlib\pyplot.py", line 2650, in bar
    **kwargs)
  File "C:\Users\Scot\Anaconda2\lib\site-packages\matplotlib\__init__.py", line 1818, in inner
    return func(ax, *args, **kwargs)
  File "C:\Users\Scot\Anaconda2\lib\site-packages\matplotlib\axes\_axes.py", line 2038, in bar
    "must be length %d or scalar" % nbars)
ValueError: incompatible sizes: argument 'height' must be length 5 or scalar

我不确定如何解决这个问题,我认为这可能是由于结果的值是浮点值?任何帮助将非常感激。 这是我的代码:

# Modules
import pandas
import numpy
import os
from pandas.tools.plotting import scatter_matrix
import matplotlib.pyplot as plt
from matplotlib import style
plt.rcdefaults()
from sklearn import preprocessing
from sklearn import cross_validation
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score, precision_recall_curve, average_precision_score
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.naive_bayes import GaussianNB
from scipy.stats import ttest_ind, ttest_ind_from_stats
from scipy.special import stdtr
from sklearn.svm import SVC
from collections import defaultdict
from sklearn.preprocessing import LabelEncoder
import warnings

# Load KDD dataset
data_set = "NSL-KDD/KDDTest+.arff"
import os
os.system("cls")

print "Loading: ", data_set

with warnings.catch_warnings():
    warnings.simplefilter("ignore")

    names = ['duration', 'protocol_type', 'service', 'flag', 'src_bytes', 'dst_bytes', 'land', 'wrong_fragment', 'urgent', 'hot', 'num_failed_logins', 'logged_in', 'num_compromised', 'su_attempted', 'num_root', 'num_file_creations',
             'num_shells', 'num_access_files', 'num_outbound_cmds', 'is_host_login', 'is_guest_login', 'count', 'srv_count', 'serror_rate', 'srv_serror_rate', 'rerror_rate', 'srv_rerror_rate', 'same_srv_rate', 'diff_srv_rate', 'srv_diff_host_rate',
             'dst_host_count', 'dst_host_srv_count', 'dst_host_same_srv_rate', 'dst_host_diff_srv_rate', 'dst_host_same_src_port_rate', 'dst_host_srv_diff_host_rate', 'dst_host_serror_rate', 'dst_host_srv_serror_rate', 'dst_host_rerror_rate', 'class',
             'dst_host_srv_rerror_rate']

    dataset = pandas.read_csv(data_set, names=names)

    for column in dataset.columns:
        if dataset[column].dtype == type(object):
            le = LabelEncoder()
            dataset[column] = le.fit_transform(dataset[column])

    array = dataset.values
    X = array[:, 0:40]
    Y = array[:, 40]

    # Split-out validation dataset
    validation_size = 0.20
    seed = 7
    X_train, X_validation, Y_train, Y_validation = cross_validation.train_test_split(
        X, Y, test_size=validation_size, random_state=seed)

    # Test options and evaluation metric
    num_folds = 10
    num_instances = len(X_train)
    seed = 10
    scoring = 'accuracy'

    #  Algorithms
    models = []
    models.append(('KNN', KNeighborsClassifier()))  
    models.append(('LDA', LinearDiscriminantAnalysis()))  
    models.append(('CART', DecisionTreeClassifier()))  
    models.append(('NB', GaussianNB()))  
    models.append(('Random Forest', RandomForestClassifier()))  
    # models.append(('LR', LogisticRegression())) 

    # evaluate each model in turn
    results = []
    names = []
    for name, model in models:
        kfold = cross_validation.KFold(n=num_instances, n_folds=num_folds, random_state=seed)
        cv_results = cross_validation.cross_val_score(
            model, X_train, Y_train, cv=kfold, scoring=scoring)
        results.append(cv_results)
        names.append(name)
        msg = "%s: %f (%f)" % (name, cv_results.mean() * 100, cv_results.std()
                               * 100)  # multiplying by 100 to show percentage
        print(msg)
        # print cv_results * 100 # plots all values that make the average

    print ("\n")

    # Perform T Test on each iteration of models.
    for i in range(len(results) - 1):
        for j in range(i, len(results)):
            t, p = ttest_ind(results[i], results[j], equal_var=False)
            print("T_Test between  & : T Value = , P Value = ".format(
                names[i], names[j], t, p))
            print("\n")

    plt.style.use('ggplot')
    n_groups = 5
    # create plot
    fig, ax = plt.subplots()
    index = numpy.arange(n_groups)
    bar_width = 0.35
    opacity = 0.8

    rects1 = plt.bar(index, cv_results, bar_width,
                     alpha=opacity,
                     #  color='b',
                     label='mean') # Line 113

    rects2 = plt.bar(index + bar_width, cv_results.std(), bar_width,
                     alpha=opacity,
                     color='g',
                     label='standard_d')

    plt.xlabel('Models')
    plt.ylabel('Percentage')
    plt.title('All Model Performance')
    plt.xticks(index + bar_width, (names))
    plt.legend()

    plt.tight_layout()
    plt.show()

编辑

printing cv_results 如下所示,有 7 位或 8 位小数:

[ 90.48146099  90.48146099  89.42999447  89.5960155   90.03873824
  89.9833979   89.9833979   89.76203652  90.09407858  90.14941893]

[ 83.34255672  84.94742667  82.2910902   83.78527947  84.3386829
  83.9513005   82.78915329  84.06198118  83.39789707  83.50857775]

[ 93.1931378   92.69507471  91.92030991  92.52905368  92.69507471
  92.41837299  92.58439402  92.25235196  92.19701162  92.14167128]

[ 29.05368013  26.89540675  31.54399557  28.22357499  29.27504151
  27.94687327  33.20420587  28.99833979  28.55561704  28.44493636]

[ 93.35915883  93.02711677  92.25235196  91.69894853  93.02711677
  92.63973437  92.58439402  92.14167128  92.47371334  92.69507471]

【问题讨论】:

cv_results 的长度是多少? @Goyo 我已经更新了问题以显示cv_results的结果 我不知道什么样的对象会有那个字符串表示。但是,如果您希望第一次调用bar 来绘制 5 个条形图,那没有任何意义。你最好把你的代码修剪成minimal reproducible example。 【参考方案1】:

如果要绘制cv_results 的均值,则需要使用.mean() 计算均值,就像在第二个图中使用.std() 一样。

此外,您经历了将每个模型的 cv_results 附加到 results 的过程,但是当您开始绘图时,您似乎仍在使用 cv_results,但这可能只是成为循环中访问的最后一个模型的 cv_results。

看起来你的results 将是一个包含 5 个 numpy 数组的列表。因此,您可以遍历该列表,计算每个数组的平均值,并使用它来绘制您的条形图:

mean_results = [res.mean() for res in results]
rects1 = plt.bar(index, mean_results,  bar_width,
                 alpha=opacity,
                 #  color='b',
                 label='mean')

或者,您可以在原始循环期间将cv_results.mean() 附加到列表中,并使用该列表制作条形图。

【讨论】:

cv_results.mean() 添加到新列表中效果非常棒!谢谢

以上是关于Matplotlib 错误:“高度”必须是长度 5 或标量的主要内容,如果未能解决你的问题,请参考以下文章

Matplotlib的'Float'错误

Python-matplotlib 画直方图hist

matplotlib 错误:x 和 y 的大小必须相同

matplotlib - 来自矩形高度阵列的 3d 表面

matplotlib手册(5) - 柱状图

Matplotlib Scatter - ValueError:RGBA 序列的长度应为 3 或 4