如何解决 StandardScaler 值错误
Posted
技术标签:
【中文标题】如何解决 StandardScaler 值错误【英文标题】:how to resolve StandardScaler value error 【发布时间】:2022-01-09 09:34:48 【问题描述】:我正在按照image to image search 教程对图像数据集执行语义搜索。我搜索了多个 SO 问题和答案,但可以查明解决方案。这是我的代码
path="../../datasets/101_ObjectCategories"
dest = "content/all_images/"
all_images = [dest+elem for elem in os.listdir(dest)] #2
print("AMOUNT OF IMAGES :", len(all_images))
# Load as numpy array
dataset = np.empty(shape=(len(all_images),128,128,3)) #1
# print("DATASETS :", dataset)
for i,path in enumerate(all_images): #2
img = load_img(path, target_size=(128,128))
img_arr = img_to_array(img)
dataset[i] = img_arr
np.random.shuffle(dataset) #3
dshape = dataset.shape
print("dataset shape ", dshape)
pipeline = Pipeline([('scaling', StandardScaler()), ('pca', PCA(n_components=5))])
pipeline.fit_transform(dataset)
我目前遇到错误
这是我的输出打印输出错误
dataset shape (9171, 128, 128, 3)
Traceback (most recent call last):
File "embedder1_main.py", line 78, in <module>
pipeline.fit_transform(dataset)
File "/Users/james/miniforge3/envs/konstant-3/lib/python3.8/site-packages/sklearn/pipeline.py", line 426, in fit_transform
Xt = self._fit(X, y, **fit_params_steps)
File "/Users/james/miniforge3/envs/konstant-3/lib/python3.8/site-packages/sklearn/pipeline.py", line 348, in _fit
X, fitted_transformer = fit_transform_one_cached(
File "/Users/james/miniforge3/envs/konstant-3/lib/python3.8/site-packages/joblib/memory.py", line 349, in __call__
return self.func(*args, **kwargs)
File "/Users/james/miniforge3/envs/konstant-3/lib/python3.8/site-packages/sklearn/pipeline.py", line 893, in _fit_transform_one
res = transformer.fit_transform(X, y, **fit_params)
File "/Users/james/miniforge3/envs/konstant-3/lib/python3.8/site-packages/sklearn/base.py", line 847, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/Users/james/miniforge3/envs/konstant-3/lib/python3.8/site-packages/sklearn/preprocessing/_data.py", line 806, in fit
return self.partial_fit(X, y, sample_weight)
File "/Users/james/miniforge3/envs/konstant-3/lib/python3.8/site-packages/sklearn/preprocessing/_data.py", line 841, in partial_fit
X = self._validate_data(
File "/Users/james/miniforge3/envs/konstant-3/lib/python3.8/site-packages/sklearn/base.py", line 561, in _validate_data
X = check_array(X, **check_params)
File "/Users/james/miniforge3/envs/konstant-3/lib/python3.8/site-packages/sklearn/utils/validation.py", line 786, in check_array
raise ValueError(
ValueError: Found array with dim 4. StandardScaler expected <= 2.
是什么导致了这个错误,我该如何解决?
【问题讨论】:
【参考方案1】:如打印输出所述,您的数据集形状为 (9171, 128, 128, 3)
所以 dim = 4
。
您提供的链接中的代码使用深度神经网络来训练模型,让作者有机会轻松处理多波段图像 (RGB)。 此外,为了处理 0 到 1 之间的值,作者将图像数组除以 255(最大可能值)。
如果您仍然想使用StandardScaler
,恐怕您需要一个自定义函数来分别处理每个图像和每个波段,以便每次都能处理一个(128,128)
数组,虽然我是不确定是最好的方法。
【讨论】:
以上是关于如何解决 StandardScaler 值错误的主要内容,如果未能解决你的问题,请参考以下文章
如何在不使用 StandardScaler 的情况下标准化 PySpark 中的列?
添加了Standardscaler但在交叉验证和相关矩阵中收到错误
带有管道和 GridSearchCV 的 StandardScaler
如何通过 StandardScaler 使用 fit 和 transform 训练和测试数据