TypeError:“numpy.int64”类型的对象没有 len()
Posted
技术标签:
【中文标题】TypeError:“numpy.int64”类型的对象没有 len()【英文标题】:TypeError: object of type 'numpy.int64' has no len() 【发布时间】:2019-05-23 19:17:17 【问题描述】:我在PyTorch
中从DataSet
制作DataLoader
。
从加载DataFrame
开始,将所有dtype 作为np.float64
result = pd.read_csv('dummy.csv', header=0, dtype=DTYPE_CLEANED_DF)
这是我的数据集类。
from torch.utils.data import Dataset, DataLoader
class MyDataset(Dataset):
def __init__(self, result):
headers = list(result)
headers.remove('classes')
self.x_data = result[headers]
self.y_data = result['classes']
self.len = self.x_data.shape[0]
def __getitem__(self, index):
x = torch.tensor(self.x_data.iloc[index].values, dtype=torch.float)
y = torch.tensor(self.y_data.iloc[index], dtype=torch.float)
return (x, y)
def __len__(self):
return self.len
准备train_loader and test_loader
train_size = int(0.5 * len(full_dataset))
test_size = len(full_dataset) - train_size
train_dataset, test_dataset = torch.utils.data.random_split(full_dataset, [train_size, test_size])
train_loader = DataLoader(dataset=train_dataset, batch_size=16, shuffle=True, num_workers=1)
test_loader = DataLoader(dataset=train_dataset)
这是我的csv
file
当我尝试迭代 train_loader
.它引发了错误
for i , (data, target) in enumerate(train_loader):
print(i)
TypeError Traceback (most recent call last)
<ipython-input-32-0b4921c3fe8c> in <module>
----> 1 for i , (data, target) in enumerate(train_loader):
2 print(i)
/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py in __next__(self)
635 self.reorder_dict[idx] = batch
636 continue
--> 637 return self._process_next_batch(batch)
638
639 next = __next__ # Python 2 compatibility
/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py in _process_next_batch(self, batch)
656 self._put_indices()
657 if isinstance(batch, ExceptionWrapper):
--> 658 raise batch.exc_type(batch.exc_msg)
659 return batch
660
TypeError: Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in _worker_loop
samples = collate_fn([dataset[i] for i in batch_indices])
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line 138, in <listcomp>
samples = collate_fn([dataset[i] for i in batch_indices])
File "/opt/conda/lib/python3.6/site-packages/torch/utils/data/dataset.py", line 103, in __getitem__
return self.dataset[self.indices[idx]]
File "<ipython-input-27-107e03bc3c6a>", line 12, in __getitem__
x = torch.tensor(self.x_data.iloc[index].values, dtype=torch.float)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py", line 1478, in __getitem__
return self._getitem_axis(maybe_callable, axis=axis)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py", line 2091, in _getitem_axis
return self._get_list_axis(key, axis=axis)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexing.py", line 2070, in _get_list_axis
return self.obj._take(key, axis=axis)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/generic.py", line 2789, in _take
verify=True)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/internals.py", line 4537, in take
new_labels = self.axes[axis].take(indexer)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2195, in take
return self._shallow_copy(taken)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/range.py", line 267, in _shallow_copy
return self._int64index._shallow_copy(values, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/numeric.py", line 68, in _shallow_copy
return self._shallow_copy_with_infer(values=values, **kwargs)
File "/opt/conda/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 538, in _shallow_copy_with_infer
if not len(values) and 'dtype' not in kwargs:
TypeError: object of type 'numpy.int64' has no len()
相关问题:https://github.com/pytorch/pytorch/issues/10165https://github.com/pytorch/pytorch/pull/9237https://github.com/pandas-dev/pandas/issues/21946
问题:
如何解决pandas
这里的问题?
【问题讨论】:
尝试使用train_loader.shape
查看train_loader
的形状。很可能,条目数量存在问题。
@Bazingaa ['_DataLoader__initialized'、'batch_sampler'、'batch_size'、'collate_fn'、'dataset'、'drop_last'、'num_workers'、'pin_memory'、'sampler'、'timeout' , 'worker_init_fn'] 它没有shape
你的问题是由这行引起的:x = torch.tensor(self.x_data.iloc[index].values, dtype=torch.float)
,我猜更准确地说是调用.values
引起的。但我不是pandas
的专家。所以这似乎与 PyTorch 本身无关。我在您的问题中添加了pandas 标签,我想那里的人将能够准确地告诉您问题所在。
@blue-phoenox 同样的错误
【参考方案1】:
我喜欢做的是像这样将数据拆分为 2 个数据框-
from sklearn.model_selection import train_test_split
train, test = train_test_split(full_dataset, test_size=0.2)
然后像这样从 2 个数据集创建加载器-
train_loader = DataLoader(dataset=train, batch_size=16, shuffle=True, num_workers=1)
test_loader = DataLoader(dataset=test)
我认为这是最干净的方式。
【讨论】:
【参考方案2】:在我的脚本中,我首先通过dataset = TensorDataset(data_x, data_y)
创建一个Tensordataset,然后使用train_dataset, test_dataset = torch.utils.data.random_split(dataset, [train_size, test_size])
。这不会在以后的训练迭代中造成问题。
【讨论】:
【参考方案3】:我总共有 2298 张图片。所以如果我按照以下方式进行
[int(len(data)*0.8),int(len(data)*0.2)]
它抛出有问题的错误。 作为
[int(len(data)*0.8)+int(len(data)*0.2)]=2297
所以我要做的是floor
和ceil
函数
[int(np.floor(len(data)*0.8)),int(np.ceil(len(data)*0.2))])
结果是 2298 并且错误消失了
【讨论】:
【参考方案4】:我通过将我的 PyTorch 版本升级到 1.3 版解决了这个问题。
https://pytorch.org/get-started/locally/
【讨论】:
【参考方案5】:我认为问题在于使用random_split
后,index
现在是torch.Tensor
而不是int
。我发现向__getitem__
添加快速类型检查,然后在张量上使用.item()
对我有用:
def __getitem__(self, index):
if type(index) == torch.Tensor:
index = index.item()
x = torch.tensor(self.x_data.iloc[index].values, dtype=torch.float)
y = torch.tensor(self.y_data.iloc[index], dtype=torch.float)
return (x, y)
来源:https://discuss.pytorch.org/t/issues-with-torch-utils-data-random-split/22298/8
【讨论】:
【参考方案6】:为什么不简单地尝试一下:
self.len = len(self.x_data)
len
可以很好地与 pandas
DataFrame
一起使用,无需转换为数组或张量。
【讨论】:
【参考方案7】:参考:https://github.com/pytorch/pytorch/issues/9211
只需将.tolist()
添加到indices
行。
def random_split(dataset, lengths):
"""
Randomly split a dataset into non-overlapping new datasets of given lengths.
Arguments:
dataset (Dataset): Dataset to be split
lengths (sequence): lengths of splits to be produced
"""
if sum(lengths) != len(dataset):
raise ValueError("Sum of input lengths does not equal the length of the input dataset!")
indices = randperm(sum(lengths)).tolist()
return [Subset(dataset, indices[offset - length:offset]) for offset, length in zip(_accumulate(lengths), lengths)]
【讨论】:
以上是关于TypeError:“numpy.int64”类型的对象没有 len()的主要内容,如果未能解决你的问题,请参考以下文章
TypeError:conversion form numpy.int64 to Decimal is not supported
将 Pandas 系列导出为 JSON - numpy 类型错误
InterfaceError:执行操作失败; Python 类型 numpy.int64 无法转换