Sims，Mosi, Mosei

Posted 2022-03-18 ArdenWang

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了Sims，Mosi, Mosei相关的知识，希望对你有一定的参考价值。

sims：中文多模态情感识别数据集
MOSI：英文多模态情感识别数据集
MOSEI

sims：中文多模态情感识别数据集

label

**sentimental state **

emotion	label
negative	-1
neutral	0
positive	1

**regression task: average the five labeled results. **
-1.0, -0.8, -0.6, -0.4, -0.2, 0.0, 0.2, 0.4, 0.6, 0.8, 1.0.

divide these values into 5 classifications

emotion	label
negative	-1.0, -0.8
weakly negative	-0.6, -0.4, -0.2
neutral	0.0
weakly positive	0.2, 0.4, 0.6
positive	0.8, 1.0

Feature

Text

BERT-base word embeddings (768-dimensional word vector)

Audio

LibROSA speech toolkit with default parameters to extract acoustic features at 22050Hz.
Totally, 33dimensional frame-level acoustic features are extracted, including 1-dimensional logarithmic fundamental frequency (log F0), 20-dimensional Melfrequency cepstral coefficients (MFCCs) and 12dimensional Constant-Q chromatogram (CQT).

Vision

Frames are extracted from the video segments at 30Hz.
MTCNN face detection algorithm to extract aligned faces.
MultiComp OpenFace2.0 toolkit to extract the set of 68 facial landmarks, 17 facial action units, head pose, head orientation, and eye gaze. Lastly, 709-dimensional frame-level visual features are extracted in total.

数据集结构

import pickle
import numpy as np

with open(\'data/SIMS/unaligned_39.pkl\', \'rb\') as f:
    data = pickle.load(f)

print(data.keys())
output:
dict_keys([\'train\', \'valid\', \'test\'])

print(data[\'train\'].keys())
output:
dict_keys([\'raw_text\', \'text_bert\', \'audio_lengths\', \'vision_lengths\', \'classification_labels\', \'regression_labels\', \'classification_labels_T\', \'regression_labels_T\', \'classification_labels_A\', \'regression_labels_A\', \'classification_labels_V\', \'regression_labels_V\', \'text\', \'audio\', \'vision\', \'id\'])

print(data[\'train\'][\'raw_text\'][0])
output:
闭嘴，不是来抓你的。

保存数据
for mode in [\'train\',\'valid\',\'test\']:
    # str --> float32
    if use_bert:
        self.text = data[self.mode][\'text_bert\'].astype(np.float32)
    else:
        self.text = data[self.mode][\'text\'].astype(np.float32)
        
    vision = data[mode][\'vision\'].astype(np.float32)
    audio = data[mode][\'audio\'].astype(np.float32)
    rawText = data[mode][\'raw_text\']
    ids = data[mode][\'id\']

Statistics

print(len(data[\'train\'][\'id\']))
print(len(data[\'valid\'][\'id\']))
print(len(data[\'test\'][\'id\']))

output:
1368
456
457

MOSI：英文多模态情感识别数据集

label

emotion	label
strongly positive	+3
positive	+2
weakly positive	+1
neutral	0
weakly negative	-1
negative	-2
strongly negative	-3

Feature

Audio and visual features have been automatically extracted from MPEG files with framerates of 1000 for audio and 30 for video

Visual

16 Facial Action Units, 68 Facial Landmarks, Head Pose and Orientation, 6 Basic Emotions6 and Eye Gaze

Audio

COVAREP： pitch, energy, NAQ (Normalized Amplitude Quotient), MFCCs (Mel-frequency Cepstral Coefficients), Peak Slope, Energy Slope

数据集结构

import pickle
import numpy as np

with open(\'data/MOSI/aligned_50.pkl\', \'rb\') as f:
    data = pickle.load(f)

print(data.keys())
output:
dict_keys([\'train\', \'valid\', \'test\'])

print(data[\'train\'].keys())
output:
dict_keys([\'raw_text\', \'audio\', \'vision\', \'id\', \'text\', \'text_bert\', \'annotations\', \'classification_labels\', \'regression_labels\'])

print(data[\'train\'][\'raw_text\'][0])
output:
A LOT OF SAD PARTS

保存数据
for mode in [\'train\',\'valid\',\'test\']:
    if use_bert:
        self.text = data[mode][\'text_bert\'].astype(np.float32)
    else:
        self.text = data[mode][\'text\'].astype(np.float32)
        
    vision = data[mode][\'vision\'].astype(np.float32)
    audio = data[mode][\'audio\'].astype(np.float32)
    rawText = data[mode][\'raw_text\']
    ids = data[mode][\'id\']

Statistics

print(len(data[\'train\'][\'id\']))
print(len(data[\'valid\'][\'id\']))
print(len(data[\'test\'][\'id\']))

output:
1284
229
686

MOSEI

label

emotion	label
strongly positive	+3
positive	+2
weakly positive	+1
neutral	0
weakly negative	-1
negative	-2
strongly negative	-3

Feature Extraction

Text

All videos have manual transcription. Glove word embeddings

Visual:

Frames are extracted from the full videos at 30Hz.

The bounding box of the face is extracted using the MTCNN face detection algorithm .

facial action units through Facial Action Coding System (FACS) .

a set of six basic emotions purely from static faces using Emotient FACET .

MultiComp OpenFace is used to extract the set of 68 facial landmarks, 20 facial shape parameters, facial HoG features, head pose, head orientation and eye gaze.

face embeddings from commonly used facial recognition models such as DeepFace , FaceNet and SphereFace .

Acoustic

COVAREP software： extract acoustic features including 12 Mel-frequency cepstral coefficients, pitch, voiced/unvoiced segmenting features, glottal source parameters , peak slope parameters and maxima dispersion quotients.

数据集结构

import pickle
import numpy as np

with open(\'data/MOSEI/aligned_50.pkl\', \'rb\') as f:
    data = pickle.load(f)

print(data.keys())
output:
dict_keys([\'train\', \'valid\', \'test\'])

print(data[\'train\'].keys())
output:
dict_keys([\'raw_text\', \'audio\', \'vision\', \'id\', \'text\', \'text_bert\', \'annotations\', \'classification_labels\', \'regression_labels\'])

print(data[\'train\'][\'raw_text\'][0])
output:
Key is part of the people that we use to solve those issues, whether it\'s stretch or outdoor resistance or abrasions or different technical aspects that we really need to solve to get into new markets, they\'ve been able to bring solutions.

保存数据
for mode in [\'train\',\'valid\',\'test\']:
    if use_bert:
        self.text = data[mode][\'text_bert\'].astype(np.float32)
    else:
        self.text = data[mode][\'text\'].astype(np.float32)
        
    vision = data[mode][\'vision\'].astype(np.float32)
    audio = data[mode][\'audio\'].astype(np.float32)
    rawText = data[mode][\'raw_text\']
    ids = data[mode][\'id\']

Statistics

print(len(data[\'train\'][\'id\']))
print(len(data[\'valid\'][\'id\']))
print(len(data[\'test\'][\'id\']))

output:
16326
1871
4659

以上是关于Sims，Mosi, Mosei的主要内容，如果未能解决你的问题，请参考以下文章

AnyLogic sims 的函数拟合和数据拟合

模拟人生3 we are unable to verify that ur disc is a valid copy of the sims3.怎么解决？

如何解决webpack打包的文件体积过大的问题

无法通过ajax在仪表板上加载数据

TLE5012B 硬件电路设计4线SPI通信，驱动完美兼容4线SPI不用改MOSI开漏推挽输出

System.Collections.Generic.KeyNotFoundException: 给定关键字不在字典中。