在 python 中从 TIMIT 数据库中读取 WAV 文件

Posted 2023-02-25

技术标签:

【中文标题】在 python 中从 TIMIT 数据库中读取 WAV 文件【英文标题】：reading a WAV file from TIMIT database in python 【发布时间】：2017-06-25 16:19:21 【问题描述】：

我正在尝试从 python 中的 TIMIT 数据库中读取一个 wav 文件，但出现错误：

当我使用 wave 时：

wave.Error: file does not start with RIFF id

当我使用 scipy 时：

ValueError: File format b'NIST'... not understood.

当我使用 librosa 时，程序卡住了。我尝试使用 sox 将其转换为 wav：

cmd = "sox " + wav_file + " -t wav " + new_wav
subprocess.call(cmd, shell=True)

它没有帮助。我看到一个引用包 scikits.audiolab 的旧答案，但它看起来不再受支持。

如何读取这些文件以获取数据的 ndarray？

谢谢

【问题讨论】：

您可以尝试使用 soundfile 模块或任何其他应该支持 NIST 格式的 libsndfile 包装器来读取文件。 【参考方案1】：

您的文件不是 WAV 文件。显然它是一个 NIST SPHERE 文件。来自LDC web page：“许多LDC语料库包含NIST SPHERE格式的语音文件。”根据NIST File Format的描述，文件的前四个字符是NIST。这就是 scipy 错误告诉您的内容：它不知道如何读取以 NIST 开头的文件。

如果您想使用您尝试过的任何库读取文件，我怀疑您必须将convert the file 转换为 WAV。要使用程序 sph2pipe 强制转换为 WAV，请使用命令选项 -f wav（或等效的 -f rif），例如

sph2pipe -f wav input.sph output.wav

【讨论】：

我更新了我的答案，并附上了关于使用 -f wav 的说明。在当前目录下的所有文件上递归运行它的简单方法是find . -name '*.WAV' -exec sph2pipe -f wav .wav \;。唯一的缺点是您最终会得到以 .WAV.wav 结尾的文件。【参考方案2】：

从命令行发出这个来验证它是否是一个 wav 文件......

xxd -b myaudiofile.wav | head

如果它是 wav 格式，它会显示类似

00000000: 01010010 01001001 01000110 01000110 10111100 10101111  RIFF..
00000006: 00000001 00000000 01010111 01000001 01010110 01000101  ..WAVE
0000000c: 01100110 01101101 01110100 00100000 00010000 00000000  fmt ..
00000012: 00000000 00000000 00000001 00000000 00000001 00000000  ......
00000018: 01000000 00011111 00000000 00000000 01000000 00011111  @...@.
0000001e: 00000000 00000000 00000001 00000000 00001000 00000000  ......
00000024: 01100100 01100001 01110100 01100001 10011000 10101111  data..
0000002a: 00000001 00000000 10000001 10000000 10000001 10000000  ......
00000030: 10000001 10000000 10000001 10000000 10000001 10000000  ......
00000036: 10000001 10000000 10000001 10000000 10000001 10000000  ......

这是另一种显示二进制文件内容的方法，例如 WAV

od -A x -t x1z -v  audio_util_test_file_custom.wav   | head 
000000 52 49 46 46 24 80 00 00 57 41 56 45 66 6d 74 20  >RIFF$...WAVEfmt <
000010 10 00 00 00 01 00 01 00 44 ac 00 00 88 58 01 00  >........D....X..<
000020 02 00 10 00 64 61 74 61 00 80 00 00 00 00 78 05  >....data......x.<
000030 ed 0a 5e 10 c6 15 25 1b 77 20 ba 25 eb 2a 08 30  >..^...%.w .%.*.0<
000040 0e 35 fc 39 cf 3e 84 43 1a 48 8e 4c de 50 08 55  >.5.9.>.C.H.L.P.U<
000050 0b 59 e4 5c 91 60 12 64 63 67 85 6a 74 6d 30 70  >.Y.\.`.dcg.jtm0p<
000060 b8 72 0a 75 25 77 09 79 b4 7a 26 7c 5d 7d 5a 7e  >.r.u%w.y.z&|]Z~<
000070 1c 7f a3 7f ee 7f fd 7f d0 7f 67 7f c3 7e e3 7d  >..........g..~.<
000080 c9 7c 74 7b e6 79 1e 78 1f 76 e8 73 7b 71 d9 6e  >.|t.y.x.v.sq.n<
000090 03 6c fa 68 c1 65 57 62 c0 5e fd 5a 0f 57 f8 52  >.l.h.eWb.^.Z.W.R<

注意 wav 文件以字符 RIFF 开头这是文件使用 wav 编解码器的强制性指标...如果您的系统（我在 linux 上）没有上述命令行实用程序：xxd 然后使用 any hex editor like wxHexEditor 类似地检查您的 wav 文件以确认您看到 RIFF ...如果没有 RIFF，那么它根本就不是 wav 文件

以下是 wav 格式规范的详细信息

http://soundfile.sapp.org/doc/WaveFormat/

http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/WAVE.html

http://unusedino.de/ec64/technical/formats/wav.html

http://www.drdobbs.com/database/inside-the-riff-specification/184409308

https://www.gamedev.net/articles/programming/general-and-gameplay-programming/loading-a-wave-file-r709

http://www.topherlee.com/software/pcm-tut-wavformat.html

http://www.labbookpages.co.uk/audio/javaWavFiles.html

http://www.johnloomis.org/cpe102/asgn/asgn1/riff.html

http://nagasm.org/ASL/sound05/

【讨论】：

【参考方案3】：

如果您想要一个适用于文件夹内每个 wav 文件的通用代码，请运行：

forfiles /s /m *.wav /c "cmd /c sph2pipe -f wav @file @fnameRIFF.wav"

它搜索每个可以找到的 wav 文件，并创建一个 scipy 和 wave 都可以读取的 wav 文件，名称为 RIFF.wav

【讨论】：

这与 Warren Weckesser 的 sph2pipe 解决方案相辅相成...我本来可以将其作为评论，但我还没有所需的声誉。 find . -name '*.WAV' -exec sph2pipe -f wav .wav \; 如果你不想安装forfiles。【参考方案4】：

我编写了一个 python 脚本，它将所有方言的所有说话者所说的 NIST 格式的所有 .WAV 文件转换为 .wav 文件 n 在您的系统上播放。

注意：所有方言文件夹都存在于 ./TIMIT/TRAIN/ 中。您可能需要根据您的项目结构（或者如果您在 Windows 上）更改 dialects_path

from sphfile import SPHFile

dialects_path = "./TIMIT/TRAIN/"

for dialect in dialects:
    dialect_path = dialects_path + dialect
    speakers = os.listdir(path = dialect_path)
    for speaker in speakers:
        speaker_path =  os.path.join(dialect_path,speaker)        
        speaker_recordings = os.listdir(path = speaker_path)

        wav_files = glob.glob(speaker_path + '/*.WAV')

        for wav_file in wav_files:
            sph = SPHFile(wav_file)
            txt_file = ""
            txt_file = wav_file[:-3] + "TXT"

            f = open(txt_file,'r')
            for line in f:
                words = line.split(" ")
                start_time = (int(words[0])/16000)
                end_time = (int(words[1])/16000)
            print("writing file ", wav_file)
            sph.write_wav(wav_file.replace(".WAV",".wav"),start_time,end_time)

【讨论】：

【参考方案5】：

请使用 sounddevice 和 soundfile 获取 numpy 数组数据（和回放）使用以下代码：

import matplotlib.pyplot as plt
import soundfile as sf
import sounddevice as sd
# https://catalog.ldc.upenn.edu/desc/addenda/LDC93S1.wav
data, fs = sf.read('LDC93S1.wav')
print(data.shape,fs)
sd.play(data, fs, blocking=True)
plt.plot(data)
plt.show()

输出

(46797,) 16000

TIMIT 数据库 wav 文件示例：https://catalog.ldc.upenn.edu/desc/addenda/LDC93S1.wav

【讨论】：

【参考方案6】：

有时这可能是由于提取 7zip 文件的方法不正确造成的。我有一个类似的问题。我通过使用7z x <datasetname>.7z提取数据集解决了这个问题

【讨论】：

以上是关于在 python 中从 TIMIT 数据库中读取 WAV 文件的主要内容，如果未能解决你的问题，请参考以下文章