计算 wav 文件和录制声音的频谱图（音量标准化）

Posted 2023-02-25

技术标签:

【中文标题】计算 wav 文件和录制声音的频谱图（音量标准化）【英文标题】：computing spectrograms of wav files & recorded sound (normalizing for volume) 【发布时间】：2013-08-31 21:06:15 【问题描述】：

我想以一致的方式比较录制的音频和从磁盘读取的音频，但我遇到了音量标准化的问题（否则频谱图的幅度不同）。

我以前也从未使用过信号、FFT 或 WAV 格式，所以这对我来说是新的、未知的领域。我将通道检索为以 44100 Hz 采样的有符号 16 位整数列表

磁盘上的 .wav 文件从我的笔记本电脑播放录制的音乐

然后我通过一个窗口 (2^k) 进行每个窗口，并有一定的重叠。对于每个窗口，如下所示：

# calculate window variables
window_step_size = int(self.window_size * (1.0 - self.window_overlap_ratio)) + 1
last_frame = nframes - window_step_size # nframes is total number of frames from audio source
num_windows, i = 0, 0 # calculate number of windows
while i <= last_frame: 
    num_windows += 1
    i += window_step_size

# allocate memory and initialize counter
wi = 0 # index
nfft = 2 ** self.nextpowof2(self.window_size) # size of FFT in 2^k
fft2D = np.zeros((nfft/2 + 1, num_windows), dtype='c16') # 2d array for storing results

# for each window
count = 0
times = np.zeros((1, num_windows)) # num_windows was calculated

while wi <= last_frame:

    # channel_samples is simply list of signed ints
    window_samples = channel_samples[ wi : (wi + self.window_size)]
    window_samples = np.hamming(len(window_samples)) * window_samples 

    # calculate and reformat [[[[ THIS IS WHERE I'M UNSURE ]]]]
    fft = 2 * np.fft.rfft(window_samples, n=nfft) / nfft
    fft[0] = 0 # apparently these are completely real and should not be used
    fft[nfft/2] = 0 
    fft = np.sqrt(np.square(fft) / np.mean(fft)) # use RMS of data
    fft2D[:, count] = 10 * np.log10(np.absolute(fft))

    # sec / frame * frames = secs
    # get midpt
    times[0, count] = self.dt * wi

    wi += window_step_size
    count += 1

# remove NaNs, infs
whereAreNaNs = np.isnan(fft2D);
fft2D[whereAreNaNs] = 0;
whereAreInfs = np.isinf(fft2D);
fft2D[whereAreInfs] = 0;

# find the spectorgram peaks
fft2D = fft2D.astype(np.float32)

# the get_2D_peaks() method discretizes the fft2D periodogram array and then
# finds peaks and filters out those peaks below the threshold supplied
# 
# the `amp_xxxx` variables are used for discretizing amplitude and the 
# times array above is used to discretize the time into buckets
local_maxima = self.get_2D_peaks(fft2D, self.amp_threshold, self.amp_max, self.amp_min, self.amp_step_size, times, self.dt)

特别是，疯狂的事情（至少对我而言）发生在我的评论 [[[[ THIS IS WHERE I'M UNSURE ]]]] 的那一行。

谁能指出我正确的方向或帮助我在正确标准化音量的同时生成此音频频谱图？

【问题讨论】：

【参考方案1】：

快速查看告诉我您忘记使用窗口，需要计算您的频谱图。

您需要在“window_samples”中使用一个窗口（hamming，hann）

np.hamming(len(window_samples)) * window_samples

然后就可以计算rfft了。

编辑：

#calc magnetitude from FFT
fftData=fft(windowed);
#Get Magnitude (linear scale) of first half values
Mag=abs(fftData(1:Chunk/2))
#if you want log scale R=20 * np.log10(Mag)
plot(Mag)

#calc RMS 来自 FFT RMS = np.sqrt( (np.sum(np.abs(np.fft(data)**2) / len(data))) / (len(data) / 2) )

RMStoDb = 20 * log10(RMS)

PS：如果你想从 FFT 计算 RMS，你不能使用 Window(Hann, Hamming)，这条线没有意义：

fft = np.sqrt(np.square(fft) / np.mean(fft)) # use RMS of data

每个窗口可以做一个简单的归一化数据：

window_samples = channel_samples[ wi : (wi + self.window_size)]

#framMax=np.max(window_samples);
framMean=np.mean(window_samples);

Normalized=window_samples/framMean;

【讨论】：

我的意思是在窗口中使用缩放值是一种很好的做法，但我认为没有它从根本上来说是错误的。但是，我确实添加了您的建议。跟进：我应该有负振幅吗？例如，从 2D 频谱图数组（我上面的代码中的fft2D）中获取所有幅度，我得到：i.imgur.com/gLyi3OW.png 澄清一下，这是幅度值的直方图对于线性幅度，您可以从 FFT 的前半部分值中获取绝对值 - 请参阅更新！但我想对音量进行归一化 - 在进行 FFT 后，我是否必须将频谱图除以所有幅度的总和才能真正与歌曲的音量无关？简单的方法是在时域中对音量进行归一化，然后应用 FFT，您向我们展示的方式并不是最好的方法！

以上是关于计算 wav 文件和录制声音的频谱图（音量标准化）的主要内容，如果未能解决你的问题，请参考以下文章