Java - 将 16 位有符号 pcm 音频数据数组转换为双精度数组

Posted 2023-02-24

技术标签:

【中文标题】Java - 将 16 位有符号 pcm 音频数据数组转换为双精度数组【英文标题】：Java - Convert 16-bit signed pcm audio data array to double array 【发布时间】：2016-06-07 07:49:17 【问题描述】：

我正在从事一个涉及音频处理的项目。

我正在从文件中提取一段音频，然后想对其进行一些处理。问题是我将音频数据作为字节数组获取，而我的处理是在双数组上（后来也在复杂数组上......）。

我的问题是如何正确地将收到的字节数组转换为双数组继续？

这是我的输入代码：

AudioFormat format = new AudioFormat(8000, 16, 1, true, true);
AudioInputStream in = Audiosystem.getAudioInputStream(WAVfile);
AudioInputStream din = null;
AudioFormat decodedFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 
                        8000,
                        16,
                        1,
                        2,
                        8000,
                        true);
din = AudioSystem.getAudioInputStream(decodedFormat, in);
TargetDataLine fileLine = AudioSystem.getTargetDataLine(decodedFormat);
fileLine .open(format);
fileLine .start();

int numBytesRead;
byte[] targetData = new byte[256]; // (samplingRate / 1000) * 32ms

while (true) 
    numBytesRead = din.read(targetData, 0, targetData.length);

    if (numBytesRead == -1) 
        break;
    

    double[] convertedData;
    // Conversion code goes here...

    processAudio(convertedData);

到目前为止，我已经研究了围绕本网站和其他网站的不同问题的不同答案。我尝试使用 ByteBuffer 和位转换，但它们都没有给我看起来正确的结果（我的另一个成员在 Python 中的同一个文件上做了同样的事情，所以我有一个参考结果应该是什么大概是……

我错过了什么？如何正确地将字节转换为双精度值？如果我只想在 targetData 中捕获 32ms 的文件，那么 targerData 的长度应该是多少？那么convertedData的长度是多少呢？

提前致谢。

【问题讨论】：

【参考方案1】：

使用 NIO 缓冲区的转换应该不会那么难。您所要做的就是应用一个因子从 16 位范围标准化到 [-1.0…1.0] 范围。

好吧，it isn’t so easy，但对于大多数实际目的，决定一个因素就足够了：

AudioFormat decodedFormat = new AudioFormat(AudioFormat.Encoding.PCM_SIGNED, 
                                            8000, 16, 1, 2, 8000, true);
try(AudioInputStream in  = AudioSystem.getAudioInputStream(WAVfile);
    AudioInputStream din = AudioSystem.getAudioInputStream(decodedFormat, in);
    ReadableByteChannel inCh = Channels.newChannel(din)) 

    ByteBuffer inBuf=ByteBuffer.allocate(256);
    final double factor=2.0/(1<<16);
    while(inCh.read(inBuf) != -1) 
        inBuf.flip();
        double[] convertedData=new double[inBuf.remaining()/2];
        DoubleBuffer outBuf=DoubleBuffer.wrap(convertedData);
        while(inBuf.remaining()>=2) 
            outBuf.put(inBuf.getShort()*factor);
        
        assert !outBuf.hasRemaining();
        inBuf.compact();
        processAudio(convertedData);

上述解决方案有效地使用了…/(double)0x8000 变体。因为我不知道processAudio 对提供的缓冲区做了什么，例如无论是否保留对它的引用，循环都会在每次迭代中分配一个新的缓冲区，但应该很容易将其更改为可重用的缓冲区。使用预分配缓冲区时，您只需要注意读取/转换的双精度数的实际数量。

【讨论】：

【参考方案2】：

首先，了解您用于示例AudioFormat.Encoding.PCM_SIGNED 和BigEndian 的格式，然后了解java int（此数字的格式）。然后使用二进制移位运算符 >> 和 << 正确移动字节（将字节之一向左移动 8 位 - 这样它将代表整数的高字节 - 需要移动的字节取决于此是 Little Endian 或 Big Endian，Big Endian 表示包含更重要部分的字节位于字节数组数组的末尾 - 因此您应该将数组中的第二个字节向左移动 8 位），然后将结果与+ 相加或 | 运算符转换为一个 int 变量，然后您需要将 int 划分为您想要的双精度范围。假设您想要范围 -1...+1，那么您应该将整数除以等于 32768 的双倍。

我会在此处发布代码，但我现在没有它。这是我遵循的指示。

例如，我已成功使用以下方法获取立体声音频数据：

AudioFormat format = new AudioFormat(8000, 16, 2, true, false);

然后通过以下方式转换它们：

   int l = (short) ((readedData[i*4+1]<<8)|readedData[i*4+0]);
   int r = (short) ((readedData[i*4+3]<<8)|readedData[i*4+2]);

所以你的比例应该是：

   double scaledL = l/32768d;
   double scaledR = r/32768d;

【讨论】：

根据您的信息和我以前看到的答案，如果我遍历字节数组“data”，我会像这样填写输出数组“realData”：realData[i] = (((数据[2*i] & 0xFF) 更像：realData[i] = (((data[2*i+1]) 好的。虽然我的结果仍然不在 -1 和 1 之间，但它们 > 100 相同的结果。如果我想当时有 32ms 的数据，并且格式如我在问题中提到的那样，我的输入字节数组的正确长度应该是多少？ 8000(samplerate)/1000(milliseconds in second) * 32( 你需要的毫秒数) * 2 (bytes per sample - 16 bit 是 2 bytes)

以上是关于Java - 将 16 位有符号 pcm 音频数据数组转换为双精度数组的主要内容，如果未能解决你的问题，请参考以下文章