详解如何使用代码进行音频合成

Posted 2020-07-31 ztysir

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了详解如何使用代码进行音频合成相关的知识，希望对你有一定的参考价值。

作者：郑童宇
GitHub：https://github.com/CrazyZty

1.前言

　　音频合成在现实生活中应用广泛，在网上可以搜索到不少相关的讲解和代码实现，但个人感觉在网上搜索到的音频合成相关文章的讲解都并非十分透彻，故而写下本篇博文，计划通过讲解如何使用代码实现音频合成功能从而将本人对音频合成的理解阐述给各位，力图读完的各位可以对音频合成整体过程有一个清晰的了解。
　　本篇博文以Java为示例语言，以android为示例平台。
　　本篇博文着力于讲解音频合成实现原理与过程中的细节和潜在问题，目的是让各位不被编码语言所限制，在本质上理解如何实现音频合成的功能。

2.音频合成

2.1.功能简介

　　本次实现的音频合成功能参考"唱吧"的音频合成，功能流程是：录音生成PCM文件，接着根据录音时长对背景音乐文件进行解码加裁剪，同时将解码后的音频调制到与录音文件相同的采样率，采样点字节数，声道数，接着根据指定系数对两个音频文件进行音量调节并合成为PCM文件，最后进行压缩编码生成MP3文件。

2.2.功能实现

2.2.1.录音

　　录音功能生成的目标音频格式是PCM格式，对于PCM的定义，维基百科上是这么写到的："Pulse-code modulation (PCM) is a method used to digitally represent sampled analog signals. It is the standard form of digital audio in computers, Compact Discs, digital telephony and other digital audio applications. In a PCM stream, the amplitude of the analog signal is sampled regularly at uniform intervals, and each sample is quantized to the nearest value within a range of digital steps."，大致意思是PCM是用来采样模拟信号的一种方法，是现在数字音频应用中数字音频的标准格式，而PCM采样的原理，是均匀间隔的将模拟信号的振幅量化成指定数据范围内最贴近的数值。
　　PCM文件存储的数据是不经压缩的纯音频数据，当然只是这么说可能有些抽象，我们拉上大家熟知的MP3文件进行对比，MP3文件存储的是压缩后的音频，PCM与MP3两者之间的关系简单说就是：PCM文件经过MP3压缩算法处理后生成的文件就是MP3文件。我们简单比较一下双方存储所消耗的空间，1分钟的每采样点16位的双声道的44.1kHz采样率PCM文件大小为：1*60*16/8*2*44.1*1000/1024=10335.9375KB，约为10MB，而对应的128kps的MP3文件大小仅为1MB左右，既然PCM文件占用存储空间这么大，我们是不是应该放弃使用PCM格式存储录音，恰恰相反，注意第一句话:"PCM文件存储的数据是不经压缩的纯音频数据"，这意味只有PCM格式的音频数据是可以用来直接进行声音处理，例如进行音量调节，声音滤镜等操作，相对的其他的音频编码格式都是必须解码后才能进行处理（PCM编码的WAV文件也得先读取文件头），当然这不代表PCM文件就好用，因为没有文件头，所以进行处理或者播放之前我们必须事先知道PCM文件的声道数，采样点字节数，采样率，编码大小端，这在大多数情况下都是不可能的，事实上就我所知没有播放器是直接支持PCM文件的播放。不过现在录音的各项系数都是我们定义的，所以我们就不用担心这个问题。
　　背景知识了解这些就足够了，下面我给出实现代码，综合代码讲解实现过程。

if (recordVoice) {
        audioRecord = new AudioRecord(MediaRecorder.Audiosource.MIC,
                Constant.RecordSampleRate, AudioFormat.CHANNEL_IN_MONO,
                pcmFormat.getAudioFormat(), audioRecordBufferSize);

        try {
            audioRecord.startRecording();
        } catch (Exception e) {
            NoRecordPermission();
            continue;
        }

        BufferedOutputStream bufferedOutputStream = FileFunction
                .GetBufferedOutputStreamFromFile(recordFileUrl);

        while (recordVoice) {
            int audioRecordReadDataSize =
                    audioRecord.read(audioRecordBuffer, 0, audioRecordBufferSize);

            if (audioRecordReadDataSize > 0) {
                calculateRealVolume(audioRecordBuffer, audioRecordReadDataSize);
                if (bufferedOutputStream != null) {
                    try {
                        byte[] outputByteArray = CommonFunction
                                .GetByteBuffer(audioRecordBuffer,
                                        audioRecordReadDataSize, Variable.isBigEnding);
                        bufferedOutputStream.write(outputByteArray);
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                }
            } else {
                NoRecordPermission();
                continue;
            }
        }

        if (bufferedOutputStream != null) {
            try {
                bufferedOutputStream.close();
            } catch (Exception e) {
                LogFunction.error("关闭录音输出数据流异常", e);
            }
        }

        audioRecord.stop();
        audioRecord.release();
        audioRecord = null;
    }

　　录音的实际实现和控制代码较多，在此仅抽出核心的录音代码进行讲解。在此为获取录音的原始数据，我使用了Android原生的AudioRecord，其他的平台基本也会提供类似的工具类。这段代码实现的功能是当录音开始后，应用会根据设定的采样率和声道数以及采样字节数来不断从MIC中获取原始的音频数据，然后将获取的音频数据写入到指定文件中，直至录音结束。这段代码逻辑比较清晰的，我就不过多讲解了。
　　潜在问题的话，手机平台上是需要申请录音权限的，如果没有录音权限就无法生成正确的录音文件。

2.2.2.解码与裁剪背景音乐

　　如前文所说，除了PCM格式以外的所有音频编码格式的音频都必须解码后才可以处理，因此要让背景音乐参与合成必须事先对背景音乐进行解码，同时为减少合成的MP3文件的大小，需要根据录音时长对解码的音频文件进行裁剪。本节不会详细解释解码算法，因为每个平台都会有对应封装的工具类，直接使用即可。
　　背景知识先讲这些，本次功能实现过程中的潜在问题较多，下面我给出实现代码，综合代码讲解实现过程。

private boolean decodeMusicFile(String musicFileUrl, String decodeFileUrl, int startSecond, int endSecond,
                                    Handler handler,
                                    DecodeOperateInterface decodeOperateInterface) {
        int sampleRate = 0;
        int channelCount = 0;

        long duration = 0;

        String mime = null;

        MediaExtractor mediaExtractor = new MediaExtractor();
        MediaFormat mediaFormat = null;
        MediaCodec mediaCodec = null;

        try {
            mediaExtractor.setDataSource(musicFileUrl);
        } catch (Exception e) {
            LogFunction.error("设置解码音频文件路径错误", e);
            return false;
        }

        mediaFormat = mediaExtractor.getTrackFormat(0);
        sampleRate = mediaFormat.containsKey(MediaFormat.KEY_SAMPLE_RATE) ?
                mediaFormat.getInteger(MediaFormat.KEY_SAMPLE_RATE) : 44100;
        channelCount = mediaFormat.containsKey(MediaFormat.KEY_CHANNEL_COUNT) ?
                mediaFormat.getInteger(MediaFormat.KEY_CHANNEL_COUNT) : 1;
        duration = mediaFormat.containsKey(MediaFormat.KEY_DURATION) ? mediaFormat.getLong(MediaFormat.KEY_DURATION) : 0;
        mime = mediaFormat.containsKey(MediaFormat.KEY_MIME) ? mediaFormat.getString(MediaFormat.KEY_MIME) : "";

        LogFunction.log("歌曲信息",
                "Track info: mime:" + mime + " 采样率sampleRate:" + sampleRate + " channels:" +
                        channelCount + " duration:" + duration);

        if (CommonFunction.isEmpty(mime) || !mime.startsWith("audio/")) {
            LogFunction.error("解码文件不是音频文件", "mime:" + mime);
            return false;
        }

        if (mime.equals("audio/ffmpeg")) {
            mime = "audio/mpeg";
            mediaFormat.setString(MediaFormat.KEY_MIME, mime);
        }

        try {
            mediaCodec = MediaCodec.createDecoderByType(mime);

            mediaCodec.configure(mediaFormat, null, null, 0);
        } catch (Exception e) {
            LogFunction.error("解码器configure出错", e);
            return false;
        }

        getDecodeData(mediaExtractor, mediaCodec, decodeFileUrl, sampleRate, channelCount, startSecond,
                endSecond, handler, decodeOperateInterface);
        return true;
    }

　　decodeMusicFile方法的代码主要功能是获取背景音乐信息，初始化解码器，最后调用getDecodeData方法正式开始对背景音乐进行处理。
　　代码中使用了Android原生工具类作为解码器，事实上作为原生的解码器，我也遇到过兼容性问题不得不做了一些相应的处理，不得不抱怨一句不同的Android定制系统实在是导致了太多的兼容性问题。

private void getDecodeData(MediaExtractor mediaExtractor, MediaCodec mediaCodec, String decodeFileUrl, int sampleRate,
                               int channelCount, int startSecond, int endSecond,
                               Handler handler,
                               final DecodeOperateInterface decodeOperateInterface) {
        boolean decodeInputEnd = false;
        boolean decodeOutputEnd = false;

        int sampleDataSize;
        int inputBufferIndex;
        int outputBufferIndex;
        int byteNumber;

        long decodeNoticeTime = System.currentTimeMillis();
        long decodeTime;
        long presentationTimeUs = 0;

        final long timeOutUs = 100;
        final long startMicroseconds = startSecond * 1000 * 1000;
        final long endMicroseconds = endSecond * 1000 * 1000;

        ByteBuffer[] inputBuffers;
        ByteBuffer[] outputBuffers;

        ByteBuffer sourceBuffer;
        ByteBuffer targetBuffer;

        MediaFormat outputFormat = mediaCodec.getOutputFormat();

        MediaCodec.BufferInfo bufferInfo;

        byteNumber =
                (outputFormat.containsKey("bit-width") ? outputFormat.getInteger("bit-width") : 0) / 8;

        mediaCodec.start();

        inputBuffers = mediaCodec.getInputBuffers();
        outputBuffers = mediaCodec.getOutputBuffers();

        mediaExtractor.selectTrack(0);

        bufferInfo = new MediaCodec.BufferInfo();

        BufferedOutputStream bufferedOutputStream = FileFunction
                .GetBufferedOutputStreamFromFile(decodeFileUrl);

        while (!decodeOutputEnd) {
            if (decodeInputEnd) {
                return;
            }

            decodeTime = System.currentTimeMillis();

            if (decodeTime - decodeNoticeTime > Constant.OneSecond) {
                final int decodeProgress =
                        (int) ((presentationTimeUs - startMicroseconds) * Constant.NormalMaxProgress /
                                endMicroseconds);

                if (decodeProgress > 0) {
                    handler.post(new Runnable() {
                        @Override
                        public void run() {
                            decodeOperateInterface.updateDecodeProgress(decodeProgress);
                        }
                    });
                }

                decodeNoticeTime = decodeTime;
            }

            try {
                inputBufferIndex = mediaCodec.dequeueInputBuffer(timeOutUs);

                if (inputBufferIndex >= 0) {
                    sourceBuffer = inputBuffers[inputBufferIndex];

                    sampleDataSize = mediaExtractor.readSampleData(sourceBuffer, 0);

                    if (sampleDataSize < 0) {
                        decodeInputEnd = true;
                        sampleDataSize = 0;
                    } else {
                        presentationTimeUs = mediaExtractor.getSampleTime();
                    }

                    mediaCodec.queueInputBuffer(inputBufferIndex, 0, sampleDataSize,
                            presentationTimeUs,
                            decodeInputEnd ? MediaCodec.BUFFER_FLAG_END_OF_STREAM : 0);

                    if (!decodeInputEnd) {
                        mediaExtractor.advance();
                    }
                } else {
                    LogFunction.error("inputBufferIndex", "" + inputBufferIndex);
                }

                // decode to PCM and push it to the AudioTrack player
                outputBufferIndex = mediaCodec.dequeueOutputBuffer(bufferInfo, timeOutUs);

                if (outputBufferIndex < 0) {
                    switch (outputBufferIndex) {
                        case MediaCodec.INFO_OUTPUT_BUFFERS_CHANGED:
                            outputBuffers = mediaCodec.getOutputBuffers();
                            LogFunction.error("MediaCodec.INFO_OUTPUT_BUFFERS_CHANGED",
                                    "[AudioDecoder]output buffers have changed.");
                            break;
                        case MediaCodec.INFO_OUTPUT_FORMAT_CHANGED:
                            outputFormat = mediaCodec.getOutputFormat();

                            sampleRate = outputFormat.containsKey(MediaFormat.KEY_SAMPLE_RATE) ?
                                    outputFormat.getInteger(MediaFormat.KEY_SAMPLE_RATE) : sampleRate;
                            channelCount = outputFormat.containsKey(MediaFormat.KEY_CHANNEL_COUNT) ?
                                    outputFormat.getInteger(MediaFormat.KEY_CHANNEL_COUNT) : channelCount;
                            byteNumber = (outputFormat.containsKey("bit-width") ? outputFormat.getInteger("bit-width") : 0) / 8;

                            LogFunction.error("MediaCodec.INFO_OUTPUT_FORMAT_CHANGED",
                                    "[AudioDecoder]output format has changed to " +
                                            mediaCodec.getOutputFormat());
                            break;
                        default:
                            LogFunction.error("error",
                                    "[AudioDecoder] dequeueOutputBuffer returned " +
                                            outputBufferIndex);
                            break;
                    }
                    continue;
                }

                targetBuffer = outputBuffers[outputBufferIndex];

                byte[] sourceByteArray = new byte[bufferInfo.size];

                targetBuffer.get(sourceByteArray);
                targetBuffer.clear();

                mediaCodec.releaseOutputBuffer(outputBufferIndex, false);

                if ((bufferInfo.flags & MediaCodec.BUFFER_FLAG_END_OF_STREAM) != 0) {
                    decodeOutputEnd = true;
                }

                if (sourceByteArray.length > 0 && bufferedOutputStream != null) {
                    if (presentationTimeUs < startMicroseconds) {
                        continue;
                    }

                    byte[] convertByteNumberByteArray = ConvertByteNumber(byteNumber, Constant.RecordByteNumber, sourceByteArray);

                    byte[] resultByteArray =
                            ConvertChannelNumber(channelCount, Constant.RecordChannelNumber, Constant.RecordByteNumber,
                                    convertByteNumberByteArray);

                    try {
                        bufferedOutputStream.write(resultByteArray);
                    } catch (Exception e) {
                        LogFunction.error("输出解压音频数据异常", e);
                    }
                }

                if (presentationTimeUs > endMicroseconds) {
                    break;
                }
            } catch (Exception e) {
                LogFunction.error("getDecodeData异常", e);
            }
        }

        if (bufferedOutputStream != null) {
            try {
                bufferedOutputStream.close();
            } catch (IOException e) {
                LogFunction.error("关闭bufferedOutputStream异常", e);
            }
        }

        if (sampleRate != Constant.RecordSampleRate) {
            Resample(sampleRate, decodeFileUrl);
        }

        if (mediaCodec != null) {
            mediaCodec.stop();
            mediaCodec.release();
        }

        if (mediaExtractor != null) {
            mediaExtractor.release();
        }
    }

　　getDecodeData方法是此次的进行解码和裁剪的核心，方法的传入参数中mediaExtractor，mediaCodec用以实际控制处理背景音乐的音频数据，decodeFileUrl用以指明解码和裁剪后的PCM文件的存储地址，sampleRate，channelCount分别用以指明背景音乐的采样率，声道数，startSecond用以指明裁剪背景音乐的开始时间，目前功能中默认为0，endSecond用以指明裁剪背景音乐的结束时间，数值大小由录音时长直接决定。
　　getDecodeData方法中通过不断通过mediaCodec读入背景音乐原始数据进行处理，然后解码输出到buffer从而获取解码后的数据，因为mediaCodec的读取解码方法和平台相关就不过多描述，在解码过程中通过startSecond与endSecond来控制解码后音频数据输出的开始与结束。
　　解码和裁剪根据上文的描述是比较简单的，通过平台提供的工具类解码背景音乐数据，然后通过变量裁剪出指定长度的解码后音频数据输出到外文件，这一个流程结束功能就实现了，但在过程中存在几个潜在问题点。
　　首先，要进行合成处理的话，我们必须要保证录音文件和解码后文件的采样率，采样点字节数，以及声道数相同，因为录音文件的这三项系数已经固定，所以我们必须对解码的音频数据进行处理以保证最终生成的解码文件三项系数和录音文件一致。在http://blog.csdn.net/ownwell/article/details/8114121/，我们可以了解PCM文件常见的四种存储格式。
　　格式字节1 字节2 字节3 字节4
　　8位单声道 0声道 0声道 0声道 0声道
　　8位双声道 0声道(左) 1声道(右) 0声道(左) 1声道(右)
　　16位单声道 0声道(低) 0声道(高) 0声道(低) 0声道(高)
　　16位双声道 0声道(左，低字节) 0声道(左，高字节) 1声道(右，低字节) 1声道(右，高字节)
　　了解这些知识后，我们就可以知道如何编码以将已知格式的音频数据转化到另一采样点字节数和声道数。
　　getDecodeData方法中146行调用的ConvertByteNumber方法是通过处理音频数据以保证解码后音频文件和录音文件采样点字节数相同。

private static byte[] ConvertByteNumber(int sourceByteNumber, int outputByteNumber, byte[] sourceByteArray) {
        if (sourceByteNumber == outputByteNumber) {
            return sourceByteArray;
        }

        int sourceByteArrayLength = sourceByteArray.length;

        byte[] byteArray;

        switch (sourceByteNumber) {
            case 1:
                switch (outputByteNumber) {
                    case 2:
                        byteArray = new byte[sourceByteArrayLength * 2];

                        byte resultByte[];

                        for (int index = 0; index < sourceByteArrayLength; index += 1) {
                            resultByte = CommonFunction.GetBytes((short) (sourceByteArray[index] * 256), Variable.isBigEnding);

                            byteArray[2 * index] = resultByte[0];
                            byteArray[2 * index + 1] = resultByte[1];
                        }

                        return byteArray;
                }
                break;
            case 2:
                switch (outputByteNumber) {
                    case 1:
                        int outputByteArrayLength = sourceByteArrayLength / 2;

                        byteArray = new byte[outputByteArrayLength];

                        for (int index = 0; index < outputByteArrayLength; index += 1) {
                            byteArray[index] = (byte) (CommonFunction.GetShort(sourceByteArray[2 * index],
                                    sourceByteArray[2 * index + 1], Variable.isBigEnding) / 256);
                        }

                        return byteArray;
                }
                break;
        }

        return sourceByteArray;
    }

　　ConvertByteNumber方法的参数中sourceByteNumber代表背景音乐文件采样点字节数，outputByteNumber代表录音文件采样点字节数，两者如果相同就不处理，不相同则根据背景音乐文件采样点字节数进行不同的处理，本方法只对单字节存储和双字节存储进行了处理，欢迎在各位Github上填充其他采样点字节数的处理方法，
　　getDecodeData方法中149行调用的ConvertChannelNumber方法是通过处理音频数据以保证解码后音频文件和录音文件声道数相同。

private static byte[] ConvertChannelNumber(int sourceChannelCount, int outputChannelCount, int byteNumber,
                                               byte[] sourceByteArray) {
        if (sourceChannelCount == outputChannelCount) {
            return sourceByteArray;
        }

        switch (byteNumber) {
            case 1:
            case 2:
                break;
            default:
                return sourceByteArray;
        }

        int sourceByteArrayLength = sourceByteArray.length;

        byte[] byteArray;

        switch (sourceChannelCount) {
            case 1:
                switch (outputChannelCount) {
                    case 2:
                        byteArray = new byte[sourceByteArrayLength * 2];

                        byte firstByte;
                        byte secondByte;

                        switch (byteNumber) {
                            case 1:
                                for (int index = 0; index < sourceByteArrayLength; index += 1) {
                                    firstByte = sourceByteArray[index];

                                    byteArray[2 * index] = firstByte;
                                    byteArray[2 * index + 1] = firstByte;
                                }
                                break;
                            case 2:
                                for (int index = 0; index < sourceByteArrayLength; index += 2) {
                                    firstByte = sourceByteArray[index];
                                    secondByte = sourceByteArray[index + 1];

                                    byteArray[2 * index] = firstByte;
                                    byteArray[2 * index + 1] = secondByte;
                                    byteArray[2 * index + 2] = firstByte;
                                    byteArray[2 * index + 3] = secondByte;
                                }
                                break;
                        }

                        return byteArray;
                }
                break;
            case 2:
                switch (outputChannelCount) {
                    case 1:
                        int outputByteArrayLength = sourceByteArrayLength / 2;

                        byteArray = new byte[outputByteArrayLength];

                        switch (byteNumber) {
                            case 1:
                                for (int index = 0; index < outputByteArrayLength; index += 2) {
                                    short averageNumber =
                                            (short) ((short) sourceByteArray[2 * index] + (short) sourceByteArray[2 * index + 1]);
                                    byteArray[index] = (byte) (averageNumber >> 1);
                                }
                                break;
                            case 2:
                                for (int index = 0; index < outputByteArrayLength; index += 2) {
                                    byte resultByte[] = CommonFunction.AverageShortByteArray(sourceByteArray[2 * index],
                                            sourceByteArray[2 * index + 1], sourceByteArray[2 * index + 2],
                                            sourceByteArray[2 * index + 3], Variable.isBigEnding);

                                    byteArray[index] = resultByte[0];
                                    byteArray[index + 1] = resultByte[1];
                                }
                                break;
                        }

                        return byteArray;
                }
                break;
        }

        return sourceByteArray;
    }

　　ConvertChannelNumber方法的参数中sourceChannelNumber代表背景音乐文件声道数，outputChannelNumber代表录音文件声道数，两者如果相同就不处理，不相同则根据声道数和采样点字节数进行不同的处理，本方法只对单双通道进行了处理，欢迎在Github上填充立体声等声道的处理方法。
　　getDecodeData方法中176行调用的Resample方法是用以处理音频数据以保证解码后音频文件和录音文件采样率相同。

private static void Resample(int sampleRate, String decodeFileUrl) {
        String newDecodeFileUrl = decodeFileUrl + "new";

        try {
            FileInputStream fileInputStream =
                    new FileInputStream(new File(decodeFileUrl));
            FileOutputStream fileOutputStream =
                    new FileOutputStream(new File(newDecodeFileUrl));

            new SSRC(fileInputStream, fileOutputStream, sampleRate, Constant.RecordSampleRate,
                    Constant.RecordByteNumber, Constant.RecordByteNumber, 1, Integer.MAX_VALUE, 0, 0, true);

            fileInputStream.close();
            fileOutputStream.close();

            FileFunction.RenameFile(newDecodeFileUrl, decodeFileUrl);
        } catch (IOException e) {
            LogFunction.error("关闭bufferedOutputStream异常", e);
        }
    }

　　为了修改采样率，在此使用了SSRC在Java端的实现，在网上可以搜到一份关于SSRC的介绍："SSRC = Synchronous Sample Rate Converter，同步采样率转换，直白地说就是只能做整数倍频，不支持任意频率之间的转换，比如44.1KHz<->48KHz。"，但不同的SSRC实现原理有所不同，我是用的是来自https://github.com/shibatch/SSRC在Java端的实现，简单读了此SSRC在Java端实现的源码，其代码实现中通过判别重采样前后采样率的最大公约数是否满足设定条件作为是否可重采样的依据，可以支持常见的非整数倍频率的采样率转化，如44.1khz<->48khz，但如果目标采样率是比较特殊的采样率如某一较大的质数，那就无法支持重采样。
　　至此，Resample，ConvertByteNumber，ConvertChannelNumber三个方法的处理保证了解码后文件和录音文件的采样率，采样点字节数，以及声道数相同。
　　接着，此处潜在的第二个问题就是大小端存储。对计算机体系结构有所了解的同学肯定了解"大小端"这个概念，大小端分别代表了多字节数据在内存中组织的两种不同顺序，如果对于"大小端"不是太了解，可以浏览http://blog.jobbole.com/102432/的阐述，在处理音频数据的方法中，我们可以看到"Variable.isBigEnding"这个参数，这个参数的含义就是当前平台是否使用大端编码，这里大家肯定会有疑问，内存中多字节数据的组织顺序为什么会影响我们对音频数据的处理，举个例子，如果我们在将采样点8位的音频数据转化为采样点16位，目前的做法是将原始数据乘以256，相当于每一个byte转化为short，同时short的高字节为原byte的内容，低字节为0，那现在问题来了，那就是高字节放到高地址还是低地址，这就和平台采用的大小端存储格式息息相关了，当然如果我们输出的数据类型是short那就不用关心，Java会帮我们处理掉，但我们输出的是byte数组，这就需要我们自己对数据进行处理了。
　　这是一个很容易忽视的问题，因为正常情况下的软件开发过程中我们基本是不用关心大小端的问题的，但在这里必须对大小端的情况进行处理，不然会出现在某些平台合成的音频无法播放的情况。

2.2.3.合成与输出

　　录音和对背景音乐的处理结束了，接下来就是最后的合成了，对于合成我们脑海中浮现最多的会是什么？相加，对没错，音频合成并不神秘，音频合成的本质就是相同系数的音频文件之间数据的加和，当然现实中的合成往往并非如此简单，在网上搜索"混音算法"，我们可以看到大量高深的音频合成算法，但就目前而言，我们没必要实现复杂的混音算法，只要让两个音频文件的原始音频数据相加即可，不过为了让我们的合成看上去稍微有一些技术含量，此次提供的音频合成方法中允许任意音频文件相对于另一音频文件进行时间上的偏移，并可以通过两个权重数据进行音量调节。下面我就给出具体代码吧，讲解如何实现。

public static void ComposeAudio(String firstAudioFilePath, String secondAudioFilePath,
                                    String composeAudioFilePath, boolean deleteSource,
                                    float firstAudioWeight, float secondAudioWeight,
                                    int audioOffset,
                                    final ComposeAudioInterface composeAudioInterface) {
        boolean firstAudioFinish = false;
        boolean secondAudioFinish = false;

        byte[] firstAudioByteBuffer;
        byte[] secondAudioByteBuffer;
        byte[] mp3Buffer;

        short resultShort;
        short[] outputShortArray;

        int index;
        int firstAudioReadNumber;
        int secondAudioReadNumber;
        int outputShortArrayLength;
        final int byteBufferSize = 1024;

        firstAudioByteBuffer = new byte[byteBufferSize];
        secondAudioByteBuffer = new byte[byteBufferSize];
        mp3Buffer = new byte[(int) (7200 + (byteBufferSize * 1.25))];

        outputShortArray = new short[byteBufferSize / 2];

        Handler handler = new Handler(Looper.getMainLooper());

        FileInputStream firstAudioInputStream = FileFunction.GetFileInputStreamFromFile(firstAudioFilePath);
        FileInputStream secondAudioInputStream = FileFunction.GetFileInputStreamFromFile(secondAudioFilePath);
        FileOutputStream composeAudioOutputStream = FileFunction.GetFileOutputStreamFromFile(composeAudioFilePath);

        LameUtil.init(Constant.RecordSampleRate, Constant.LameBehaviorChannelNumber,
                Constant.BehaviorSampleRate, Constant.LameBehaviorBitRate, Constant.LameMp3Quality);

        try {
            while (!firstAudioFinish && !secondAudioFinish) {
                index = 0;

                if (audioOffset < 0) {
                    secondAudioReadNumber = secondAudioInputStream.read(secondAudioByteBuffer);

                    outputShortArrayLength = secondAudioReadNumber / 2;

                    for (; index < outputShortArrayLength; index++) {
                        resultShort = CommonFunction.GetShort(secondAudioByteBuffer[index * 2],
                                secondAudioByteBuffer[index * 2 + 1], Variable.isBigEnding);

                        outputShortArray[index] = (short) (resultShort * secondAudioWeight);
                    }

                    audioOffset += secondAudioReadNumber;

                    if (secondAudioReadNumber < 0) {
                        secondAudioFinish = true;
                        break;
                    }

                    if (audioOffset >= 0) {
                        break;
                    }
                } else {
                    firstAudioReadNumber = firstAudioInputStream.read(firstAudioByteBuffer);

                    outputShortArrayLength = firstAudioReadNumber / 2;

                    for (; index < outputShortArrayLength; index++) {
                        resultShort = CommonFunction.GetShort(firstAudioByteBuffer[index * 2],
                                firstAudioByteBuffer[index * 2 + 1], Variable.isBigEnding);

                        outputShortArray[index] = (short) (resultShort * firstAudioWeight);
                    }

                    audioOffset -= firstAudioReadNumber;

                    if (firstAudioReadNumber < 0) {
                        firstAudioFinish = true;
                        break;
                    }

                    if (audioOffset <= 0) {
                        break;
                    }
                }

                if (outputShortArrayLength > 0) {
                    int encodedSize = LameUtil.encode(outputShortArray, outputShortArray,
                            outputShortArrayLength, mp3Buffer);

                    if (encodedSize > 0) {
                        composeAudioOutputStream.write(mp3Buffer, 0, encodedSize);
                    }
                }
            }

            handler.post(new Runnable() {
                @Override
                public void run() {
                    if (composeAudioInterface != null) {
                        composeAudioInterface.updateComposeProgress(20);
                    }
                }
            });

            while (!firstAudioFinish || !secondAudioFinish) {
                index = 0;

                firstAudioReadNumber = firstAudioInputStream.read(firstAudioByteBuffer);
                secondAudioReadNumber = secondAudioInputStream.read(secondAudioByteBuffer);

                int minAudioReadNumber = Math.min(firstAudioReadNumber, secondAudioReadNumber);
                int maxAudioReadNumber = Math.max(firstAudioReadNumber, secondAudioReadNumber);

                if (firstAudioReadNumber < 0) {
                    firstAudioFinish = true;
                }

                if (secondAudioReadNumber < 0) {
                    secondAudioFinish = true;
                }

                int halfMinAudioReadNumber = minAudioReadNumber / 2;

                outputShortArrayLength = maxAudioReadNumber / 2;

                for (; index < halfMinAudioReadNumber; index++) {
                    resultShort = CommonFunction.WeightShort(firstAudioByteBuffer[index * 2],
                            firstAudioByteBuffer[index * 2 + 1], secondAudioByteBuffer[index * 2],
                            secondAudioByteBuffer[index * 2 + 1], firstAudioWeight,
                            secondAudioWeight, Variable.isBigEnding);

                    outputShortArray[index] = resultShort;
                }

                if (firstAudioReadNumber != secondAudioReadNumber) {
                    if (firstAudioReadNumber > secondAudioReadNumber) {
                        for (; index < outputShortArrayLength; index++) {
                            resultShort = CommonFunction.GetShort(firstAudioByteBuffer[index * 2],
                                    firstAudioByteBuffer[index * 2 + 1], Variable.isBigEnding);

                            outputShortArray[index] = (short) (resultShort * firstAudioWeight);
                        }
                    } else {
                        for (; index < outputShortArrayLength; index++) {
                            resultShort = CommonFunction.GetShort(secondAudioByteBuffer[index * 2],
                                    secondAudioByteBuffer[index * 2 + 1], Variable.isBigEnding);

                            outputShortArray[index] = (short) (resultShort * secondAudioWeight);
                        }
                    }
                }

                if (outputShortArrayLength > 0) {
                    int encodedSize = LameUtil.encode(outputShortArray, outputShortArray,
                            outputShortArrayLength, mp3Buffer);

                    if (encodedSize > 0) {
                        composeAudioOutputStream.write(mp3Buffer, 0, encodedSize);
                    }
                }
            }
        } catch (Exception e) {
            LogFunction.error("ComposeAudio异常", e);

            handler.post(new Runnable() {
                @Override
                public void run() {
                    if (composeAudioInterface != null) {
                        composeAudioInterface.composeFail();
                    }
                }
            });

            return;
        }

        handler.post(new Runnable() {
            @Override
            public void run() {
                if (composeAudioInterface != null) {
                    composeAudioInterface.updateComposeProgress(50);
                }
            }
        });

        try {
            final int flushResult = LameUtil.flush(mp3Buffer);

            if (flushResult > 0) {
                composeAudioOutputStream.write(mp3Buffer, 0, flushResult);
            }
        } catch (Exception e) {
            LogFunction.error("释放ComposeAudio LameUtil异常", e);
        } finally {
            try {
                composeAudioOutputStream.close();
            } catch (Exception e) {
                LogFunction.error("关闭合成输出音频流异常", e);
            }

            LameUtil.close();
        }

        if (deleteSource) {
            FileFunction.DeleteFile(firstAudioFilePath);
            FileFunction.DeleteFile(secondAudioFilePath);
        }

        try {
            firstAudioInputStream.close();
            secondAudioInputStream.close();
        } catch (IOException e) {
            LogFunction.error("关闭合成输入音频流异常", e);
        }

        handler.post(new Runnable() {
            @Override
            public void run() {
                if (composeAudioInterface != null) {
                    composeAudioInterface.composeSuccess();
                }
            }
        });
    }

　　ComposeAudio方法是此次的进行合成的具体代码实现，方法的传入参数中firstAudioFilePath, secondAudioFilePath是用以合成的音频文件地址，composeAudioFilePath用以指明合成后输出的MP3文件的存储地址，firstAudioWeight，secondAudioWeight分别用以指明合成的两个音频文件在合成过程中的音量权重，audioOffset用以指明第一个音频文件相对于第二个音频文件合成过程中的数据偏移，如为负数，则合成过程中先输出audioOffset个字节长度的第二个音频文件数据，如为正数，则合成过程中先输出audioOffset个字节长度的第一个音频文件数据，audioOffset在另一程度上也代表着时间的偏移，目前我们合成的两个音频文件参数为16位单通道44.1khz采样率，那么audioOffset如果为1*16/8*1*44100=88200字节，那么最终合成出的MP3文件中会先播放1s的第一个音频文件的音频接着再播放两个音频文件加和的音频。
　　整体合成代码是很清晰的，因为加入了时间偏移，所以合成过程中是有可能有一个文件先输出完的，在代码中针对性的进行处理即可，当然即使没有时间偏移也是可能出现类似情况的，比如音乐时长2分钟，录音3分钟，音乐输出结束后那就只应该输出录音音频了，另外在代码中将PCM数据编码为MP3文件使用了LAME的MP3编码库，除此以外代码中就没有比较复杂的模块了。

3.总结

　　至此，音频合成的流程我们算是走完了，希望读到此处的各位对音频合成的实现有了清晰的了解。
　　这篇博文就到这里结束了，本文所有代码已经托管到https://github.com/CrazyZty/ComposeAudio，大家可以自由下载。

以上是关于详解如何使用代码进行音频合成的主要内容，如果未能解决你的问题，请参考以下文章

详解如何使用代码进行音频合成

使用 MoMu STK 进行音频合成

如何在 iPad 上使用 HTML5/Javascript 合成音频

一次用ffmpeg实现图片+音频合成视频的开发

如何在iPad上使用HTML5 / Javascript合成音频

iOS - 如何将混合音频合成导出到文件大小最小的文件？