AI机器学习算法：语音分离时间序列与概率统计模型

Posted 2021-05-02 胖鱼CS

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了AI机器学习算法：语音分离时间序列与概率统计模型相关的知识，希望对你有一定的参考价值。

“ 佛罗里达大学生物系与佛罗里达州政府部门对附近河流交汇口进行海牛叫声的探测和识别来进行生物方面的物种统计和分析。

方法为在水下放置声呐来提取声音进而筛选出海牛叫声，但由于海底环境较为复杂，多种动植物和水流均产生噪音，直接收集到的音频数据包含许多噪声。由专家在原始音频中人工分辨海牛叫声结果会整体相对准确，但也会存在误判的偏差，并且时间和人工成本极高。因此，随着人工智能(AI)技术的发展，专家产生了使用智能算法来实现自动在原始噪声音频文件中预测海牛叫声的想法

经过研究和实验，发现机器学习和概率统计模型方法可行并且在随机取样的有限样本中测试效果极好。此算法具有通用性，可用于许多音频去噪和恢复模糊音频的目的。”

The manatee call detection and prediction may not that interesting and exciting to most of people who are not biologist.

对于大多数不是生物学家的人来说，海牛叫声的探测可能不是那么有趣和令人兴奋。

Another Interesting Scenario: Gaming of Agent and Spy

(另一个有趣的场景：特工与间谍的较量)

Assume a group of agents from an intelligence agency or national security agency is monitoring an experienced spy from some hidden microphones in his bathroom. The spy is clever and always turn the water on to generate noise while reporting the significant finding to his institution. For the national security agency, the sound received from the microphones are not clear to figure out what important secret the spy is reporting. Any ideas? Using machine learning algorithm to denoise the sound so that the signal only containing the voice of the spy.

假设一群来自情报机构或安全局的特工正在监视一名经验丰富的间谍，从他浴室里隐藏的麦克风里监视他。这个间谍很聪明，在向他的机构报告重大发现时总是把水打开产生噪音来避免监听。对于安全部门来说，从麦克风接收到的声音尚不清楚这名间谍在报告什么重要机密由于声音嘈杂和模糊。有什么好主意吗?利用机器学习算法降噪和删除杂音，使得监听信号只包含间谍的声音从而清晰的听到间谍在汇报什么。

Since the voice and sound data are stored as array or vector in most of the electronic devices, they are time series data. A 10 second audio file may containing 1000 data points. In this case, the 100th data point and data point near it would approximately represent the audio information at 1 second and so on.

由于语音和声音数据以数组或向量的形式存储在大多数电子设备中，所以它们是时间序列数据。一个10秒的音频文件可能包含1000个数据点。在这种情况下，第100个数据点和它附近的数据点大约代表1秒时的音频信息，以此类推。

The method to predict and detect a signal from a noise signal is to train two models, one for noise and one for the singal. Extracting the feature of noise and the singal, then applying both models parallel on the noise signal. Using smoothing method as well as statistical probability model to predict which data point belongs to which model. Then for the data points that are predicted to the signal model, using the noise signal eliminate the noise singal would be the pure signal.

从噪声信号中预测和检测信号的方法是训练两个模型，一个是噪声模型，另一个是信号模型。提取噪声和信号的特征，然后将两种模型并行应用于噪声信号。采用平滑法和统计概率模型预测数据点属于哪个模型。然后对于预测到信号模型的数据点，利用噪声信号消除噪声信号即为纯信号。

The training model and parallel applying two model on the test signal is using machine learning linear regression algorithm. Multiple techniques can be applied to train the model including Least Mean Square method, Wiener Filter solution, Recusive Least Square method, Kernel LMS and QKLMS models etc. The core concept is still the gradient descent. However, the gradient descent parameters are affacted by the length of the time series. A window size needs to be chosen in order to contain the time series info since the audio is changing as the time increasing. A short or long history record could help to adjust the gradient descent procedure to get a better performance.

训练模型和对测试信号并行应用两个模型采用机器学习线性回归算法。应用最小均方法、维纳滤波解、递归最小二乘法、核最小二乘模型、QKLMS模型等多种方法对模型进行训练。核心概念仍然是梯度下降。然而，梯度下降参数受到时间序列长度的影响。为了包含时间序列信息，需要选择一个窗口大小，因为音频会随着时间的增加而变化。短或长的历史数据可以用来帮助调整梯度下降从而产生更好的表现。

Since a great machine learning project is always complicated and needs a huge of mathematically theory explanation as well as the experiment and professonal performance analysis, an original 7 pages IEEE format paper is presented here to explain how does the model work, why the algorithm would work, what experiments are designed and implemented and the their results：

https://github.com/zhengyul9/Manatee-Calls-Detection-Machine-Learning

由于一个好的机器学习项目总是很复杂，需要大量的数学理论解释以及实验和专业的性能分析方法，这里给出了一篇7页原创的IEEE格式的论文来解释这个模型是如何运作的，为什么这个算法会有效，设计和进行了什么实验和实验的结果。

Preview: (预览)

以上是关于AI机器学习算法：语音分离时间序列与概率统计模型的主要内容，如果未能解决你的问题，请参考以下文章