这里有一份数据挖掘技术方面的习题,是英文的,其中一部分希望有高手帮忙翻译一下,追加高分!!万分感谢!!! 1 Data Pre-processing (only show a small portion to illustrate) 1.1 k-mers extraction 1.2 generation of position frequency matrices 1.3 background probability feature 1.4 relative mismatch score feature 2 Generation of Training and Testing Datasets (only show a small portion to illustrate) 2.1 pattern pairs for building the learner models, i.e., (k-mers features, class label) 2.2 pattern pairs for testing the learner models on the training dataset 2.3 pattern pairs for testing the learner models on the test dataset 2.4 Discuss about the difference between the datasets given in 2.1 and 2.2 3 Neural Networks Approach 3.1 neural networks architecture, and learning related parameters setting 3.2 display the learning curve, i.e., the Figure produced by Matlab tool 3.3 confusion matrix, recall, precision, F-measure and misclassification rates for both the training dataset and the testing dataset 3.4 results visualization (i.e., clearly highlight the predicted binding sites and true binding sites in the given DNA sequences). 4 Discussion and Conclusion 4.1 effect of the size of negative training examples on system performance 4.2 merits and shortcomings of the adopted approach 4.3 conclusions from the results and experience 4.4 suggestions and further research 这是习题的最后要求的报告格式,要求按照这个格式结构来写,但是我不明白具体都是什么意思。求高手
1 Data Pre-processing (only show a small portion to illustrate) 数据预处理(只显示一小部分说明) 1.1 k-mers extraction k-mers提取 1.2 generation of position frequency matrices 一代的位置频率矩阵 1.3 background probability feature 背景的概率特征 1.4 relative mismatch score feature 得分相对匹配特征 2 Generation of Training and Testing Datasets (only show a small portion to illustrate) 一代的训练和测试数据集(只显示一小部分说明) 2.1 pattern pairs for building the learner models, i.e., (k-mers features, class label) 对来建设模式,即学习者模型(k-mers特征、类标号) 2.2 pattern pairs for testing the learner models on the training dataset 测试模式对学习者在训练数据集模型 2.3 pattern pairs for testing the learner models on the test dataset 测试模式对学习者在测试数据集模型 2.4 Discuss about the difference between the datasets given in 2.1 and 2.2 讨论了数据集之间的差异,给出了2.1和2.2 3 Neural Networks Approach 神经网络方法 3.1 neural networks architecture, and learning related parameters setting 神经网络体系结构,学习相关参数的设置 3.2 display the learning curve, i.e., the Figure produced by Matlab tool 显示的学习曲线,即图产生用Matlab工具 3.3 confusion matrix, recall, precision, F-measure and misclassification rates for both the training dataset and the testing dataset 混合矩阵,回忆、精度、F-measure率和分类 两个训练数据集和测试数据 3.4 results visualization (i.e., clearly highlight the predicted binding sites and true binding sites in the given DNA sequences). 结果可视化(例如,明确突出预测结合位点和真正的结合位点在给定的DNA序列的)。 4 Discussion and Conclusion 讨论和结论 4.1 effect of the size of negative training examples on system performance 大小的负面影响训练系统性能。实例 4.2 merits and shortcomings of the adopted approach 所采用的优缺点的方法 4.3 conclusions from the results and experience 结论实验结果和经验 4.4 suggestions and further research 建议和进一步的研究 1前处理的数据(只显示一小部分来说明) 1.1 K -聚体的提取 1.2代的位置频率矩阵 1.3背景的概率特征 1.4相对不匹配得分功能 2代的训练和测试数据集(只显示一小部分来说明) 2.1模式对学习者模型,即(K -聚体的功能,类标签)建设 2.2模式对训练集上进行测试学习者模型 2.3模式对学习者模型进行测试的测试数据集上 2.4讨论的2.1和2.2中的数据集之间的差异 3神经网络方法 3.1神经网络架构,并学习相关的参数设置 3.2显示学习曲线,即由MATLAB工具产生的数字 3.3混淆矩阵,召回,精密,F -测量和分类错误率 训练集和测试数据集 3.4结果的可视化(即,明确突出预测的结合位点,真正在给定的DNA序列的结合位点)。 4讨论和结论 4.1大小对系统性能的负面培训例子的影响 4.2采用的方法的优点和缺点 4.3结论从结果和经验 4.4建议,并进一步研究参考技术A1 Data Pre-processing (only show a small portion to illustrate) 数据预处理(只显示一小部分说明) 1.1 k-mers extraction k-mers提取 1.2 generation of position frequency matrices 一代的位置频率矩阵 1.3 background probability feature 背景的概率特征 1.4 relative mismatch score feature 得分相对匹配特征 2 Generation of Training and Testing Datasets (only show a small portion to illustrate) 一代的训练和测试数据集(只显示一小部分说明) 2.1 pattern pairs for building the learner models, i.e., (k-mers features, class label) 对来建设模式,即学习者模型(k-mers特征、类标号) 2.2 pattern pairs for testing the learner models on the training dataset 测试模式对学习者在训练数据集模型 2.3 pattern pairs for testing the learner models on the test dataset 测试模式对学习者在测试数据集模型 2.4 Discuss about the difference between the datasets given in 2.1 and 2.2 讨论了数据集之间的差异,给出了2.1和2.2 3 Neural Networks Approach 神经网络方法 3.1 neural networks architecture, and learning related parameters setting 神经网络体系结构,学习相关参数的设置 3.2 display the learning curve, i.e., the Figure produced by Matlab tool 显示的学习曲线,即图产生用Matlab工具 3.3 confusion matrix, recall, precision, F-measure and misclassification rates for both the training dataset and the testing dataset 混合矩阵,回忆、精度、F-measure率和分类 两个训练数据集和测试数据 3.4 results visualization (i.e., clearly highlight the predicted binding sites and true binding sites in the given DNA sequences). 结果可视化(例如,明确突出预测结合位点和真正的结合位点在给定的DNA序列的)。 4 Discussion and Conclusion 讨论和结论 4.1 effect of the size of negative training examples on system performance 大小的负面影响训练系统性能。实例 4.2 merits and shortcomings of the adopted approach 所采用的优缺点的方法 4.3 conclusions from the results and experience 结论实验结果和经验 4.4 suggestions and further research 建议和进一步的研究参考技术B1前处理的数据(只显示一小部分来说明) 1.1 K -聚体的提取 1.2代的位置频率矩阵 1.3背景的概率特征 1.4相对不匹配得分功能 2代的训练和测试数据集(只显示一小部分来说明) 2.1模式对学习者模型,即(K -聚体的功能,类标签)建设 2.2模式对训练集上进行测试学习者模型 2.3模式对学习者模型进行测试的测试数据集上 2.4讨论的2.1和2.2中的数据集之间的差异 3神经网络方法 3.1神经网络架构,并学习相关的参数设置 3.2显示学习曲线,即由MATLAB工具产生的数字 3.3混淆矩阵,召回,精密,F -测量和分类错误率 训练集和测试数据集 3.4结果的可视化(即,明确突出预测的结合位点,真正在给定的DNA序列的结合位点)。 4讨论和结论 4.1大小对系统性能的负面培训例子的影响 4.2采用的方法的优点和缺点 4.3结论从结果和经验 4.4建议,并进一步研究