
Posted 颜妮儿




P ( F ( x ) ≠ f ( x ) ) = ∑ k = 0 ⌊ T / 2 ⌋ ( T k ) ( 1 − ϵ ) k ϵ T − k ≤ e − 1 2 T ( 1 − 2 ϵ ) 2 P(F(\\boldsymbolx)\\ne f(\\boldsymbolx))=\\sum\\limits_k=0^\\lfloor T/2\\rfloor \\beginpmatrix T\\\\ k \\endpmatrix (1-\\epsilon)^k\\epsilon^T-k\\le e^-\\frac12T(1-2\\epsilon)^2 P(F(x)=f(x))=k=0T/2(Tk)(1ϵ)kϵTke21T(12ϵ)2
其中 T T T表示集成中个体分类器的个数; ϵ \\epsilon ϵ表示个体分类器的错误率; F ( x ) 和 f ( x ) F(\\boldsymbol x)和f(\\boldsymbolx) F(x)f(x)分别表示预测标签和真实标签。

当前个体分类器的权重: α = 1 2 ln ⁡ ( 1 − ϵ ϵ ) \\alpha=\\frac12\\ln\\left(\\frac1-\\epsilon\\epsilon\\right) α=21ln(ϵ1ϵ),由于当错误率 ϵ \\epsilon ϵ大于等于0.5时没有意义,所以 α > 0 \\alpha\\gt 0 α>0
更新错误样本的权重: D t + 1 ( x ) = D t ( x ) Z t × e − α f ( x ) h ( x ) D_t+1(\\boldsymbolx)=\\fracD_t(\\boldsymbolx)Z_t\\times e^-\\alpha f(\\boldsymbol x)h(\\boldsymbol x) Dt+1(x)=ZtDt(x)×eαf(x)h(x)(当 f ( x ) = h ( x ) f(\\boldsymbol x)=h(\\boldsymbol x) f(x)=h(x)时, f ( x ) h ( x ) = 1 f(\\boldsymbol x)h(\\boldsymbol x)=1 f(x)h(x)=1,否则为-1。当预测值和真实标签不同时,该样本权重为增大,否则会减少), Z t Z_t Zt是规范化因子,将权重缩放到0~1。


为什么要增大预测错误的样本权重?结合计算平均错误率公式: E ( f ; D ) = 1 m ∑ i = 1 m I ( f ( x i ) ≠ y i ) E(f;D)=\\frac1m\\sum\\limits_i=1^m\\mathbbI(f(\\boldsymbol x_i)\\ne y_i) E(f;D)=m1i=1mI(f(xi)=yi),如果样本不是均匀分布,例如样本属性值为 x k x_k xk的样本出现次数为 k k k,则该样本的出现概率表示为: p ( x k ) = k m p(\\boldsymbol x_k)=\\frackm p(xk)=mk,则平均错误率表示为: E ( f ; D ) = ∫ x ∼ D I ( f ( x i ) ≠ y i ) p ( x ) d x E(f;D)=\\int_\\boldsymbol x\\sim D \\mathbbI(f(\\boldsymbol x_i)\\ne y_i)p(\\boldsymbol x)dx E(f;D)=xDI(f(xi)=yi)p(x)dx,增大其样本权重相当于增加了该样本出现的频率,频率越大,则对该样本的分类效果越好。

为什么要给个体分类器设置权重?在集合后的分类根据每个个体分类器的预测结果投票获得最终结果,根据个体分类器的权重计算表达式, 1 − ϵ ϵ \\frac1-\\epsilon\\epsilon ϵ1ϵ表示分类器分类正确的几率,值越大则表示该分类器的效果越好,它在投票时的话语权也应该越高。


package adaboosting;

import java.io.FileReader;
import java.util.Arrays;

import weka.core.Instances;

public class WeightedInstances extends Instances 

	 * Just the require of some classes, any number is OK.
	private static final long serialVersionUID = 11087456L;

	 * Weights.
	private double[] weights;

	 * The first constructor.
	 * @param paraFileReader The given reader to read data from file.
	public WeightedInstances(FileReader paraFileReader) throws Exception 
		setClassIndex(numAttributes() - 1);

		// Initialize weights
		weights = new double[numInstances()];
		double tempAverage = 1.0 / numInstances();
		for (int i = 0; i < weights.length; i++) 
			weights[i] = tempAverage;
		 // Of for i
		System.out.println("Instances weights are: " + Arrays.toString(weights));
	// Of the first constructor.

	 * The second constructor.
	 * @param paraInstances The given instance.
	public WeightedInstances(Instances paraInstances) 
		setClassIndex(numAttributes() - 1);

		// Initialize weights.
		weights = new double[numInstances()];
		double tempAverage = 1.0 / numInstances();
		for (int i = 0; i < weights.length; i++) 
			weights[i] = tempAverage;
		 // Of for i
		System.out.println("Instances weights are: " + Arrays.toString(weights));
	// Of the second constructor

	 * Getter.
	 * @param paraIndex The given index.
	 * @return The weight of the given index.
	public double getWeight(int paraIndex) 
		return weights[paraIndex];
	// Of getWeight

	 * Adjust the weights.
	 * @param paraCorrectArray Indicate which instances have been correctly
	 *                         classified.
	 * @param paraAlpha        The weight of the last classifier.
	public void adjustWeights(boolean[] paraCorrectArray, double paraAlpha) 
		// Step 1. Calculate alpha.
		double tempIncrease = Math.exp(paraAlpha);

		// Step 2. Adjust
		double tempWeightsSum = 0;// For normalization
		for (int i = 0; i < weights.length; i++) 
			if (paraCorrectArray[i]) 
				weights[i] /= tempIncrease;
				weights[i] *= tempIncrease;
			 // Of if
			tempWeightsSum += weights[i];
		 // Of for i

		// Step 3. Normalize.
		for (int i = 0; i < weights.length; i++) 
			weights[i] /= tempWeightsSum;

		System.out.println("After adjusting, instances weights are: " + Arrays.toString(weights));
	// Of adjustWeights

	 * Test the method.
	public void adjustWeightsTest() 
		boolean[] tempCorrectArray = new boolean[numInstances()];
		for (int i = 0; i < tempCorrectArray.length; i++) 
			tempCorrectArray[i] = true;
		 // Of for i

		double tempWeightedError = 0.3;

		adjustWeights(tempCorrectArray, tempWeightedError);

		System.out.println("After adjusting");

	// Of adjustWeightsTest

	 * For display.
	public String toString() 
		String resultString = "I am a weighted Instances object.\\r\\n" + "I have " + numInstances() + " instances and "
				+ (numAttributes() - 1) + " conditional attributes.\\r\\n" + "My weights are: " + Arrays.toString(weights)
				+ "\\r\\n" + "My data are: " + super.toString();
		return resultString;
	// Of toString

	 * For unit test.
	 * @param args Not provided.




机器学习(十三) 集成学习和随机森林(下)

