Weka 中的不同分类结果:GUI vs Java 库

Posted

技术标签:

【中文标题】Weka 中的不同分类结果:GUI vs Java 库【英文标题】:Different classification results in Weka: GUI vs Java library 【发布时间】:2015-06-15 19:51:36 【问题描述】:

在将 Weka GUI 分类结果与我的 Java 程序进行比较时遇到一些问题,使用 iris 数据集执行树 (J48)。如果您能帮助我,我将不胜感激。

我正在使用 iris 数据集,并且正在尝试开发一个 Java 程序来对新实例进行分类。为此,我使用 Weka GUI 获得了一个模型(“iris_tree(CV).model”),该模型经过了训练和验证(交叉验证了 10 次)。 Weka GUI 的结果很好并且符合预期:4 个错误分类的实例。之后,我保存模型以供我的 Java 程序稍后使用。

当我在我的 Java 程序中加载模型“iris_tree(CV).model”并尝试对新实例(测试数据集)进行分类时,结果不同:Java 程序很好地分类了“setosa”和“virginica” ,但不是“杂色”。结果如下:

Classification: setosa
Classification: setosa
Classification: virginica
Classification: virginica
Classification: virginica
Classification: virginica

当我期望得到:

Classification: setosa
Classification: setosa
Classification: versicolour
Classification: versicolour
Classification: virginica
Classification: virginica

我已经阅读了一些相关的帖子,但是在使用 Java 而不是 Weka GUI 时,我找不到对这种奇怪行为的明确回应。

我将 Java 代码附加到 2 个类中,然后是训练和测试集。提前致谢。

主类:

public static void main(String[] args) 

    try 


        Hashtable<String, String> values = new Hashtable<String, String>();

        //Loading the model
        String pathModel="";
        String pathTestSet="";
        JFileChooser chooserModel = new JFileChooser();
        chooserModel.setCurrentDirectory(new java.io.File("."));
        chooserModel.setDialogTitle("HoliDes: choose the model");
        chooserModel.setFileSelectionMode(JFileChooser.FILES_AND_DIRECTORIES);
        chooserModel.setAcceptAllFileFilterUsed(true);

        if (chooserModel.showOpenDialog(null) == JFileChooser.APPROVE_OPTION) 
            File filePathModel=chooserModel.getSelectedFile();
            pathModel=filePathModel.getPath();

            State irisModel = new State(pathModel);

            //Loading the model
            JFileChooser chooserTestSet = new JFileChooser();
            chooserTestSet.setDialogTitle("HoliDes: choose TEST SET");
            chooserTestSet.setFileSelectionMode(JFileChooser.FILES_AND_DIRECTORIES);
            chooserTestSet.setAcceptAllFileFilterUsed(true);

            //Loading the testing dataset
            if (chooserTestSet.showOpenDialog(null) == JFileChooser.APPROVE_OPTION) 
                File filePathTestSet=chooserTestSet.getSelectedFile();
                pathTestSet=filePathTestSet.getPath();

                //Transforming the data set into pairs attribute-value
                ConverterUtils.DataSource unlabeledSource = new ConverterUtils.DataSource(pathTestSet);
                Instances unlabeledData = unlabeledSource.getDataSet();
                if (unlabeledData.classIndex() == -1)
                    unlabeledData.setClassIndex(unlabeledData.numAttributes() - 1);
                

                for (int i = 0; i < unlabeledData.numInstances(); i++) 
                    Instance ins=unlabeledData.instance(i);

                    for (int j = 0; j < ins.numAttributes(); j++) 

                        String attrib=ins.attribute(j).name();
                        double val=ins.value(ins.attribute(j));

                        values.put(attrib,String.valueOf(val));

                    

                    System.out.println("Classification: " + irisModel.classifySpecies(values,pathModel));

                

            

        

     catch (Exception ex) 
        Logger.getLogger(PilotPatternClassifier.class.getName()).log(Level.SEVERE, null, ex);
    


还有 State 类:

public class State 

    //private String classModelFile = "/iris_tree.model";    
    private Classifier classModel;
    private Instances dataModel;

    /**
     *  Class constructor.
     */
    public State(String pathModel) throws Exception 
            //InputStream classModelStream;
            //  Create a stream object for the model file embedded within the JAR file.
            //classModelStream = getClass().getResourceAsStream(classModelFile);
            classModel=(Classifier) weka.core.SerializationHelper.read(pathModel);
    

    /**
     *  Close the instance by setting both the model file string and
     *  the model object itself to null.  When the garbage collector
     *  runs, this should make clean up simpler.  However, the garbage
     *  collector is not called synchronously since that should be
     *  managed by the larger execution environment.
     */
    public void close() 
            classModel = null;
            //classModelFile=null;
    

    /**
     * Evaluate the model on the data provided by @param measures.
     * This returns a string with the species name.
     *
     * @param measures object with petal and sepal measurements
     * @return string with the species name
     * @throws Exception
     */
    public String classifySpecies(Dictionary<String, String> measures, String pathTestSet) throws Exception 
            FastVector dataClasses = new FastVector();
            FastVector dataAttribs = new FastVector();
            Attribute species;
            double values[] = new double[measures.size() + 1];
            int i = 0, maxIndex = 0;

            //  Assemble the potential species options.
            dataClasses.addElement("setosa");
            dataClasses.addElement("versicolour");
            dataClasses.addElement("virginica");
            species = new Attribute("species", dataClasses);

            //  Create the object to classify on.
            for (Enumeration<String> keys = measures.keys(); keys.hasMoreElements(); ) 

                    String key = keys.nextElement();
                    double val = Double.parseDouble(measures.get(key));         
                    dataAttribs.addElement(new Attribute(key));

                    values[i++] = val;

            

            dataAttribs.addElement(species);
            dataModel = new Instances("iris-test", dataAttribs, 0);//"classify" is the name of the relationship of the test file. It is arbitrary
            dataModel.setClass(species);

            Instance ins=new DenseInstance(1, values);
            //dataModel.add(new Instance(1, values) );            
            dataModel.add(ins);            
            dataModel.instance(0).setClassMissing();

            //  Find the class with the highest estimated likelihood
            double cl[] = classModel.distributionForInstance(dataModel.instance(0));
            for(i = 0; i < cl.length; i++)
                if(cl[i] > cl[maxIndex])
                        maxIndex = i;
                
            
            return dataModel.classAttribute().value(maxIndex);


    



这里是训练和测试集:

@RELATION iris-train

@ATTRIBUTE sepallength  REAL
@ATTRIBUTE sepalwidth   REAL
@ATTRIBUTE petallength  REAL
@ATTRIBUTE petalwidth   REAL
@ATTRIBUTE species  setosa,versicolour,virginica

@DATA
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
5.0,3.6,1.4,0.2,setosa
5.4,3.9,1.7,0.4,setosa
4.6,3.4,1.4,0.3,setosa
5.0,3.4,1.5,0.2,setosa
4.4,2.9,1.4,0.2,setosa
4.9,3.1,1.5,0.1,setosa
5.4,3.7,1.5,0.2,setosa
4.8,3.4,1.6,0.2,setosa
4.8,3.0,1.4,0.1,setosa
4.3,3.0,1.1,0.1,setosa
5.8,4.0,1.2,0.2,setosa
5.7,4.4,1.5,0.4,setosa
5.4,3.9,1.3,0.4,setosa
5.1,3.5,1.4,0.3,setosa
5.7,3.8,1.7,0.3,setosa
5.1,3.8,1.5,0.3,setosa
5.4,3.4,1.7,0.2,setosa
5.1,3.7,1.5,0.4,setosa
4.6,3.6,1.0,0.2,setosa
5.1,3.3,1.7,0.5,setosa
4.8,3.4,1.9,0.2,setosa
5.0,3.0,1.6,0.2,setosa
5.0,3.4,1.6,0.4,setosa
5.2,3.5,1.5,0.2,setosa
5.2,3.4,1.4,0.2,setosa
4.7,3.2,1.6,0.2,setosa
4.8,3.1,1.6,0.2,setosa
5.4,3.4,1.5,0.4,setosa
5.2,4.1,1.5,0.1,setosa
5.5,4.2,1.4,0.2,setosa
4.9,3.1,1.5,0.1,setosa
5.0,3.2,1.2,0.2,setosa
5.5,3.5,1.3,0.2,setosa
4.9,3.1,1.5,0.1,setosa
4.4,3.0,1.3,0.2,setosa
5.1,3.4,1.5,0.2,setosa
5.0,3.5,1.3,0.3,setosa
4.5,2.3,1.3,0.3,setosa
4.4,3.2,1.3,0.2,setosa
5.0,3.5,1.6,0.6,setosa
5.1,3.8,1.9,0.4,setosa
4.8,3.0,1.4,0.3,setosa
5.1,3.8,1.6,0.2,setosa
4.6,3.2,1.4,0.2,setosa
5.3,3.7,1.5,0.2,setosa
5.0,3.3,1.4,0.2,setosa
7.0,3.2,4.7,1.4,versicolour
6.4,3.2,4.5,1.5,versicolour
6.9,3.1,4.9,1.5,versicolour
5.5,2.3,4.0,1.3,versicolour
6.5,2.8,4.6,1.5,versicolour
5.7,2.8,4.5,1.3,versicolour
6.3,3.3,4.7,1.6,versicolour
4.9,2.4,3.3,1.0,versicolour
6.6,2.9,4.6,1.3,versicolour
5.2,2.7,3.9,1.4,versicolour
5.0,2.0,3.5,1.0,versicolour
5.9,3.0,4.2,1.5,versicolour
6.0,2.2,4.0,1.0,versicolour
6.1,2.9,4.7,1.4,versicolour
5.6,2.9,3.6,1.3,versicolour
6.7,3.1,4.4,1.4,versicolour
5.6,3.0,4.5,1.5,versicolour
5.8,2.7,4.1,1.0,versicolour
6.2,2.2,4.5,1.5,versicolour
5.6,2.5,3.9,1.1,versicolour
5.9,3.2,4.8,1.8,versicolour
6.1,2.8,4.0,1.3,versicolour
6.3,2.5,4.9,1.5,versicolour
6.1,2.8,4.7,1.2,versicolour
6.4,2.9,4.3,1.3,versicolour
6.6,3.0,4.4,1.4,versicolour
6.8,2.8,4.8,1.4,versicolour
6.7,3.0,5.0,1.7,versicolour
6.0,2.9,4.5,1.5,versicolour
5.7,2.6,3.5,1.0,versicolour
5.5,2.4,3.8,1.1,versicolour
5.5,2.4,3.7,1.0,versicolour
5.8,2.7,3.9,1.2,versicolour
6.0,2.7,5.1,1.6,versicolour
5.4,3.0,4.5,1.5,versicolour
6.0,3.4,4.5,1.6,versicolour
6.7,3.1,4.7,1.5,versicolour
6.3,2.3,4.4,1.3,versicolour
5.6,3.0,4.1,1.3,versicolour
5.5,2.5,4.0,1.3,versicolour
5.5,2.6,4.4,1.2,versicolour
6.1,3.0,4.6,1.4,versicolour
5.8,2.6,4.0,1.2,versicolour
5.0,2.3,3.3,1.0,versicolour
5.6,2.7,4.2,1.3,versicolour
5.7,3.0,4.2,1.2,versicolour
5.7,2.9,4.2,1.3,versicolour
6.2,2.9,4.3,1.3,versicolour
5.1,2.5,3.0,1.1,versicolour
5.7,2.8,4.1,1.3,versicolour
6.3,3.3,6.0,2.5,virginica
5.8,2.7,5.1,1.9,virginica
7.1,3.0,5.9,2.1,virginica
6.3,2.9,5.6,1.8,virginica
6.5,3.0,5.8,2.2,virginica
7.6,3.0,6.6,2.1,virginica
4.9,2.5,4.5,1.7,virginica
7.3,2.9,6.3,1.8,virginica
6.7,2.5,5.8,1.8,virginica
7.2,3.6,6.1,2.5,virginica
6.5,3.2,5.1,2.0,virginica
6.4,2.7,5.3,1.9,virginica
6.8,3.0,5.5,2.1,virginica
5.7,2.5,5.0,2.0,virginica
5.8,2.8,5.1,2.4,virginica
6.4,3.2,5.3,2.3,virginica
6.5,3.0,5.5,1.8,virginica
7.7,3.8,6.7,2.2,virginica
7.7,2.6,6.9,2.3,virginica
6.0,2.2,5.0,1.5,virginica
6.9,3.2,5.7,2.3,virginica
5.6,2.8,4.9,2.0,virginica
7.7,2.8,6.7,2.0,virginica
6.3,2.7,4.9,1.8,virginica
6.7,3.3,5.7,2.1,virginica
7.2,3.2,6.0,1.8,virginica
6.2,2.8,4.8,1.8,virginica
6.1,3.0,4.9,1.8,virginica
6.4,2.8,5.6,2.1,virginica
7.2,3.0,5.8,1.6,virginica
7.4,2.8,6.1,1.9,virginica
7.9,3.8,6.4,2.0,virginica
6.4,2.8,5.6,2.2,virginica
6.3,2.8,5.1,1.5,virginica
6.1,2.6,5.6,1.4,virginica
7.7,3.0,6.1,2.3,virginica
6.3,3.4,5.6,2.4,virginica
6.4,3.1,5.5,1.8,virginica
6.0,3.0,4.8,1.8,virginica
6.9,3.1,5.4,2.1,virginica
6.7,3.1,5.6,2.4,virginica
6.9,3.1,5.1,2.3,virginica
5.8,2.7,5.1,1.9,virginica
6.8,3.2,5.9,2.3,virginica
6.7,3.3,5.7,2.5,virginica
6.7,3.0,5.2,2.3,virginica
6.3,2.5,5.0,1.9,virginica
6.5,3.0,5.2,2.0,virginica
6.2,3.4,5.4,2.3,virginica
5.9,3.0,5.1,1.8,virginica

@RELATION iris-test

@ATTRIBUTE sepallength  REAL
@ATTRIBUTE sepalwidth   REAL
@ATTRIBUTE petallength  REAL
@ATTRIBUTE petalwidth   REAL

@DATA
5.1,3.5,1.4,0.2
4.9,3.0,1.4,0.2
6.6,3.0,4.4,1.4
6.8,2.8,4.8,1.4
6.4,3.1,5.5,1.8
6.0,3.0,4.8,1.8

非常感谢您的帮助。

【问题讨论】:

可能是您阅读模型的方式?你试过***.com/questions/22201949/… 是的,就是这样:classModel=(Classifier) weka.core.SerializationHelper.read(pathModel); 分类器 classModel=(分类器) weka.core.SerializationHelper.read(pathModel); 并且pathModel是“D:\Users\106811\Desktop\iris_tree(CV).model” 【参考方案1】:

我认为,将分类器模型应用于测试集时,准确度低于使用训练集特征文件进行检查时的准确度是正常的。尝试在这个测试集上使用 Weka GUI,也许你会得到相同的结果。这不是 GUI 与 Java 的问题

我会将此作为评论,但由于缺乏声誉而无法评论。

【讨论】:

Ignored Class Unknown Instances: 6. 请注意,我已经修改了我的测试文件,包括标题中的类,并添加了“?”在数据中。例如:6.0,3.0,4.8,1.8,?

以上是关于Weka 中的不同分类结果:GUI vs Java 库的主要内容,如果未能解决你的问题,请参考以下文章

通过 python api 在 Weka GUI 和 Weka 中得到不同的结果

Weka 小数精度

哪个更快,使用 weka gui 或实现 weka java 代码?

Weka 中 SMO、NaiveBayes 和 BayesNet 分类器的不同结果

Java:如何坚持 Weka 朴素贝叶斯分类器?

相同的决策树,不同的结果