fluent python 第二版,《java从入门到精通》

1介绍：嵌入式机器学习是一个很小的数据挖掘程序，用自己的算法调用Weka实现文本分类，实用价值并不大，但有助于Weka的理解和使用。本例为《数据挖掘：实用机器学习技术》第2版(如下文第3章所示。大家可以到http://blogger.org.cn/blog/message.ASP吗？ name=DMman#23691下载这本书查看算法的详细说明。在算法中做了详细的注释。虽然是英语，但还是很简单。已经简单介绍了例子的使用，感兴趣的人请研究一下。

2功能：使用weka的j48分类器实现了文本分类的小程序。文本文件由weka过滤器StringToWordVector预处理。

3注意：只有将weka.jar放在你的classpath中才能通过编译。

4使用方法：

命令行参数：

-t文本文件的路径

-m你的模型文件路径

-c选项，类别(高或低) )。

如果提供了-c，则将其用于培训；否则，将其分类为模型，并输出该文本的类型(hit或miss )

模型是动态创建的。第一次使用命令行时，必须指定-c参数才能创建模型。

1 )建立模型

jvamessageclassifier-tdata/1.BMP-mmy model-chit

可以看到myModel成立了。然后继续训练这个模型。的文本实例越多，模型的分类性能越好

jvamessageclassifier-tdata/2.BMP-mmy model-chit

javamessageclassifier-tdata/1.gif-mmy model-cmiss

2 )使用模型分类

如果有模型，可以用它对文本文件进行分类。例如

javamessageclassifier-tdata/2.gif-mmy模型

3 )可以使用提供-c参数的命令继续改进模型

原始文件MessageClassifier .java/**

* javaprogramforclassifyingtextmessagesintotwoclasses。

导入WEKA.core.attribute；

import weka.core.Instance；

import weka.core.Instances；

import weka.core.FastVector；

import weka.core.Utils；

import WEKA.classifiers.classifier；

import WEKA.classifiers.trees.j48；

import weka.filters.Filter；

import WEKA.filters.unsupervised.attribute.stringtowordvector；

import java.io.*；

publicclassmessageclassifierimplementsserializable {

/* thetrainingdatagatheredsofar.* /

私有实例m _ data=null；

/* thefilterusedtogeneratethewordcounts.* /

privatestringtowordvectorm _ filter=newstringtowordvector (；

/* The actual classifier. */

privateclassifierm _ classifier=newj 48 (；

/* Whether the model is up to date. */

私密布尔型m _ upto date；

//*

*构造模板数据集。

公共消息分类器() throws Exception {

stringnameofdataset=' messageclassificationproblem '；

//创建向量of attributes。

fastvectorattributes=newfastvector (2；

//addattributeforholdingmessages。

attributes.addelement (new attribute (' message )，)快速向量(null )；

// Add class attribute.

FastVector classValues = new FastVector(2);

classValues.addElement("miss");

classValues.addElement("hit");

attributes.addElement(new Attribute("Class", classValues));

// Create dataset with initial capacity of 100, and set index of class.

m_Data = new Instances(nameOfDataset, attributes, 100);

m_Data.setClassIndex(m_Data.numAttributes() - 1);

}

/**

* Updates data using the given training message.

public void updateData(String message, String classValue) throws Exception {

// Make message into instance.

Instance instance = makeInstance(message, m_Data);

// Set class value for instance.

instance.setClassValue(classValue);

// Add instance to training data.

m_Data.add(instance);

m_UpToDate = false;

}

/**

* Classifies a given message.

public void classifyMessage(String message) throws Exception {

// Check whether classifier has been built.

if (m_Data.numInstances() == 0) {

throw new Exception("No classifier available.");

}

// Check whether classifier and filter are up to date.

if (!m_UpToDate) {

// Initialize filter and tell it about the input format.

m_Filter.setInputFormat(m_Data);

// Generate word counts from the training data.

Instances filteredData = Filter.useFilter(m_Data, m_Filter);

// Rebuild classifier.

m_Classifier.buildClassifier(filteredData);

m_UpToDate = true;

}

// Make separate little test set so that message

// does not get added to string attribute in m_Data.

Instances testset = m_Data.stringFreeStructure();

// Make message into test instance.

Instance instance = makeInstance(message, testset);

// Filter instance.

m_Filter.input(instance);

Instance filteredInstance = m_Filter.output();

// Get index of predicted class value.

double predicted = m_Classifier.classifyInstance(filteredInstance);

// Output class value.

System.err.println("Message classified as : " +

m_Data.classAttribute().value((int)predicted));

}

/**

* Method that converts a text message into an instance.

private Instance makeInstance(String text, Instances data) {

// Create instance of length two.

Instance instance = new Instance(2);

// Set value for message attribute

Attribute messageAtt = data.attribute("Message");

instance.setValue(messageAtt, messageAtt.addStringValue(text));

// Give instance access to attribute information from the dataset.

instance.setDataset(data);

return instance;

}

/**

* Main method.

public static void main(String[] options) {

try {

// Read message file into string.

String messageName = Utils.getOption('t', options);

if (messageName.length() == 0) {

throw new Exception("Must provide name of message file.");

}

FileReader m = new FileReader(messageName);

StringBuffer message = new StringBuffer(); int l;

while ((l = m.read()) != -1) {

message.append((char)l);

}

m.close();

// Check if class value is given.

String classValue = Utils.getOption('c', options);

// If model file exists, read it, otherwise create new one.

String modelName = Utils.getOption('m', options);

if (modelName.length() == 0) {

throw new Exception("Must provide name of model file.");

}

MessageClassifier messageCl;

try {

ObjectInputStream modelInObjectFile =

new ObjectInputStream(new FileInputStream(modelName));

messageCl = (MessageClassifier) modelInObjectFile.readObject();

modelInObjectFile.close();

} catch (FileNotFoundException e) {

messageCl = new MessageClassifier();

}

// Check if there are any options left

Utils.checkForRemainingOptions(options);

// Process message.

if (classValue.length() != 0) {

messageCl.updateData(message.toString(), classValue);

} else {

messageCl.classifyMessage(message.toString());

}

// Save message classifier object.

ObjectOutputStream modelOutObjectFile =

new ObjectOutputStream(new FileOutputStream(modelName));

modelOutObjectFile.writeObject(messageCl);

modelOutObjectFile.close();

} catch (Exception e) {

e.printStackTrace();

}