The StringToWordVector filter assumes that the document text is stored in an attribute
of type String—a nominal attribute without a prespecified set of values. In the filtered
data, this is replaced by a fixed set of numeric attributes, and the class attribute
is put at the beginning, as the first attribute.
To perform document classification, first create an ARFF file with a string attribute
that holds the document’s text—declared in the header of the ARFF file using
@attribute document string, where document is the name of the attribute. A nominal
attribute is also needed to hold the document’s classification.