Friday, February 12, 2010

Alignment for Text mining result


First of all, this alignment is entirely irrelevant to the sequence alignment. Basically, it is the comparison between a few copies of a corpus and each copy is annotated by a tagger tool. The alignment result gives an objective view about the performance of each tagger.

IeXMLAlignment is a such tool which handles the job decently. It is currently available as a Java library and could be loaded with multi files and running alignment simultaneously.

For instance, CALBC could be annotated with Abner, Swissprot and Biolexicon. Then load these annotated corpora into IeXMLAlignment. The result comes out as plain text with the boundary information, terms frequency and Agreement/Disagreement between corpora.

This result is under consideration to be encoded as a standard format such as IeXML.

No comments: