BANNER Named Entity Recognition System

BANNER is a named entity recognition system, primarily intended for biomedical text. It is a machine-learning system based on conditional random fields and contains a wide survey of the best features in recent literature on biomedical named entity recognition (NER). BANNER is portable and is designed to maximize domain independence by not employing semantic features or rule-based processing steps. It is therefore useful to developers as an extensible NER implementation, to researchers as a standard for comparing innovative techniques, and to biologists requiring the ability to find novel entities in large amounts of text.

BANNER is released under the Common Public License 1.0 and can be tested online or downloaded from the releases page, though the CVS repository contains more recent updates. BANNER is constantly being improved, so you may wish to subscribe to the updates mailing list to be notified of new releases and bug fixes.

BANNER was written and is maintained by Bob Leaman and is advised by Dr. Graciela Gonzalez, both members of the BioAI lab at Arizona State University.

We have compared BANNER to the official results of the Second BioCreative Challenge Evaluation. Following the protocol for the BioCreative 2 gene mention task, BANNER was trained for this evaluation on the entire BioCreative 2 gene mention task training set and tested on the test set, both of which can be downloaded from the BioCreative 2 datasets. A difference of 1.23 or more f-measure is statistically significant, a difference of 0.35 or less is not (p < 0.05). Both the version of BANNER as published and the current version of BANNER, which includes a dictionary feature and is available in the CVS repository, are listed.

System or author Rank at BioCreative 2 Precision (%) Recall (%) F-Measure
Ando 1 88.48 85.97 87.21
BANNER (current) - 88.66 84.32 86.43
BANNER (as published) - 87.18 82.78 84.92
Vlachos 9 86.28 79.66 82.84
Baumgartner et. al. 11 (median) 85.54 76.83 80.95
NERBio 13 92.67 68.91 79.05

We have also compared BANNER to ABNER and LingPipe (existing freely-available software), on two different corpora, using 5x2 cross-validation.

Corpus: BioCreative 2 gene mention task (Training) BioText disease/treatment (diseases only)
System Precision (%) Recall (%) F-Measure Precision (%) Recall (%) F-Measure
BANNER 85.09 79.06 81.96 68.89 45.55 54.84
ABNER 83.21 73.94 78.30 66.08 44.86 53.44
LingPipe 60.34 70.32 64.95 55.41 47.50 51.15

BANNER is the subject of the following paper:
Leaman, R. & Gonzalez G. (2008) BANNER: An executable survey of advances in biomedical named entity recognition. Pacific Symposium on Biocomputing 13:652-663(2008)

Thanks to SourceForge for providing hosting: Logo