MBT is a memory-based tagger-generator and tagger in one. The tagger-generator part can generate a sequence tagger on the basis of a training set of tagged sequences; the tagger part can tag new sequences. MBT can, for instance, be used to generate part-of-speech taggers or chunkers for natural language processing. It has also been used for named-entity recognition, information extraction in domain-specific texts, and disfluency chunking in transcribed speech.
Mbt is used by Frog for Dutch tagging.
Features
- Tagger generation: tagged text in, tagger out
- Optional feedback loop: feed previous tag decision back to input of next decision
- Easily customizable feature representation
- Allows user-provided features
- Automatic generation of separate sub-taggers for known words and unknown words
- Can make use of full algorithmic parameters of TiMBL
- Server mode is now available through a separate package: MbtServer
Download & Installation
Mbt is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation.
To download and install Mbt:
- First check if there are up-to-date packages included in your distribution's package manager. There are packages for Alpine Linux, Arch Linux (AUR), macOS (homebrew), Debian and derivates like Ubuntu.
- If not, we recommend you use our docker container via docker pull proycon/mbt. It includes mbt and all necessary dependencies.
- Alternatively, you can always download, compile and install mbt manually, as shown next.
Manual installation
To compile Mbt manually consult the included INSTALL document, you will need current versions of the following dependencies of our software:
As well as the following 3rd party dependencies:
- A sane build environment with a C++ compiler (e.g. gcc or clang), autotools, libtool, pkg-config
Documentation
- Reference Guide (15 pages, 110 kB PDF); Daelemans, W., Zavrel, J., Van den Bosch, A., and Van der Sloot, K. (2010). MBT: Memory-Based Tagger, version 3.2, Reference Guide. ILK Technical Report Series 10-04.
- Recent Advances in Memory-Based Part-of-Speech Tagging. Jakub Zavrel and Walter Daelemans. in: Actas del VI Simposio Internacional de Comunicacion Social, Santiago de Cuba, pp. 590-597, 1999. ILK pub: ILK-9903.
- MBT: A Memory-Based Part of Speech Tagger-Generator. Walter Daelemans, Jakub Zavrel, Peter Berck and Steven Gillis. in: E. Ejerhed and I. Dagan (eds.) Proceedings of the Fourth Workshop on Very Large Corpora, Copenhagen, Denmark, 14-27, 1996.
- Part-of-Speech Tagging for Dutch with MBT, a Memory-based Tagger Generator. Walter Daelemans, Jakub Zavrel, Peter Berck, in: Congresboek van de Interdisciplinaire Onderzoeksconferentie Informatiewetenchap 1996, TU Delft.
- Book: Memory-Based Language Processing - Daelemans, W., and Van den Bosch, A. (2005). Cambridge, UK: Cambridge University Press.
Links
Mbt is used in:
- Frog - A Natural Language Processing suite for Dutch
- MBSP Demo - Demo of memory-based English shallow parsing, including Mbt
- Kiswahili PoS tagger - Demo of African Language Technology using Mbt
The development and improvement of Mbt also relies on your bug reports, suggestions, and comments. Use the github issue tracker or mail lamasoftware (at) science.ru.nl