Frog
|
#include <cgn_tagger_mod.h>
Public Member Functions | |
CGNTagger (TiCC::LogStream *l, TiCC::LogStream *d=0) | |
bool | init (const TiCC::Configuration &) |
void | add_declaration (folia::Document &, folia::processor *) const |
void | post_process (frog_data &) |
void | add_tags (const std::vector< folia::Word * > &, const frog_data &) const |
std::string | getSubSet (const std::string &, const std::string &, const std::string &) const |
![]() | |
BaseTagger (TiCC::LogStream *, TiCC::LogStream *, const std::string &) | |
virtual | ~BaseTagger () |
virtual void | Classify (frog_data &) |
void | add_provenance (folia::Document &, folia::processor *) const |
std::string | getTagset () const |
std::string | set_eos_mark (const std::string &) |
bool | fill_map (const std::string &, std::map< std::string, std::string > &) |
std::vector< Tagger::TagResult > | tagLine (const std::string &) |
std::vector< Tagger::TagResult > | tag_entries (const std::vector< tag_entry > &) |
std::string | version () const |
Additional Inherited Members | |
![]() | |
void | extract_words_tags (const std::vector< folia::Word * > &, const std::string &, std::vector< std::string > &, std::vector< std::string > &) |
std::vector< Tagger::TagResult > | call_server (const std::vector< tag_entry > &) const |
BaseTagger (const BaseTagger &) | |
![]() | |
int | debug |
std::string | _label |
std::string | tagset |
std::string | _version |
std::string | textclass |
TiCC::LogStream * | err_log |
TiCC::LogStream * | dbg_log |
std::string | base |
std::string | _host |
std::string | _port |
MbtAPI * | tagger |
TiCC::UniFilter * | filter |
std::vector< std::string > | _words |
std::vector< Tagger::TagResult > | _tag_result |
std::map< std::string, std::string > | token_tag_map |
|
inlineexplicit |
|
virtual |
add POS annotation as an AnnotationType to the document
doc | the Document the add to |
proc | the processor to add |
Implements BaseTagger.
void CGNTagger::add_tags | ( | const std::vector< folia::Word * > & | wv, |
const frog_data & | fd | ||
) | const |
add the tagger results to the folia:Word list
wv | The folia::Word vector to add to |
fd | the frog_data structure with the tagger results |
string CGNTagger::getSubSet | ( | const std::string & | val, |
const std::string & | head, | ||
const std::string & | fullclass | ||
) | const |
get a specific subset value. (FoLiA output only)
val | the val to look up |
head | the head of the CGN POS-tag |
fullclass | the original full CGN tag, used for error messages only |
A full class may be N(soort,ev,basis,zijd,stan), so the head is N.
For every value in 'soort,ev,basis,zijd,stan' we lookup the subset in the cgn_subsets, and when the constraints on the head are satisfied we return the subset.
For instance: 'soort' is found in the subsets to belong to the subset 'ntype' and there are no 'head' constrainst on ntype, so the lookup for 'soort' yields 'ntype'
And would the fullclass have been VNW(betr,pron,stan,vol,persoon,getal) then the subset for 'getal' is 'getal' AND the constraints for 'getal' are 'VNW, N', so these are satisfied and the result is 'getal'
|
virtual |
initalize a CGN tagger from 'config'
config | the TiCC::Configuration |
first BaseTagger::init() is called to set generic values, then the CGN specific values for subset and constraints file-names are added and those files are read, except when these have the value 'ignore'
Reimplemented from BaseTagger.
|
virtual |
add the found tagging results to the frog_data structure
words | The frog_data structure to extend |
Implements BaseTagger.