Guile Mecab =========== Guile bindings for [MeCab](https://taku910.github.io/mecab/), yet another part-of-speech and morphological analyzer. The bindings link to libmecab and offer a small API for using MeCab. Installation ------------ Using [Guix](https://guix.gnu.org), it is easy to use and install mecab, as we provide a recipe for mecab, mecab-ipadic (one possible dictionary for Japanese) and guile-mecab in the `guix` directory. For instance, to enter an environment with all the tools needed to run guile-mecab, you can run, from this repository: ```bash guix environment -L guix --ad-hoc guile guile-mecab mecab-ipadic ``` Usage ----- Once installed, you can use the bindings by loading `(mecab mecab)` in a guile program. * *Scheme Procedure*: **mecab-version** Returns MeCab's version number. ### The tagger The tagger is a global object used to load a dictionary. You can create and remove a tagger by using the following procedures: * *Scheme Procedure*: **mecab-new-tagger** [`args` `'()`] This procedure creates a new tagger that can be passed to other functions. `args` is a list of strings that represent the arguments passed to mecab. See mecab's help for a list of accepted arguments. * *Scheme Procedure*: **mecab-destroy** `tagger` This procedure destroys a tagger. Reusing it afterwards will not work, and may lead to a segmentation fault. * *Scheme Procedure*: **mecab-error** `tagger` Returns a string representing the last error message returned by the tagger. ### The analysis To analyse a sentence, you can use one of the following procedures. * *Scheme Procedure*: **mecab-parse** `tagger` `str` Parses a sentence `str` with `tagger` and returns a node. * *Scheme Procedure*: **mecab-parse-to-str** `tagger` `str` Parses a sentence `str` with `tagger` and returns a string containing the result. The resulting string can be affected by options such as `-O` or `--node-format`. * *Scheme Procedure*: **mecab-split** `tagger` `str` Returns the list of words in `str`. * *Scheme Procedure*: **mecab-features** `tagger` `str` Returns the list of features associated with each word in `str`, in the same order as `mecab-split`. Note that the strings are always in CSV format, and this cannot be affected by the `-O`, `--node-format` etc. options. * *Scheme Procedure*: **mecab-words** `tagger` `str` Returns the list of words in dictionary form that are present in `str`, leaving out punctuation and declension. ### Nodes Nodes can be manipulated with the following procedures. * *Scheme Procedure*: **node-feature** `node` Return the feature string of the given `node`. * *Scheme Procedure*: **node-surface** `node` Return the surface value of the given `node`, that is the word as it is written in the sentence (not necessarily its dictionary form). Note that this function is not very reliable. * *Scheme Procedure*: **node-stat** `node` Return the type of the given `node`. The value is a number which is one of `MECAB_NOR_NODE`, `MECAB_UNK_NODE`, `MECAB_BOS_NODE`, `MECAB_EOS_NODE` or `MECAB_EON_NODE`. * *Scheme Procedure*: **node-next** `node` Return the node following the given `node`. * *Scheme Procedure*: **node-prev** `node` Return the node preceding the given `node`.