Guile Mecab
Guile bindings for MeCab, yet another part-of-speech and morphological analyzer. The bindings link to libmecab and offer a small API for using MeCab.
Installation
Using Guix, it is easy to use and install mecab, as we
provide a recipe for mecab, mecab-ipadic (one possible dictionary for Japanese)
and guile-mecab in the guix
directory. For instance, to enter an environment
with all the tools needed to run guile-mecab, you can run, from this repository:
guix environment -L guix --ad-hoc guile guile-mecab mecab-ipadic
Usage
Once installed, you can use the bindings by loading (mecab mecab)
in a
guile program.
- Scheme Procedure: mecab-version Returns MeCab's version number.
The tagger
The tagger is a global object used to load a dictionary. You can create and remove a tagger by using the following procedures:
Scheme Procedure: mecab-new-tagger [
args
'()
]This procedure creates a new tagger that can be passed to other functions.
args
is a list of strings that represent the arguments passed to mecab. See mecab's help for a list of accepted arguments.Scheme Procedure: mecab-destroy
tagger
This procedure destroys a tagger. Reusing it afterwards will not work, and may lead to a segmentation fault.
Scheme Procedure: mecab-error
tagger
Returns a string representing the last error message returned by the tagger.
The analysis
To analyse a sentence, you can use one of the following procedures.
Scheme Procedure: mecab-parse
tagger
str
Parses a sentence
str
withtagger
and returns a node.Scheme Procedure: mecab-parse-to-str
tagger
str
Parses a sentence
str
withtagger
and returns a string containing the result. The resulting string can be affected by options such as-O
or--node-format
.Scheme Procedure: mecab-split
tagger
str
Returns the list of words in
str
.Scheme Procedure: mecab-features
tagger
str
Returns the list of features associated with each word in
str
, in the same order asmecab-split
. Note that the strings are always in CSV format, and this cannot be affected by the-O
,--node-format
etc. options.Scheme Procedure: mecab-words
tagger
str
Returns the list of words in dictionary form that are present in
str
, leaving out punctuation and declension.
Nodes
Nodes can be manipulated with the following procedures.
Scheme Procedure: node-feature
node
Return the feature string of the given
node
.Scheme Procedure: node-surface
node
Return the surface value of the given
node
, that is the word as it is written in the sentence (not necessarily its dictionary form). Note that this function is not very reliable.Scheme Procedure: node-stat
node
Return the type of the given
node
. The value is a number which is one ofMECAB_NOR_NODE
,MECAB_UNK_NODE
,MECAB_BOS_NODE
,MECAB_EOS_NODE
orMECAB_EON_NODE
.Scheme Procedure: node-next
node
Return the node following the given
node
.Scheme Procedure: node-prev
node
Return the node preceding the given
node
.