install chasen 茶筅をインストール
http://chasen-legacy.sourceforge.jp/
http://sourceforge.jp/projects/chasen-legacy/
http://sourceforge.jp/projects/chasen-legacy/releases/
darts-32
http://chasen.org/~taku/software/darts/
chasen-2.4.4とdarts 0.32を入れます。
darts-0.32 # ./configure
darts-0.32 # make
darts-0.32 # make install
chasen-2.4.4 # ./configure
chasen-2.4.4 # make
chasen-2.4.4 # make install
ipadic 2.7.0をインストール
.dicと.chaをutf8へ変換
<?php $files = scandir('.'); foreach($files as $file){ if(preg_match('/(\.dic|\.cha)$/',$file)){ echo 'cp '.$file .' '. $file .'.bk'."\n"; echo 'nkf -w '.$file .'> tmpfile' . "\n"; system( 'cp '.$file .' '. $file .'.bk' );// save backup system( 'nkf -w '.$file .'> tmpfile' );// encode to utf8 system( 'mv tmpfile '.$file );// mv } }
[ipadic-2.7.0]# ./configure [ipadic-2.7.0]# `chasen-config --mkchadic`/makemat -i w parsing grammar.cha parsing cforms.cha parsing ctypes.cha parsing connect.cha table size: 2229 lines: .................................................. 24576 number of states: 2446 bi-gram: ........................................ 20000 ........ 24440 tri-gram: . 24576 matrix size: 2446x2229 -> 362x304 [ipadic-2.7.0]# `chasen-config --mkchadic`/makeda -i w chadic *.dic parsing grammar.cha parsing cforms.cha parsing table.cha parsing dictionaries... Adj.dic Adnominal.dic Adverb.dic Auxil.dic Conjunction.dic Filler.dic Interjection.dic Noun.adjv.dic Noun.adverbal.dic Noun.demonst.dic Noun.dic Noun.nai.dic Noun.name.dic Noun.number.dic Noun.org.dic Noun.others.dic Noun.place.dic Noun.proper.dic Noun.verbal.dic Onebyte.dic Others.dic Postp-col.dic Postp.dic Prefix.dic Suffix.dic Symbol.dic Verb.dic 379012 entries 325968 keys # make install # cd /usr/local/etc # nkf -w chasenrc > tmp # mv tmp chasenrc
# echo “茶筅でお茶をマゼマゼ” | chasen -i w
茶筅 チャセン 茶筅 名詞-一般
で デ で 助詞-格助詞-一般
お茶 オチャ お茶 名詞-一般
を ヲ を 助詞-格助詞-一般
マゼマゼ 未知語
EOS