TopPage

  1. NLTK
    • python¤ÎÂÐÏÃ¥·¥§¥ë¥â¡¼¥É
      	python
    • ÂÐÏÃ¥·¥§¥ë¥â¡¼¥É¤Ç¼¡¤Î¥³¥Þ¥ó¥É¤òÆþÎϤ¹¤ë
      	import nltk
      	nltk.download()
      • NLTK Downloader
      • d) Download l) List c) Config h) Help q) Quit
        	Downloader> d
      • Download which package (l=list; x=cancel)?
        	Identifier> book
  1. NLTK Japanese Corpora - NLTK¤Ç»È¤¨¤ëÆüËܸ쥳¡¼¥Ñ¥¹
    • http://lilyx.net/pages/nltkjapanesecorpus.html
      	wget http://nlp.kuee.kyoto-u.ac.jp/~hasimoto/KNBC_v1.0_090925.tar.bz2
      	tar xjvf KNBC_v1.0_090925.tar.bz2
      	mv KNBC_v1.0_090925 knbc
      	mv knbc nltk_data/corpora
      	cd nltk_data/corpora/ <--- ¥Ç¥£¥ì¥¯¥È¥ê°ÜÆ°
      	ls <-- ¤³¤Î¥Ç¥£¥ì¥¯¥È¥ê¡¡knbc

NLTK¤Î¥É¥­¥å¥á¥ó¥È¤Ç´Ø·¸¤¢¤ê¤½¤¦¤Ê²Õ½ê(É®Àס§Ashihara)

  • SyntaxCorpusReader?:http://nltk.googlecode.com/svn/trunk/doc/api/nltk.corpus.reader.api.SyntaxCorpusReader-class.html
    • SyntaxCorpusReader?¤ò·Ñ¾µ¤·¤ÆKNPCorpusReader?¤Ï¹½ÃÛ¤µ¤ì¤Æ¤¤¤ë¡£
  • CorpusReader?:http://nltk.googlecode.com/svn/trunk/doc/api/nltk.corpus.reader.api.CorpusReader-class.html
    • SyntaxCorpusReader?¤Î·Ñ¾µ¸µ¡Ê¥¹¡¼¥Ñ¡¼¥¯¥é¥¹¡Ë¤Ç¤¹¡£¥µ¥Ö¥¯¥é¥¹¤ÎÃæ¤Ç»È¤¨¤ë¤â¤Î¤¬¤¢¤ë¤«¤â¤·¤ì¤Þ¤»¤ó¡£
    • ¾åµ­¤Î¥µ¥Ö¥¯¥é¥¹°ìÍ÷¡§Âоݥ³¡¼¥Ñ¥¹¤´¤È¤Ë·Ñ¾µ¤·¤Æ»È¤Ã¤Æ¤¤¤ë¤è¤¦¤Ç¤¹¡£
             * SyntaxCorpusReader
             * , xmldocs.XMLCorpusReader
             * , cmudict.CMUDictCorpusReader
             * , plaintext.PlaintextCorpusReader
             * , tagged.TaggedCorpusReader
             * , chunked.ChunkedCorpusReader
             * , conll.ConllCorpusReader
             * , ieer.IEERCorpusReader
             * , ipipan.IPIPANCorpusReader
             * , indian.IndianCorpusReader
             * , nombank.NombankCorpusReader
             * , ppattach.PPAttachmentCorpusReader
             * , propbank.PropbankCorpusReader
             * , senseval.SensevalCorpusReader
             * , string_category.StringCategoryCorpusReader
             * , wordlist.WordListCorpusReader
             * , switchboard.SwitchboardCorpusReader
             * , timit.TimitCorpusReader
             * , toolbox.ToolboxCorpusReader
             * , wordnet.WordNetCorpusReader
             * , wordnet.WordNetICCorpusReader
             * , ycoe.YCOECorpusReader
  • NLTK Japanese¤ÎÆ°ºî¤Ë¤Ä¤¤¤Æ
    • KNBC_v1.0_090925¤òknbc¤È¤¤¤¦Ì¾Á°¤ËÊѹ¹¡Ê¥³¥Þ¥ó¥É¡§mv KNBC_v1.0_090925 knbc¡Ë
    • nltk_data¤Î²¼¤Ç¤Ï¤Ê¤¯¡¢nltk_data/corpora¤Î²¼¤ËÃÖ¤¯¡£¡Ê¥³¥Þ¥ó¥É¡§mv knbc nltk_data/corpora¡Ë
  • NLTK Japanese¤Çnltk.Text.generate()¤ò»È¤¦¡£
    • knbcorpus.py¤Îdemo´Ø¿ô¤Î²¼¤Ë¤Ç¤âÄɲ䷤ƤߤƲ¼¤µ¤¤¡£
    • knbc = LazyCorpusLoader?('knbc/corpus1', KNBCorpusReader?, sorted(fileids, key=_knbc_fileids_sort), encoding='euc-jp') ¤Î²¼¤Ç¤¹¡£
         sents = knbc.words()
         sentList = []
         for sent in sents:
             for word in sent:
                 sentList.append(word.encode('utf-8'))#word¤Ïunicode·¿¤Ê¤Î¤Çʸ»ú·¿¤ËÊÑ´¹¤¹¤ë¡£
         text = nltk.Text(sentList)
         text.generate()
         >>>>>>½ÐÎÏÎã
         ¡Î ·ÈÂÓ ÅÅÏà ²ñ¼Ò ¼±ÊÌ ÈÖ¹æ ¤ò ÃÎ¤é ¤Ê¤¤ ¤¬ ¡¢ ǯ
         ¤´¤È ¤Ë ¼«Æ° ¤Ç ¥Õ¥©¥ë¥À ʬ¤± ¤µ ¤ì¤Æ ¤ë ¤ß¤ó¤Ê ¤â
         ¤ª ´«¤á ¤Ç¤¹ ¡£ ¤³¤Î¤è¤¦¤Ê µï¼ò²° ¤Ï ÉԲķç¤Ç¤¹
         ¤Í ¡£ ¡Î ¥¹¥Ý¡¼¥Ä ¡Ï ±¿Æ° ¤Ï ¹¥¤­¤Ê Êý ¤Ê ¤Î¤Ç¤¹
         ¤¬ ¡¢ ·ÈÂÓ ÅÅÏà µ¡¼ï ¤Î ¥­¡¼ ¤¬ ÎÙÀÜ ¤·¤Æ ¤ª¤ê ¡¢
         ¤½¤ì ¤Ï ¤Ò¤ë¤¬¤¨¤»¤Ð ·ÈÂÓ ¤¬ Äú ¤·¤Æ ¤¤¤Æ ½¼ÅÅ
         ¤Î »ý¤Á ¤¬ °­¤¯ ¤Ê¤Ã¤Æ ¤­¤¿ ¡£ Èà½÷ ¤Ï ¤Ü¤¯ ¤¬ Èà
         ¤é ¤Î ¥µ¡¼¥Ó¥¹ ¤ò ¼Â»Ü Ãæ ¤Ç¤¹ ¡£ ¤è¤¯ ¥É¥ó¤¯¤µ¤¤
         ¤Ã¤Æ ¸À¤ï ¤ì¤ë »ÏËö ¡£ ¢£ ¤Î ¡Ö ¤ª Í·¤Ó ¡× Ū¤Ê
         ¥Ä¡¼¥ë ¤«¤é ¶ÛµÞ
    • unicode·¿¤Ë¤Ä¤¤¤Æ¡§http://lab.hde.co.jp/2008/08/pythonunicodeencodeerror.html

ÌÚ¼¥¼¥ßÀ¸¸ÂÄê

ÊÔ½¸²èÌÌ
¥¼¥ßÀ¸
2021-2022ǯÅÙÀ¸(14´ü)
°¤ÉôÍÚÂçÆâñ¥
²¬ÅÄ°¼²»¶½À±ÍÛ
³á±ï²ÏÌîͳÌï
º´¡¹ÌÚô¥º´¡¹ÌÚÈþÇÈ
ßÀÅÄϵ®×¢µÈϵ®
Æ£°æ°ì»ÖÆ£ÅĽ¡¿¿
2020-2021ǯÅÙÀ¸(13´ü)
¾®ß·¿¿ô¥³Þ¸¶Í­¿¿
²ÃÆ£ÀµÃè³÷ÅÄÌöÅÍ
ºä¼Íã½»µÈ¿¿Æà
¹âÌîÂç²ÏÃæ°æÍÕ·î
±ÊÞ¼·ÊÍ´Ê¿´ÛºÚ¡¹»Ò
2019-2020ǯÅÙÀ¸(12´ü)
Âç°²¶³Ê¿--
¶áÆ£ÂÀͺÀ¶¿åÈþΤ
Ãæ¼²ÄÎçÊ¡»³³èµ¯
Ê¡²ÈÍ´µªÁ¥±ÛÅ·ºÌ
Æ°áΤ»³²¼²À·î
ºäËÜÎÃÂÀÅÚ²°ºÌ²Æ
2018-2019ǯÅÙÀ¸(ÉÔºß)
SEA-NAÂåɽ¼èÄùÌò
Ê¿²ìľµ±²£»³è½²Ö
½»µÈ¼Âµ§¼¼¶¶ÏºÈ
2017-2018ǯÅÙÀ¸(11´ü)
ÀйõÛÙÆà°ìµÜÂó³¤
µµ°æ³¤½®º´Æ£ÛÙ
º´Æ£Í­´õÉ°¿¹Âó¿¿
Æ£Ëܼë²Æ¥Û¥ï¥¤¥È¥¸¥Ë¡¼
ÁýÅÄÍ¥ºîëÆâ·òÂÀ
2016-2017ǯÅÙÀ¸(10´ü)
°ËÆ£¤ß¤­²¬Åç·ò¸ç
¾®À¾ÀãÍÕÍ´ÀîÂÙµ±
ÎëÌÚͤºÚÂçÌçÂó»Ë
ÅÄƬ¤ï¤«¤Ð¸ÍÅèºéÊæ
Ãæ¼ÃÒµ®À¾ÌîůÀ¸
²£»³Í´²ÌÀî´ßÍ´²Ì
2015-2016ǯÅÙÀ¸(09´ü)
Àõ²ì¼·³¤¾®ÎÓ¿¿ºÚ
À¾Â¼°Ë±ûËÙ¹¾ÃÎ̤
¿ËÀ¸°Ô´õ¼¾å¹ÀÂÀ
2014-2015ǯÅÙÀ¸(08´ü)
ÂçÀÐÀ¿ÂçÌÂÀϯ
²Ãƣ͵¼ùº´¡¹ÌÚº¸¶á
¹â¶¶Íýº»ÉðÅÄè½Êæ
»ûÅçÉñ»ÒȪ²ìÂç
»³ÅĽ¤À¤
2013-2014ǯÅÙÀ¸(07´ü)
²ÃÆ£»Ë¿¥¹©Æ£ÃÒ»Ò
º´¡¹ÌÚÍÕ»Ò»Ö³ù¼þ
¹â¶¶¸¼Î¶üâ¾¾æÆ
ÃæÈøÀéºéÃæÀîÎèºÚ
Ãæé®Âçµ®
2012-2013ǯÅÙÀ¸(06´ü)
±óÆ£À±ÃÏÂçÌî¼Óµ¨
³ùÅĤᤰ¤ßÌÚ²¼ÏÂÂç
ã·ÌÚÎò𺴡¹ÌÚÍÚ
º´Æ£Í¥»Ò¾Â߷ʸ¹á
¸Å²°¿¿ÍýµÈÅÄÃÒ¹°
2010-2011ǯÅÙÀ¸(05´ü)
°±¸¶»ËÉÒ°ËÆ£Â絯
°ËÆ£¤ß¤É¤ê±Êºäʸǵ
Æ£ÅĹҺÈÁ°Â¿ÂçÊå
¾¾ËÜÎÍͤµÜÄÅÍ­º»
»³ÅÄ°¡µ¨
2009-2010ǯÅÙÀ¸(04´ü)
´ßËÜδ»Ö·¦ÃÏͳ·Ã
»Ö²ìÀéÄáÄÅÅÄÍ­»Ò
»°±º¹©Ìï
2008-2009ǯÅÙÀ¸(03´ü)
°ÀÄŹ¯Í¤°æ¾å¤µ¤æ¤ê
Ë̺êͤ¼ù¹©Æ£Ï´²
¸ÅËóÍ¥²Öº´Àî¾´¹¨
º´Æ£Ä÷ÍÎÎëÌÚ°¡°á
Ãݸ¶´õÈþÆ£°æÍ¥ºî
ËÙ¸ø°ìËÙÆâ¾®¿¥
ÊÆß·¹¨»Ë
2007-2008ǯÅÙÀ¸(02´ü)
º´Æ£·òÂÀ¾å¼²Â¹°
±üÅÄ·¼µ®¾®ÌîÀ¿
Çò°æ¤«¤º¤ß¹â°æÍDzð
¿¹Ã«Î¼²ðÏ»ÅÏÍ­Íü·Ã
¼ãËÜůʿ
2006-2007ǯÅÙÀ¸(01´ü)
¿û°æ°´ÅÏÉô¸¬ÂÀϺ
Áêºä¿¿Â缲¿µ
±üÅí»Ò³Þ°æÌÔ
¾®ÎÓϹ¬óîÆ£¤¤¤Ä¤³
óîƣͺµªº´¡¹ÌÚËã̤
º´Æ£Æü²ÃÍùëËܵ®Ç·
ÆÁ¹¾Í¤²ðĹÎææûÊ¿
À¾Ëܤߤ椭ÎÓ³¨Î¤»Ò
ß·ÅÄÂçµ±