{"id":73,"date":"2007-04-21T13:53:01","date_gmt":"2007-04-21T13:53:01","guid":{"rendered":"http:\/\/newblog.mix1009.net\/?p=73"},"modified":"2020-10-27T21:28:45","modified_gmt":"2020-10-27T12:28:45","slug":"clucene-cjk-%eb%b6%84%ec%84%9d%ea%b8%b0","status":"publish","type":"post","link":"https:\/\/mix1009.net\/?p=73","title":{"rendered":"CLucene CJK \ubd84\uc11d\uae30"},"content":{"rendered":"<p><a href=\"http:\/\/clucene.sourceforge.net\/\">CLucene<\/a>\uc744 \uc774\uc6a9\ud558\uc5ec \uac80\uc0c9\uc5d4\uc9c4 \uad6c\ud604\ud558\ub294\ub370, \ud55c\uae00\ucc98\ub9ac\uc5d0 \ub300\ud55c \uc815\ubcf4\uac00 \uac70\uc758 \uc5c6\ub354\uad70\uc694. \uc544\uc8fc \uae30\ubcf8\uc801\uc778 \ud55c\uae00\ucc98\ub9ac\ub9cc \uad6c\ud604\ud574\ubcf4\uc558\uc2b5\ub2c8\ub2e4. CLucene\uc5d0\uc11c \ud55c\uae00 \ucc98\ub9ac\uc5d0 \ub300\ud574\uc11c \ucc38\uace0\ud558\uc2dc\uba74 \ub3c4\uc6c0\uc774 \ub418\ub9ac\ub77c \uc0dd\uac01\ud558\uc5ec \uc18c\uc2a4\ub97c \uacf5\uac1c\ud569\ub2c8\ub2e4.<\/p>\n<p>\ub9ac\ub205\uc2a4\uc640 \uc708\ub3c4\uc6b0\uc988\uc5d0\uc11c \ub3d9\uc791\ud558\uc9c0\ub9cc \uba3c\uc800 \ub9ac\ub205\uc2a4 \uc18c\uc2a4\ub9cc \uacf5\uac1c\ud569\ub2c8\ub2e4. \uc708\ub3c4\uc6b0\uc988\uc5d0\uc11c \uc544\uc9c1 _MBCS \uc815\uc758\ub97c \ube7c\uc9c0\uc54a\uace0 \ucef4\ud30c\uc77c\uc5d0 \uc131\uacf5\ud558\uc9c0 \ubabb\ud588\uc2b5\ub2c8\ub2e4. \uc880\ub354 \uc5f0\uad6c\ud574\ubd10\uc57c\ud560\ub4ef\ud558\ub124\uc694. \uc18c\uc2a4\ub294 \ucf54\ub4dc\ubcc0\ud658 \uc678\uc5d0\ub294 \ucc28\uc774\uac00 \uc5c6\uc2b5\ub2c8\ub2e4.<\/p>\n<p><span style=\"font-weight: bold\">clucene-core-0.9.16a \ubc84\uc804<\/span>\uc744 \uc0ac\uc6a9\ud588\uc73c\uba70 Makefile\uc5d0\uc11c CLUCENEPATH\ub97c \uc124\uc815\ud558\uace0 make\ud558\uc2dc\uba74 \ub429\ub2c8\ub2e4. \uc18c\uc2a4\uc5d0 \ud3ec\ud568\ub41c \ud55c\uae00\uc740 <span style=\"font-weight: bold\">UTF-8<\/span>\ub85c \uc778\ucf54\ub529\ub418\uc5b4\uc788\uc73c\uba70, <span style=\"font-weight: bold\">CentOS 4.4 AMD64 \ub9ac\ub205\uc2a4(LANG=ko_KR.UTF-8)<\/span>\uc5d0\uc11c \ud14c\uc2a4\ud2b8\ud588\uc2b5\ub2c8\ub2e4.<\/p>\n<p>clucene\uc758 StandardTokenizer\uc5d0 \ubcf4\uba74 CJK\uad00\ub828 \ucc98\ub9ac\uac00 \uc788\uc9c0\ub9cc, next()\uc5d0\uc11c _CJK\ub85c \uc778\uc2dd\ud558\uae30 \uc804\uc5d0 \ub2e4\ub978\uacf3(_istalpha)\uc73c\ub85c \ube60\uc838\uc11c CJK \ud1a0\ud070\uc73c\ub85c \ubd84\ub958\uac00 \uc548\ub418\ub354\uad70\uc694. \uadf8\ub798\uc11c \ubcf5\uc0ac\ud574\uc11c <span style=\"font-weight: bold\">CJKTokenizer.cpp<\/span>\ub97c \ub9cc\ub4e4\uace0 \ube44\uad50 \uc21c\uc11c\ub9cc \ubc14\uafd4\uc92c\uc2b5\ub2c8\ub2e4. \uc65c \ud55c\uae00\ucf54\ub4dc\uac00 _istalpha\uc73c\ub85c \uc778\uc2dd\ub418\uc5b4 \ube60\uc838\ub098\uac00\ub294\uc9c0\ub294 \uc798 \ubaa8\ub974\uaca0\ub124\uc694.<\/p>\n<p><span style=\"font-weight: bold\">KoreanStemFilter.cpp<\/span>\uc5d0\uc11c\ub294 CJK \ud1a0\ud070\uc744 2\uae00\uc790 \ub2e8\uc704\ub85c \ub098\ub204\ub294 \uc5ed\ud560\uc744 \ud569\ub2c8\ub2e4. \ub8e8\uc52c\uc778\uc561\uc158\uc5d0 \uc124\uba85\ub418\uc5b4\uc788\ub294\ub370 clucene\uc5d0\ub294 \uad6c\ud604\uc774 \uc548\ub418\uc5b4 \uc788\ub294\uac70 \uac19\ub354\uad70\uc694. &#8220;\uac80\uc0c9\uc5d4\uc9c4&#8221; \ud1a0\ud070\uc744 &#8220;\uac80\uc0c9&#8221; &#8220;\uc0c9\uc5d4&#8221; &#8220;\uc5d4\uc9c4&#8221; \ud1a0\ud070\uc73c\ub85c \ubc14\uafb8\uc8e0. \ud55c\uae00\uc758 \uc870\uc0ac\ub97c \ube80\ub2e4\ub358\uac00 \ud558\ub294 \uae30\ub2a5\uc744 \ucd94\uac00\ud558\uae30 \uc704\ud574\uc11c KoreamStemFilter\ub85c \ub9cc\ub4e4\uc5c8\ub294\ub370 \uc9c0\uae08 \uae30\ub2a5\uc740 CJK \ud544\ud130\ub9cc \uad6c\ud604\ub418\uc5b4\uc788\ub124\uc694.<\/p>\n<p><span style=\"font-weight: bold\">ConvertUtil.cpp<\/span>\ub294 iconv\ub97c \uc774\uc6a9\ud558\uc5ec UTF-8\uc744 UTF-32LE\ub85c \ubc14\uafb8\ub294 \uc18c\uc2a4 \uc785\ub2c8\ub2e4. \uc708\ub3c4\uc6b0\uc988\uc758 \uacbd\uc6b0\ub294 MultiByteToWideChar()\uc640 WideCharToMultiByte() \ud568\uc218\ub97c \uc774\uc6a9\ud588\uc2b5\ub2c8\ub2e4.<\/p>\n<p><span style=\"font-weight: bold\">CLuceneTest.cpp<\/span>\ub294 \uac04\ub2e8\ud55c \ub370\uc774\ud0c0 3\uac1c\ub97c \ub123\uace0 \ud130\ubbf8\ub110 \uc0c1\uc5d0\uc11c \uac80\uc0c9\ud560 \uc218 \uc788\ub294 \ud14c\uc2a4\ud2b8 \ud504\ub85c\uadf8\ub7a8\uc785\ub2c8\ub2e4. clucene\uc758 \ub370\ubaa8 \uc18c\uc2a4\ub97c \uc57d\uac04 \ubc14\uafd4\uc11c \uad6c\ud604\ud588\uc2b5\ub2c8\ub2e4.<\/p>\n<p>\ub2e4\uc74c\uc740 \uc2e4\ud589\ud55c \ud654\uba74\uc785\ub2c8\ub2e4:<\/p>\n<div style=\"padding: 10px; background-color: rgb(228, 228, 228)\">$ <span style=\"font-weight: bold; color: rgb(0, 0, 255)\">.\/CLuceneTest<\/span><br \/>\nadding doc: doc1 &#8211; hahaha \ud55c\uae00\ub2e8\uc5b4 hohoho \ube44 bye \uac80\uc0c9\uc5d4\uc9c4<br \/>\nadding doc: doc2 &#8211; hello zaza \ud55c\uae00 \uae40\ud604\uc815 \uae40\uac74 \uac74\ubaa8 \uac80\uc0c9<br \/>\nadding doc: doc3 &#8211; goodbye \uae40\uac74\ubaa8 \uc11c\uc601\uc740 \uac80\uc0c9 \uc5d4\uc9c4 SG\uc6cc\ub108\ube44<br \/>\nIndexing took: 5 ms.<\/p>\n<p>Enter query string: <span style=\"color: rgb(0, 0, 255); font-weight: bold\">\uac80\uc0c9\uc5d4\uc9c4<\/span><br \/>\nSearching for: &#8220;\uac80\uc0c9 \uc0c9\uc5d4 \uc5d4\uc9c4&#8221;<\/p>\n<p>0. doc1 &#8211; hahaha \ud55c\uae00\ub2e8\uc5b4 hohoho \ube44 bye \uac80\uc0c9\uc5d4\uc9c4 (0.974307)<\/p>\n<p>Search took: 1 ms.<br \/>\nScreen dump took: 0 ms.<\/p>\n<p>Enter query string: <span style=\"color: rgb(0, 0, 255); font-weight: bold\">+\uac80\uc0c9 +\uc5d4\uc9c4<\/span><br \/>\nSearching for: +\uac80\uc0c9 +\uc5d4\uc9c4<\/p>\n<p>0. doc1 &#8211; hahaha \ud55c\uae00\ub2e8\uc5b4 hohoho \ube44 bye \uac80\uc0c9\uc5d4\uc9c4 (0.383675)<br \/>\n1. doc3 &#8211; goodbye \uae40\uac74\ubaa8 \uc11c\uc601\uc740 \uac80\uc0c9 \uc5d4\uc9c4 SG\uc6cc\ub108\ube44 (0.383675)<\/p>\n<p>Search took: 0 ms.<br \/>\nScreen dump took: 0 ms.<\/div>\n<p>\n\uc18c\uc2a4\uc785\ub2c8\ub2e4.<br \/>\n<a href=\"https:\/\/mix1009.net\/wp-content\/uploads\/1\/1204400252.tgz\" class=\"aligncenter\"  \/>1204400252.tgz<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>CLucene\uc744 \uc774\uc6a9\ud558\uc5ec \uac80\uc0c9\uc5d4\uc9c4 \uad6c\ud604\ud558\ub294\ub370, \ud55c\uae00\ucc98\ub9ac\uc5d0 \ub300\ud55c \uc815\ubcf4\uac00 \uac70\uc758 \uc5c6\ub354\uad70\uc694. \uc544\uc8fc \uae30\ubcf8\uc801\uc778 \ud55c\uae00\ucc98\ub9ac\ub9cc \uad6c\ud604\ud574\ubcf4\uc558\uc2b5\ub2c8\ub2e4. CLucene\uc5d0\uc11c \ud55c\uae00 \ucc98\ub9ac\uc5d0 \ub300\ud574\uc11c \ucc38\uace0\ud558\uc2dc\uba74 \ub3c4\uc6c0\uc774 \ub418\ub9ac\ub77c \uc0dd\uac01\ud558\uc5ec \uc18c\uc2a4\ub97c \uacf5\uac1c\ud569\ub2c8\ub2e4. \ub9ac\ub205\uc2a4\uc640 \uc708\ub3c4\uc6b0\uc988\uc5d0\uc11c \ub3d9\uc791\ud558\uc9c0\ub9cc \uba3c\uc800 \ub9ac\ub205\uc2a4 \uc18c\uc2a4\ub9cc \uacf5\uac1c\ud569\ub2c8\ub2e4. \uc708\ub3c4\uc6b0\uc988\uc5d0\uc11c \uc544\uc9c1 _MBCS \uc815\uc758\ub97c \ube7c\uc9c0\uc54a\uace0 \ucef4\ud30c\uc77c\uc5d0 \uc131\uacf5\ud558\uc9c0 \ubabb\ud588\uc2b5\ub2c8\ub2e4. \uc880\ub354 \uc5f0\uad6c\ud574\ubd10\uc57c\ud560\ub4ef\ud558\ub124\uc694. \uc18c\uc2a4\ub294 \ucf54\ub4dc\ubcc0\ud658 \uc678\uc5d0\ub294 \ucc28\uc774\uac00 \uc5c6\uc2b5\ub2c8\ub2e4. clucene-core-0.9.16a \ubc84\uc804\uc744 \uc0ac\uc6a9\ud588\uc73c\uba70 Makefile\uc5d0\uc11c CLUCENEPATH\ub97c \uc124\uc815\ud558\uace0 make\ud558\uc2dc\uba74 \ub429\ub2c8\ub2e4. \uc18c\uc2a4\uc5d0 \ud3ec\ud568\ub41c [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":[],"categories":[19],"tags":[68,66,67,37],"_links":{"self":[{"href":"https:\/\/mix1009.net\/index.php?rest_route=\/wp\/v2\/posts\/73"}],"collection":[{"href":"https:\/\/mix1009.net\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mix1009.net\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mix1009.net\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mix1009.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=73"}],"version-history":[{"count":1,"href":"https:\/\/mix1009.net\/index.php?rest_route=\/wp\/v2\/posts\/73\/revisions"}],"predecessor-version":[{"id":315,"href":"https:\/\/mix1009.net\/index.php?rest_route=\/wp\/v2\/posts\/73\/revisions\/315"}],"wp:attachment":[{"href":"https:\/\/mix1009.net\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=73"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mix1009.net\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=73"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mix1009.net\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=73"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}