Search ThaiLIS Digital Collection 2019 x

แจ้งเอกสารไม่ครบถ้วน, ไม่ตรงกับชื่อเรื่อง หรือมีข้อผิดพลาดเกี่ยวกับเอกสาร ติดต่อที่นี่ ==>
หากไม่มีอีเมลผู้รับให้กรอก thailis-noc@uni.net.th

Patiphon Ongartittichai.  Benchmarking of Thai-spelling correction algorithms.  Master's Degree(Data Science).  Chiang Mai University. Library. : Chiang Mai University, 2568.

Title

Benchmarking of Thai-spelling correction algorithms

Title Alternative

การเปรียบเทียบสมรรถนะของอัลกอริทึมการแก้ไขการสะกดคำภาษาไทย

Creator

Name: Patiphon Ongartittichai

Subject

LCSH: Spelling errors

LCSH: Thai language -- Orthography and spelling

LCSH: Thai language -- Data processing

LCSH: Thai language -- Machine translating

LCSH: Language and languages -- Orthography and spelling

LCSH: Algorithms

Description

Abstract: This research aims to compare the efficiency of algorithms for detecting and correcting typos in Thai, considering accuracy and processing time, especially the combination of word cutting methods and typo detection algorithms, to find the most suitable approach for developing Thai natural language processing tools (Thai NLP). The data used in the experiment consisted of 3 Thai datasets: Thai Toxicity Tweet, Wisesight Sentiment, and ThaiSum, which are human-generated texts from both social media and news articles. The data was then prepared and word cutting was performed using the newmm, deepcut, and attacut processes. Then, typos were checked using the Levenshtein Distance, Hunspell, Peter Norvig, and Word2Vec algorithms. The experimental results showed that the combination of word cutting and typo detection algorithms between attacut and Peter Norvig gave the best results in terms of accuracy, while newmm and Hunspell gave the best results in terms of speed. Each method has its own advantages and disadvantages. Therefore, the choice of use should depend on the objectives, such as accuracy or speed. In addition, the research also presents a reusable experimental framework, which is useful for developers and researchers who want to evaluate or develop Thai typo detection systems in the future.

Abstract: งานวิจัยนี้มีวัตถุประสงค์เพื่อเปรียบเทียบประสิทธิภาพของอัลกอริทึมสำหรับการตรวจสอบและแก้ไขคำผิดในภาษาไทย โดยพิจารณาจากความแม่นยำและระยะเวลาในการประมวลผล โดยเฉพาะการจับคู่ร่วมกันระหว่างวิธีการตัดคำและอัลกอริทึมการตรวจคำผิด เพื่อหาแนวทางที่เหมาะสมสำหรับการพัฒนาเครื่องมือประมวลผลภาษาธรรมชาติภาษาไทย (Thai NLP) ข้อมูลที่ใช้ในการทดลองประกอบด้วยชุดข้อมูลภาษาไทย 3 ชุด ได้แก่ Thai Toxicity Tweet, Wisesight Sentiment และ ThaiSum ซึ่งเป็นข้อความที่เกิดจากมนุษย์ทั้งจากสื่อสังคมออนไลน์และบทความข่าว จากนั้นนำข้อมูลมาผ่านกระบวนการเตรียมข้อมูล และตัดคำด้วยกระบวนการ newmm, deepcut และ attacut จากนั้นทำการตรวจคำผิดด้วยอัลกอริทึม Levenshtein Distance, Hunspell, Peter Norvig และ Word2Vec ผลการทดลองพบว่า การจับคู่กระบวณการตัดคำและอัลกอริทึมตรวจเช็คคำผิดระหว่าง attacut กับ Peter Norvig ให้ผลลัพธ์ดีที่สุดในด้านความแม่นยำ ขณะที่ newmm กับ Hunspell ให้ผลลัพธ์ดีที่สุดในด้านความเร็ว โดยที่แต่ละวิธีมีข้อดีข้อเสียต่างกัน การเลือกใช้งานจึงควรขึ้นอยู่กับวัตถุประสงค์ เช่น ความแม่นยำ หรือความเร็วในการทำงาน นอกจากนี้ งานวิจัยยังได้นำเสนอกรอบการทดลองที่สามารถนำไปใช้ซ้ำได้ ซึ่งเป็นประโยชน์สำหรับนักพัฒนาและนักวิจัยที่ต้องการประเมินหรือต่อยอดระบบตรวจสอบคำผิดในภาษาไทยในอนาคต

Publisher

Chiang Mai University. Library

Address: CHIANG MAI

Email: cmulibref@cmu.ac.th

Contributor

Name: Phasit Charoenkwan

Role: Advisor

Date

Created: 2568

Modified: 2569-02-22

Issued: 2025-12-25

Type

วิทยานิพนธ์/Thesis

Format

application/pdf

Language

eng

Thesis

DegreeName: Master of Science

Level: Master's Degree

Descipline: Data Science

Grantor: Chiang Mai University

Rights

RightsAccess:

ลำดับที่.	ชื่อแฟ้มข้อมูล	ขนาดแฟ้มข้อมูล	จำนวนเข้าถึง	วัน-เวลาเข้าถึงล่าสุด
1	630632072.pdf	1.59 MB

ใช้เวลา

0.021633 วินาที

Creator : Patiphon Ongartittichai

Title	Contributor	Type
Benchmarking of Thai-spelling correction algorithms มหาวิทยาลัยเชียงใหม่ Patiphon Ongartittichai	Phasit Charoenkwan	วิทยานิพนธ์/Thesis

Contributor : Phasit Charoenkwan

Title	Creator	Type and Date Create
The Application of SECI model to develop a Chinese guidebook for Thai hotel front office staff มหาวิทยาลัยเชียงใหม่ Taksina Kunarucks;Danaitun Pongpatcharatorntep;Phasit Charoenkwan	Xiaoxia Wang	วิทยานิพนธ์/Thesis
Interpretable model for Thai sentiment analysis using Zero-Shot learning for feature extraction มหาวิทยาลัยเชียงใหม่ Phasit Charoenkwan;Pree Thiengburanathum	Thanakorn Chaisen	วิทยานิพนธ์/Thesis
Collaboration analysis of Orthopaedic publications between year 2010-2019 มหาวิทยาลัยเชียงใหม่ Phasit Charoenkwan	Thiraphat Tanphiriyakun	วิทยานิพนธ์/Thesis
High-throughput computational prediction of broadly neutralizing antibody against dengue envelope protein using machine learning algorithm มหาวิทยาลัยขอนแก่น Chonlatip Pipattanaboon;Sorujsiri Charoensudjai;Phasit Charoenkwanesai	Piyatida Natsrita ปิยะธิดา นาทศรีทา	วิทยานิพนธ์/Thesis
Analyzing customer behavior in walking street markets using deep learning techniques มหาวิทยาลัยเชียงใหม่ Phasit Charoenkwan	Manaschai Aonon	วิทยานิพนธ์/Thesis
Benchmarking of Thai-spelling correction algorithms มหาวิทยาลัยเชียงใหม่ Phasit Charoenkwan	Patiphon Ongartittichai	วิทยานิพนธ์/Thesis

You can access to TDC Database at URL http://www.thailis.or.th/tdc/ or http://dcms.thailis.or.th/tdc/ or http://tdc.thailis.or.th/tdc/

ThaiLIS is Thailand Library Integrated System
สนับสนุนโดย สำนักงานบริหารเทคโนโลยีสารสนเทศเพื่อพัฒนาการศึกษา
กระทรวงการอุดมศึกษา วิทยาศาสตร์ วิจัยและนวัตกรรม
328 ถ.ศรีอยุธยา แขวง ทุ่งพญาไท เขต ราชเทวี กรุงเทพ 10400 โทร. โทร. 02-232-4000

กำลัง ออน์ไลน์
ภายในเครือข่าย ThaiLIS จำนวน 79
ภายนอกเครือข่าย ThaiLIS จำนวน 10,207
รวม 10,286 คน

More info..

นอก ThaiLIS = 604,045 ครั้ง
มหาวิทยาลัยสังกัดทบวงเดิม = 3,766 ครั้ง
มหาวิทยาลัยราชภัฏ = 86 ครั้ง
หน่วยงานอื่น = 61 ครั้ง
มหาวิทยาลัยเทคโนโลยีราชมงคล = 30 ครั้ง
มหาวิทยาลัยเอกชน = 13 ครั้ง
สถาบันพระบรมราชชนก = 4 ครั้ง
มหาวิทยาลัยการกีฬาแห่งชาติ = 3 ครั้ง
มหาวิทยาลัยสงฆ์ = 2 ครั้ง
รวม 608,010 ครั้ง

Database server :
Version 2.5 Last update 1-06-2018
Power By SUSE PHP MySQL IndexData Mambo Bootstrap
มีปัญหาในการใช้งานติดต่อผ่านระบบ UniNetHelp

Server : 8.199.134
Client : Not ThaiLIS Member
From IP : 216.73.216.42