A study of influencing factors affecting Thai text summary and a technique development on applying natural language processing with ontology-based knowledge for Thai text summary
Abstract:
Text summary system is the process of distilling the most important information from
original documents into particular phrases which help readers getting the main idea in short
forms. However, Thai text summary system is still in an early stage of developing mechanisms
for automatic summarizing documents. It is difficult to develop text summary system to be
efficient as a result of the complication of Thai text language which is needed to be analyzed and
interpreted. Therefore, the objectives of this thesis are to study methodology, design summary
algorithm, develop, and assess the effectiveness of Thai text summary system. The researcher
does apply natural language processing with ontology-based knowledge. In addition, we have
applied both extraction-based summarization techniques in order to shorten the context and to
filter some statements and words, and abstraction summarization which are appropriate for
academic journal summarization. Users can select the original document that he/she want to
summarize. And then the system will automatically summarize the context which is easy and
time saving.
In this research, we study key factors that affect the summarization. Confirmatory Factor
Analysis (CFA), Kaiser Meyer Olkin (KMO) statistic and Barletts Test of Sphericity are used to
evaluate the sufficiency of samples in the research. The sample group consists of 96 Thai text
precise experts. The result of the test indicates that all of the aspects which are studied, input
factors, process factors and output factors are greater than .05 of P-value, so all analyzed factors
correspond with empirical data very well. It is insignificant at the level of .05. The assessment of
the system efficiency is good ( X = 4.17, S.D. = 0.78). The accuracy of the output from Thai
ง
text summarization system comparing to the output from Thai text precise experts shows that the
average accuracy of the highlight sentence = 50.01% and the average accuracy of the summary
output = 70.08%. We can conclude that the algorithm and the developed system can be used for
news summarization efficiently.