Abstract:
Natural Language Processing--NLP is the principle that enables computers to understand, interpret, and utilize human language for communication. Specifically, it involves generating text to create coherent narratives automatically. This is particularly useful for generating complex and narrative-rich textual content. The main mechanism involves constructing sentences by assembling various types of words, phrases, or groups of words into coherent sentences before creating meaningful content that humans can understand. This research develops and designs a software machine to generate Thai language sentences for storage in a Thai sentence repository for use in subsequent research on summarization. The design includes both the architecture of the machinery and the methods. It is divided into two main parts: generating a community dictionary of word types in the Thai language and generating sentences using the cross-productive operations of relational algebra on a database as control rules for generating sentences following Thai language syntax patterns. The experiment involves importing words from a small-sized city dictionary of 30,000 words and generating 21 Thai sentence patterns. The experimental results show that the machine can generate a large quantity of sentences, up to 7.63926x1016 sentences. The quality of the results is assessed by considering whether the generated sentences are readable and semantically correct. It is found that, on average, 36.70% of the sentences are readable and semantically correct, with a minimum of 13.33% and a maximum of 64%. Considering the number of words used to create sentences, sentences with two words have a readability and correctness rate between 44.00% and 64.00%, averaging 53.05%. For sentences with a length of 3 words, the readability and correctness rate range from 22.33% to 57.67%, with an average of 34.57%. Sentences with four words have a readability and correctness rate between 13.33% and 21.00%, averaging 18.00%.