Title: Data Augmentation with GPT-3.5 for Vietnamese Natural Language Inference
Student: Mai Hieu Hien – 20521305 – KHCL2020.2 – Main Author
Supervisor: Dr. Luong Ngoc Hoang
Summary:
In this paper, we introduce a novel method for data augmentation using the GPT-3.5 model to enhance Vietnamese datasets in the natural language inference task. Previous methods for Vietnamese natural language processing often involved modifying a few words (tokens), limiting the diversity of generated sentences. In contrast, our approach leverages the GPT-3.5 model to generate new sentences, ensuring a more diverse augmented dataset. Additionally, we employ the pointwise V-information to filter mislabeled data in the generated corpus, enhancing the quality of our dataset. Experimental results demonstrate that our proposed method outperforms the baseline on both multilingual models (such as Multilingual BERT, XLM-Roberta) and monolingual models (such as PhoBERT), highlighting the effectiveness and potential of utilizing large language models in Vietnamese natural language processing.
"I would like to express my gratitude to Dr. Luong Ngoc Hoang for his dedicated guidance and for pointing out my limitations during the research and publication of this international scientific paper."
The RIVF conference is an international event in the fields of Communication Technology and Computer Science, gathering scientists and researchers from Vietnam and around the world, promoting "Research, Innovation, and Vision for the Future" (RIVF). RIVF is listed in prestigious conference directories, proposed by SCOPUS and ISI Web of Science. The 2023 edition marks the 16th year of this conference.
RIVF 2023 focuses on various topics including Image, Language, and Speech Processing; Communication & Computer Networks, Network Security; Distributed Systems, Internet of Things, Cloud Computing; Artificial Intelligence, Data Science, Big Data Analysis, Intelligent Computing; Software Engineering, Information Systems, and Computational Models.
The RIVF conferences were initiated in 2003 through the efforts of professors like Patrick Bellot, Marc Bui, Duong Nguyen Vu, and colleagues from various countries, as well as Professor Nguyen Dinh Tri and professors from the French-speaking Institute of Computer Science (Institut de la Francophonie pour l’Informatique) in Hanoi. By 2007, RIVF evolved from a French-speaking community's computer science conference held in Vietnam into an international conference under the IEEE (Institute of Electrical and Electronics Engineers) umbrella, covering both ICT and Computer Science, with an emphasis on enhanced quality.
For more details, visit: https://www.facebook.com/UIT.Fanpage/posts/pfbid0TMeB6Q44hC4v1ii9i...
Ha Bang - Media Collaborator, University of Information Technology
Nhat Hien - Translation Collaborator, University of Information Technology