Skip to content
  • Tiếng Việt
  • English

From the idea of a coursework project to an internationally recognized A*-ranked conference paper in the field of computer science.

Recently, a UIT student excelled in natural language processing research, and their work was published at the "2023 Conference on Empirical Methods in Natural Language Processing (EMNLP 2023)," which is classified as A* according to CORE2023.

The scientific paper, titled "ViSoBERT: A Pre-Trained Language Model for Vietnamese Social Media Text Processing," was researched by two students, Nguyen Quoc Nam (the main author) and Phan Chau Thang (co-author), who are currently studying in the Data Science program at the Faculty of Information Technology.

Let's meet and chat with student Quoc Nam to learn more about your research journey:

  1. What motivated you to choose the topic of natural language processing for your research?

I have a strong passion for research, especially in the field of natural language processing. Currently, this area receives significant attention both within and outside the field. However, in Vietnam, specialized language models for social media have not been well developed.

Recognizing this issue, our team decided to research the ViSoBERT language model. ViSoBERT is trained on data from major social media platforms in Vietnam, such as Facebook, YouTube, and TikTok.

During the research, the model achieved outstanding results across a range of natural language processing tasks on social media data. These results can contribute to enhancing performance and promoting the development of various applications, such as information filtering and social media monitoring in Vietnam.

  1. Please share your feelings when your research paper was accepted at EMNLP 2023, classified as A* by CORE2023.

I'm thrilled and proud. This achievement is the result of continuous efforts, time, and dedication that we've invested over time. We are proud because this accomplishment makes a small contribution to the natural language processing research community in Vietnam and worldwide.

However, success comes with responsibility. We will continue to make efforts, develop, and contribute to the fields of natural language processing and artificial intelligence. This achievement also serves as motivation for my future career in scientific research.

With relentless effort, Nguyen Quoc Nam has produced high-quality scientific research papers.

  1. What challenges and advantages did your team encounter during the research process?

The most significant challenge we faced was having access to high-performance computers capable of running a large number of experiments, over 300 experiments in our case. We often needed to rent servers to conduct these experiments. Our guidance lecturers provided great support in this regard, enabling us to complete our research successfully.

Additionally, analyzing and experimenting with various aspects of social media language, such as emojis, Teencode, punctuation, and the unique features of social media language, posed a challenging task that required a great deal of effort to achieve the best results.

Thang and I have worked together for a long time, so we understand each other's ideas, communicate effectively, and collaborate efficiently. We complement each other's strengths and weaknesses. For example, I have the ability to generate ideas, write, argue, and implement rigorously according to the plan. Thang is strong in designing and experimenting with language models to create a diverse range of experiments. This made our research process smoother and led to evaluations from various perspectives as required by our guiding lecturers.

  1. How did UIT support you in your studies and research?

This research paper originated from a small idea in our Data Science Capstone Project. We were fortunate to receive enthusiastic and dedicated support from our lecturers, ThS. Nguyen Van Kiet and ThS. Nguyen Duc Vu. From there, our small idea developed into a high-quality research paper published at an A* international conference in the field of natural language processing.

UIT is an excellent environment for students to develop themselves, whether it's in terms of soft skills, deep knowledge, or their future careers, including research. I'd like to express my gratitude to the Faculty of Information Technology and the lecturers at the University of Information Technology in particular and Ho Chi Minh City University of Technology in general, as they are always dedicated to teaching and supporting students.

To UIT students: Give your best effort, whether in research or work. Success will come to those who truly strive.

Some of Nguyen Quoc Nam's achievements as a student in the Data Science program, Faculty of Information Technology:

  • Main author of a paper at the "2023 Conference on Empirical Methods in Natural Language Processing" (A* ranking according to CORE2023)
  • Co-author of a paper at "The 10th IEEE International Conference on Data Science and Advanced Analytics (DSAA 2023)" (B ranking according to CORE2023)
  • Co-author of a paper at "The 12th International Symposium on Information and Communication Technology (SOICT 2023)"

Như Ý - Contributing Communications Partner at the University of Information Technology

Translated by: Ngoc Diem