Skip to content
  • Tiếng Việt
  • English

Extracting Relationships between Entities in Vietnamese Texts

Extracting Relationships between Entities in Vietnamese Texts

Pham Minh Man - CH1802054

Extracting relationships is a crucial task in Natural Language Processing (NLP). It falls under the umbrella of information extraction, widely applied in various fields such as knowledge graphs, automatic question answering, text summarization, and more. With the continuous growth of data, especially textual data, this task has garnered significant attention from researchers both nationally and internationally. However, research on relationship extraction in Vietnamese texts is still limited compared to other languages like English and Chinese. This thesis delves deeper into this topic.

Practical Applications include:

  • Web Mining: Explores data from the web, analyzing competitors, extracting names of famous individuals, popular products, price comparisons, and customer sentiment analysis.
  • Business Intelligence: Evaluates market information from new regulations in the business market, political information between countries, etc.

Contributions to Science:

  • Provides an overview of domestic and international research, the development trends of relationship extraction, and related tasks such as coreference resolution.
  • Discusses and evaluates various relationship extraction methods and coreference resolution techniques, proposes future directions for research.

Thesis Achievements:

  • Offers a comprehensive understanding of the relationship extraction task, the current state of research both nationally and internationally, and trends in relationship extraction and related tasks such as coreference resolution.
  • Compares combined models based on BERT for relationship extraction in Vietnamese. The study compares models based on PhoBERT and XLM-RoBERTa with single models like PhoBERT, demonstrating that combining models yields better results than individual models.
  • Proposes and evaluates new coreference resolution methods for Vietnamese texts to support the relationship extraction task, achieving F1 scores of 66.50%, 82.70%, and 76.26% using MUC, B3, and CEAFe metrics, respectively.
  • Develops a demo system for relationship extraction in Vietnamese texts.

Limitations:

  • The thesis does not address the issue of data imbalance during the training of relationship extraction models in Vietnamese.
  • The results of coreference resolution have not been integrated into the developed relationship extraction system.

For more details, visit: https://fit.uit.edu.vn/index.php/tin-tuc/goc-hoc-tap/6485-rut-trich-quan...

Ha Bang - Media Collaborator, University of Information Technology

Translation: Nhat Hien