Skip to content
  • Tiếng Việt
  • English

Student of the Faculty of Information Technology Publishes Scientific Paper in ISI Q1 Journal

Paper: "XLMR4MD: New Vietnamese Dataset and Framework for Detecting the Consistency of Description and Permission in Android Applications Using Large Language Models"

Students involved:

  • Nguyen Ngoc Qui - IT2018

Supervisors:

  • Dr. Nguyen Tan Cam
  • Mr. Nguyen Van Kiet

Abstract:

Google Play and other app platforms host a variety of Android applications along with their metadata. Among this metadata, description and privacy policies help explain the functionality of the applications. They also describe the permissions of the application, especially those related to sensitive information. Detecting inconsistencies between the description of the application, privacy information, and permissions extracted from the application's source code helps users decide whether to install and use the application. In this study, we propose a new method based on a pre-trained language model to detect inconsistencies between permissions extracted from the application description, privacy policies, and permissions extracted from the application's source code (APK file). Related work focuses on models of large-scale datasets, especially for languages with rich resources like English. However, a language with low resources, specifically Vietnamese, requires more datasets for this task. To address this issue, we propose the ViDPApp dataset (Description and Privacy Policies of Applications in Vietnamese domains) - a manually labeled dataset with over 12,000 sentences with inter-annotator agreement (IAA) of over 85%. Additionally, we propose XLMR4MD, a framework using large language models, outperforming other machine learning models (LSTM, Bi-GRU-LSTM-CNN, WikiBERT, DistilBERT, mBERT, and PhoBERT). The framework achieves the best F1 score of 84.04% in detecting inconsistencies between Android application permissions and descriptions. This framework can be fine-tuned for 100 different languages, thereby extending to languages with low resources like Vietnamese. This dataset is available for research purposes.

"I would like to extend my sincerest thanks to Dr. Nguyen Tan Cam and Mr. Nguyen Van Kiet. With their willingness and dedication, both of you have guided us throughout the research process, pointing out significant limitations in the research process. The support of both of you has been a great source of encouragement, helping me overcome challenges and accomplish the research work to the best of my ability. I sincerely appreciate the guidance and valuable knowledge that both of you have imparted to me. Once again, I thank both of you very much!"

Detailed Information: https://www.facebook.com/UIT.Fanpage/posts/pfbid033awLFtwtmYu8h7nzLMoiWU...

Hạ Băng - Media Collaborator, University of Information Technology

English version: Phan Huy Hoang