Thứ Sáu, 19 tháng 1, 2018

Paraphrase Identification in Vietnamese Documents

Title: Paraphrase Identification in Vietnamese Documents
Authors: Bach, N.X.
Oanh, T.T.
Hai, N.T.
Phuong, T.M.
Keywords: K-Nearest Neighbor;Maximum Entropy Model;Naive Bayes Classification;Paraphrase Identification;Semantic Similarity;Support Vector Machines
Issue Date: 2016
Publisher: Institute of Electrical and Electronics Engineers Inc.
Citation: Scopus
Abstract: In this paper, we investigate the task of paraphrase identification in Vietnamese documents, which identify whether two sentences have the same meaning. This task has been shown to be an important research dimension with practical applications in natural language processing and data mining. We choose to model the task as a classification problem and explore different types of features to represent sentences. We also introduce a paraphrase corpus for Vietnamese, vnPara, which consists of 3000 Vietnamese sentence pairs. We describe a series of experiments using various linguistic features and different machine learning algorithms, including Support Vector Machines, Maximum Entropy Model, Naive Bayes, and k-Nearest Neighbors. The results are promising with the best model achieving up to 90% accuracy. To the best of our knowledge, this is the first attempt to solve the task of paraphrase identification for Vietnamese.
Description: Proceedings - 2015 IEEE International Conference on Knowledge and Systems Engineering, KSE 2015 4 January 2016, Article number 7371778, Pages 174-179
URI: http://ieeexplore.ieee.org/document/7371778/
http://repository.vnu.edu.vn/handle/VNU_123/33594
ISBN: 978-146738013-3
Appears in Collections:Bài báo của ĐHQGHN trong Scopus

Không có nhận xét nào:

Đăng nhận xét