Preview

Труды Института системного программирования РАН

Расширенный поиск

Поиск заимствований в армянских текстах путем внутреннего стилометрического анализа

https://doi.org/10.15514/ISPRAS-2021-33(1)-14

Аннотация

Работа посвящена применению внутренних стилометрических методов в задаче обнаружения текстовых заимствований для армянского языка. Мы исследуем два варианта постановки задачи: обнаружение изменения стиля в документе и обнаружение границ нарушений стиля. Для данных задач в рамках этой работы мы создаем синтетические примеры с заимствованиями для академического, художественного и новостного жанров текста, и на полученных примерах проверяем эффективность алгоритмов иерархической кластеризации и других моделей по обнаружению нарушений стиля из серии конференций PAN.

Об авторах

Ева Максимовна ЕШИЛБАШЯН
Российско-Армянский университет
Армения

Студентка магистратуры по направлению машинного обучения факультета прикладной математики и информатики



Ариана Арменовна АСАТРЯН
Российско-Армянский университет
Армения

Магистрант кафедры математической кибернетики



Цолак Гукасович ГУКАСЯН
Российско-Армянский университет
Армения

Аспирант кафедры системного программирования



Список литературы

1. Mike Kestemont, Michael Tschuggnall, Efstathios Stamatatos, Walter Daelemans, Günther Specht, Benno Stein, and Martin Potthast. Overview of the Author Identification Task at PAN-2018: Cross-domain Authorship Attribution and Style Change Detection. Working Notes of CLEF 2018 – Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 2125, 2018.

2. Eva Zangerle, Michael Tschuggnall, Günther Specht, Martin Potthast, and Benno Stein. Overview of the Style Change Detection Task at PAN 2019. Working Notes of CLEF 2019 – Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 2380, 2019.

3. Michael Tschuggnall, Efstathios Stamatatos, Ben Verhoeven, Walter Daelemans, Günther Specht, Benno Stein, and Martin Potthast. Overview of the Author Identification Task at PAN 2017: Style Breach Detection and Author Clustering. Working Notes of CLEF 2017 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 1866, 2017.

4. Paolo Rosso, Francisco Rangel, Martin Potthast, Efstathios Stamatatos, Michael Tschuggnall, and Benno Stein. Overview of PAN 2016 – New Challenges for Authorship Analysis: Cross-genre Profiling, Clustering, Diarization, and Obfuscation. Lecture Notes in Computer Science, vol. 9822, 2016, pp. 332-350.

5. Sukanya Nath. Style Change Detection by Threshold Based and Window Merge Clustering Methods. Working Notes of CLEF 2019 – Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 2380, 2019.

6. Dimitrina Zlatkova, Daniel Kopev, Kristiyan Mitov, Atanas Atanasov, Momchil Hardalov, Ivan Koychev, and Preslav Nakov. An Ensemble-Rich Multi-Aspect Approach for Robust Style Change Detection – Notebook for PAN at CLEF 2018. Working Notes of CLEF 2018 – Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 2125, 2018.

7. Marjan Hosseinia and Arjun Mukherjee. A Parallel Hierarchical Attention Network for Style Change Detection – Notebook for PAN at CLEF 2018. Working Notes of CLEF 2018 – Conference and Labs of the Evaluation Forum, vol. 2125, 2018.

8. Kamil Safin and Aleksandr Ogaltsov. Detecting a Change of Style Using Text Statistics – Notebook for PAN at CLEF 2018. Working Notes of CLEF 2018 – Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 2125, 2018.

9. Daniel Karaś, Martyna Śpiewak, and Piotr Sobecki. OPI-JSA at CLEF 2017: Author Clustering and Style Breach Detection – Notebook for PAN at CLEF 2017. Working Notes of CLEF 2017 – Conference and Labs of the Evaluation Forum, vol. 1866, 2017.

10. Jamal Ahmad Khan. Style Breach Detection: An Unsupervised Detection Model – Notebook for PAN at CLEF 2017. Working Notes of CLEF 2017 – Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 1866, 2017.

11. Kamil Safin and Rita Kuznetsova. Style Breach Detection with Neural Sentence Embeddings – Notebook for PAN at CLEF 2017. Working Notes of CLEF 2017 – Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 1866, 2017.

12. Helena Gómez-Adorno, Yuridiana Alemán, Darnes Vilariño Ayala, Miguel A. Sanchez-Perez, David Pinto, and Grigori Sidorov. Author Clustering using Hierarchical Clustering Analysis – Notebook for PAN at CLEF 2017, Working Notes of CLEF 2017 м Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 1866, 2017.

13. Yasmany García-Mondeja, Daniel Castro-Castro, Vania Lavielle-Castro, and Rafael Muñoz. Discovering Author Groups using a B-compact graph-based Clustering – Notebook for PAN at CLEF 2017. Working Notes of CLEF 2017 - Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 1866, 2017.

14. Mirco Kocher and Jacques Savoy. UniNE at CLEF 2017: Author Clustering – Notebook for PAN at CLEF 2017. Working Notes of CLEF 2017 – Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 1866, 2017.

15. Iqbal Farkhund, Hamad Binsalleeh, Benjamin C.M. Fung, and Mourad Debbabi. Mining writeprints from anonymous e-mails for forensic investigation. Digital Investigation, vol. 7, issue 1-2, 2010, pp. 56-64.

16. Zuo Chaoyuan, Yu Zhao, and Ritwik Banerjee. Style Change Detection with Feed-forward Neural Networks. Working Notes of CLEF 2019 – Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR Workshop Proceedings, vol. 2125, 2019.

17. Hirst Graeme, and Ol’ga Feiguina. Bigrams of syntactic labels for authorship discrimination of short texts. Literary and Linguistic Computing, vol. 22, no. 4, 2007, pp. 405-417.

18. Rupesh Kumar Dewang and A. K. Singh. 2015. Identification of Fake Reviews Using New Set of Lexical and Syntactic Features. In Proc. of the Sixth International Conference on Computer and Communication Technology (ICCCT '15), 2015, pp. 115–119.

19. C. Zhao, W. Song, L. Liu, C. Du and X. Zhao. Research on Author Identification Based on Deep Syntactic Features. In Proc. of the 2017 10th International Symposium on Computational Intelligence and Design (ISCID), 2017, pp. 276-279.

20. K. Avetisyan and T. Ghukasyan. Word embeddings for the armenian language: intrinsic and extrinsic evaluation. Bulletin of the Russian-Armenian University: Physico-Mathematical and Natural Sciences, no. 1, 2019, pp. 59-72.

21. Gishamer Flurin. Using Hashtags and POS-Tags for Author Profiling. Working Notes of CLEF 2019 – Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, CEUR Workshop Proceedings, vol. 2125, 2019.


Рецензия

Для цитирования:


ЕШИЛБАШЯН Е.М., АСАТРЯН А.А., ГУКАСЯН Ц.Г. Поиск заимствований в армянских текстах путем внутреннего стилометрического анализа. Труды Института системного программирования РАН. 2021;33(1):209-224. https://doi.org/10.15514/ISPRAS-2021-33(1)-14

For citation:


YESHILBASHIAN Ye.M., ASATRYAN A.A., GHUKASYAN Ts.G. Plagiarism Detection in Armenian Texts Using Intrinsic Stylometric Analysis. Proceedings of the Institute for System Programming of the RAS (Proceedings of ISP RAS). 2021;33(1):209-224. (In Russ.) https://doi.org/10.15514/ISPRAS-2021-33(1)-14



Creative Commons License
Контент доступен под лицензией Creative Commons Attribution 4.0 License.


ISSN 2079-8156 (Print)
ISSN 2220-6426 (Online)