Review of BERT and Semantic Textual Similarity (STS) Methods for Natural Language Processing Applications
Main Article Content
Abstract
ABSTRACT
The significant development in Natural Language Processing (NLP) is reflected in the increasingly recognised Transformer-based models. One of the Transformer models, namely the BERT (Bidirectional Encoder Representations from Transformers) model, is one of the models that has revolutionised NLP by using an effective pre-training and fine-tuning approach by combining measurements using Semantic Textual Similarity / STS, where BERT offers deep and contextual text representation capabilities (Devlin et al., 2019). This research discusses the BERT method as well as the application of STS measurement. BERT is optimised through a fine-tuning approach (Sun et al., 2019), whereas STS often utilises models such as Sentence-BERT to produce sentence embedding with high efficiency (Reimers & Gurevych, 2019). This method enables the mapping of sentences into vectors in semantic space which can facilitate the measurement of similarity between texts through metrics such as, cosine similarity. The study results show that the use of BERT in STS can improve the accuracy in understanding and measuring the level of meaning similarity between sentences or documents. The combination of BERT and STS provides a superior solution in various NLP applications, including text classification, text matching, content relevance measurement, information retrieval, document filtering, and question and answer systems. The implementation of BERT in STS can provide opportunities towards the development of more intelligent and responsive systems for NLP.
Keywords: BERT, Semantic Textual Similarity, NLP, Natural Language Processing, Machine Learning
ABSTRAK
Perkembangan yang sangat signifikan pada Pemrosesan Bahasa Alami (Natural Language Processing/NLP) tercermin dari model-model berbasis Transformer yang semakin dikenal. Salah satu model Transformer yaitu model BERT (Bidirectional Encoder Representations from Transformers) merupakan salah satu model yang telah merevolusi NLP dengan menggunakan pendekatan pre-training dan fine-tuning yang efektif dengan memadukan pengukuran dengan menggunakan Semantic Textual Similarity/STS, dimana BERT menawarkan kemampuan representasi teks yang mendalam dan kontekstual (Devlin et al., 2019). Penelitian ini membahas metode BERT serta aplikasi pengukuran STS. BERT dioptimalkan melalui pendekatan fine-tuning (Sun et al., 2019), sedangkan STS sering memanfaatkan model seperti Sentence-BERT untuk menghasilkan embedding kalimat dengan efisiensi tinggi (Reimers & Gurevych, 2019). Metode ini memungkinkan pemetaan kalimat ke dalam vektor-vektor dalam ruang semantik yang dapat memfasilitasi pengukuran kemiripan antar teks melalui metrik seperti, cosine similarity. Hasil studi menunjukkan bahwa penggunaan BERT dalam STS dapat meningkatkan akurasi dalam memahami dan mengukur tingkat kemiripan makna antar kalimat atau dokumen. Kombinasi BERT dan STS memberikan solusi yang unggul dalam berbagai aplikasi NLP, termasuk klasifikasi teks, pencocokan teks, pengukuran relevansi konten, pencarian informasi, penyaringan dokumen, dan sistem tanya jawab. Implementasi BERT dalam STS dapat memberikan peluang terhadap pengembangan sistem yang lebih cerdas dan responsif terhadap NLP.
Kata Kunci: BERT, Semantic Textual Similarity, NLP, Pemrosesan Bahasa Alami, Pembelajaran Mesin