danmaku icon

Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get?(teaser)

9 วิว29/09/2022

Because of globalization, it is becoming more and more common to use multiple languages in a single utterance, also called codeswitching. This results in special linguistic structures and, therefore, poses many challenges for Natural Language Processing. Existing models for language identification in code-switched data are all supervised, requiring annotated training data which is only available for a limited number of language pairs. In this paper, we explore semi-supervised approaches, that exploit out-of-domain monolingual training data. We experiment with word uni-grams, word n-grams, character ngrams, Viterbi Decoding, Latent Dirichlet Allocation, Support Vector Machine and Logistic Regression. The Viterbi model was the best semi-supervised model, scoring a weighted F1 score of 92.23%, whereas a fully supervised state-of-the-art BERT-based model scored 98.43%
warn iconห้ามทำซ้ำหรือดัดแปลงโดยไม่ได้รับอนุญาตจากครีเอเตอร์
creator avatar

วีดีโอแนะนำสำหรับคุณ

  • ทั้งหมด
  • อนิเมะ
.....
0:27

.....

292.2K วิว

its too big
0:32

its too big

411.5K วิว