Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get?(teaser)

Phản hồi
Báo xấu

9 Lượt xem Premium29/09/2022

Because of globalization, it is becoming more and more common to use multiple languages in a single utterance, also called codeswitching. This results in special linguistic structures and, therefore, poses many challenges for Natural Language Processing. Existing models for language identification in code-switched data are all supervised, requiring annotated training data which is only available for a limited number of language pairs. In this paper, we explore semi-supervised approaches, that exploit out-of-domain monolingual training data. We experiment with word uni-grams, word n-grams, character ngrams, Viterbi Decoding, Latent Dirichlet Allocation, Support Vector Machine and Logistic Regression. The Viterbi model was the best semi-supervised model, scoring a weighted F1 score of 92.23%, whereas a fully supervised state-of-the-art BERT-based model scored 98.43%

Không được đăng tải lại nội dung khi chưa có sự cho phép của nhà sáng tạo

0 Người theo dõi · 11 Videos

Đề xuất cho bạn

Tất cả
Anime

hàm số cực trị

3:31

hàm số cực trị

0 Lượt xem

Maria Wendt – Viral Instagram Content Treasure Chest

3:07

Maria Wendt – Viral Instagram Content Treasure Chest

Course For Job net

0 Lượt xem

Custom cars are different 🤔

0:12

Custom cars are different 🤔

1 Lượt xem

Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get?(teaser)

0:39

Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get?(teaser)

9 Lượt xem

Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media Data

5:45

Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media Data

26 Lượt xem

We Need to Talk About train-dev-test Splits

8:00

We Need to Talk About train-dev-test Splits

16 Lượt xem

Where are we Still Split on Tokenization?

4:46

Where are we Still Split on Tokenization?

5 Lượt xem

From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken L

10:00

From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken L

13 Lượt xem

MaChAmp at SemEval-2022 Tasks 2, 3, 4, 6, 10, 11, and 12: Multi-task Multi-lingual Learning for a Pr

6:32

MaChAmp at SemEval-2022 Tasks 2, 3, 4, 6, 10, 11, and 12: Multi-task Multi-lingual Learning for a Pr

27 Lượt xem

Enough is Enough! A Case Study on the Effect of Data Size for Evaluation Using Universal Dependencie

4:31

Enough is Enough! A Case Study on the Effect of Data Size for Evaluation Using Universal Dependencie

3 Lượt xem

Frustratingly Easy Performance Improvements for Low-resource Setups: A Tale on BERT and Segment Embe

1:55

Frustratingly Easy Performance Improvements for Low-resource Setups: A Tale on BERT and Segment Embe

18 Lượt xem

Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get? full

6:03

Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get? full

11 Lượt xem

MaChAmp at SemEval-2023 Tasks 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12: On the Effectiveness of Interm

10:00

MaChAmp at SemEval-2023 Tasks 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12: On the Effectiveness of Interm

16 Lượt xem

Lexical Normalization for Code-switched Data and its Effect on POS Tagging

12:15

Lexical Normalization for Code-switched Data and its Effect on POS Tagging

8 Lượt xem

HUWAG KANG KUKURAP DAHIL MATATAWA KA TIYAK

8:34

HUWAG KANG KUKURAP DAHIL MATATAWA KA TIYAK

LOL Extremefunny clp

399.3K Lượt xem

All Of Us Are Dead Season 2 Netflix Everything We Know

4:34

All Of Us Are Dead Season 2 Netflix Everything We Know

429.2K Lượt xem

Satisfying Video l How to make Rainbow Slime Candy with Baby Shark Surprise Eggs Cutting ASMR #62

6:10

Satisfying Video l How to make Rainbow Slime Candy with Baby Shark Surprise Eggs Cutting ASMR #62

bili_1918601764

203.2K Lượt xem

Bawal slomo

0:15

380.1K Lượt xem

Bawal tigasan challenge 🤭💦🔥😏💋

0:13

Bawal tigasan challenge 🤭💦🔥😏💋

225.4K Lượt xem

LAGOT! Vice Ganda NAGPARINIG kay ANDREA at DANIEL PADILLA ISSUE sa It’s Showtime Live

2:03

LAGOT! Vice Ganda NAGPARINIG kay ANDREA at DANIEL PADILLA ISSUE sa It’s Showtime Live

240.4K Lượt xem