Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get? full

Feedback
Report

11 Views PremiumSep 29, 2022

Because of globalization, it is becoming more and more common to use multiple languages in a single utterance, also called codeswitching. This results in special linguistic structures and, therefore, poses many challenges for Natural Language Processing. Existing models for language identification in code-switched data are all supervised, requiring annotated training data which is only available for a limited number of language pairs. In this paper, we explore semi-supervised approaches, that exploit out-of-domain monolingual training data. We experiment with word uni-grams, word n-grams, character ngrams, Viterbi Decoding, Latent Dirichlet Allocation, Support Vector Machine and Logistic Regression. The Viterbi model was the best semi-supervised model, scoring a weighted F1 score of 92.23%, whereas a fully supervised state-of-the-art BERT-based model scored 98.43%.

Repost is prohibited without the creator's permission.

0 Follower · 11 Videos

Recommended for You

All
Anime

We Need to Talk About train-dev-test Splits

8:00

We Need to Talk About train-dev-test Splits

18 Views

MaChAmp at SemEval-2023 Tasks 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12: On the Effectiveness of Interm

10:00

MaChAmp at SemEval-2023 Tasks 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12: On the Effectiveness of Interm

16 Views

From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken L

10:00

From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken L

13 Views

Frustratingly Easy Performance Improvements for Low-resource Setups: A Tale on BERT and Segment Embe

1:55

Frustratingly Easy Performance Improvements for Low-resource Setups: A Tale on BERT and Segment Embe

19 Views

MaChAmp at SemEval-2022 Tasks 2, 3, 4, 6, 10, 11, and 12: Multi-task Multi-lingual Learning for a Pr

6:32

MaChAmp at SemEval-2022 Tasks 2, 3, 4, 6, 10, 11, and 12: Multi-task Multi-lingual Learning for a Pr

27 Views

Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media Data

5:45

Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media Data

26 Views

Enough is Enough! A Case Study on the Effect of Data Size for Evaluation Using Universal Dependencie

4:31

Enough is Enough! A Case Study on the Effect of Data Size for Evaluation Using Universal Dependencie

8 Views

Where are we Still Split on Tokenization?

4:46

Where are we Still Split on Tokenization?

5 Views

After 6 Years as the CEO’s Substitute Lover She Divorces. When CEO Sees Her Again, She’s the Richest

2:00:28

After 6 Years as the CEO’s Substitute Lover She Divorces. When CEO Sees Her Again, She’s the Richest

3 Views

গল্প: বুদ্ধিমান খরগোশ | Bengali Moral Story for Kidsএই গল্পে দেখা যাবে কীভাবে একটি ছোট খরগোশ তার বুদ

0:53

গল্প: বুদ্ধিমান খরগোশ | Bengali Moral Story for Kidsএই গল্পে দেখা যাবে কীভাবে একটি ছোট খরগোশ তার বুদ

New Movie bazar

0 View

Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get?(teaser)

0:39

Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get?(teaser)

9 Views

Forgetful of English Words

8:01

Forgetful of English Words

1 View

Beautiful Quran Recitation By Abu Md Abdullah

0:35

Beautiful Quran Recitation By Abu Md Abdullah

Abu Mohd Abdullah

2 Views

Improve Your English Speaking Skills ALONE.

6:00

Improve Your English Speaking Skills ALONE.

6 Views

A1 Level of Podcast Episode 2

12:00

A1 Level of Podcast Episode 2

0 View

Qatar Flag vertical

3:40

Qatar Flag vertical

bili_1211552864

0 View

Lexical Normalization for Code-switched Data and its Effect on POS Tagging

12:15

Lexical Normalization for Code-switched Data and its Effect on POS Tagging

8 Views

WATCH Hell of a Summer 2025 - Link In The Description

2:54

WATCH Hell of a Summer 2025 - Link In The Description

2 Views

OriginSuite Review – The Best Unlimited Marketing Game Changer

4:33

OriginSuite Review – The Best Unlimited Marketing Game Changer

bili_1238013168

0 View

A1 level of Podcast Episode 1

12:46

A1 level of Podcast Episode 1

0 View