Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get?(teaser)

Feedback
Report

9 Views PremiumSep 29, 2022

Because of globalization, it is becoming more and more common to use multiple languages in a single utterance, also called codeswitching. This results in special linguistic structures and, therefore, poses many challenges for Natural Language Processing. Existing models for language identification in code-switched data are all supervised, requiring annotated training data which is only available for a limited number of language pairs. In this paper, we explore semi-supervised approaches, that exploit out-of-domain monolingual training data. We experiment with word uni-grams, word n-grams, character ngrams, Viterbi Decoding, Latent Dirichlet Allocation, Support Vector Machine and Logistic Regression. The Viterbi model was the best semi-supervised model, scoring a weighted F1 score of 92.23%, whereas a fully supervised state-of-the-art BERT-based model scored 98.43%

Repost is prohibited without the creator's permission.

0 Follower · 11 Videos

Recommended for You

All
Anime

MaChAmp at SemEval-2022 Tasks 2, 3, 4, 6, 10, 11, and 12: Multi-task Multi-lingual Learning for a Pr

6:32

MaChAmp at SemEval-2022 Tasks 2, 3, 4, 6, 10, 11, and 12: Multi-task Multi-lingual Learning for a Pr

27 Views

From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken L

10:00

From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken L

13 Views

MaChAmp at SemEval-2023 Tasks 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12: On the Effectiveness of Interm

10:00

MaChAmp at SemEval-2023 Tasks 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12: On the Effectiveness of Interm

16 Views

Frustratingly Easy Performance Improvements for Low-resource Setups: A Tale on BERT and Segment Embe

1:55

Frustratingly Easy Performance Improvements for Low-resource Setups: A Tale on BERT and Segment Embe

19 Views

We Need to Talk About train-dev-test Splits

8:00

We Need to Talk About train-dev-test Splits

18 Views

Enough is Enough! A Case Study on the Effect of Data Size for Evaluation Using Universal Dependencie

4:31

Enough is Enough! A Case Study on the Effect of Data Size for Evaluation Using Universal Dependencie

8 Views

Where are we Still Split on Tokenization?

4:46

Where are we Still Split on Tokenization?

5 Views

Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get? full

6:03

Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get? full

11 Views

Lexical Normalization for Code-switched Data and its Effect on POS Tagging

12:15

Lexical Normalization for Code-switched Data and its Effect on POS Tagging

8 Views

Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media Data

5:45

Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media Data

26 Views

The Power of God to the Highest

0:57

The Power of God to the Highest

Prophetic Ministry

1 View

engin 1st phy chap 7 lec 2

3:33:10

engin 1st phy chap 7 lec 2

Nazmus Sakib_6371

1 View

Software Private Video

13:30

Software Private Video

bili_1302716240

4 Views

Trading and Investment Scam Funds Recovery, Hoffman Law Recovery Can Help

0:53

Trading and Investment Scam Funds Recovery, Hoffman Law Recovery Can Help

bili_1257395745

1 View

How to Claim Your 1000-Point Bonus

0:26

How to Claim Your 1000-Point Bonus

bili_2070876880

1 View

##বন্ধু মনটাই সওদাগর তোমার কোথায় বাড়ি ঘর।

4:08

##বন্ধু মনটাই সওদাগর তোমার কোথায় বাড়ি ঘর।

bili_1793350945

1 View

Wing Chun The Hague 👉 wudae.com 👉 Body & Mind Connection 💥 Martial Arts 💥 Kung Fu 💥 Den Haag

0:08

Wing Chun The Hague 👉 wudae.com 👉 Body & Mind Connection 💥 Martial Arts 💥 Kung Fu 💥 Den Haag

WuDae The Hague

0 View

Tugas 2 - Jaringan Komputer (TryWulandary-050724383)

45:54

Tugas 2 - Jaringan Komputer (TryWulandary-050724383)

Try Wulandary_4157

4 Views

eroplanong papel

5:12

eroplanong papel

bili_2022972353

2 Views

Tugas Seni budaya kelas XI-1

12:32

Tugas Seni budaya kelas XI-1

10 Views