Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get? full

Maklum balas
Laporan

11 Tontonan Premium29/09/2022

Because of globalization, it is becoming more and more common to use multiple languages in a single utterance, also called codeswitching. This results in special linguistic structures and, therefore, poses many challenges for Natural Language Processing. Existing models for language identification in code-switched data are all supervised, requiring annotated training data which is only available for a limited number of language pairs. In this paper, we explore semi-supervised approaches, that exploit out-of-domain monolingual training data. We experiment with word uni-grams, word n-grams, character ngrams, Viterbi Decoding, Latent Dirichlet Allocation, Support Vector Machine and Logistic Regression. The Viterbi model was the best semi-supervised model, scoring a weighted F1 score of 92.23%, whereas a fully supervised state-of-the-art BERT-based model scored 98.43%.

Siaran semula adalah dilarang tanpa kebenaran pencipta.

0 Pengikut · 11 Video

Disyorkan untuk anda

Semua
Anime

PBT OOP DFP 30243 SESI 1 20242025--1130_1062

13:07

PBT OOP DFP 30243 SESI 1 20242025--1130_1062

bili_1911822560

1 Tontonan

FX DAILY：Trive 看跌欧元/美元

2:39

FX DAILY：Trive 看跌欧元/美元

2 Tontonan

Moneyball scene statistics

17:40

Moneyball scene statistics

4 Tontonan

如何开始在Trive Social跟单

3:20

如何开始在Trive Social跟单

1 Tontonan

如何开始使用 Trive Social

2:25

如何开始使用 Trive Social

1 Tontonan

FX DAILY：Trive 看涨美元/日元

2:14

FX DAILY：Trive 看涨美元/日元

2 Tontonan

如何在Trive Social成为信号提供者

1:27

如何在Trive Social成为信号提供者

1 Tontonan

5 websites can get free money

20:09

5 websites can get free money

Putra Salman Noah

1 Tontonan

ENGLISH F4 ULANGKAJI

59:47

ENGLISH F4 ULANGKAJI

Online.Class.SPM

0 Tontonan

SAINS F4 BAB 6 SOKONGAN,PERGERAKAN DAN PERTUMBUHAN

1:28:29

SAINS F4 BAB 6 SOKONGAN,PERGERAKAN DAN PERTUMBUHAN

Online.Class.SPM

1 Tontonan

SAINS F4 BAB 8 BAHAN DAN UNSUR

1:33:15

SAINS F4 BAB 8 BAHAN DAN UNSUR

Online.Class.SPM

4 Tontonan

PAI F4 KERAJAAN ABBASIYAH

1:01:06

PAI F4 KERAJAAN ABBASIYAH

Online.Class.SPM

0 Tontonan

ADD MATH F4 GEOMETRI KOORDINAT

1:35:18

ADD MATH F4 GEOMETRI KOORDINAT

Online.Class.SPM

2 Tontonan

We Need to Talk About train-dev-test Splits

8:00

We Need to Talk About train-dev-test Splits

16 Tontonan

Frustratingly Easy Performance Improvements for Low-resource Setups: A Tale on BERT and Segment Embe

1:55

Frustratingly Easy Performance Improvements for Low-resource Setups: A Tale on BERT and Segment Embe

18 Tontonan

Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get?(teaser)

0:39

Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get?(teaser)

9 Tontonan

Lexical Normalization for Code-switched Data and its Effect on POS Tagging

12:15

Lexical Normalization for Code-switched Data and its Effect on POS Tagging

8 Tontonan

Where are we Still Split on Tokenization?

4:46

Where are we Still Split on Tokenization?

5 Tontonan

Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media Data

5:45

Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media Data

26 Tontonan

MaChAmp at SemEval-2022 Tasks 2, 3, 4, 6, 10, 11, and 12: Multi-task Multi-lingual Learning for a Pr

6:32

MaChAmp at SemEval-2022 Tasks 2, 3, 4, 6, 10, 11, and 12: Multi-task Multi-lingual Learning for a Pr

27 Tontonan