Lexical Normalization for Code-switched Data and its Effect on POS Tagging

Maklum balas
Laporan

8 Tontonan Premium29/09/2022

Lexical normalization, the translation of noncanonical data to standard language, has shown to improve the performance of many natural language processing tasks on social media. Yet, using multiple languages in one utterance, also called code-switching (CS), is frequently overlooked by these normalization systems, despite its common use in social media. In this paper, we propose three normalization models specifically designed to handle codeswitched data which we evaluate for two language pairs: Indonesian-English (Id-En) and Turkish-German (Tr-De). For the latter, we introduce novel normalization layers and their corresponding language ID and POS tags for the dataset, and evaluate the downstream effect of normalization on POS tagging. Results show that our CS-tailored normalization models outperform Id-En state of the art and Tr-De monolingual models, and lead to 5.4% relative performance increase for POS tagging as compared to unnormalized input

Siaran semula adalah dilarang tanpa kebenaran pencipta.

0 Pengikut · 11 Video

Disyorkan untuk anda

Semua
Anime

马来西亚所有的州

0:13

马来西亚所有的州

超宠粉的大帅哥

0 Tontonan

马来西亚所有的州

0:13

马来西亚所有的州

超宠粉的大帅哥

1 Tontonan

setting yamaha y15

3:02

setting yamaha y15

4 Tontonan

boboiboy gentar episode 1 English dub

30:19

boboiboy gentar episode 1 English dub

5 Tontonan

MATEMATIK F4 BAB 6 KETAKSAMAAN LINEAR

53:19

MATEMATIK F4 BAB 6 KETAKSAMAAN LINEAR

Online.Class.SPM

1 Tontonan

PAI F4 FORMAT SPM

56:59

PAI F4 FORMAT SPM

Online.Class.SPM

0 Tontonan

ADD MATH F4 BAB 3 PERSAMAAN

1:35:50

ADD MATH F4 BAB 3 PERSAMAAN

Online.Class.SPM

0 Tontonan

Lexical Normalization for Code-switched Data and its Effect on POS Tagging

12:15

Lexical Normalization for Code-switched Data and its Effect on POS Tagging

8 Tontonan

We Need to Talk About train-dev-test Splits

8:00

We Need to Talk About train-dev-test Splits

16 Tontonan

Frustratingly Easy Performance Improvements for Low-resource Setups: A Tale on BERT and Segment Embe

1:55

Frustratingly Easy Performance Improvements for Low-resource Setups: A Tale on BERT and Segment Embe

19 Tontonan

Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get?(teaser)

0:39

Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get?(teaser)

9 Tontonan

MaChAmp at SemEval-2023 Tasks 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12: On the Effectiveness of Interm

10:00

MaChAmp at SemEval-2023 Tasks 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, and 12: On the Effectiveness of Interm

16 Tontonan

Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media Data

5:45

Increasing Robustness for Cross-domain Dialogue Act Classification on Social Media Data

26 Tontonan

Where are we Still Split on Tokenization?

4:46

Where are we Still Split on Tokenization?

5 Tontonan

MaChAmp at SemEval-2022 Tasks 2, 3, 4, 6, 10, 11, and 12: Multi-task Multi-lingual Learning for a Pr

6:32

MaChAmp at SemEval-2022 Tasks 2, 3, 4, 6, 10, 11, and 12: Multi-task Multi-lingual Learning for a Pr

27 Tontonan

Enough is Enough! A Case Study on the Effect of Data Size for Evaluation Using Universal Dependencie

4:31

Enough is Enough! A Case Study on the Effect of Data Size for Evaluation Using Universal Dependencie

6 Tontonan

From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken L

10:00

From Masked Language Modeling to Translation: Non-English Auxiliary Tasks Improve Zero-shot Spoken L

13 Tontonan

Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get? full

6:03

Much Gracias: Semi-supervised Code-switch Detection for Spanish-English: How far can we get? full

11 Tontonan

Ang bastos nman

0:46

Ang bastos nman

460.6K Tontonan

“要不是有监控我根本不会信！”

3:22

“要不是有监控我根本不会信！”

Yizhixiaohuangshu

1.2M Tontonan