This is the repository for the paper 'Transfer Learning for Code-Mixed Data: Do Pretraining Languages Matter?' It contains links to all the datasets used in the paper.
- AfriSenti - train, dev
- NaijaVader - in this repo
- SAIL
- IIITH-CodeMix
- TamilMixSentiment
- MalayalamMixSentiment
- DravidianCodeMix
- Singh_et_al (NER)
- MasakhaNER
- Shah_and_Maurya (Sarcasm Detection)
Modified versions of TamilMixSentiment, MalayalamMixSentiment and DravidianMixSentiment (transliterated and with 4 sentiment labels) are included in this repo. Splits that we made for IIITH-CodeMix and Singh_et_al (NER) are also included.
- To reproduce the experiments in the paper, please refer to the MaChAmp repo.