September 25, 2020

The missing guide on data preparation for language modeling

The missing guide on data preparation for language modeling
Language models gained popularity in NLP in the recent years. Sometimes you might have enough data and want to train a language model like BERT or RoBERTa from scratch. While there are many tutorials about tokenization and on how to train the model, there is not much information about how to load the data into the model. This guide aims to close this gap.

PrivacyImprintRSS

© depends-on-the-definition 2017-2022