To improve the ability of language models to handle Natural Language Processing
(NLP) tasks and intermediate step of pre-training has recently been
introduced. In this setup, one takes a pre-trained language model, trains it on
a (set of) NLP dataset(s), and then finetunes it for a target task. It is
known that the selection of relevant transfer tasks is important, but recently
some work has shown substantial performance gains by doing intermediate
training on a very large set of datasets. Most previous work uses generative
language models or only focuses on one or a couple of tasks and uses a
carefully curated setup. We compare intermediate training with one or many
tasks in a setup where the choice of datasets is more arbitrary; we use all
SemEval 2023 text-based tasks. We reach performance improvements for most tasks
when using intermediate training. Gains are higher when doing intermediate
training on single tasks than all tasks if the right transfer task
is identified. Dataset smoothing and heterogeneous batching did not lead to
robust gains in our setup.