Our approach to using language models in macro
We have started to release our News Inflation Pressure Indices (NIPI, see here) which synthetise the inflation news in a given country and/or in a specific sector. The first results show an interesting correlation with actual inflation.
Most of our language models have been re-estimated recently, as we have added more sources, two languages and several new features. So we take this opportunity to refresh our NewsBots models description.
We train pre-existing language models on specific tasks, mostly classification.
Specialised models beat general models
For training, we are getting generally better results with pre-trained models which have been initially elaborated on the same family of tasks, rather than huge and generalists models.
We use almost exclusively transformers (i.e. context aware) models, the latest generation of language models, able to perform language tasks pretty much on par with humans.
What the NewsBot models do
1/ Identify the English news relevant to the near-term inflation forecast using the RoBERTa model, referenced below, which we have fine-tuned on the classification task.
2/ Do the same in a couple of other languages (French and Italian for now, more to come).
3/ Take out the news about CPI and other official inflation releases, using another transformers model classifier, multi-language this time. (we do this because we don't want a spurious or lagging indicator)
4/ Apply Named Entity Recognition to identify the location, using a combination of pre-trained transformers models and our own algorithms.
5/ Apply further transformers classification models, to detect the theme (utility, food, airfares, etc) and sign (positive, negative and neutral). We have found the best results with one model for each classification task. We occasionally use multi-language models to get more training examples.
6/ Finally, compile the NIPI which is the difference between positive and negative news normalised.
The framework is reproducible. It can be transposed to pretty much any news classification problem (or investment focus).
We are happy to discuss the model details, our results and any extension of this work, so don't hesitate to reach out.