A machine learning approach to domain specific dictionary generation. An economic time series framework

Stellenbosch Working Paper Series No. WP06/2021
 
Publication date: March 2021
 
Author(s):
[protected email address] (Department of Economics, Stellenbosch University)
 
Abstract:

This paper aims to offer an alternative to the manually labour intensive process of constructing a domain specific lexicon or dictionary through the operationalization of subjective information processing. This paper builds on current empirical literature by (a) constructing a domain specific dictionary for various economic confidence indices, (b) introducing a novel weighting schema of text tokens that account for time dependence; and (c) operationalising subjective information processing of text data using machine learning. The results show that sentiment indices constructed from machine generated dictionaries have a better fit with multiple indicators of economic activity than @loughran2011liability's manually constructed dictionary. Analysis shows a lower RMSE for the domain specific dictionaries in a five year holdout sample period from 2012 to 2017. The results also justify the time series weighting design used to overcome the p>>n problem, commonly found when working with economic time series and text data.

 
JEL Classification:

C32, C45, C53, C55

Keywords:

Sentometrics, Machine learning, Domain-specific dictionaries

Notes:

Data download: Generated Dictionaries

Download: PDF (738 KB)

BER Weekly

25 Jul 2025 Budget hurdle cleared, but US tariff implementation remains a risk
It was another big week on the local political front, but with some constructive momentum. On the trade front, ahead of next week’s 1 August deadline, Trump announced another “massive” trade deal with Japan. The upcoming week is busy, with a slew of global and domestic data releases and several monetary policy decisions, including the SARB....

Read the full issue