A machine learning approach to domain specific dictionary generation. An economic time series framework
Stellenbosch Working Paper Series No. WP06/2021Publication date: March 2021
Author(s):
This paper aims to offer an alternative to the manually labour intensive process of constructing a domain specific lexicon or dictionary through the operationalization of subjective information processing. This paper builds on current empirical literature by (a) constructing a domain specific dictionary for various economic confidence indices, (b) introducing a novel weighting schema of text tokens that account for time dependence; and (c) operationalising subjective information processing of text data using machine learning. The results show that sentiment indices constructed from machine generated dictionaries have a better fit with multiple indicators of economic activity than @loughran2011liability's manually constructed dictionary. Analysis shows a lower RMSE for the domain specific dictionaries in a five year holdout sample period from 2012 to 2017. The results also justify the time series weighting design used to overcome the p>>n problem, commonly found when working with economic time series and text data.
JEL Classification:C32, C45, C53, C55
Keywords:Sentometrics, Machine learning, Domain-specific dictionaries
Notes:Data download: Generated Dictionaries
Download: PDF (738 KB)Login
(for staff & registered students)
BER Weekly
23 Jan 2026 Free Weekly Review | Number 3 | 23 January 2026This report covers the key domestic and international data releases over the past week....
Read the full issue
BER Weekly
23 Jan 2026 Free Weekly Review | Number 3 | 23 January 2026This report covers the key domestic and international data releases over the past week....
Read the full issue
Research