A machine learning approach to domain specific dictionary generation. An economic time series framework

Stellenbosch Working Paper Series No. WP06/2021
 
Publication date: March 2021
 
Author(s):
[protected email address] (Department of Economics, Stellenbosch University)
 
Abstract:

This paper aims to offer an alternative to the manually labour intensive process of constructing a domain specific lexicon or dictionary through the operationalization of subjective information processing. This paper builds on current empirical literature by (a) constructing a domain specific dictionary for various economic confidence indices, (b) introducing a novel weighting schema of text tokens that account for time dependence; and (c) operationalising subjective information processing of text data using machine learning. The results show that sentiment indices constructed from machine generated dictionaries have a better fit with multiple indicators of economic activity than @loughran2011liability's manually constructed dictionary. Analysis shows a lower RMSE for the domain specific dictionaries in a five year holdout sample period from 2012 to 2017. The results also justify the time series weighting design used to overcome the p>>n problem, commonly found when working with economic time series and text data.

 
JEL Classification:

C32, C45, C53, C55

Keywords:

Sentometrics, Machine learning, Domain-specific dictionaries

Notes:

Data download: Generated Dictionaries

Download: PDF (738 KB)

BER Weekly

6 Jun 2025 SA GDP barely expands in Q1, while BCI and PMI suggest that Q2 remained weak
It was a busy week for local data releases, much of which painted a bleak picture of SA’s economy. Not only was first-quarter GDP growth dismal, but 2024 growth was also revised lower to just 0.5%. , The RMB/BER Business Confidence Index (BCI) showed sentiment remained shaky in the second quarter...

Read the full issue