Record linkage in the Cape of Good Hope Panel

Stellenbosch Working Paper Series No. WP06/2018
 
Publication date: May 2018
 
Author(s):
[protected email address] (Department of History, Utrecht University)
[protected email address] (Department of Economic History, Lund University)
[protected email address] (Department of Economics, Stellenbosch University)
 
Abstract:

In this paper we describe the record linkage procedure to create a panel from Cape Colony census returns, or opgaafrolle, for 1787--1828, a dataset of 42,354 household-level observations. Based on a subset of manually linked records, we first evaluate statistical models and deterministic algorithms to best identify and match households over time. By using household-level characteristics in the linking process and near-annual data, we are able to create high-quality links for 84 percent of the dataset. We compare basic analyses on the linked panel dataset to the original cross-sectional data, evaluate the feasibility of the strategy when linking to supplementary sources, and discuss the scalability of our approach to the full Cape panel.

 
JEL Classification:

N01, C81

Keywords:

census, machine learning, micro-data, record linkage, panel data, South Africa

Download: PDF (1.1 MB)

Login

(for staff & registered students)



Need a password?
Forgot your password?

Upcoming Seminars

No seminars are currently listed. Please check back soon.
 
More...

BER Weekly

26 Jul 2024
Following a string of busy weeks, it was relatively quiet on the local front. Datawise, the most notable release was the consumer price inflation (CPI) print for June. The biggest global data release of the week also came from the US, with GDP coming out much stronger than expected in Q2. It was a(nother) wild week in US politics, with President Joe...

Read the full issue
 

Upcoming Seminars

No seminars are currently listed. Please check back soon.
 
More...

BER Weekly

26 Jul 2024
Following a string of busy weeks, it was relatively quiet on the local front. Datawise, the most notable release was the consumer price inflation (CPI) print for June. The biggest global data release of the week also came from the US, with GDP coming out much stronger than expected in Q2. It was a(nother) wild week in US politics, with President Joe...

Read the full issue