Capita Hartshead Tracing Solutions

Data cleaning

When we receive data from a customer there is inevitably an initial need for data cleaning before doing anything else with it.

We refer to cleaning where data is internally corrected and improved (e.g. spelling mistakes and splitting data into the correct fields) without adding or appending from any other data sources.

This involves normalising data into a set of separate standard fields, which includes the following processes:

  • Names - prefix, title, first name, middle names, surname, suffix.
  • Using matching against lists of common prefixes, titles, and suffixes.
  • Dealing with missing titles via gender recognition (against lists of known name genders).
  • Recognition of common multi word titles, surnames & prefixes.
  • Recognition of company names in names & addresses.
  • Temporary removal of operator notes - C/O & names within address data.
  • Missing apostrophe correction (e.g. in Surnames).
  • Recognition of names that cannot be split e.g. Duchess of York.
  • Correction of common Optical Character Recognition (OCR) failures.
  • Proper casing correction.
  • Flagging of companies, registrars & international contacts.
  • Postcode error-correction using BS7666 definition.
  • Address standardisation to five lines plus postcode.

On completion of the data cleaning we enhance the data.

© 2011 Capita Hartshead Tracing Solutions Limited.

Capita Hartshead Tracing Solutions is a trading name of Capita Hartshead Limited with its registered office at
The Registry, 34 Beckenham Road, Beckenham, Kent BR3 4TU. (Registered in England & Wales No. 02260524).

Part of Capita plc.