DOES SIZE MATTER? IT DEPENDS ON YOUR PERSPECTIVE
Raw data can rarely be used for insights because it is usually necessary to combine with other sources. Typically, this process of combination results in a loss of data.
HARNESS has developed the Data Fabric platform that enables data in the most leveraged and connected form. To promote the technology, HARNESS has published millions of lines of price per square metre data, which may be freely interrogated and used to derive property value insights. This is released today and, subsequently, in approximately 6 weekly intervals.
Data Fabric is underpinned by the HARNESS Property ID (HPID); a reference to a proprietary grid of positions against which the data is placed. This approach surfaces the largest possible universe of address data to match against due to the understanding of connections between address objects. More limited perspectives are available by relating HPIDs to other location identifiers, such as UPRN, TOID and Land Registry Title. This open release comprises HPID to UPRN mapping.
To get started, download the data and import it into your favourite database.
The Data Fabric proprietary connection technology was utilised to join the following Open Government datasets:
- HM Land Registry Price Paid (price and transaction date)
- MHCLG Non-Domestic EPC (floor area)
- MHCLG Domestic EPC (floor area)
- VOA Rating List (floor area)
The data was de-duplicated by selecting the most recent floor area value either side of the price paid transaction date.
Just for fun, the data is being presented as price per square metre (PPSM) relatives using a simple, repeatable process: rolling 12 month means, with cleansing, segmented across asset class, geography and location identifier (or volume) into two size buckets – approx. 10th – 50th and 50th – 90th percentile. Note that the segments are not available in the open release. This method is applied across the two size perspectives – volume and floor area – to see the impact on the outcome on trend. For clarity sake, HARNESS does not comment on the market nor does it promote the trending method as an index; that is for others who wish to analyse and ‘create their own perspective’.
The charts below show the PPSM relatives for retail, office-industrial and flats from January 2018 onwards. This clearly points to the benefit of having an unlimited view of the data prior to the application of any downstream data analysis or methods. Yes, size really does matter!
- In terms of volume of data, enabled by the proprietary grid of HPIDs. For this release, there are approximately 740,000 more HPID matched records available for analysis that cannot be positioned with the UPRN identifier. For the commercial real estate component, this equates to an uplift in linked data utilisation of circa 30%. Figure 1 shows generally less negative PPSM relatives when the whole dataset is used in this analysis.
- In terms of physical space, enabled by stitching currently unlinked datasets to process data more effectively for insight. Figures 2 and 3 clearly show the relative impacts of property size, which is rarely considered when trending price (or PPSM) deltas.
Attempting to link address centric data in the UK is challenging, particularly for commercial real estate. This is because there are many location identifiers in market, and none are 100% designed or suitable for the task of stitching multiple feeds into a unified view. The Ordnance Survey supplied, and now open, UPRN goes some way to remedy the situation, although most datasets are not commonly identified at record level by a UPRN.
The conventional method of assigning a location identifier to a record is to match the address string to a standardised list which is cross referenced to the required identifier. Herein lies the problem; there isn’t a definitive standardised list and few would refer to their location via a third party assigned code. Address strings, which are in far more common use, are created and used by Royal Mail in the form of the Postcode Address File (PAF). PAF is a maintained database of ‘delivery points’ and postcodes in the UK, of which there are circa 29 million and 1.8 million respectively. The circa 40 million UPRNs extend the PAF addressable objects in scope and granularity by adding data from over 300 local planning authorities (LPAs).
Address matching a dataset to return the UPRN (or UDPRN, the PAF equivalent) works well for ‘vanilla’ property like houses. There is 1 UPRN, 1 post box, 1 set of residents and 1 address string. However, in the residential asset class alone challenges arise when, for example:
- A single address is ‘divided’ into multiple addresses.
- One or more parties believe the address no longer exists.
- The location is a PO Box.
This is demonstrated when a house is subdivided into flats for multiple occupancy but still receives mail through a communal letterbox.
The challenge is greater for commercial real estate, primarily because the common understanding of a ‘building’, and the unit level addressable objects within, is not captured in the Royal Mail or Ordnance Survey data. We see office complexes, industrial estates, and retail parks without an identifier for constituent parts of the asset, multiple identifiers to one address string or multiple strings to one identifier.
The results? Data professionals and amateurs alike having a choice between potentially linking data together inaccurately or underutilising the raw materials and potentially missing insights, opportunities or risks. In our data-driven environment, users seek to ensure that maximum value is achieved.
Data Fabric is the data-stitching system which connects IDs, both location and description, to give a view of the data and relationships without imposing limit. This is achieved through the proprietary grid of HPIDs and as a by-product results in an ‘enhanced’ address list in the UK market, particularly for commercial real estate internal building objects (floors, units etc.). Uniquely, the Data Fabric framework presents location data ‘as is’ without the assumption of a universal identifier, though location keys are included and related in the output, which includes proprietary IDs for buildings and estates. With this approach, all data is presented via a record level HPID that sits within the grid of positions.
The result, the highest accuracy and data utilisation rates in market. We encourage those with address strings to test our matching capabilities.
If you would like to analyse and use the data for your own purposes, please download and let us know what you think. There are no barriers to usage. If you register your details, we will notify you of subsequent releases as they become available.
If you would like to:
- Utilise data lines without a UPRN, or
- Discuss how Data Fabric can supercharge your data
The data is available to download and use openly and will be released after each Data Fabric build as a zipped csv file. The feed comprises the following fields:
- InternalHPID (unique identifier, not null) – HARNESS proprietary address position of the highest resolution
- SignatureHPID (unique identifier, not null) – HARNESS proprietary address position
- UPRN (varchar(20), null) – Unique Property Reference Number assigned by the LLPG Custodian or Ordnance Survey
- PricePerSquareMetre (float, null) – Land Registry price divided by floor area from other sources
- LandRegistryTransactionDate (date, null) – date of Land Registry price
- SquareMetreSource (varchar(50), null) – Epc Non Domestic, Epc Domestic, VOA
The data ‘profile’ is summarised as follows:
- Number of records with HPID, price, date and floor area = 16,302,982
- Number of records with UPRN, price, date and floor area = 15,565,138
- Date range = 1995-01-01 to 2020-10-29
- Proportions with UPRN: Non-domestic = 77.4% Domestic = 95.6%
Contains OS data © Crown copyright and database right 2021
Contains HM Land Registry data © Crown copyright and database right 2021. This data is licensed under the Open Government Licence v3.0.
Contains public sector information licensed under the Open Government Licence v3.0.
GOV.UK terms and conditions apply