The rise of the machines - learning for environmental good and cost savings

May 24, 2023

Hastoe Group award winning Passivhaus scheme, Wimbish, Essex.

Introduction The term “machine learning” covers a collection of very clever techniques that automatically analyse vast quantities of data to build predictive models.  These models then in turn allow better business decisions.  This is a novel field for social landlords and some of the potential uses could be:

  • Predictive maintenance – instantaneously inform call centre staff whether it is more cost effective to replace or to repair components when they fail
  • Data filling – automatic and intelligent data imputation or “cloning” to reduce stock condition survey costs
  • Rent arrears management – identify which residents are genuinely struggling and which are using arrears as cheap credit
  • Predicting cost savings – work out which home improvements reduce costs the most

Baker Homes won a funding award in conjunction with the with the University of Surrey, from the UK’s innovation agency, Innovate UK and their Knowledge Transfer Partnership (KTP) Programme,. The objective of the KTP project is to build an advanced predictive tool to provide improved business decision making that promotes sustainable living for the social housing sector and provides efficient savings to housing providers.

In this case study we report our analysis of one landlord’s data on domestic energy efficiency and how energy efficiency improvements are reducing maintenance costs by around 5% a year. To describe the complex techniques used is beyond the scope of this article.   So for now we’ll concentrate on the outcomes.

The basic process of this KTP project was to:

  1. Collect and understand the data
  2. Prepare for analysis , including data filling
  3. Undertake Machine learning analysis
  4. Complete a What-if analysis

Collecting data  We gathered data for 3,844 homes, from Hastoe Housing Association.  This was a combination of data held in the asset management and housing management systems.  We also included external databases to give detail on likely credit ratings, crime and even Facebook usage rates.  In all, we collected 77 attributes (or pieces of data) for each address.  To give some context, most analysis just analyses one variable at a time, so 77 variables shows the power of machine learning.

Prepare for analysis – essentially this meant bringing all the data together and, crucially, linking maintenance costs and all other data with each address.  One of the first findings was that energy efficiency data (SAP rating) was missing on 1,252 homes.  One (expensive and lengthy) option was to commission DEA surveys for each of these homes and then enter the data.  However, we were able to “fill in” in this data in a few hours using machine learning.  The question is how accurate were the machine learnt SAP ratings?

To check this we used the same technique to predict the SAP ratings for the 2,592 homes where we had the actual data.  In all cases we found the predicted SAP to be roughly 4 SAP points higher than the actual.  So a predicted SAP of 74 compared to an actual SAP of 70 compared to and a predicted 70 compared with an actual SAP 66.

It seemed an obvious next step to improve the accuracy by simply subtracting 4 SAP points from the predicted value.  This gave an average gap between predicted and actual of 0.05 SAP points so SAP 70.05 predicted against an actual of 70.

What was even more interesting was that the data used to predict a SAP rating wasn’t data that is normally associated with energy efficiency (which is often missing in data sets anyway).  So normally things like insulation levels and boiler efficiency are used to calculate SAP.  However, we achieved accurate predictions using data such as market valuation and number of bedrooms.  Machine learning is good at finding unexpected links between data.

We have seen the same overestimation in predicted values in an entirely different machine learning project.  In that case machine learning was used to predict viewer “star” ratings of films they may wish to view.  We are still investigating to see if this overestimation is a function of this particular machine learning technique.

Machine learning analysis We have already spent time writing the code to take prepared data and number crunch.  So, for this project the effort to import the data and press “go” was fairly trivial.  That said, the computer does carry on working overnight while we all go home.

One of the outputs of this analysis was a “features list” which showed which factor was most important in predicting asset management costs. Here’s what we found:


For this particular landlord, you can see that there are 4 features that show up as most influencing asset management costs.  These are:

  • SAP rating (the most important)
  • build year
  • maximum number of occupants
  • number of bedrooms (Facebook usage was 24th important in case you’re interested).

What-if analysis  We wanted to see how much the asset management costs could be reduced if the stock was improved to meet fuel poverty and carbon reduction targets.  Here’s what we found

Scenario Rationale Annual savings (£)
100% of homes SAP 75 or higher Comfortably within government’s fuel poverty targets £220,000 p.a.
100% of homes SAP 80 or higher Nearing UK’s carbon reduction requirement £330,000 p.a.

We have done this kind of analysis before and with similar findings.  When we presented previous results, they did attract a bit of scepticism.  Perhaps this is slightly understandable given the novelty of the technique.  One of the questions we were often asked was how do we know it wasn’t just the building age influencing the repairs costs?

Hopefully the chart above helps answer that question.  The build age was only a third of the importance of energy efficiency.

Nevertheless, as an extra exercise we ran another analysis.  This time we assumed that all homes less than 25 years old had not had any major upgrades.  A bit of a conservative assumption as we’d expect boiler replacements in that time, but we wanted to err on the side of caution.

Next, for the older homes we adjusted the SAP rating for the likely SAP rating they would have had given their build year (extracted from BRE data).  Our thinking was that if build age was the major predictor then adjusting the SAP rating wouldn’t make a difference to current maintenance costs.  In other words the maintenance costs would be the same as now.  What we actually found was that maintenance costs would be £200,000 higher, if the association hadn’t have made any improvements to their stock since the homes were built.

Partnering with the University of Surrey is allowing Baker Homes Ltd to draw upon knowledge and expertise that we would not otherwise have access too. The KTP includes Dr Lilian Tang from the Department of Computer Science at the University of Surrey and Dr Wolfgang Garn from Surrey Business School.

Dr Lilian Tang said, “The important thing that we want to show here, is, with the current available data, which are often incomplete and some are implicit, we have discovered some very useful information that gives us insight of cost saving etc. There is still huge potential in the next step of the project when we are developing more understanding on the nature and the use of data, which in turn may guide future data collection.”

Conclusions Machine learning is a potentially valuable tool that can help landlords analyse their data and, where necessary, change their processes such that they save vast amounts of money.  From a sustainability point of view, we have found that the reduced maintenance costs can add to the business case for making homes energy efficient.

We are interested in working with landlords or even maintenance contractors to carry on this type of analysis.  If you want any of the below, please get in touch:

  • Predictive maintenance – when is it cost effective to replace components
  • Data filling – essential if you want to make strategic decisions without having to wait for lengthy and costly stock condition surveys
  • Condensation and mould – what are the main factors in predicting mould problems
  • Voids and arrears – what factors (environmental or otherwise) contribute most to arrears and voids and how can they be managed

Alternatively, please ask us about any other ideas.  Nothing is off-limits – predicting resident satisfaction, general wellbeing, risk management of contractors, the possibilities are endless. Leave your comment below or contact us directly at or 0875 221 2232.

Knowledge Transfer Partnerships (KTP) is Europe’s leading programme helping businesses to improve their competitiveness by enabling companies to work with higher education or research and technology organisations to obtain knowledge, technology or skills which they consider to be of strategic competitive importance. The UK- wide programme is overseen by Innovate UK the UK’s innovation agency, and supported by 16 other public sector funding organisations.

Richard Lupo

Richard Lupo

Areas of expertise: developing and instigating stream-lined processes to ensure environmental effectiveness