top of page

Past event: Talking to Health Education England - cutting attrition rates using advanced analytics

Although the largest employer of highly skilled professionals globally, the NHS faces enormous challenges in meeting workforce demand. Recruitment is not enough. Training lead times are significant; it takes seven years and £250,000 for junior doctors, this can double for certain specialties. Therefore, reducing attrition and increasing attainment through targeted support are critical missions for Health Education England (HEE).

In the second of our public sector data webinar series, Daniel Woolf, Head of Business Intelligence and System Development, was in conversation with Transform’s CDO Will Lowe; Head of Analysis, Michael Baines; and Senior Data Analyst, Patrick Greenway.

Setting the scene, Daniel explained that while HEE understood the high-level factors behind the issues affecting attrition and attainment, they needed to break down the complex interactions for each person. We tackled this challenge through advanced data analytics, using data platforms, ML environments, and predictive models to deliver breakthrough insights, building confidence in data and ML as tools to tackle workforce challenges cost-effectively.

As Michael pointed out, when most organisations start out on their analytics journey, they are already looking at what has happened in the past, usually via reporting and dashboards. The difficult step is to move from this descriptive analysis into predictive analytics, to look forward and focus on what is likely to happen in the future. Armed with this information, prescriptive analytics can be used to develop appropriate courses of action and build intervention strategies.

For HEE building a suitable data platform was a crucial step to make sure our modelling data set accurately reflects the problem space that we're trying to solve (in this case trainee attrition) with enough relevant data points for our machine to learn from.

We used Microsoft Azure services via a VPN into HEE’s data centre. Security and access control were managed via a key vault. Machine learning code was developed in Azure ML And used Python and R notebooks which were controlled via DevOps processes.

Once we had built a training dataset we then follow a reasonably linear set of steps to reach a final set of candidate models. The time invested in this data preparation and understanding phase of the approach is not to be underestimated – often 60-70% of the project is spent here – and it’s crucial this is right.

Trials and validation of different machine-based learning algorithms are normal to deliver an accurate and robust result.

Supervised machine learning

Patrick talked our audience through the concept of supervised machine learning: having a training data set of past data where we know the outcome (e.g. whether trainees have or haven’t completed their training), that we can feed into a machine learning classifier – in this case learning the relationship between the input variables and attrition.

We used a binary classification approach – the model is trained to look for difference in the distribution of variables between two classes: those who complete the programme versus those who attrite.

To understand how our model performed we looked at two concepts: recall and precision. Recall is what proportion of those who left did we manage to predict. Precision is what proportion of our prediction actually left. This provided an F1 score of 0.6 which is essentially the weighted mean between the two metrics. A prediction that

60% would leave.

Breaking this group down further, we found 2 different sets of people: those who actually left and those who have the same characteristics of those who would leave and we could therefore target individuals with relevant attrition prevention strategies.

Our attendees left armed with a wealth of insights including the following key learnings:

  • Reviewing the data sets you already hold alongside third-party data helps to highlight the areas that can benefit from improved data quality for projects like this

  • Models are not automatically transferable to other problem groups: we discovered that our model was less accurate for GP programmes where factors of attrition are different to those affecting junior doctors

  • Existing data can contribute to the final outcome – by taking HEE’s existing data and applying our approach for advanced data analytics we were able to support the objective of tackling workforce challenges for the department

If you are interested in how we can help improve the way you use your data, get in touch at

If you were unable to join us for the session but would like to watch the recording or if you’d like to be invited to future events, just drop us a note at

bottom of page