A classification model for predicting diabetic retinopathy based on patient characteristics and biochemical measures

How to Cite

Kotsiliti E, Al-Diri B, Hunter A. A classification model for predicting diabetic retinopathy based on patient characteristics and biochemical measures. MAIO [Internet]. 2017 Jul. 7 [cited 2022 Dec. 2];1(4):69-85. Available from: https://www.maio-journal.com/index.php/MAIO/article/view/47

Copyright notice

Authors who publish with this journal agree to the following terms:

  1. Authors retain copyright and grant the journal right of first publication, with the work twelve (12) months after publication simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work’s authorship and initial publication in this journal.

  2. After 12 months from the date of publication, authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.


area under the ROC curve (AUC); classification; gradient boosting; lasso; prevalence and risk of diabetic retinopathy; random forest; retina screening



 Purpose: In the United Kingdom (UK), The NHS Diabetic Eye Screening Program offers an annual eye examination to all people with diabetes aged 12 or over, aiming at the early detection of people at high risk of visual loss due to diabetic retinopathy. The purpose of this study was the design of a model to predict patients at risk of developing retinopathy with the use of patient characteristics and clinical measures. 

Methods: We investigated data from 2011 to 2016 from the population-based Diabetic Eye Screening Program in East Anglia. The data comprised retinal eye screening results, patient characteristics, and routine biochemical measures of HbA1c, blood pressure, Albumin to Creatinine ratio (ACR), estimated Glomerular Filtration rate (eGFR), serum creatinine, cholesterol and Body Mass Index (BMI). Individuals were classified according to the presence or absence of retinopathy as indicated by their retinal eye examinations. A lasso regression, random forest, gradient boosting machine and regularized gradient boosting model were built and cross-validated for their predictive ability. 


Results: A total of 6,375 subjects with recorded information for all available biochemical measures were identified from the cohorts. Of these, 5,969 individuals had no signs of diabetic retinopathy. Of the remainder 406 individuals with signs of diabetic retinopathy, 352 had background diabetic retinopathy and 54 had referable diabetic retinopathy. The highest value of the10-fold cross-validated Area under the Curve (AUC) was achieved by the gradient boosting machine 0.73 ± 0.03 and the minimum required set of variables to yield this performance included 4 variables: duration of diabetes, HbA1c, ACR and age. A subsequent analysis on the predictive power of the biochemical measures showed that when HbA1c and ACR measurements were available for longer time periods, the performance of the models was greatly enhanced. When HbA1c and ACR measurements for a 5-year period prior to the event of study were available, gradient boosting machine cross-validated AUC was 0.77 ± 0.04 in comparison to the cross-validated AUC of 0.68 ± 0.04 when only information for the 1-year period for these variables was available. Similarly, an increment from 0.70 ± 0.02 to 0.75 ± 0.04 was observed with random forest. The dataset with the 1-year measurements comprised 4,857 subjects, of whom, 4,572 had no retinopathy and the remainder 285 had signs of retinopathy. The dataset with the 5-year measurements comprised 757 subjects, of whom, 696 had no retinopathy and the remainder 51 had signs of retinopathy. 

Conclusions: The utilization of patient information and routine biochemical measures can be used to identify patients at risk of developing retinopathy. The effective differentiation between patients with and without retinopathy could significantly reduce the number of screening visits without compromising patients’ health.