Probabilistic modeling of genetic barriers enables reliable prediction of HAART outcome at below 15% error rate

Poster number: 24

I Savenkov (1), N Beerenwinkel (2), T Sing (1), M Däumer (3), SY Rhee (4), M Horberg (5), A Scarsella6, A Zolopa4, SY Lee (6), L Hurley (5), WJ Fessel5, RW Shafer (4), R Kaiser (3) and T Lengauer (1)

  1. Max Planck Institute for Informatics, Saarbrücken, Germany
  2. Department of Mathematics, University of California, Berkeley, CA
  3. Institute of Virology, University of Cologne, Germany
  4. Stanford University, Stanford, CA
  5. Kaiser-Permanente Medical Care Program, Oakland, CA
  6. Pacific Oaks Medical Center, Los Angeles, CA

BACKGROUND: Predicting virologic response to highly active antiretroviral therapy (HAART) is crucial for selecting a new potent drug combination after HIV therapy failure. Recently, we have shown that the probabilistic model of the genetic barrier is highly predictive for therapy outcome with an error rate of 134% (95% confidence interval). This performance was estimated on the outcomes of 759 therapy switches. Here, we validate the genetic barrier coupled with logistic regression and support vector machine classification on a seven times larger dataset from three US American cohorts.

OBJECTIVE: To assess the predictive power of the probabilistic genetic barrier for HAART outcome prediction on a large clinical dataset.

METHODS: We use patient data from clinics in North America comprising sequence and medication data from a total of 5089 patients (4584 failures and 974 successes). The data are fed into statistical learning algorithms, namely logistic regression and support vector machines classification. The feature set consists of indicators for the presence of a drug in the regimen and an estimate of the probabilistic genetic barrier. The genetic barrier is defined as the probability of the virus not reaching a certain level of resistance in a certain time interval [Beerenwinkel et al. J Infect Dis]. The predicted therapy outcome is either success or failure. Success is marked by undetectable virus load at least once in the course of the therapy. Failure is the outcome of a therapy during which sequencing was done (based on the assumption that sequencing is usually performed for patients failing therapy). Predictive power on unseen cases is estimated by 10-fold cross-validation.

RESULTS: On average, the predictions showed an error rate of about 15%. In combination with logistic regression an error rate of 144% was achieved. In combination with support vector machine classification we obtained an error rate of 15.55%.

CONCLUSIONS: We have shown on a large clinical dataset that logistic regression on probabilistic genetic barriers constitutes a powerful model for HAART outcome prediction characterized by high prediction accuracy.