The use of different data-mining techniques for the prediction of in vitro susceptibility and in vivo response of HIV-1 to antiretroviral drugs from viral genotypic data.

Poster number: 0

Andrea De Luca

  1. Istituto di Clinica delle Malattie Infettive, Policlinico Universitario “A. Gemelli”, Università Cattolica del Sacro Cuore, Rome, Italy

Due to high replication rate, large number of virions in the infected patients and high error rate of its reverse transcriptase, the Human Immunodeficiency Virus (HIV) type 1 has a high mutational rate. Nowadays, more than 20 antiretroviral drugs belonging to 4 distinct classes are licensed for use in the clinic: a highly active regimen consists of the association of at least 3 antiretrovirals. Antiretroviral drug pressure drives the virus towards the selection of mutations conferring loss of in vitro drug susceptibility and in vivo drug activity. Drug resistance mutations in the viral reverse transcriptase and protease, the main antiretroviral targets, occur in at least 50 codons with 80 aminoacid substitutions. For each drug pressure, distinct mutations in the relative target enzyme are selected in vitro and in vivo, but the effect of individual mutations and mutational pattern on resistance and cross-resistance to various drugs of the same class is complex and only partially known. The prediction of in vitro drug susceptibility from genotypic data is relatively simple, since one drug at a time is analysed, and can be achieved by several statistical tools, such as linear and logistic regression, recursive partitioning, classification trees, support vector machines and neural nets. Nonetheless, given some technical limitations of in vitro susceptibility testing and the difficulties in translating the effect of drug combinations in vivo, the direct prediction of drug response should be the optimal goal in the genotypic interpretation science. Given the complexity of the genotype-response correlation, large databases should be available which contain information on viral genotype, treatment type and in vivo virologic response. Major issues regard the quality of these databases and the availability of measures for potential confounders such as medication adherence and in vivo drug exposure parameters. The large number of parameters that should be estimated in the prediction models represents the major challenge for clinical virologists and methodologists working in the field. We have applied several mathematical tools using distinct models in order to develop accurate estimates of response to different drug regimens: these include fuzzy operators, genetic algorithms, neural networks, case-based reasoning with k-Nearest Neighbour algorithms as well as standard linear regression techniques. We could obtain better predictions as with available expert rule-based algorithms, but correlations with treatment response were consistently lower than with in vitro drug susceptibility, suggesting that other, not yet estimated, parameters contribute to in vivo response. Pros and cons of the different methodologies will be discussed in detail.