Multiple Testing and Data Adaptive Regression: An Application to HIV-1 Sequence Data.

Merrill D. Birkner; Sandra E. Sinisi; Mark J. van der Laan

doi:10.2202/1544-6115.1110

Article

Multiple Testing and Data Adaptive Regression: An Application to HIV-1 Sequence Data.

Merrill D. Birkner , Sandra E. Sinisi and Mark J. van der Laan

Published/Copyright: April 18, 2005

Published by

Become an author with De Gruyter Brill

Submit Manuscript Author Information Explore this Subject

From the journal Statistical Applications in Genetics and Molecular Biology Volume 4 Issue 1

Analysis of viral strand sequence data and viral replication capacity could potentially lead to biological insights regarding the replication ability of HIV-1. Determining specific target codons on the viral strand will facilitate the manufacturing of target-specific antiretrovirals. Various algorithmic and analysis techniques can be applied to this application. In this paper, we apply two techniques to a data set consisting of 317 patients, each with 282 sequenced protease and reverse transcriptase codons. The first application is recently developed multiple testing procedures to find codons which have significant univariate associations with the replication capacity of the virus. A single-step multiple testing procedure (Pollard and van der Laan 2003) method was used to control the family wise error rate (FWER) at the five percent alpha level as well as the application of augmentation multiple testing procedures to control the generalized family wise error (gFWER) or the tail probability of the proportion of false positives (TPPFP). We also applied a data adaptive multiple regression algorithm to obtain a prediction of viral replication capacity based on an entire mutant/non-mutant sequence profile. This is a loss-based, cross-validated Deletion/Substitution/Addition regression algorithm (Sinisi and van der Laan 2004), which builds candidate estimators in the prediction of a univariate outcome by minimizing an empirical risk. These methods are two separate techniques with distinct goals used to analyze this structure of viral data.

Keywords: Bootstrap; codon; generalized family wise error rate; HIV-1; multiple testing; prediction; tail probability of the proportion of false positives; type I error; variable selection.

Published Online: 2005-4-18

You are currently not able to access this content.

Articles in the same Issue

https://doi.org/10.2202/1544-6115.1110

Keywords for this article

Bootstrap; codon; generalized family wise error rate; HIV-1; multiple testing; prediction; tail probability of the proportion of false positives; type I error; variable selection.