EpiInformatics Working Group - Title
National Cancer Institute

Meetings

Agendas

Minutes

Members

Publications List

BCBRWG
Grand Challenges

 

Biostatistics Branch Journal Club Papers
Reference List

Click on the Title to download a PDF Version

1. Benjamini Y, Hochberg Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society.Series B (methodological) 1995;57:289-300.

Abstract: The common approach to the multiplicity problem calls for controlling the familywise error rate (FWER). This approach, though, has faults, and we point out a few. A different approach to problems of multiple significance testing is presented. It calls for controlling the expected proportion of falsely rejected hypotheses-the false discovery rate. This error rate is equivalent to the FWER when all hypotheses are true but is smaller otherwise. Therefore, in problems where the control of the false discovery rate rather than that of the FWER is desired, there is potential for a gain in power. A simple sequential Bonferroni-type procedure is proved to control the false discovery rate for independent test statistics, and a simulation study shows that the gain in power is substantial. The use of the new procedure and the appropriateness of the criterion are illustrated with examples.

2. Benjamini Y, Yekutieli D. The control of the false discovery rate in multiple testing under dependency. Annals of Statistics 1999;29:1165-88.

Abstract: Benjamini and Hochberg suggest that the false discovery rate may be the appropriate error rate to control in many applied multiple testing problems. A simple procedure was given there as an FDR controlling procedure for independent test statistics and was shown to be much more powerful than comparable procedures which control the traditional family wise error rate. We prove that this same procedure also controls the false discovery rate when the test statistics have positive regression dependency on each of the test statistics corresponding to the true null hypotheses. This condition for positive dependency is general enough to cover many problems of practical interest, including the comparisons of many treatments with a single control, multivariate normal test statistics with positive correlation matrix and multivariate t. Furthermore, the test statistics may be discrete, and the tested hypotheses composite without posing special difficulties. For all other forms of dependency, a simple conservative modification of the procedure controls the false discovery rate. Thus the range of problems for which a procedure with proven FDR control can be offered is greatly increased.

3. Efron B, Tibshirani R. Empirical Bayes methods and false discovery rates for microarrays. Genet.Epidemiol. 2002;23:70-86.


4. Hildesheim A, Apple RJ, Chen CJ, Wang SS, Cheng YJ, Klitz W et al. Association of HLA class I and II alleles and extended haplotypes with nasopharyngeal carcinoma in Taiwan. J.Natl.Cancer Inst. 2002;94:1780-9.

Abstract: BACKGROUND: Nasopharyngeal carcinoma (NPC), which occurs at a disproportionately high rate among Chinese individuals, is associated with Epstein-Barr virus (EBV). Human leukocyte antigen (HLA) polymorphisms appear to play a role in NPC, because they are essential in the immune response to viruses. We used high-resolution HLA genotyping in a case-control study in Taiwan to systematically evaluate the association between various HLA alleles and NPC. METHODS: We matched 366 NPC case patients to 318 control subjects by age, sex, and geographic residence. Participants were interviewed and provided blood samples for genotyping. High-resolution (polymerase chain reaction- based) genotyping of HLA class I (A and B) and II (DRB1, DQA1, DQB1, and DPB1) genes was performed in two phases. In phase I, 210 case patients and 183 control subjects were completely genotyped. In phase II, alleles associated with NPC in the phase I analysis were evaluated in another 156 case patients and 135 control subjects. Extended haplotypes were inferred. RESULTS: We found a consistent association between HLA-A*0207 (common among Chinese but not among Caucasians) and NPC (odds ratio [OR] = 2.3, 95% confidence interval [CI] = 1.5 to 3.5) but not between HLA-A*0201 (most common HLA-A2 allele in Caucasians) and NPC (OR = 0.79, 95% CI = 0.55 to 1.2). Individuals with HLA-B*4601, which is in linkage disequilibrium with HLA-A*0207, had an increased risk for NPC (OR = 1.8, 95% CI = 1.2 to 2.5) as did individuals with HLA-A*0207 and HLA-B*4601 (OR = 2.8, 95% CI = 1.7 to 4.4). Individuals homozygous for HLA-A*1101 had decreased risks for NPC (OR = 0.24, 95% CI = 0.13 to 0.46). The extended haplotype HLA-A*3303-B*5801/2- DRB1*0301-DQB1*0201/2-DPB1*0401, specific to this ethnic group, was associated with a statistically significantly increased risk for NPC (OR = 2.6, 95% CI = 1.1 to 6.4). CONCLUSIONS: The restriction of the association of HLA-A2 with NPC to HLA-A*0207 probably explains previously observed associations of HLA-A2 with NPC among Chinese but not Caucasians. The extended haplotypes associated with NPC might, in part, explain the higher rates of NPC in this ethnic group.

5. Hill WG. Estimation of linkage disequilibrium in randomly mating populations. Heredity 1974;33:229-39.

6. Rieder MJ, Taylor SL, Clark AG, Nickerson DA. Sequence variation in the human angiotensin converting enzyme. Nat.Genet. 1999;22:59-62.
Abstract: Angiotensin converting enzyme (encoded by the gene DCP1, also known as ACE) catalyses the conversion of angiotensin I to the physiologically active peptide angiotensin II, which controls fluid-electrolyte balance and systemic blood pressure. Because of its key function in the renin-angiotensin system, many association studies have been performed with DCP1. Nearly all studies have associated the presence (insertion, I) or absence (deletion, D) of a 287-bp Alu repeat element in intron 16 with the levels of circulating enzyme or cardiovascular pathophysiologies. Many epidemiological studies suggest that the DCP1*D allele confers increased susceptibility to cardiovascular disease; however, other reports have found no such association or even a beneficial effect. We present here the complete genomic sequence of DCP1 from 11 individuals, representing the longest contiguous scan (24 kb) for sequence variation in human DNA. We identified 78 varying sites in 22 chromosomes that resolved into 13 distinct haplotypes. Of the variant sites, 17 were in absolute linkage disequilibrium with the commonly typed Alu insertion/deletion polymorphism, producing two distinct and distantly related clades. We also identified a major subdivision in the Alu deletion clade that enables further analysis of the traits associated with this gene. The diversity uncovered in DCP1 is comparable to that described for other regions in the human genome. The highly correlated structure in DCP1 raises important issues for the determination of functional DNA variants within genes and genetic studies in humans based on marker association


7. Satagopan JM, Verbel DA, Venkatraman ES, Offit KE, Begg CB. Two-stage designs for gene-disease association studies. Biometrics 2002;58:163-70.

Abstract: The goal of this article is to describe a two-stage design that maximizes the power to detect gene-disease associations when the principal design constraint is the total cost, represented by the total number of gene evaluations rather than the total number of individuals. In the first stage, all genes of interest are evaluated on a subset of individuals. The most promising genes are then evaluated on additional subjects in the second stage. This will eliminate wastage of resources on genes unlikely to be associated with disease based on the results of the first stage. We consider the case where the genes are correlated and the case where the genes are independent. Using simulation results, it is shown that, as a general guideline when the genes are independent or when the correlation is small, utilizing 75% of the resources in stage 1 to screen all the markers and evaluating the most promising 10% of the markers with the remaining resources provides near-optimal power for a broad range of parametric configurations. This translates to screening all the markers on approximately one quarter of the required sample size in stage.



NIH IdentifierDepartment of Health and Human Services Identifier