Background | Test for Interaction | Mantel-Haenszel Methods
When Interaction and Confounding Are Minimal | Strategy for Analysis | Exercises
Epi Info 7.2
This chapter considers the analysis of a binary outcome (disease D) and binary exposure (exposure E) with data stratified according to an extraneous cofactor (cofactor C). Two phenomena -- confounding and statistical interaction -- are considered.
Confounding
Jul 22, 2019 Epi Info 7.2.2.6, Epi Info: Conduct professional public health-related surveys and analyze data in a very time-efficient manner with the help of this comprehensive and useful application. The Spanish translation database for Epi Info ™ 7.2 does not cover StatCalc Menu or any of the prompts, labels, captions, or titles on the individual StatCalc calculators. Attempts to use the Create Translation File button on the Language tab of the Options dialog results in the following error. Epi Info is a free set of software tools for public health practitioners and researchers across the globe. Epi Info is available for Windows, Mobile, Web & Cloud. This site provides Downloads, Support and Resources, a User Guide, Tutorials, FAQs, Help Desk, and User Community Q&A. Epi Info is used for outbreak investigations; disease. Epi Info 7.2.4.0 add to watchlist send us an update. 22 screenshots: runs on: Windows 10 32/64 bit Windows 8 32/64 bit Windows 7 Windows Vista Windows XP file size: 23.8 MB filename. Epi Info version 7.2.4 allows users to code free-text industry and occupation data to Census standardized industry and occupation codes. During data collection, you can use Epi Info to collect AND code industry and occupation information. The Epi Info software can be downloaded from the Epi Info website.
Confounding (from the Latin confundere: to mix together) is a distortion of an association between E and D brought about by a cofactor C. Confounding occurs when E is associated with C and C is an independent risk factor for D. In addition, C is not intermediate in the causal pathway.
For example, smoking (C) confounds the relation between alcohol consumption (E) and lung cancer (D) because alcohol user are more likely to smoke than non-users. Thus, the effects of smoking get mixed-in with the effect of alcohol consumption--smoking confounds the association between alcohol consumption and lung cancer.
One way to address confounding is to subset data into relatively homogenous subgroups ('strata') according to the confounding cofactor. Not surprisingly, data can show one thing in aggregate form and another once disaggregated.
Measures of association in the aggregate are called crude measures of association since relations are unadjusted. Let us precede symbols for measures of association with a c when referring to crude measures of association. For example, cRR will represent the crude risk ratio (i.e., the risk ratio based on all data combined in single 2-by-2 table).
Subscripts will denote strata-specific measures of association. For example, RR1 will represent the risk ratio in stratum 1, RR2 will represent the risk ratio in stratum 2, and so on.
Suppose, in the aggregate, we see the following crude data:
D+ D-
E+ 200 800 1000
E- 50 950 1000
250 1750 2000
Therefore, p1 = 200 / 1000 = .20, p2 = 50 / 1000 = .05, and cRR = .20 / .05 = 4.0.
Now suppose we stratify by confounding factor C. In strata 1 (positive for C) and find:
D+ D-
E+ 194 606 800
E- 24 76 100
218 682 900
In this strata, p1,1 = 194 / 800 = .2425, p2,1 = 24 / 100 = .24, and RR2 = .2425 / .24 @ 1.0.
In strata 2 (negative for factor C) we find:
D+ D-
E+ 6 194 200
E- 26 874 900
32 1068 1100
In this strata, p1,2 = 6 / 200 = .03, p2,2 = 26 / 900 = .0288, and RR2 = .03 / .0288 @ 1.0.
Therefore, the strong positive association seen in the aggregate disappears in the subgroups. This proves C confounded the association between E and D in the aggregate.
Statistical Interaction (Effect-Measure Heterogeneity)
The term 'interaction' has two distinct meanings in epidemiology. Biological interaction is the interdependent operation of two or more factors in a cause. There is always biological interaction in epidemiologic data. Statistical interaction is when the statistical model being used does notexplain the joint effects of two or more independent variables. Biological interaction and statistical interaction are two distinct phenomena that should not be confused. Here, we consider statistical interaction only.
Statistical interaction is synonymous with effect-measure heterogeneity. In epidemiology, this occurs when the value for the effect-measures being used (e.g., risk ratio) is differs in different subgroups. A numerical example will serve to illuminate.
Once again we may start with the crude (unstratified) data:
D+ D-
E+ 200 800 1000
E- 50 950 1000
250 1750 2000
Again, p1 = 200 / 1000 = .20, p2 = 50 / 1000 = .05, and cRR = .20 / .05 = 4.0.
Suppose, on stratification, we find:
Stratum 1 (negative for C)
D+ D-
E+ 12 188 200
E- 48 752 800
60 940 1000
Therefore, p1,1 = 12 / 200 = .06, p2,1 = 48 / 800 = .06, and RR1 = .06 / .06 = 1.0.
Stratum 2 (positive for C)
D+ D-
E+ 188 612 800
E- 2 198 200
190 810 1000
Therefore, p1,2 = 188 / 800 = .2350, p2,2 = 2 / 200 = .01, and RR2 = .235 / .01 = 23.5.
Because the risk ratio is heterogeneous in the two strata, we say there is a statistical interaction between E and C as relates to D.
The above demonstrations suggest a strategy for dealing with extraneous factors. In essence, data are explored through stratification.
Illustrative Data Set (SEXBIAS.REC)
To illustrate methods in this chapter, let us consider a data set that demonstrates both interaction and confounding. Data were collected as part of a University of California at Berkeley study to assess whether men were being given preferential treatment over women in admission to graduate programs (Bickel & O'Connell, 1975, Freedman et al., 1991, pp. 16 - 19). Assuming that the men and women who applied for admission to the graduate programs were equally well-qualified, one would expect equal acceptance rates by gender. However, it initially appeared as if men were being admitted in greater proportions than women. Hence, the investigation.
The experience of applicants to the six largest majors at the school is stored in SEXBIAS.ZIP. This data set contains 4526 records and the following variables:Variable | Type | Len | Description |
MAJOR | Alpha | 9 | Department major: A, B, C, D, E, and F |
SEX | Alpha | 9 | 1 = Male 2 = Female |
ACCEPT | Yes/no | 1 | Application accepted: +/- |
Crude analysis (TABLES SEX ACCEPT) derives:
ACCEPT
SEX + - | Total
-----------+---------------+------
1 | 1198 1493 | 2691 Acceptance rate, men = 1198 /2691 = 0.445
2 | 557 1278 | 1835 Acceptance rate, women = 557 / 1835 = 0.304
-----------+---------------+------ RR = 0.445 / 0.304 = 1.46
Total | 1755 2771 | 4526 P < 0.00001
Therefore, men appear to have a higher acceptance rate than women ( supporting evidence of preferential treatment). However, what if men had applied to majors with more favorable acceptance rates than women? Then the cofactor of MAJOR would confound the observed relation. To investigate this possibility, data are stratified by MAJOR.
Stratification
Table stratification is accomplished with the command:
EPI6> TABLE <E> <D> <C>
For the illustrative example, the following command is issued:
EPI6> TABLES SEX ACCEPT MAJOR
This produces separate tables for each of the 6 majors. Annotated output is shown below:
MAJOR =A
ACCEPT
SEX | + - | Total
------------------------------
1 | 512 313 | 825 Acceptance rate, men = 512 / 825 = 0.621
2 | 89 19 | 108 Acceptance rate, women = 89 / 108 = 0.824
-----------+-------------+------ RR = 0.621 / 0.824 = 0.75
Total | 601 332 | 933 p = 0.000033
MAJOR =B
ACCEPT
SEX | + - | Total
-----------+-------------+------
1 | 353 207 | 560 Acceptance rate, men = 353 / 560 = 0.630
2 | 17 8 | 25 Acceptance rate, women = 17 / 25 = 0.680
-----------+-------------+------ RR = 0.630 / 0.680 = 0.93
Total | 370 215 | 585 p = 0.61
MAJOR =C
ACCEPT
SEX | + - | Total
-----------+-------------+------
1 | 120 205 | 325 Acceptance rate, men = 120 / 325 = 0.369
2 | 202 391 | 593 Acceptance rate, women = 202 / 593 = 0.341
-----------+-------------+------ RR = 0.369 / 0.341 = 1.08
Total | 322 596 | 918 p = 0.39
Epi Info 7.2.2.2
MAJOR =D
ACCEPT
SEX | + - | Total
-----------+-------------+------
1 | 138 279 | 417 Acceptance rate, men = 138 / 417 = 0.331
2 | 131 244 | 375 Acceptance rate, women = 131 / 375 = 0.349
-----------+-------------+------ RR = 0.331 / 0.349 = 0.95
Total | 269 523 | 792 p = 0.59
MAJOR =E
ACCEPT
SEX | + - | Total
-----------+-------------+------
1 | 53 138 | 191 Acceptance rate, men = 53 / 191 = 0.277
2 | 94 299 | 393 Acceptance rate, women = 94 / 393 = 0.239
-----------+-------------+------ RR = 0.277 / 0.239 = 1.16
Total | 147 437 | 584 p = 0.32
MAJOR =F
ACCEPT
SEX | + - | Total
--------------------------------
1 | 22 351 | 373 Acceptance rate, men = 22 / 373 = 0.059
2 | 24 317 | 341 Acceptance rate, women = 24 / 341 = 0.070
-----------+-------------+------ RR = 0.059 / 0.070 = 0.84
Total | 46 668 | 714 p = 0.54
Therefore, only Major A demonstrates a significant difference in acceptance rates by sex -- and this in favor of women by a small margin. Notice that the initial crude analysis hid this pattern (a.k.a., Simpson's paradox). It is now evident that application to specific MAJORs confounds the study of SEX and ACCEPTance rates and there is an interaction between SEX and MAJOR.
A chi-square test for interaction may be used to help whether effect-measure heterogeneity is present. Because this test applies to both risk ratios and odds ratios (and other measures of association), let MA refer to the measure of association parameter being studied. The null and alternative hypotheses are:
H0: MA1 = MA2 = . . . = MAS (no interaction)
H1: at least one of the strata-specific measure of association differs (interaction)
The method of calculating the chi-square interaction statistic in Epi Info is unspecified, but it is assumed to be a general Wald statistic (see Epidemiology Kept Simple Formula 15.1). Under the null hypothesis, this chi-squared interaction statistic has S - 1 degrees of freedom, where S represents the number of strata being tested.
Illustrative example. In SEXBIAS.REC we test H0: RR1 = RR2 = RR3 = RR4 = RR5 = RR6. Results, printed in the summary section of the stratified output, are:
Chi Square for evaluation of interaction 18.10
P value 0.00282859
Since there are 6 strata, df = 5. This along with the divergent incidence (risk) ratio in strata 1 suggests that statistical interaction is present.
It is often advantageous to summarize the relation being studied with a single, unconfounded measure of association and tests. This can be accomplished by pooling unconfounded strata-specific measures of association to form a summary measure of association.
Summary Measure of Association
The Mantel-Haenszel method of pooling calculated as weighted average of strata-specific estimates with weights proportional to N1*N2/N, where N represents the total number of people in the strata (Cochran 1954; Mantel & Haenszel 1959). This assumes the measures of association are uniform among strata. This homogeneity assumption allows us to combine strata-specific measures of association to form a single summary measure that has been adjusted for confounding. Any non-uniformity will be suppressed nonuniformity through summarization. The pooled measure of association may be viewed as a statistical convenience whose purpose is to draw correct conclusions about the effect of the exposure.
Illustrative Example (SEXBIAS.REC). By suppressing the non-uniformity of the incidence (risk) ratios in SEXBIAS.REC, we find:
SUMMARY RISK RATIO (RR)
Crude RR without stratification 1.47
Summary RR of (ACCEPT=+) for (SEX=1) 0.94
95% confidence limits for RR 0.87 < RR < 1.03
Comments:
(1) The crude RR estimate of 1.47 indicates higher acceptance for men, whereas the summary estimate of 0.94 indicates slightly higher acceptance rates in women. Thus, the summary RR is an unconfounded estimate of the effect of gender on acceptance to graduate school at UC Berkeley.
(2) The 95% confidence interval for the summary RR is calculated using the method in Robins et al., 1986.
Mantel-Haenszel Summary Test Statistic
A test of H0: aMA = 1 (where aMA represents the parameter for the Mantel-Haenszel adjusted measure of association) is performed with a Mantel-Haenszel chi-square statistic. Under the null hypothesis, this test statistic has a chi-square sampling distribution with 1 degree of freedom.
Illustrative Example (SEXBIAS.REC). The null hypothesis H0: aRR = 1 is tested with a Mantel-Haenszel summary chi-square statistic. The Mantel-Haenszel test statistics for SEXBIAS.REC are:
Telecharger Epi Info 7.2 Francais Gratuit
** Summary of 6 Tables With Non-Zero margins **
N = 4526
Mantel-Haenszel Summary Chi Square 1.43
P value 0.23226346
Comment: The p value of .23 fails to provide evidence against H0. We conclude no significant difference in acceptance rates by gender.
In the absence of interaction and confounding, stratification and adjustments are unnecessary. In such instances, crude measures of association offer the benefit of better precision (compared with M-H summary measures of association).
Illustrative Example. Data from a case-control study of esophageal cancer and tobacco consumption (Breslow & Day, 1980; Tuyns, 1977) are available in BD1NEW.ZIP. We are interested in the relation between tobacco consumption (TOBHIGH: 1 = 20+ g/day, 2 = less than 20 g/day) and esophageal cancer (CASE: 1 = case, 2 = control) while considering the possible confounding or effect-measure modifying effects of alcohol consumption (ALCHIGH: 1 = 80+ g /day, 2 = < 80 g/day). The following commands are issued to analyze the data:
EPI6> READ BDNEW
EPI6> TABLES TOBHIGH CASE ALCHIGH
Key output includes:
Stratum 1 (ALCHIGH = 1)
CASE
TOBHIGH | 1 2 | Total
-----------+-------------+------
1 | 30 23 | 53
2 | 66 86 | 152
-----------+-------------+------
Total | 96 109 | 205
Single Table Analysis Stratum 1 Odds ratio = 1.70
Stratum 2 (ALCHIGH = 2)
CASE
TOBHIGH | 1 2 | Total
-----------+-------------+------
1 | 34 127 | 161
2 | 70 539 | 609
-----------+-------------+------
Total | 104 666 | 770
Single Table Analysis Stratum 2 Odds ratio = 2.06
Thus, the strata-specific odds ratios are 1.70 and 2.06, respectively. We might now ask if it makes sense to summarize these two odds ratio with a single summary statistic. The chi-square interaction statistic (H0: OR1 = OR2) is helpful in this regard. Epi Info prints this information in the area labeled 'Summary Odds Ratio':
Chi Square for evaluation of interaction 0.24
P value 0.62621898
In this instance df = 2 - 1 (not shown by Epi Info) and c²int = 0.24, p = 0.63. This supports an assumption that differences in strata-specific odds ratios may be random (no statistical interaction).
The crude odds ratio and M-H summary odds ratio also listed in the area labeled 'Summary Odds Ratio':
SUMMARY ODDS RATIO
Crude OR 1.96
Mantel-Haenszel weighted Odds ratio 1.92
We also note that the crude odds ratio and Mantel-Haenszel weighted odds are similar. Therefore, it is reasonable to report the crude odds ratio. To get the confidence interval and p value for the crude odds ratio issue the command. For example,
EPI6> TABLES TOBHIGH CASE
Output is:
TOBHIGH | 1 2 | Total
-----------+-------------+------
1 | 64 150 | 214
2 | 136 625 | 761
-----------+-------------+------
Total | 200 775 | 975
Odds ratio 1.96
Cornfield 95% confidence limits for OR 1.36 < OR < 2.82
Although the detection and control of confounding is crucially important in epidemiologic research, there exists no single way for dealing with this problem. Nevertheless, epidemiologists agree that potential confounders must be identified before data are collected so that data on these factors can be collected to allow further evaluation. So how does one know what variables might confound an analysis? Briefly, this information comes from an understanding of the systems being investigated, and is based on previous research, clinical insight, and understanding of the processes being studied. It is essential that the investigator 'does their homework,' researching all potential confounders, before collecting data. With this said, a couple of rules-of-thumb are presented:
(1) Adjustments for confounding are contraindicated when interaction is present, as such summary measures of association would obscure important modifications of effect.
(2) Since confounding is a matter of systematic error (not random error), hypothesis tests should not be used in the detection of confounding.
(3) A pragmatic strategy for calculating good measures of association suggests:
- Before the study is begun, the investigator attempts to understand the complex causal interrelations among the exposure, disease, and various other factors. This may require lots of homework on the part of the investigation, as well as close collaboration with
subject matter specialists. - Measurements and coding for E, D, and C1, C2, ..., Ck must be valid based on understanding of phenomena.
- The research question must be defined in an insightful way. 'Finding the question is often more important than finding the answer' (Tukey, 1980).
- Study design are based on choices that maximize the likelihood of delineating causal relations.
- After data are collected, entered and cleaned, the analyst explores inter-relations, starting with simple comparisons and descriptions. Identified relationships between E and C and C and D heighten the awareness of the potential for confounding.
- Data are stratified and explored for interaction. (The above test for interaction may be applied.) When interaction is confirmed, strata-specific estimates are reported.
- The continued consultation with a subject matter specialist may be necessary before a decision is made whether or not to control for potential confounder C.
- In the absence of interaction and confounding, crude (unadjusted) estimates of association may be reported.
- The best estimate of association is both valid and precise. If interaction is present, strata-specific measures of association are reported. If interaction is absent but confounding is present, summary (adjusted) measures of association are reported. If neither interaction nor confounding are present, crude (unadjusted) measures of association are reported.
- In practice, there will always be uncertainty about whether a given set of variables are or are not confounders. 'Science DOES NOT BEGIN WITH A TIDY QUESTION. Nor does it end with a tidy answer' (Tukey, 1980).
Epi Info 7.2 Free Download
(1) GENERIC.ZIP: Simpson's Paradox (Hypothetical Data). This exercise illustrates Simpson's Paradox while applying a strategy for the detecting and accounting for confounding and interaction. Three case-control data sets are presented: GENERIC1.REC, GENERIC2.REC, and GENERIC3.REC. Each data set contains the variables E (exposure), D (disease), and C (potential confounder). For each data set determine if interaction is present. If interaction is present, stop there and report strata-specific odds ratios and other relevant case-control statistics. If interaction is absent, assess the potential for confounding. Summarize your assessment. If confounding is present, report an adjusted odds ratio and associated case-control statistics. If interaction and confounding are absent, report the crude (unadjusted) case-control statistics.
(2) BD2.ZIP:Breslow & Day 2: The Oxford Childhood Cancer Survey (Breslow & Day, 1980, p. 238; Kneale, 1971; Steward & Kneale, 1970). Data are from a case-control study of childhood leukemia and lymphoid tumors and in utero X-ray exposure (Kneale et al., 1971). The primary variables of interest are CASE (1 = case, 2 = control), XRAY (1 = exposed, 2 = unexposed). The potential confounder is AGE (years). Analyze these data and report the 'best' odds ratio estimate and a 95% confidence interval for the parameter. Summarize your results in narrative form.
(3) BI-HELM1.ZIP:Bicycle Helmet Use in Two Northern California Counties (Perales et al., 1994). This data set contains information on bicycle helmet use in Santa Clara County and Contra Costa County -- two counties in northern California (U.S.A.). Data definitions are included in a data documentation file in the ZIP archive (bi-helm1-dd.htm), which can be downloaded by clicking on the highlighted text, above. Review this data documentation file and then perform the following analyses.
(A) Determine crude incidences of helmet use in Santa Clara County (p1) and Contra Costa County (p2). (The easiest way to derive these statistics is to use a two-variable tables command TABLES COUNTY HELMETUSE ). Test whether these proportions differ, and summarize your results.
(B) Stratify the data on the matching variable (TABLES COUNTY HELMETUSE MATCHVAR). Stratify the data based on the socioeconomic matching variable MATCHVAR. Report strata-specific helmet use rates by school and test whether within-strata rates differ significantly. Summarize your results narratively.
(C) Test the incidence (risk) ratio parameter for interaction Be explicit in listing the null and alternative hypotheses. Report all relevant test statistics and state your conclusion.
(D) Discuss your findings. In so doing, consider the potential for interaction and confounding. Which schools show higher helmet-use rates compared with their matched counterpart? etc.
(4) CERVICAL: Cervical Cancer and Smoking (Nischan et al., 1988; Pagano & Gauvreau, 1993, p. 359). Data from a case-control study of cervical cancer and smoking are shown below.
Case | Control | |
Smoke + | 108 | 163 |
Smoke - | 117 | 268 |
(A) Based on these data calculate the odds ratio of smoking for cervical cancer.
(B) Data stratified by number of sexual partners are shown below. Calculate stratum specific odds ratios.
Stratum 1: Zero or One Partner | ||
Case | Control | |
Smoke + | 12 | 21 |
Smoke - | 25 | 118 |
Stratum 2: Two or More Partners | ||
Case | Control | |
Smoke + | 96 | 142 |
Smoke - | 92 | 150 |
(C) Based on these exploratory analyses, would you say there is interaction? Justify your response. How would you report your results?
(5) ASBESTOS.ZIP: Asbestos Exposure and Lung Cancer (Hypothetical data). Data are from an case-control study of lung cancer and asbestos exposure. The data set includes information on smoking (SMOKE: + / -), asbestos exposure (ASBESTOS: + / -), and lung cancer (LUNGCA: + / -)
(A) Calculate the odds ratio of lung cancer associated with smoking. Include a 95% confidence interval, and interpret your findings.
(B) Calculate the odds ratio of lung cancer associated with asbestos exposure. Include a 95% confidence interval and interpret your findings.
(C) An investigator thinks it would be interesting to sort out the inter-relationship between asbestos, smoking, and lung cancer by looking at the lung cancer risk associated with asbestos in smokers and non-smokers separately. Perform such a stratified analysis. In so doing, report strata-specific odds ratios. Perform a test for interaction. (Include all hypothesis testing steps.) Is interaction present? Calculate and report the summary (adjusted) odds ratio. Is confounding evident? Is confounding present? Would it make sense to report the adjusted odds ratio in light of your findings about interaction? How would you report your results? Report your final results.
September 13, Content source: Ses coordonnées sont fournies ci-dessous. Epi Info 7 , français , traduction , téléchargement. Une nouvelle mise à jour d’Epi Info 7 est disponible depuis fin mars , c’est la version 7. Elisabete Estorilio — Medica Sanitarista Courriel:
Nom: | epi info 7.2 |
Format: | Fichier D’archive |
Système d’exploitation: | Windows, Mac, Android, iOS |
Licence: | Usage Personnel Seulement |
Taille: | 53.34 MBytes |
Elle est récupérable depuis la page « Téléchargement d’Epi Info » de mon nouveau site consacré à Epi Info en français: Si vous avez des questions au sujet de la traduction, veuillez contacter directement le traducteur. November 16, Page last updated: Chaque fichier contient une base de données compactée Langauge. Il est réalisé par David Moreau, Statisticien Epidémiologiste. Posté par davstat à Cette traduction est téléchargeable sur mon nouveau siote dédié à Epi Info en français à l’adresse:.
.