The Pitfalls of Observational Studies – Watts Up With That?

0
3


If you study data on poverty and crime, it’s very easy to observe that neighborhoods with high poverty rates often have high crime rates. But what’s not so easy to discern is the causes of this relationship.

Does poverty cause crime? Or does crime cause poverty? Or do they feed off one other in a vicious cycle? Or does another factor or a combination of other factors cause both poverty and crime?

In this lesson from Just Facts Academy about Observational Studies, you’re going to learn how to mop up such messy matters.

Observational studies are those in which the investigators don’t intervene but only measure.[1] [2] In other words, these are studies of things in the wild, whether they be people, grasshoppers, economies, or climates.

This differs from lab studies, which are conducted in controlled environments. It also differs from gold-standard randomized controlled trials where investigators randomly assign treatments or placebos to the subjects of the studies.[3] [4]

Observational studies are widely published by academic journals and often amplified by the media,[5] [6] [7] but they have a major weakness: they can seldom measure what people really want to know: causes and effects.[8] [9] [10] [11] [12] [13] [14] [15]

That’s because observing variables like crime and poverty in uncontrolled settings can only reveal associations, and in the words of an academic textbook about analyzing data:

Association is not the same as causation. This issue is a persistent problem in empirical analysis in the social sciences. … Because we so often confuse association and causation, it is extremely easy to be convinced that a tight relationship between two variables means that one is causing the other. This is simply not true.[16] [17] [18] [19] [20]

This fact is commonly taught in high schools, and knowing it can prevent you from being duped.[21] Yet, it is often ignored by commentators,[22] journalists,[23] government agencies,[24] businesspeople,[25] and even PhD scholars who should certainly know better.[26]

Let’s look at a real-world case.

In 2014, an academic journal published an observational study of opioid overdose deaths in states that had legalized medical marijuana and those that had not. It found that opioid deaths rose in both types of states but rose more slowly in states that has legalized cannabis for medical purposes.[27]

Deep within in their paper, the authors of the study warned that it only measures associations and cannot prove causation.[28] Of course, that didn’t stop weed activists, journalists, and even scholars from touting the study as evidence that medical marijuana reduces opioid deaths.[29] [30] [31] [32] [33] [34]

Highlighting the folly of this cause-and-effect claim based on an observational study, another academic journal published a study in 2019 that extended the timeframe of the original study by seven years, and do you know what it found? The exact opposite result: opioid overdose deaths rose more quickly in states that had legalized medical marijuana.[35]

In this case, however, the authors of the study did the right thing and made it crystal clear that the study merely showed an “observed association” that is “likely spurious” and is probably caused by “unmeasured variables.”[36]

Underline that phrase “unmeasured variables,” because it’s a core weakness of observational studies. See, no matter how many variables an observational study measures, there’s usually a great risk that unmeasured variables affected the outcome.[37] [38] [39]

In the case of the cannabis legalization and opioid death studies, the unmeasured variables include socioeconomic factors and government policies specified by the authors and countless others they didn’t mention.[40] [41] [42]

To address the challenge of unmeasured variables, people who conduct observational studies often measure multiple variables and use a statistical technique called “regression” to “control” for them.[43] [44]

For instance, an observational study of a drug taken by heart failure patients used a regression to control for their ages, races, incomes, education levels, and reams of factors related to their health.[45]

Even so, it’s generally impossible to rule out the possibility that other variables are at play. This is called “omitted variable bias,” [46] [47] [48] [49] [50] [51] [52] or “residual confounding,”[53] or as one scholarly paper dubbed it, the “Phantom Menace.”[54]

This menace is so ubiquitous that it even haunts “animal observational studies that were rigorously controlled and executed beyond what is achieved in studies of humans.” In the words of the 10 scholars who conducted these studies, “observational studies alone, no matter how well done, cannot support conclusions of causation,” especially when it comes to “nutrition research.”[55] [56]

And an Oxford University Press textbook on criminology says virtually the same thing about crime research,[57] as does a paper about violence against women.[58]

And the authors of a paper in the European Heart Journal came to the same conclusion after comparing the results of 119 studies of drugs taken by people suffering from heart failure.[59]

Get the picture? This is a universal principle.

And on top of omitted variable bias, there’s also the problem of included variable bias, where the authors of observational studies control for variables in ways that obscure the realities that they claim to measure.[60] [61]

This is like trying to study the frequency of rainfall and then controlling for wet sidewalks. The ultimate effect is to hide vital information from anyone who doesn’t dig into the study. See the footnotes of this video’s transcript for some crafty examples of this.[62] [63] [64]

It gets worse because observational studies often use statistical methods that are highly subjective.[65] [66] [67]

Nothing drives this point home like a study of 29 teams of analysts who were given the very same dataset to answer a social science question. Using regressions and other statistical strategies, these 29 teams produced a vast range of point estimates and margins of error.[68]

In the words of the study’s authors, “researchers” often have “little appreciation” for “the fact that” their “results may depend on the chosen analytic strategy, which itself is imbued with theory, assumptions, and choice points.” The authors also note that the peer review process and post-publication critiques “rarely” fix these problems,[69] contrary to mantra that “science is self-correcting.”

So, should we just throw out all observational studies? That would be an epic mistake because they can illuminate paths for future research, rule out theories, and estimate effects if the evidence is overwhelming.[70] [71] [72]

Bottom line: we need to learn to use them, not abuse them. Here’s how we do that:

1) Never forget that observational studies can rarely determine causes or effects, despite what many politicians, journalists, activists, and scholars will tell you. This applies regardless of how many observational studies come to the same conclusion. Per the medical textbook Principles and Practice of Clinical Research:

While consistency in the findings of a large number of observational studies can lead to the belief that the associations are causal, this belief is a fallacy.”[73]

2) Don’t be fooled into thinking that observational studies with lots of control variables must be accurate. Even if a study controls for hundreds of variables, there’s others it likely missed.

3) Be on the lookout for included variable bias, or when the authors of observational studies control for variables in ways that conceal what their studies purport to measure.

4) Be aware that an observational study is hopelessly confounded when an association seemingly appears before an intervention has time to take effect—like, for example, a vaccine that appears to prevent long Covid before it even has a chance to prevent Covid.[74] [75] [76] [77]

5) Know that there’s a good possibility an observational study has identified a cause if it finds a dose-dependent relationship.[78] [79] The association between smoking and lung cancer is a prime example—the more cigarettes people smoke, the more they get lung cancer.[80] [81] But even so, observational studies can’t accurately quantify the harms of smoking because of omitted variable bias.[82]

6) Watch out for reverse causation. Again, observational studies can’t prove whether crime causes poverty, or poverty causes crime, or other factors cause them both.[83] [84]

7) Don’t take any of this to absurd extremes. Observations are fine for determining that jumping out of a plane without a parachute isn’t great for your health. However, this kind of certainty is a rare exception, not the rule.[85]

Now that you know the common pitfalls of observational studies and are hyperaware that association is not causation, don’t be a dummy for people who aren’t as informed as you.

Instead, put observational studies in their proper place and keep it locked to Just Facts Academy, so you can research like a genius.



Endnotes

[1] Book: International Encyclopedia of Public Health (2nd edition). Edited by Stella R. Quah and William C. Cockerham. Elsevier, 2017. https://www.sciencedirect.com/referencework/9780128037089/international-encyclopedia-of-public-health

Chapter: “Observational Epidemiology.” By Jennifer L. Kelsey and Ellen B. Gold. Pages 295–307. https://www.sciencedirect.com/science/article/abs/pii/B9780128036785003106

In an observational epidemiologic study, an investigator observes what is occurring in a study population without intervening. Observational studies may be descriptive or analytic. Examples of analytic studies include case-control, cohort, cross-sectional, and ecologic studies, as well as hybrid designs. Measures of association often estimated from observational studies include relative risks, hazard ratios, odds ratios, standardized mortality (or incidence) ratios, and attributable fractions. Bias from inadequate measurement, suboptimal selection of study participants, and uncontrolled confounding is often of concern in observational studies.

[2] Book: Equine Internal Medicine (4th edition). Edited by Stephen M. Reed, Warwick M. Bayly, and Debra C. Sellon. Elsevier, 2018. https://www.sciencedirect.com/book/9780323443296/equine-internal-medicine

Chapter: “Clinical Epidemiology and Evidence-Based Medicine.” By Kenneth W. Hinchcliff. Pages 218–231. https://www.sciencedirect.com/science/article/abs/pii/B9780323443296000061

Observational studies are those in which there is no intervention by the investigators. Data are obtained by simply observing and recording events without any attempt to impose interventions that could alter the course of events. There is no experimental aspect to observational studies in that there is no testing of a hypothesis. These studies are solely based on observing details about a case or series of cases, or groups of animals, and summarizing associations among variables.

[3] Paper: “Randomized Controlled Trials.” Deutsches Ärzteblatt International (The German Medical Association’s Official International Bilingual Science Journal). By Maria Kabisch and others. September 30, 2011. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3196997/

In RCTs the patients are randomly assigned to the different study groups. This is intended to ensure that all potential confounding factors are divided equally among the groups that will later be compared (structural equivalence). These factors are characteristics that may affect the patients’ response to treatment, e.g., weight, age, and sex. Only if the groups are structurally equivalent can any differences in the results be attributed to a treatment effect rather than the influence of confounders.

[4] Book: Rutherford’s Vascular Surgery (8th edition, Volume 1). Edited by Jack L. Cronenwet and K. Wayne Johnston. Elsevier Saunders, 2014.

Chapter 1: “Epidemiology and Clinical Analysis.” By Loius L. Nguyen and Ann DeBord Smith.” Pages 2–14.

Book: Rutherford’s Vascular Surgery (8th edition, Volume 1). Edited by Jack L. Cronenwet and K. Wayne Johnston. Elsevier Saunders, 2014.

Chapter 1: “Epidemiology and Clinical Analysis.” By Loius L. Nguyen and Ann DeBord Smith.” Pages 2–14.

Clinical research can be broadly divided into observational studies and experimental studies. Observational studies are characterized by the absence of a study-directed intervention, whereas experimental studies involve testing a treatment, be it a drug, a device, or a clinical pathway. Observational studies can follow ongoing treatments but cannot influence choices made in the treatment of a patient. Observational studies can be executed in a prospective or retrospective fashion, whereas experimental studies can be performed only prospectively. …

Experimental studies differ from observational studies in that the former expose patients to a treatment being tested. Many experimental trials involve randomization of patients to the treatment group or appropriate control group. Although randomization ensures that known factors are evenly distributed between the exposure and control groups, the importance of RCTs lies in the even distribution of unknown factors. Thus, a well-designed RCT will result in more simplified endpoint analyses because complex statistical models are not necessary to control for confounding factors.

[5] Paper: “Media Coverage, Journal Press Releases and Editorials Associated with Randomized and Observational Studies in High-Impact Medical Journals: A Cohort Study.” By Michael T. M. Wang and others. PLoS One, December 23, 2015. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0145294

In contrast, observational studies can generate hypotheses but not reliably test them [2]. However, observational research is conducted more frequently than randomized studies, and both types of research are published in prominent medical journals. Thus, both types of research potentially influence health beliefs and behaviours.

Publication of clinical research findings in prominent journals influences health beliefs and medical practice, in part by engendering news coverage. Randomized controlled trials (RCTs) should be most influential in guiding clinical practice. …

We specifically assessed media coverage of the most rigorous RCTs, those with >1000 participants that reported ‘hard’ outcomes. There was no difference between RCTs and observational studies in coverage by major newspapers or news agencies, or in total number of news stories generated (all P>0.63). Large RCTs reporting ‘hard’ outcomes did not generate more news coverage than small RCTs that reported surrogate outcomes and observational studies (all P>0.32). …

Large RCTs that report “hard” disease outcomes should be very influential on practice but, in comparison to studies with less rigorous design or importance, they were not more frequently accompanied by editorials or journal press releases, nor did they generate more media coverage.

[6] Commentary: “Observational Studies, Bad Science, and the Media.” By Steven E. Nissen, MD. American College of Cardiology, May 25, 2012. https://www.acc.org/Latest-in-Cardiology/Articles/2012/05/25/12/44/Observational-Studies

In a world of instant news delivered via the internet, poor-quality medical news stories now dominate the airwaves and print media. CNN actually devoted an entire program to a wacky retired physician who claimed that his diet could make patients “heart attack-proof.” The public hears these dramatic news reports and worse yet, actually believes them. …

There is a sad reality here, a problem well known to most clinical trialists—observational studies are usually unreliable. Nevertheless, the reporting of such studies in the media can lead to mass hysteria or promote a rush to judgment about unproven therapies.

[7] Paper: “Media Coverage of Medical Journals: Do the Best Articles Make the News?” By Senthil Selvaraj, Durga S. Borkar, and Vinay Prasad. PLoS One, January 17, 2014. https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0085355

We compared study characteristics of 75 clinically-oriented journal articles that received coverage in the top five newspapers by circulation against 75 clinically-oriented journal articles that appeared in the top five medical journals by impact factor over a similar timespan. ….

Investigations receiving coverage from newspapers were less likely to be RCTs (17% vs. 35%, p = 0.016) and more likely to be observational studies (75% vs. 47%, p<0.001). …

Newspapers were more likely to cover observational studies and less likely to cover RCTs than high impact journals. Additionally, when the media does cover observational studies, they select articles of inferior quality. Newspapers preferentially cover medical research with weaker methodology.

[8] Paper: “Observational Research Rigour Alone Does Not Justify Causal Inference.” By Keisuke Ejima and others. European Journal of Clinical Investigation, October 6, 2016. https://onlinelibrary.wiley.com/doi/10.1111/eci.12681

Observational designs remain useful in biomedical and behavioral research by allowing empirical investigations of exposures and their associations in populations experiencing all the vagaries of everyday life. Observational studies offer potential directions for randomized controlled experimental studies. However, as we have illustrated here, even an observational study that is meticulously controlled far beyond what could be achieved in a human study cannot be counted upon to reliably estimate causal effects owing to uncontrolled confounders, especially in nutrition research. Therefore, we believe that, despite public statements to the contrary [6], observational studies alone, no matter how well done, cannot support conclusions of causation.

[9] Paper: “Econometric Methods for Causal Evaluation of Education Policies and Practices: A Non-Technical Guide.” By Martin Schlotter, Guido Schwerdt, and Ludger Woessmann. Education Economics, January 2011. https://www.tandfonline.com/doi/10.1080/09645292.2010.511821

Page 110:

Using standard statistical methods, it is reasonably straightforward to establish whether there is an association between two things—for example, between the introduction of a certain education reform (the treatment) and the learning outcome of students (the outcome). However, whether such a statistical correlation can be interpreted as the causal effect of the reform on outcomes is another matter. The problem is that there may well be other reasons why this association comes about.

Page 131:

But obtaining convincing evidence on the effects on specific education policies and practices is not an easy task. As a precondition, relevant data on possible outcomes has to be gathered. What is more, showing a mere correlation between a specific policy or practice and potential outcomes is no proof that the policy or practice caused the outcome. For policy purposes, mere correlations are irrelevant, and only causation is important. What policy-makers care about is what would really happen if they implemented a specific policy or practice—would it really change any outcome that society cares about? In order to implement evidence-based policy, policy-makers require answers to such causal questions.

[10] Article: “Cause Versus Association in Observational Studies in Psychopharmacology.” By Chittaranjan Andrade. Journal of Clinical Psychiatry, August 26, 2014. https://www.psychiatrist.com/jcp/cause-versus-association-observational-studies-psychopharmacology/

Hypotheses may be generated (and conclusions drawn) from observational studies in areas where information from randomized controlled trials (RCTs) is unavailable. However, observational studies can only establish that significant associations exist between predictor and outcome variables. Observational studies cannot establish that the associations identified represent cause-and-effect relationships. This article discusses examples of associations that were identified in observational studies and that were subsequently refuted in RCTs. Examples are also provided of associations that have yet to be confirmed or refuted but that are nevertheless influential in psychopharmacologic practice. Explanations are offered about how confounding might explain significant relationships between variables that are not related by cause and effect. As a conclusion of this exercise, clinicians are cautioned against placing too much reliance on the findings of observational research.

[11] Book: Multiple Regression: A Primer. By Paul D. Allison. Pine Forge Press, 1998.

Chapter 3: “What Can Go Wrong With Multiple Regression?” https://us.sagepub.com/sites/default/files/upm-binaries/2726_allis03.pdf

Page 67: “Non-experimental data rarely tell you anything about the direction of a causal relationship. You must decide the direction based on your prior knowledge of the phenomenon you’re studying.”

[12] Commentary: “Detecting Selection Bias in Observational Studies—When Interventions Work Too Fast.” By Ghulam Rehman Mohyuddin and Vinay Prasad. JAMA Internal Medicine, June 12, 2023. https://jamanetwork.com/journals/jamainternalmedicine/article-abstract/2805974

Observational studies can find associations, not cause-and-effect relationships. If misapplied, the findings of observational studies may lead to overuse, as well as underuse, of medical interventions. To validate the findings of observational studies, randomized clinical trials are often needed.

[13] Commentary: “Covid-19 Vaccine Trial Protocols Released.” By Peter Doshi. British Medical Journal, October 2020. https://www.bmj.com/content/371/bmj.m4058

Sixty years after influenza vaccination became routinely recommended for people aged 65 or older in the US, we still don’t know if vaccination lowers mortality. Randomised trials with this outcome have never been done.9 Observational studies with results in both directions can be cited, and without definitive randomised evidence the debate will go on. Unless we act now, we risk repeating this sorry state of affairs with covid-19 vaccines.

[14] Paper: “Medicaid Increases Emergency-Department Use: Evidence from Oregon’s Health Insurance Experiment.” By Sarah L. Taubman and others. Science, January 2, 2014. Pages 263-268. http://www.sciencemag.org/content/343/6168/263.abstract

In 2008, Oregon initiated a limited expansion of a Medicaid program for uninsured, low-income adults, drawing names from a waiting list by lottery. …

… The lottery allowed us to isolate the causal effect of insurance on emergency-department visits and care; random assignment through the lottery can be used to study the impact of insurance without the problem of confounding factors that might otherwise differ between insured and uninsured populations. …

It is difficult to isolate the impact of Medicaid on emergency-department use in observational data, because the uninsured and Medicaid enrollees may differ on many characteristics (including health and income) that are correlated with use of the emergency department. Indeed, we show (table S17) that observational estimates that did not account for such confounding factors suggested much larger increases in emergency-department use associated with Medicaid coverage than the results from our randomized controlled setting.

[15] Book: Regression With Social Data: Modeling Continuous and Limited Response Variables. By Alfred DeMaris. John Wiley & Sons, 2004.

Page 10:

Nonetheless, according to the potential response model, the average causal effect can be estimated in an unbiased fashion if there is random assignment to the cost. Unfortunately, this pretty much rules out making causal inferences from nonexperimental data. … Still, hard-core adherence to the potential response framework would deny the causal status of most of the interesting variables in the social sciences because they are not capable of being assigned randomly. Holland and Rubin, for example have made up a motto that expresses this quite succinctly: “No causation without manipulation” (Holland, 1986, p. 959). In other words, only “treatments” that can be assigned randomly to any case at will are considered candidates for exhibiting causal effects. … I agree with others … who take exception to this restrictive conception of causality, despite the intuitive appeal of counterfactual reasoning.

Page 12:

Friedman … is especially critical of drawing causal inferences from observational data, since all that can be “discovered,” regardless of the statistical candlepower used, is association. Causation has to be assumed into the structure from the beginning. Or, as Friedman … says: “If you want to pull a causal rabbit out of the hat, you have to put the rabbit into the hat.” In my view, this point is well taken; but it does not preclude using regression for causal inference. What it means, instead, is that prior knowledge of the causal status of one’s regressors is a prerequisite for endowing regression coefficients with a causal interpretation, as acknowledged by Pearl 1998.

Page 13:

Sobel’s (1988, p. 346) advice is in the same vein: “[s]ociologists might follow the example of epidemiologists. Here, when an association is found in an observational study that might plausibly suggest causation, the findings are treated as preliminary and tentative. The next step, when possible, is to conduct the randomized study that will more definitively answer the causal question of interest.” …

In sum, causal modeling via regression, using nonexperimental data, can be a useful enterprise provided we bear in mind that several strong assumptions are required to sustain it. First, regardless of the sophistication of our methods, statistical techniques only allow us to examine associations among variables.

[16] Book: Introductory Econometrics: Using Monte Carlo Simulation with Microsoft Excel. By Humberto Barreto and Frank M. Howland. Cambridge University Press, 2006.

Page 43:

Association Is Not Causation

A second problem with the correlation coefficient involves its interpretation. A high correlation coefficient means that two variables are highly associated, but association is not the same as causation.

This issue is a persistent problem in empirical analysis in the social sciences. Often the investigator will plot two variables and use the tight relationship obtained to draw absolutely ridiculous or completely erroneous conclusions. Because we so often confuse association and causation, it is extremely easy to be convinced that a tight relationship between two variables means that one is causing the other. This is simply not true.

[17] Book: Business and Competitive Analysis: Effective Application of New and Classic Methods (2nd edition). By Craig S. Fleisher and Babette E. Bensoussan. Pearson Education, 2015.

Pages 338–339:

One of the biggest potential problems with statistical analysis is the quality of the interpretation of the results. Many people see cause-and-effect relationships ‘evidenced’ by statistics, which are in actuality simply describing data associations or correlation having little or nothing to do with causal factors.

[18] Textbook: Macroeconomics: A Contemporary Introduction (10th edition). By William A. McEachern. South-Western Cengage Learning, 2014.

Page 13:

Economic analysis, like other forms of scientific inquiry, is subject to common mistakes in reasoning that can lead to faulty conclusions. Here are three sources of confusion.

The Fallacy That Association Is Causation

In the past two decades, the number of physicians specializing in cancer treatment increased sharply. At the same time, the incidence of some cancers increased. Can we conclude that physicians cause cancer? No. To assume that event A caused event B simply because the two are associated in time is to commit the association-is-causation fallacy, a common error. The fact that one event precedes another or that the two events occur simultaneously does not necessarily mean that one causes the other. Remember: Association is not necessarily causation.

[19] Article: “Statistical Malpractice.” By Bruce G. Charlton. Journal of the Royal College of Physicians of London, March 1996. Pages 112–114. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5401528/pdf/jrcollphyslond90376-0016.pdf

Page 112: “Science is concerned with causes but statistics is concerned with correlations.”

Page 113: “The root of most instances of statistical malpractice is the breaking of mathematical neutrality and the introduction of causal assumptions into analysis without justifying them on scientific grounds.”

[20] Book: Regression With Social Data: Modeling Continuous and Limited Response Variables. By Alfred DeMaris. John Wiley & Sons, 2004.

Page 9:

Regression modeling of nonexperimental data for the purpose of making causal inferences is ubiquitous in the social sciences. Sample regression coefficients are typically thought of as estimates of the causal impacts of explanatory variables on the outcome. Even though researchers may not acknowledge this explicitly, their use of such language as impact or effect to describe a coefficient value often suggest a causal interpretation. This practice is fraught with controversy….

Page 12:

Friedman … is especially critical of drawing causal inferences from observational data, since all that can be “discovered,” regardless of the statistical candlepower used, is association. Causation has to be assumed into the structure from the beginning. Or, as Friedman … says: “If you want to pull a causal rabbit out of the hat, you have to put the rabbit into the hat.” In my view, this point is well taken; but it does not preclude using regression for causal inference. What it means, instead, is that prior knowledge of the causal status of one’s regressors is a prerequisite for endowing regression coefficients with a causal interpretation, as acknowledged by Pearl 1998.

Page 13:

In sum, causal modeling via regression, using nonexperimental data, can be a useful enterprise provided we bear in mind that several strong assumptions are required to sustain it. First, regardless of the sophistication of our methods, statistical techniques only allow us to examine associations among variables.

[21] “Common Core State Standards for Mathematics.” Common Core State Standards Initiative. Accessed October 3, 2018 at https://bit.ly/3QpckL8

Mathematics | High School—Statistics and Probability

Decisions or predictions are often based on data—numbers in context. These decisions or predictions would be easy if the data always sent a clear message, but the message is often obscured by variability. Statistics provides tools for describing variability in data and for making informed decisions that take it into account. …

Make inferences and justify conclusions from sample surveys, experiments and observational studies …

Distinguish between correlation and causation.

[22] Article: “Four Fabrications About Firearms.” By James D. Agresti, Just Facts, October 19, 2017. https://www.justfactsdaily.com/four-fabrications-about-firearms/

To prove his claim that “more guns means more murder,” Times columnist Bret Stephens cites a 2013 paper in the American Journal of Public Health, which found that “states with higher rates of gun ownership had disproportionately large numbers of deaths from firearm-related homicides.”

For two reasons, this study does not support Stephens’ assertion. …

Second, Stephens makes a common blunder by confusing association with causation. Even if the study he cited had found that states with higher rates of gun ownership had higher levels of murder, this would not show that “more guns means more murder.” As explained in a textbook about analyzing data:

Association is not the same as causation. This issue is a persistent problem in empirical analysis in the social sciences. Often the investigator will plot two variables and use the tight relationship obtained to draw absolutely ridiculous or completely erroneous conclusions. Because we so often confuse association and causation, it is extremely easy to be convinced that a tight relationship between two variables means that one is causing the other. This is simply not true.

The reason it’s not true is because there are numerous possible factors that impact homicide rates, and without an experimental study, it is extremely difficult to identify, measure, and account for the effects of all such factors. In fact, the authors of the study in question make this point three times, writing, “we could not determine causation,” “we could not determine causation,” and “it is not possible in a panel study such as ours to determine causality.”

Despite those explicit and repeated warnings in the study, Stephens misrepresents it as though it proves causation. Coming from Stephens, who is a Pulitzer Prize winner and has a master’s degree from the London School of Economics, this kind of error belies gross negligence.

[23] Article: “Should the U.S. Adopt Australia’s Strict Gun Laws?” By James D. Agresti. Just Facts, December 20, 2012. https://www.justfactsdaily.com/should-the-u-s-adopt-australias-strict-gun-laws

ABC News, for an example, published an article entitled, “Will Lessons From Down Under Stem the Undertaker Here?” In this piece, correspondent Nick Schifrin reports that strict Australia gun laws passed in 1996 have proved “extremely effective. In the last 16 years, the risk of dying by gunshot in Australia has fallen by more than 50 percent. The national rate of gun homicide is one-thirtieth that of the United States.”

Statistics like these do more to mislead than inform. First, a simple comparison of current firearm homicide rates between countries cannot possibly establish the impact of their gun control laws. This is because there are numerous other factors endemic to each country that impact homicide rates, such as their law enforcement and criminal justice systems, the portion of children raised in single-parent households, poverty rates, and many other relevant variables. Schifrin’s argument is analogous to an argument made by the NRA that right-to-carry states have a 28% lower murder rate than the rest of the country. Such statistics tell us little. To provide any legitimate indication of the effects of gun laws, before-and-after comparisons are almost always necessary.

Schifrin does provide a before-and-after comparison of the “risk of dying by gunshot in Australia” over the past 16 years, but this is deceptive because it accounts for lives taken with guns while failing to account for lives saved with guns. As shown in several studies summarized in the Journal of Criminal Law and Criminology, in the vast majority of cases where someone uses a gun for self-defense, a bullet is never even fired because the would-be assailant retreats when he discovers that his target is armed. Schifrin’s “risk of dying by gunshot” statistic fails to account for such scenarios.

The “risk of dying by gunshot” statistic also fails to account for weapons substitution, which occurs when murderers use whatever weapons are readily available to them. Would someone judge a gun control law to be a success if every averted gun murder were replaced by another type of murder? Of course not, but the press commonly cites statistics that fail to account for such outcomes. For these reasons, to assess the full effects of gun laws on homicides, one must look at all homicides, not just those committed with firearms.

The homicide data does not fit the storyline commonly advanced by the media. Quite to the contrary, the data shows that U.S. homicide rates have dropped more rapidly since the federal ban on assault weapons expired than homicide rates dropped in Australia after its strict gun laws were implemented. To be precise, seven full calendar years have transpired since the federal ban on assault weapons and high-capacity magazines elapsed in 2004, and over this entire period, the U.S. murder rate has averaged 3.9% lower than it was when the ban expired. Correspondingly, in the seven years that followed the implementation of Australia’s gun laws in 1997, the Australian murder rate averaged 0.4% lower than it was when the laws took effect. …

If association equals causation—as the ABC article suggests—the expiration of the federal assault weapons ban was 10 times more effective in reducing homicides than the enactment of Australia’s tight gun laws and gun buyback. Of course, cause and effect cannot be proved because many other factors affect murder rates, and it is practically impossible to accurately isolate all of these effects. Nevertheless, the above graph allows us to observe trends and constrains the impact of many variables because the data is drawn from large population sets with limited demographic changes from year to year.

[24] Article: “Everything You Always Wanted to Know About Masks, and the Deadly Falsehoods Surrounding Them.” By James D. Agresti, Just Facts, September 13, 2021. https://www.justfacts.com/news_face_masks_deadly_falsehoods

The CDC’s “Science Brief” on cloth masks is similarly deceitful. It makes its sources inaccessible, distorts the lone RCT on cloth masks, and ignores observational studies that don’t fit the narrative. Perhaps worst of all, it cites more than a dozen observational studies without ever revealing their fatal flaw: an inability to determine the actual effects of masks.

In contrast, the government health agencies of other nations, like Public Health England, are forthright about that reality and bluntly state that observational studies on masks are “highly subject to confounders,” and thus, they constitute “weak evidence.”

[25] Book: Business and Competitive Analysis: Effective Application of New and Classic Methods (2nd edition). By Craig S. Fleisher and Babette E. Bensoussan. Person Education, 2015.

Pages 337–338:

One of the biggest potential problems with statistical analysis in the quality of the interpretation of the results. Many people see cause-and-effect relationships “evidenced” by statistics, which are in actuality simply describing data associations or correlation having little or nothing to do with casual factors. 

[26] “Immigration Facts.” By James D. Agresti and Steven Bukovec. Just Facts. Last revised January 28, 2025. https://www.justfacts.com/immigration

In 2016, Gary Johnson, the Libertarian candidate for president of the U.S., told CNN that Mexican immigrants “are more law-abiding than U.S. citizens, and that is a statistic.”[26] PolitiFact, a group with a mission to “help you find the truth in politics,”[26] reported that this statement is “mostly true.”[26]

Misrepresenting Association as Causation

In support of its “mostly true” ruling, PolitiFact wrote that “crime involvement among foreign-born residents is lower than that of U.S.-born citizens.” As evidence of this, PolitiFact cited a report from the American Immigration Council written by three Ph.D.’s and accurately paraphrased it as follows:

Between 1990 and 2013, the foreign-born share of the U.S. population increased from 7.9 percent to 13.1 percent, and the number of unauthorized immigrants went up from 3.5 million to 11.2 million. At the same time, violent crime rate (murder, rape and aggravated assault) decreased 48 percent and property crime rate fell 41 percent, the report said, citing FBI data.[26] [26]

Per an academic textbook about analyzing data:

Association is not the same as causation. This issue is a persistent problem in empirical analysis in the social sciences. Often the investigator will plot two variables and use the tight relationship obtained to draw absolutely ridiculous or completely erroneous conclusions. Because we so often confuse association and causation, it is extremely easy to be convinced that a tight relationship between two variables means that one is causing the other. This is simply not true.[26] [26] [26] [26] [26] [26]

[27] Paper: “Medical Cannabis Laws and Opioid Analgesic Overdose Mortality in the United States, 1999–2010.” By Marcus A. Bachhuber and others. JAMA Internal Medicine, October 2014. https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/1898878

The mean age-adjusted opioid analgesic overdose mortality rate increased in states with and without medical cannabis laws during the study period (Figure 1). Throughout the study period, states with medical cannabis laws had a higher opioid analgesic overdose mortality rate and the rates rose for both groups; however, between 2009 and 2010 the rate in states with medical cannabis laws appeared to plateau.

In the adjusted model, medical cannabis laws were associated with a mean 24.8% lower annual rate of opioid analgesic overdose deaths (95% CI, −37.5% to −9.5%; P = .003) (Table), compared with states without laws.

[28] Paper: “Medical Cannabis Laws and Opioid Analgesic Overdose Mortality in the United States, 1999–2010.” By Marcus A. Bachhuber and others. JAMA Internal Medicine, October 2014. https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/1898878

In summary, although we found a lower mean annual rate of opioid analgesic mortality in states with medical cannabis laws, a direct causal link cannot be established.

This study has several limitations. First, this analysis is ecologic and cannot adjust for characteristics of individuals within the states, such as socioeconomic status, race/ethnicity, or medical and psychiatric diagnoses. … Third, although fixed-effects models can adjust for time-invariant characteristics of each state and state-invariant time effects, there may be important time- and state-varying confounders not included in our models.

[29] Paper: “Association Between Medical Cannabis Laws and Opioid Overdose Mortality Has Reversed Over Time.” By Chelsea L. Shover and others. Proceedings of the National Academy of Sciences, June 10, 2019. https://www.pnas.org/content/116/26/12624

A 2014 study by Bachhuber et al.1 created a sensation by showing that state medical cannabis laws were associated with lower-than-expected opioid overdose mortality rates from 1999 to 2010. Cited by more than 350 scientific articles to date, the study attracted national and international media attention and was hailed by many activists and industry representatives as proof that expanding medical cannabis would reverse the opioid epidemic.1

1 M. A. Bachhuber, B. Saloner, C. O. Cunningham, C. L. Barry. Medical cannabis laws and opioid analgesic overdose mortality in the United States, 1999-2010. JAMA Intern. Med. 174, 1668–1673 (2014).

[30] Article: “Study: Medical Pot Might Reduce Drug Overdose Deaths.” By Trevor Hughes. USA Today, August 25, 2014. https://www.usatoday.com/story/news/nation/2014/08/25/medical-marijuana-prescription-drugs-study/14572193/

Access to medical marijuana appears to have saved thousands of lives over the past few years by reducing accidental overdose deaths from drugs like Vicodin, Percocet and OxyContin, a new study says. …

“It suggests the potential for many lives to be saved,” said study senior author Colleen L. Barry, an associate professor in the Department of Health Policy and Management at the Bloomberg School.

[31] Article: “Legalized Marijuana Could Help Curb the Opioid Epidemic, Study Finds.” Reuters, March 27, 2017. https://www.nbcnews.com/health/health-news/legalized-marijuana-could-help-curb-opioid-epidemic-study-finds-n739301

In states that legalized medical marijuana, hospitals failed to see the predicted influx of pot smokers, and in an unexpected twist, they treated far fewer opioid users, a new study shows. …

In a 2014 study, Dr. Marcus Bachhuber found deaths from opioid overdoses fell by 25 percent in states that legalized medical marijuana. …

Many of Bachhuber’s patients ask for help quitting highly addictive opioids, and some have used marijuana to taper off the prescription painkillers, he said.

[32] Article: “New Study Shows Medical Marijuana Prevents Opioid Abuse.” Medical Jane, December 31, 2014. https://www.medicaljane.com/

“A study published in JAMA Internal Medicine in October 2014 found that in states where medical cannabis is legal in the US, deaths due to opioid overdose are reduced by approximately 25%.”

[33] Article: “Medical Marijuana Laws Linked to Fewer Opioid Deaths.” By Pauline Anderson. Medscape Medical News, August 25, 2014. https://www.medscape.com/viewarticle/830417

US states with laws that establish access to medical cannabis have lower rates of mortality due to opioid overdoses, according to a new study. …

The “striking” implication of the study “is that medical marijuana laws, when implemented, may represent a promising approach for stemming runaway rates of nonintentional opioid analgesic-related deaths,” commented Marie J. Hayes, PhD, and Mark S. Brown, MD, Pediatrics and Neonatal Medicine, Eastern Maine Medical Center, Bangor, Maine, in an accompanying editorial.

“If true, this finding upsets the applecart of conventional wisdom regarding the public health implications of marijuana legalization and medicinal usefulness.”

[34] Article: “Laws on Medical Cannabis Lower Opioid Overdose Mortality.” By Sanjeet Bagcchi. The Lancet, October 2014. https://www.thelancet.com/journals/lanonc/article/PIIS1470-2045(14)70436-X/abstract

[35] Paper: “Association Between Medical Cannabis Laws and Opioid Overdose Mortality Has Reversed Over Time.” By Chelsea L. Shover and others. Proceedings of the National Academy of Sciences, June 10, 2019. https://www.pnas.org/content/116/26/12624

Medical cannabis has been touted as a solution to the US opioid overdose crisis since Bachhuber et al. … found that from 1999 to 2010 states with medical cannabis laws experienced slower increases in opioid analgesic overdose mortality. … In this study, we used the same methods to extend Bachhuber et al.’s analysis through 2017. Not only did findings from the original analysis not hold over the longer period, but the association between state medical cannabis laws and opioid overdose mortality reversed direction from −21% to +23% and remained positive after accounting for recreational cannabis laws. …

… Using the same methods as Bachhuber et al. (1), we revisited the question with seven more years of data.

[36] Paper: “Association Between Medical Cannabis Laws and Opioid Overdose Mortality Has Reversed Over Time.” By Chelsea L. Shover and others. Proceedings of the National Academy of Sciences, June 10, 2019. https://www.pnas.org/content/116/26/12624

We find it unlikely that medical cannabis—used by about 2.5% of the US population—has exerted large conflicting effects on opioid overdose mortality. A more plausible interpretation is that this association is spurious. Moreover, if such relationships do exist, they cannot be rigorously discerned with aggregate data. …

We are more cautious than others have been in drawing causal conclusions from ecological correlations and conclude that the observed association between these two phenomena is likely spurious rather than a reflection of medical cannabis saving lives 10 y ago and killing people today. Medical cannabis users are about 2.5% of the population, making it unlikely that they can significantly alter population-wide indices.12 Unmeasured variables likely explain both associations (e.g., state incarceration rates and practices, naloxone availability, and the extent of insurance and services).2

The nonrobustness of the earlier findings also highlights the challenges of controlling scientific messages in controversial policy areas. Corporate actors (e.g., the medical cannabis industry) with deep pockets have substantial ability to promote congenial results, and suffering people are desperate for effective solutions.

[37] Commentary: “Observational Studies, Bad Science, and the Media.” By Steven E. Nissen, MD. American College of Cardiology, May 25, 2012. https://www.acc.org/Latest-in-Cardiology/Articles/2012/05/25/12/44/Observational-Studies

The limitations of observational studies are myriad, but the most common flaws are easily understood and explained. Since patients are not randomly assigned to a treatment group, there always exist differences in characteristics between the study groups. The best observational studies attempt to adjust for these “confounders,” but often consider only the most common demographic variables, such as age and gender. Statistical adjustment can never fully compensate for all of the differences in patient characteristics, leading a common problem known as “residual confounding.”

[38] Book: Multiple Regression: A Primer. By Paul D. Allison. Pine Forge Press, 1998.

Chapter 1: “What Is Multiple Regression?” https://us.sagepub.com/sites/default/files/upm-binaries/2725_allis01.pdf

Page 20:

To statistically control for a variable, you have to be able to measure that variable so that you can explicitly build it into the data analysis, either by putting it in the regression equation or by using it to form homogeneous subgroups. Unfortunately, there’s no way that we can measure all the variables that might conceivably affect the dependent variable. No matter how many variables we include in a regression equation, someone can always come along and say, “Yes, but you neglected to control for variable X and I feel certain that your results would have been different if you had done so.”

[39] Paper: “Observational Research Rigour Alone Does Not Justify Causal Inference.” By Keisuke Ejima and others. European Journal of Clinical Investigation, October 6, 2016. https://onlinelibrary.wiley.com/doi/10.1111/eci.12681

Differing opinions exist on whether associations obtained in observational studies can be reliable indicators of a causal effect if the observational study is sufficiently well controlled and executed. …

To test this, we conducted two animal observational studies that were rigorously controlled and executed beyond what is achieved in studies of humans. …

In both studies we observed markedly disparate results of the observational association estimates and the experimental effect estimates regarding food consumption effects on lifespan and weight gain. With the randomized experimental design as the gold standard of causal inference, the observational association estimates in study 1 showed the opposite direction of the “true effect” of daily energy intake on lifespan in mice, whereas those estimates in study 2 did not represent the “true effect” of assigned food consumption on weight gain in young mice. Even [if] we employed a study design using genetically identical mice in the same environment; it is very possible that there are some unmeasured confounders not controlled in our observational designs, such as undiagnosed disease or individual characteristics, which can be strongly correlated to both exposure and outcomes and eventually bias the statistical inferences. For example, mice with any undiagnosed diseases or specific metabolic characteristics (e.g., basal metabolic rate, daily energy expenditure) may eat poorly and die earlier than the other healthy mice, which would produce a biased result. These two studies are typical examples of un-controlled confounding in studies with self-selection feature, which is frequently involved in food consumption research. The un-controlled confounders in self-selected food consumption (or other factors more generally) apparently can result in biased inference. Thus, relying on self-selection can lead to biased estimation of causal effects. …

Observational designs remain useful in biomedical and behavioral research by allowing empirical investigations of exposures and their associations in populations experiencing all the vagaries of everyday life. Observational studies offer potential directions for randomized controlled experimental studies. However, as we have illustrated here, even an observational study that is meticulously controlled far beyond what could be achieved in a human study cannot be counted upon to reliably estimate causal effects owing to uncontrolled confounders, especially in nutrition research. Therefore, we believe that, despite public statements to the contrary [6], observational studies alone, no matter how well done, cannot support conclusions of causation.

[40] Paper: “Medical Cannabis Laws and Opioid Analgesic Overdose Mortality in the United States, 1999–2010.” By Marcus A. Bachhuber and others. JAMA Internal Medicine, October 2014. https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/1898878

This study has several limitations. First, this analysis is ecologic and cannot adjust for characteristics of individuals within the states, such as socioeconomic status, race/ethnicity, or medical and psychiatric diagnoses. Although we found that the association between medical cannabis laws and lower opioid overdose mortality strengthened in the years after implementation, this could represent heterogeneity between states that passed laws earlier in the study period vs those that passed the laws later. Second, death certificate data may not correctly classify cases of opioid analgesic overdose deaths, and reporting of opioid analgesics on death certificates may differ among states; misclassification could bias our results in either direction. Third, although fixed-effects models can adjust for time-invariant characteristics of each state and state-invariant time effects, there may be important time- and state-varying confounders not included in our models.

[41] Paper: “Association Between Medical Cannabis Laws and Opioid Overdose Mortality Has Reversed Over Time.” By Chelsea L. Shover and others. Proceedings of the National Academy of Sciences, June 10, 2019. https://www.pnas.org/content/116/26/12624

We are more cautious than others have been in drawing causal conclusions from ecological correlations and conclude that the observed association between these two phenomena is likely spurious rather than a reflection of medical cannabis saving lives 10 y ago and killing people today. Medical cannabis users are about 2.5% of the population, making it unlikely that they can significantly alter population-wide indices (12). Unmeasured variables likely explain both associations (e.g., state incarceration rates and practices, naloxone availability, and the extent of insurance and services) (2).

[42] Paper: “Observational Research Rigour Alone Does Not Justify Causal Inference.” By Keisuke Ejima and others. European Journal of Clinical Investigation, October 6, 2016. https://onlinelibrary.wiley.com/doi/10.1111/eci.12681

“The greatest challenge to drawing causal inferences in observational studies is the existence of potential confounding variables, not all of which can be specified, measured, or modeled.”

[43] Book: Multiple Regression: A Primer. By Paul D. Allison. Pine Forge Press, 1998.

Chapter 1: “What Is Multiple Regression?” https://us.sagepub.com/sites/default/files/upm-binaries/2725_allis01.pdf

Page 1:

Multiple regression is a statistical method for studying the relationship between a single dependent variable and one or more independent variables. It is unquestionably the most widely used statistical technique in the social sciences. It is also widely used in the biological and physical sciences.

[44] Lecture: “Observational Studies – Basics, Agreement with Randomized Trials, Flawed Vaccine Studies.” Delivered by Dr. Vinay Prasad at the University of California San Francisco, November 3, 2023. https://www.youtube.com/watch?v=nlGiFTsaO7M

Time marker 10:52:

Investigators doing this kind of research typically have an observational dataset. …. The way we do this research is we construct some type of model. Typically, the model is a regression model where the first covariate we put in is the thing that interests us: vitamin E exposure.

And so you ask, “Is there an association between vitamin E exposure and mortality?” But, of course, you have to adjust for covariates, and I so easily could say, “Let’s adjust for age. After all, all older people take more vitamin E capsules. Let’s adjust for sex. Let’s adjust for race.”

But somebody else could do the same sort of study and also adjust for income. Maybe they live in Toronto, and they’re more cognizant of income. And somebody in North Carolina might add smoking status, okay? And then my friend at Harvard, he might add BMI and hypertension, diabetes, cholesterol, alcohol consumption, education, family history of heart disease, etc., etc.

So there are many different analytic plans that could be run in the same dataset looking at Vitamin E exposure and mortality. You could run it with two covariates, three covariates, six covariates, or 13 covariates, or something like that. And you can also run it with all sorts of combinations of these covariates.

[45] Paper: “Statin Therapy and Risks for Death and Hospitalization in Chronic Heart Failure.” By Alan S. Go and others. Journal of the American Medical Association, November 1, 2006. https://jamanetwork.com/journals/jama/fullarticle/203879

Objective To evaluate the association between initiation of statin therapy and risks for death and hospitalization among adults with chronic heart failure. …

Main Outcome Measures All-cause death and hospitalization for heart failure during a median of 2.4 years of follow-up. We examined the independent relationships between statin therapy and risks for adverse events overall and stratified by the presence or absence of coronary heart disease after multivariable adjustment for potential confounders. …

Covariates

Age, sex, and self-reported race/ethnicity were identified from health plan databases. Race/ethnicity was included because studies suggest it may be associated with differential treatment or outcomes in cardiovascular diseases. Socioeconomic status was estimated from 2000 US Census data. Low education was defined as living in a census block where more than 25% of those aged 25 years or older had less than a 12th-grade education; low income was defined as living in a block where annual household income is less than $35 000 per year.22 We ascertained information on coexisting illnesses based on diagnoses or procedures using ICD-9 codes, laboratory results, or specific therapies from health plan hospitalization discharge, ambulatory visit, laboratory, and pharmacy databases; diabetes mellitus registry23; and regional cancer registry.24 This included baseline and follow-up diagnoses of CHD, cerebrovascular disease, peripheral arterial disease, diabetes, hypertension, malignancy, thyroid disease, liver disease, lung disease, human immunodeficiency virus infection, valvular disease, dementia, depression, ventricular arrhythmias, and atrial fibrillation/flutter (ICD-9 codes available on request).

We also identified outpatient measurements of total cholesterol, low-density lipoprotein cholesterol, high-density lipoprotein cholesterol, and hemoglobin17 from laboratory databases during the 12 months before study entry and throughout follow-up. We classified kidney function using the Modification of Diet in Renal Disease equation for estimated glomerular filtration rate based on outpatient determinations of serum creatinine.25,26

We ascertained information on systolic function status from health plan databases. If available, we used a previously described approach18 to define reduced systolic function as left ventricular ejection fraction of less than 40% or a qualitative description of moderate or severely reduced systolic function; preserved left ventricular systolic function was defined by left ventricular ejection fraction of 40% or higher or a qualitative description of normal or only mildly reduced systolic function.

Finally, as a proxy for intensity of care, we identified the number of visits to a cardiologist before study entry and during follow-up. Information on participation in heart failure case management programs was not available.

[46] Book: Introductory Econometrics: Using Monte Carlo Simulation with Microsoft Excel. By Humberto Barreto and Frank M. Howland. Cambridge University Press, 2006.

Page 491:

Omitted variable bias is a crucial topic because almost every study in econometrics is an observational study as opposed to a controlled experiment. Very often, economists would like to be able to interpret the comparisons they make as if they were the outcomes of controlled experiments. In a properly conducted controlled experiment, the only systematic difference between groups results from the treatment under investigation; all other variation stems from chance. In an observational study, because the participants self-select into groups, it is always possible that varying average outcomes between groups result from systematic difference between groups other than the treatment. We can attempt to control for these systematic differences by explicitly incorporating variables in a regression. Unfortunately, if not all of those differences have been controlled for in the analysis, we are vulnerable to the devastating effects of omitted variable bias.

[47] Book: Multiple Regression: A Primer. By Paul D. Allison. Pine Forge Press, 1998.

Chapter 1: “What Is Multiple Regression?” https://us.sagepub.com/sites/default/files/upm-binaries/2725_allis01.pdf

Page 20:

Multiple regression shares an additional problem with all methods of statistical control, a problem that is the major focus of those who claim that multiple regression will never be a good substitute for the randomized experiment. To statistically control for a variable, you have to be able to measure that variable so that you can explicitly build it into the data analysis, either by putting it in the regression equation or by using it to form homogeneous subgroups. Unfortunately, there’s no way that we can measure all the variables that might conceivably affect the dependent variable. No matter how many variables we include in a regression equation, someone can always come along and say, “Yes, but you neglected to control for variable X and I feel certain that your results would have been different if you had done so.”

That’s not the case with randomization in an experimental setting. Randomization controls for all characteristics of the experimental subjects, regardless of whether those characteristics can be measured. Thus, with randomization there’s no need to worry about whether those in the treatment group are smarter, more popular, more achievement oriented, or more alienated than those in the control group (assuming, of course, that there are enough subjects in the experiment to allow randomization to do its job effectively).

[48] Book: Theory-Based Data Analysis for the Social Sciences (2nd edition). By Carol S. Aneshensel. SAGE Publication, 2013.

Page 90:

The numerous variables that are omitted from any model are routinely assumed to be uncorrelated with the error term, a requirement for obtaining unbiased parameter estimates from regression models. However, the possibility that unmeasured variables are correlated with variables that are in the model obviously cannot be eliminated on empirical grounds. Thus, omitted variable bias cannot be ruled out entirely as a counterargument for the empirical association between the focal independent and dependent variables in observational studies.

[49] Book: Applied Statistics for Economists. By Margaret Lewis. Routledge, 2012.

In economics, our primary concern is to identify and then include all relevant independent variables as indicated by economic theory.9 Omitting such variables will cause the regression model to be underspecified, with the partial regression coefficients that are affected by the omitted variable(s) will not equal the true population parameters.

[50] Encyclopedia of Education Economics and Finance. Edited by Dominic J. Brewer and Lawrence O. Picus. Sage Publications, 2014.

Page 498:

Omitted variable bias (OVB) occurs when an important independent variable is excluded from an estimation model, such as a linear regression, and its exclusion causes the estimated effects of the included independent variables to be biased. Bias will occur when the excluded variable is correlated with one or more of the included variables. An example of this occurs when investigating the returns to education. This typically involves regressing the log of wages on the number of years of completed schooling as well as on other demographic characteristics such as an individual’s race and gender. One important variable determining wages, however, is a person’s ability. In many such regressions, a measure of ability is not included in the regression (or the measure included only imperfectly controls for ability). Since ability is also likely to be correlated with the amount of schooling an individual receives, the estimated return to years of completed schooling will likely suffer from OVB.

[51] Book: Higher Education: Handbook of Theory and Research (Volume 28). Edited by Michael B. Paulsen. Springer, 2013.

Chapter 6: “Instrumental Variables: Conceptual Issues and an Application Considering High School Course Taking.” By Rob M. Bielby and others. Pages 263–312.

Page 273:

An additional issue with the aforementioned studies is that none employ strategies to eliminate the influence of unobservable factors on course taking and attainment. Some student characteristics may be difficult or impossible to obtain information about in observational datasets, but this does not change the fact that they are confounding factors (Cellini, 2008). Examples of potential unobservable factors in course taking effects research include a student’s enjoyment of the learning process and a student’s desire to undertake and persevere through challenges. It is likely that these unobservable factors contribute to student selection into high school courses and a student’s subsequent choice to attain a bachelor’s degree. However, none of the studies we examined that employ a standard regression approach accounted for a student’s intrinsic love of learning or ability to endure through difficulties; the failure to account for these unobserved factors may bias the estimates that result from these studies.

[52] Paper: “Observational Research Rigour Alone Does Not Justify Causal Inference.” By Keisuke Ejima and others. European Journal of Clinical Investigation, October 6, 2016. https://onlinelibrary.wiley.com/doi/10.1111/eci.12681

The greatest challenge to drawing causal inferences in observational studies is the existence of potential confounding variables, not all of which can be specified, measured, or modeled. Randomization is the only method that can eliminate all potential confounders of the effect of treatment assignment per se, doing so by making the distribution of prerandomization factors identical for all treatment assignments at the population level [2].

[53] Commentary: “Observational Studies, Bad Science, and the Media.” By Steven E. Nissen, MD. American College of Cardiology, May 25, 2012. https://www.acc.org/Latest-in-Cardiology/Articles/2012/05/25/12/44/Observational-Studies

The limitations of observational studies are myriad, but the most common flaws are easily understood and explained. Since patients are not randomly assigned to a treatment group, there always exist differences in characteristics between the study groups. The best observational studies attempt to adjust for these “confounders,” but often consider only the most common demographic variables, such as age and gender. Statistical adjustment can never fully compensate for all of the differences in patient characteristics, leading a common problem known as “residual confounding.”

[54] Paper: “The Phantom Menace: Omitted Variable Bias in Econometric Research.” By Kevin A. Clarke. Conflict Management and Peace Science, September 2005. https://journals.sagepub.com/doi/10.1080/07388940500339183

Quantitative political science is awash in control variables. The justification for these bloated specifications is usually the fear of omitted variable bias. A key underlying assumption is that the danger posed by omitted variable bias can be ameliorated by the inclusion of relevant control variables. Unfortunately, as this article demonstrates, there is nothing in the mathematics of regression analysis that supports this conclusion. The inclusion of additional control variables may increase or decrease the bias, and we cannot know for sure which is the case in any particular situation.

[55] Paper: “Observational Research Rigour Alone Does Not Justify Causal Inference.” By Keisuke Ejima and others. European Journal of Clinical Investigation, October 6, 2016. https://onlinelibrary.wiley.com/doi/10.1111/eci.12681

Differing opinions exist on whether associations obtained in observational studies can be reliable indicators of a causal effect if the observational study is sufficiently well controlled and executed. …

To test this, we conducted two animal observational studies that were rigorously controlled and executed beyond what is achieved in studies of humans. …

In both studies we observed markedly disparate results of the observational association estimates and the experimental effect estimates regarding food consumption effects on lifespan and weight gain. With the randomized experimental design as the gold standard of causal inference, the observational association estimates in study 1 showed the opposite direction of the “true effect” of daily energy intake on lifespan in mice, whereas those estimates in study 2 did not represent the “true effect” of assigned food consumption on weight gain in young mice. Even [if] we employed a study design using genetically identical mice in the same environment; it is very possible that there are some unmeasured confounders not controlled in our observational designs, such as undiagnosed disease or individual characteristics, which can be strongly correlated to both exposure and outcomes and eventually bias the statistical inferences. For example, mice with any undiagnosed diseases or specific metabolic characteristics (e.g., basal metabolic rate, daily energy expenditure) may eat poorly and die earlier than the other healthy mice, which would produce a biased result. These two studies are typical examples of un-controlled confounding in studies with self-selection feature, which is frequently involved in food consumption research. The un-controlled confounders in self-selected food consumption (or other factors more generally) apparently can result in biased inference. Thus, relying on self-selection can lead to biased estimation of causal effects. …

Observational designs remain useful in biomedical and behavioral research by allowing empirical investigations of exposures and their associations in populations experiencing all the vagaries of everyday life. Observational studies offer potential directions for randomized controlled experimental studies. However, as we have illustrated here, even an observational study that is meticulously controlled far beyond what could be achieved in a human study cannot be counted upon to reliably estimate causal effects owing to uncontrolled confounders, especially in nutrition research. Therefore, we believe that, despite public statements to the contrary [6], observational studies alone, no matter how well done, cannot support conclusions of causation.

[56] Book: The Education Gap: Vouchers and Urban Schools (Revised edition). By William G. Howell and Paul E. Peterson with Patrick J. Wolf and David E. Campbell. Brookings Institution Press, 2006 (first published in 2002). https://www.brookings.edu/book/the-education-gap/

Page 39:

In a perfectly controlled experiment in the natural sciences, the researcher is able to control for all factors while manipulating the variable of interest. …

Experiments with humans are much more difficult to manage. Researchers cannot give out pills or placebos and then ask subjects not to change any other aspect of their lives.

[57] Textbook: Criminology. By Steve Case and others. Oxford University Press, 2017.

Chapter 27: “Searching for the Causes of Crime.” Pages 525–548.

Page 534: “The statistical relationships identified between different factors and offending behaviors/crime are correlational, not causal.”

Page 546: “To answer the title question, searching for the cause of crime is an impossible goal in the strict experimental, positivist sense, because there are simply too many unknowns and unmeasured dark figures of crime and explanation to enable us to draw valid and reliable conclusions from research.”

[58] Paper: “Resisting Rape: The Effects of Victim Self-Protection on Rape Completion and Injury.” By Jongyeon Tark and Gary Kleck. Violence Against Women, March 2014. https://journals.sagepub.com/doi/epdf/10.1177/1077801214526050

Page 270:

The impact of victim resistance on rape completion and injury was examined utilizing a large probability sample of sexual assault incidents, derived from the National Crime Victimization Survey … and taking into account whether harm to the victim followed or preceded self-protection (SP) actions. Additional injuries besides rape, particularly serious injuries, following victim resistance are rare. Results indicate that most SP actions, both forceful and nonforceful, reduce the risk of rape completion, and do not significantly affect the risk of additional injury. …

Finally, given the impossibility of experimental research on this topic, it should be noted that our findings are necessarily based on observed associations between victim actions and assault outcomes, thereby precluding definitive conclusions about causal effects.

[59] Paper: “Association Is Not Causation: Treatment Effects Cannot Be Estimated From Observational Data in Heart Failure.” By Christopher J Rush and others. European Heart Journal, October 2018. https://academic.oup.com/eurheartj/article/39/37/3417/5063542

Treatment ‘effects’ are often inferred from non-randomized and observational studies. These studies have inherent biases and limitations, which may make therapeutic inferences based on their results unreliable. We compared the conflicting findings of these studies to those of prospective randomized controlled trials (RCTs) in relation to pharmacological treatments for heart failure (HF). …

We searched Medline and Embase to identify studies of the association between non-randomized drug therapy and all-cause mortality in patients with HF until 31 December 2017. … We identified 92 publications, reporting 94 non-randomized studies, describing 158 estimates of the ‘effect’ of the six treatments of interest on all-cause mortality, i.e. some studies examined more than one treatment and/or HF phenotype. These six treatments had been tested in 25 RCTs. For example, two pivotal RCTs showed that MRAs reduced mortality in patients with HF with reduced ejection fraction. However, only one of 12 non-randomized studies found that MRAs were of benefit, with 10 finding a neutral effect, and one a harmful effect.

This comprehensive comparison of studies of non-randomized data with the findings of RCTs in HF shows that it is not possible to make reliable therapeutic inferences from observational associations. While trials undoubtedly leave gaps in evidence and enroll selected participants, they clearly remain the best guide to the treatment of patients.

[60] Paper: “The Phantom Menace: Omitted Variable Bias in Econometric Research.” By Kevin A. Clarke. Conflict Management and Peace Science, September 2005. https://journals.sagepub.com/doi/10.1080/07388940500339183

Quantitative political science is awash in control variables. The justification for these bloated specifications is usually the fear of omitted variable bias. A key underlying assumption is that the danger posed by omitted variable bias can be ameliorated by the inclusion of relevant control variables. Unfortunately, as this article demonstrates, there is nothing in the mathematics of regression analysis that supports this conclusion. The inclusion of additional control variables may increase or decrease the bias, and we cannot know for sure which is the case in any particular situation.

[61] Book: Multiple Regression: A Primer. By Paul D. Allison. Pine Forge Press, 1998.

Chapter 3: “What Can Go Wrong With Multiple Regression?” https://us.sagepub.com/sites/default/files/upm-binaries/2726_allis03.pdf

Pages 60–61:

Do Some Variables Mediate the Effects of Other Variables?

Even if the sample is not small, there is another reason for being cautious in concluding that a variable has no effect: It’s possible that other variables mediate the effect of that variable. If those other variables are also included in the regression model, the effect of the variable you’re interested in may disappear.

[62] Paper: “Understanding How the Social Scientific Study of Same-Sex Parenting Works.” By Mark Regnerus. Annals of Social Sciences / Roczniki Nauk Społecznych, 2020. https://ojs.tnkul.pl/index.php/rns/article/download/15976/15377/

[T]he “consensus” that children from same-sex households fare no differently than children from opposite-sex households—in particular, married families—is a carefully guarded social construction. The consensus is the result of sampling decisions, analytic comparisons, and interpretations of results that often indicate baseline differences prior to statistical controls for household instability, after which they commonly disappear. …

The story of “no differences” between same-sex and opposite-sex households with children hinges on a pair of repetitive themes in the published research: small and nonrepresentative sampling strategies, and analytic strategies that all but guarantee the ability to “explain away” any baseline observable differences between children from same-sex and opposite-sex households. …

Where deficits would be considered consequential and potentially harmful to admit, controlling for household instability is nearly ubiquitous. For instance, a 2016 study using data from the population-based Early Childhood Longitudinal Study—Kindergarten cohort found that the children of same-sex parents wellbeing measured significantly lower on measures such as interpersonal skills and both internalizing and externalizing wellbeing when compared to the children of married opposite-sex parents. But then the authors employ the familiar tactic:

After including family change and early childhood transitions in the model, differences in the externalizing well-being, internalizing well-being, and interpersonal skills of children in same-sex parent households were no longer significantly different from their peers in married, two-biological parent families.31

This is how researchers get to no “differences,” that is, by controlling for—hence ignoring—household turmoil.

Others simply obscure the results in their discussion of them. For example, in a 2018 study which compared the mental health of the NLLFS’s [National Longitudinal Lesbian Family Study’s] then-25-year-old donor-conceived children of lesbian parents to a population-based sample of children from opposite-sex households, the authors reported “no significant differences in measures of mental health” between the two groups.32 And yet the evidence presented in the study itself reveals that the NLLFS children reported demonstrably higher levels of “depression or anxiety” than the control group.33 Nevertheless, the next year the same authors mentioned how their 2018 study conclusions “provide no justification for restricted access to reproductive technologies, adoption, foster care, or civil liberties for lesbian, gay, or bisexual people.”34

My concern is not at all with the use of control variables and regression analyses. These are, after all, standard approaches. The problem is that this method is often misemployed to “control away” how reality and social processes work, and to ensure a high likelihood of a “no differences” conclusion.

The most common way this occurs is by controlling for parental relationship dissolution, rates of which vary dramatically between gay and straight couples. How dramatically? Estimates vary, but they never reveal lower breakup rates among same-sex households with children. A 2020 study of over 1.2 million children in the (gay-friendly) Netherlands revealed that 55 percent of children living with same-sex parents—the vast majority of which were female couples—experienced parental separation, well above the 19 percent of children of opposite-sex parents who experienced the same.35 The story in the Netherlands has not changed; the same pattern was observed using data from no later than 2000 in which (mostly cohabiting) same-sex couples experienced 3.1 times higher dissolution odds than opposite-sex cohabiting couples and 11.5 times higher odds compared with married couples.36

One may claim, following minority stress theory, that if societies were more tolerant, sexual minorities wouldn’t feel the need to hide their identities and enter heterosexual relationships, only to see them “inevitably” fail, followed by the formation of relationships considered more “authentic.” But this is the Netherlands—it doesn’t get more tolerant than that. Even data from Sweden shows that women in same-sex marriages have a divorce rate nearly twice that of opposite-sex married couples.37 And despite the fact that the NLLFS drew upon a particularly privileged set of recruited American lesbians, 62 percent of the young adults in that study reported in 2019 that their parents—typically a biological and social mother—had already broken up, a rate well above what we would expect to see among the offspring of opposite-sex parents.38

A recent re-examination of three nationally-representative datasets from the United States and Canada similarly revealed that dissolution rates were different—but not profoundly so—for couples with no children: 9% for same-sex vs. 5% for opposite sex in one study, 27% and 17% in another, respectively. However, for couples with children (in a formalized union), the results were strikingly different, with dissolution rates of 43% for same-sex couples vs. 8% for opposite-sex. The presence of children tended to stabilize opposite-sex couples, but destabilize same-sex couples.39 The authors suggest that “parental instability is an important factor through which parents’ sexual orientation influences children’s outcomes.”40

The elevated break-up rate of female same-sex couples is a central mechanism here that is under-theorized. The consistent story is not about a direct effect of sexual orientation on children’s outcomes, but rather about the indirect effects of consolidating sex (or gender) preferences and behaviors. Even key proponents of the “no differences consensus” had long predicted this pattern, stating their suspicion that the “asymmetrical biological and legal statuses” and “high standards of equality” present in lesbian relationships would put them at a heightened risk of dissolution.41 Their suspicion has proven to be correct.

It is plausible, even likely, that household instability—via parental romantic- relationship fragility—is a key pathway or mechanism by which children come to have difficulties in one or more domains of life. This tendency to overlook pathways in favor of controls reflects a typical misguided tendency in social science research to always search for “independent” effects of variables, thereby missing the ways in which social phenomena actually operate and outcomes come to be.42 Controlling for the effect of a parent’s same-sex relationship with a “household instability” variable and concluding that there are “no differences” between children of same-sex and opposite-sex parents is tantamount to “controlling for the pathways.” It is unhelpful for describing and understanding how social reality works.43

[63] As supposed evidence that Mexican immigrants are “less likely to commit crimes than the native-born,” a 2016 article by PolitiFact cited an American Immigration Council report and wrote that:

2010 Census data that shows incarceration rates of young, less educated Mexican, Salvadoran and Guatemalan men—which comprise the bulk of the unauthorized population—are “significantly lower” than incarceration rates of native-born young men without a high-school diploma.

If the claim above is accurate, it would not show that Mexican immigrants are “less likely to commit crimes than the native-born.” It would show that young, male Mexican immigrants with low education who remain in the U.S. are less likely to be incarcerated than native-born young men without a high-school diploma. With regard to:

  • education:
    • In 2012, 54% of Mexican and Central American immigrants aged 25–64 did not have a high school diploma or equivalent, as compared to 7% of people born in the U.S. in the same age group.
    • In 2009, 65% of male prison inmates did not have a high school diploma, as compared to 19% of adults in the general population.
  • remaining in the U.S.:
  • In the decade from 2006 to 2015, the federal government deported 1,504,934 non-citizens who were convicted of committing crimes in the U.S. This was 10 times the number of non-citizens in U.S. adult correctional facilities at the end of this period (151,324).
  • After convicts are released from state prisons, roughly 81% of them are arrested within a decade for committing completely new crimes (not probation or parole violations), with an average of five arrests per released prisoner.

[64] In October 2022, the British Journal of Sports Medicine published a small study which found that transgender women (i.e., males) on long-term estrogen therapy had higher cardiopulmonary capacity and grip strength than biological women.

In the words of the study’s authors, “These findings add new insights to the sparse information available on a highly controversial topic about the participation of TW [transgender women] in physical activities.”

Several months later in February 2023, the journal published a “correction” which stated that there was “no difference” in cardiopulmonary capacity or grip strength between the transgender women and biological women when “adjusted for fat-free mass.”

Controlling for fat-free mass obscures the fact that women generally have much higher body fat percentages than men. This gives biological men a competitive advantage in physical activities.

[65] Book: Multiple Regression: A Primer. By Paul D. Allison. Pine Forge Press, 1998.

Chapter 1: “What Is Multiple Regression?” https://us.sagepub.com/sites/default/files/upm-binaries/2725_allis01.pdf

Page 1: “Multiple regression is a statistical method for studying the relationship between a single dependent variable and one or more independent variables. It is unquestionably the most widely used statistical technique in the social sciences. It is also widely used in the biological and physical sciences.”

Chapter 3: “What Can Go Wrong With Multiple Regression?” https://us.sagepub.com/sites/default/files/upm-binaries/2726_allis03.pdf

Page 49:

Any tool as widely used as multiple regression is bound to be frequently misused. Nowadays, statistical packages are so user-friendly that anyone can perform a multiple regression with a few mouse clicks. As a result, many researchers apply multiple regression to their data with little understanding of the underlying assumptions or the possible pitfalls. Although the review process for scientific journals is supposed to weed out papers with incorrect or misleading statistical methods, it often happens that the referees themselves have insufficient statistical expertise or are simply too rushed to catch the more subtle errors. The upshot is that you need to cast a critical eye on the results of any multiple regression, especially those you run yourself.

Fortunately, the questions that you need to ask are neither extremely technical nor large in number. They do require careful thought, however, which explains why even experts occasionally make mistakes or overlook the obvious. Virtually all the questions have to do with situations where multiple regression is used to make causal inferences.

NOTE: Pages 49–65 detail eight possible pitfalls of regression analyses.

Page 65: “The preceding eight problems are the ones I believe most often lead to serious errors in judging the results of a multiple regression. By no means do they exhaust the possible pitfalls that may arise. Before concluding this chapter, I’ll briefly mention a few others.”

[66] Paper: “Econometric Methods for Causal Evaluation of Education Policies and Practices: A Non-Technical Guide.” By Martin Schlotter, Guido Schwerdt, and Ludger Woessmann. Education Economics, January 2011. https://www.tandfonline.com/doi/10.1080/09645292.2010.511821

Page 111:

Whenever other reasons exist that give rise to some correlation between the two things of interest—the treatment and the outcome—the overall correlation cannot be interpreted as the causal effect of the treatment on the outcome. Broadly speaking, this is what economists call the ‘endogeneity problem’. The term stems from the idea that treatment cannot be viewed as exogenous to the model of interest, as it should be, but that it is rather endogenously determined within the model—depending on the outcome or being jointly determined with the outcome by a third factor. Because of the problem of endogeneity, estimates of the association between treatment and outcome based on correlations will be biased estimates of the causal effect of treatment on outcome.2

Standard approaches try to deal with this problem by observing the other sources of possible correlation and take out the difference in outcomes that can be attributed to these other observed differences. This is the approach of multivariate models that estimate the effects of multiple variables on the outcome at the same time, such as the classical ordinary least-squares (OLS) or multilevel modeling (or hierarchical linear models, HLM) techniques. They allow estimating the association between treatment and outcome conditional on the effects of the other observed factors.

2 Other possible sources of endogeneity include self-selection (objects with different characteristics can choose whether to be treated or not) and simultaneity (treatment and outcome are choice variables that are jointly determined). In econometric terms, measurement error in the treatment variable can also be interpreted as an endogeneity problem, because it gives rise to a particular form of association between treatment and outcome (one that generally biases the estimates toward finding no effect, even if there was one).

[67] Article: “Statistical Malpractice.” By Bruce G. Charlton. Journal of the Royal College of Physicians of London, March 1996. https://www.researchgate.net/publication/14493406_Statistical_malpractice

Page 112: “Science is concerned with causes but statistics is concerned with correlations.”

Page 113:

The root of most instances of statistical malpractice is the breaking of mathematical neutrality and the introduction of causal assumptions into analysis without justifying them on scientific grounds. This amounts to performing science by sleight-of-hand: the quickness of the statistics deceives the mind. The process is often accidental, the product of misunderstanding rather than of malice—as commonly happens when statistical adjustments or standardization of populations are performed to remove the effects of confounding variables.6 These are maneuvers by which data sets are recalculated (for example, by stratified or multivariate analysis) in an attempt to eliminate the consequences of uncontrolled “interfering” variables which distort the causal relationship under study. …

There are however, no statistical rules by which confounders can be identified, and the process of adjustment involves making quantitative causal assumptions based upon secondary analysis of the database in question. …

Adjustment is therefore, implicitly, a way of modeling the magnitude of a causal process in order to subtract its effects from the data. However, modeling is not mathematically neutral and involves inputting assumptions—an activity which requires to be justified for each case. …

… Statistical malpractice occurs, however, exactly because it has not dispensed with causation, but has merely concealed it under a cloak of mathematical neutrality.

[68] Paper: “Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results.” By Raphael R. Silberzahn and others. Advances in Methods and Practices in Psychological Science, August 23, 2018. https://journals.sagepub.com/doi/full/10.1177/2515245917747646

Twenty-nine teams involving 61 analysts used the same data set to address the same research question: whether soccer referees are more likely to give red cards to dark-skin-toned players than to light-skin-toned players. … Twenty teams (69%) found a statistically significant positive effect, and 9 teams (31%) did not observe a significant relationship. … These findings suggest that significant variation in the results of analyses of complex data may be difficult to avoid, even by experts with honest intentions. …

… The analytic techniques chosen ranged from simple linear regression to complex multilevel regression and Bayesian approaches. The teams also varied greatly in their decisions regarding which covariates to include….

Figure 2 shows each team’s estimated effect size, along with its 95% confidence interval (CI). As this figure and Table 3 show, the estimated effect sizes ranged from 0.89 (slightly negative) to 2.93 (moderately positive) in odds-ratio (OR) units; the median estimate was 1.31. The confidence intervals for many of the estimates overlap, which is expected because they are based on the same data. Twenty teams (69%) found a significant positive relationship, p < .05, and nine teams (31%) found a nonsignificant relationship. No team reported a significant negative relationship. …

Table 3. Analytic Approaches and Results for Each Team

[69] Paper: “Many Analysts, One Data Set: Making Transparent How Variations in Analytic Choices Affect Results.” By Raphael R. Silberzahn and others. Advances in Methods and Practices in Psychological Science, August 23, 2018. https://journals.sagepub.com/doi/full/10.1177/2515245917747646

Twenty-nine teams involving 61 analysts used the same data set to address the same research question: whether soccer referees are more likely to give red cards to dark-skin-toned players than to light-skin-toned players. … Twenty teams (69%) found a statistically significant positive effect, and 9 teams (31%) did not observe a significant relationship. … These findings suggest that significant variation in the results of analyses of complex data may be difficult to avoid, even by experts with honest intentions. …

In the scientific process, creativity is mostly associated with the generation of testable hypotheses and the development of suitable research designs. Data analysis, on the other hand, is sometimes seen as the mechanical, unimaginative process of revealing results from a research study. Despite methodologists’ remonstrations … it is easy to overlook the fact that results may depend on the chosen analytic strategy, which itself is imbued with theory, assumptions, and choice points. …

Researchers may understand this conceptually, but there is little appreciation for the implications in practice. In some cases, authors use a particular analytic strategy because it is the one they know how to use, rather than because they have a specific rationale for using it. Peer reviewers may comment on and suggest improvements to a chosen analytic strategy, but rarely do those comments emerge from working with the actual data set…. Moreover, it is not uncommon for peer reviewers to take the authors’ analytic strategy for granted and comment exclusively on other aspects of the manuscript. More important, once an article is published, reanalyses and critiques of the chosen analytic strategy are slow to emerge and rare … in part because of the low frequency with which data are available for reanalysis…. The reported results and implications drive the impact of published articles; the analytic strategy is pushed to the background.

But what if the methodologists are correct? What if scientific results are highly contingent on subjective decisions at the analysis stage? In that case, the process of certifying a particular result on the basis of an idiosyncratic analytic strategy might be fraught with unrecognized uncertainty … and research findings might be less trustworthy than they at first appear to be….. Had the authors made different assumptions, an entirely different result might have been observed….

… The analytic techniques chosen ranged from simple linear regression to complex multilevel regression and Bayesian approaches. The teams also varied greatly in their decisions regarding which covariates to include….

Figure 2 shows each team’s estimated effect size, along with its 95% confidence interval (CI). As this figure and Table 3 show, the estimated effect sizes ranged from 0.89 (slightly negative) to 2.93 (moderately positive) in odds-ratio (OR) units; the median estimate was 1.31. The confidence intervals for many of the estimates overlap, which is expected because they are based on the same data. Twenty teams (69%) found a significant positive relationship, p < .05, and nine teams (31%) found a nonsignificant relationship. No team reported a significant negative relationship. …

Table 3. Analytic Approaches and Results for Each Team

[70] Paper: “Observational Research Rigour Alone Does Not Justify Causal Inference.” By Keisuke Ejima and others. European Journal of Clinical Investigation, October 6, 2016. https://onlinelibrary.wiley.com/doi/10.1111/eci.12681

Observational designs remain useful in biomedical and behavioral research by allowing empirical investigations of exposures and their associations in populations experiencing all the vagaries of everyday life. Observational studies offer potential directions for randomized controlled experimental studies.

[71] Paper: “Resisting Rape: The Effects of Victim Self-Protection on Rape Completion and Injury.” By Jongyeon Tark and Gary Kleck. Violence Against Women, March 2014. https://journals.sagepub.com/doi/epdf/10.1177/1077801214526050

Page 270:

The impact of victim resistance on rape completion and injury was examined utilizing a large probability sample of sexual assault incidents, derived from the National Crime Victimization Survey … and taking into account whether harm to the victim followed or preceded self-protection (SP) actions. Additional injuries besides rape, particularly serious injuries, following victim resistance are rare. Results indicate that most SP actions, both forceful and nonforceful, reduce the risk of rape completion, and do not significantly affect the risk of additional injury. …

Finally, given the impossibility of experimental research on this topic, it should be noted that our findings are necessarily based on observed associations between victim actions and assault outcomes, thereby precluding definitive conclusions about causal effects.

[72] Book: Rutherford’s Vascular Surgery (8th edition, Volume 1). Edited by Jack L. Cronenwet and K. Wayne Johnston. Elsevier Saunders, 2014.

Chapter 1: “Epidemiology and Clinical Analysis.” By Loius L. Nguyen and Ann DeBord Smith.” Pages 2–14.

Observational Studies

There are two main types of observational studies: cohort and case-control studies. A cohort is a designated group of individuals that is followed over a period of time. Cohort studies seek to identify a population at risk for the disease of interest. After a period of observation, patients in whom the disease develops are compared with the population of patients who are free of the disease. Cohort studies are most often associated with epidemiology because they comprise many of the most prominent studies in the modern era. The classic example is the Framingham Heart Study (FHS), in which 5,209 residents of Framingham, Massachusetts, were monitored prospectively, starting in 1948.3 Much of our epidemiologic knowledge regarding risk factors for heart disease comes from the FHS.4 Although the FHS was initially intended to last 20 years, the study has subsequently been extended and now involves the third generation of participants. Cohort studies also seek to identify potential risk factors for development of the disease of interest. For example, if cigarette smoking is suspected in the development of peripheral arterial disease (PAD), smokers are assessed for the development of PAD from the beginning of the observation period to the end of the observation period. Because PAD does not develop in all smokers, and conversely, not all PAD patients are smokers, a relative risk (RR) is calculated as the ratio of the incidence of PAD in smokers versus the incidence of PAD in nonsmokers.

[73] Textbook: Principles and Practice of Clinical Research. By John I. Gallin and ‎Frederick P. Ognibene. Academic Press, 2012.

Page 226: “While consistency in the findings of a large number of observational studies can lead to the belief that the associations are causal, this belief is a fallacy.”

[74] Commentary: “Detecting Selection Bias in Observational Studies—When Interventions Work Too Fast.” By Ghulam Rehman Mohyuddin and Vinay Prasad. JAMA Internal Medicine, June 12, 2023. https://jamanetwork.com/journals/jamainternalmedicine/article-abstract/2805974

Early separation of the Kaplan-Meier curves is another mechanism for detecting residual confounding in observational studies. The Kaplan-Meier survival curve is a graphical representation of time-to-event end points and allows for maximal use of each participant’s time-related data. In a Kaplan-Meier analysis, participants contribute to the survival estimate until the event of interest occurs (eg, death, progression of disease) or until they are censored (eg, loss to follow-up or mandatory data lock). …

A limitation to this approach to detecting selection bias in observational studies is that it requires knowledge of the pathophysiology and natural history of a disease, the expected efficacy of the interventions, or other expertise in the subject of the study. Another is that when observational studies are published, Kaplan-Meier curves are often not included. Instead, there are tables showing relative risks or odds ratios for the comparisons between the groups of patients that are being observed; such tables do not show changes in the relative risks or odds ratios that may occur over time. To allow for the ascertainment of bias, such as imbalances between the characteristics of cohorts and residual confounding or confounding by indication, observational studies that examine a time-to-event end point or contain time-to-event data should report this information in a graphical form. When interventions appear to work too fast, the findings of a study may be too good to be true.

[75] Lecture: “Observational Studies – Basics, Agreement with Randomized Trials, Flawed Vaccine Studies.” Delivered by Dr. Vinay Prasad at the University of California San Francisco, November 3, 2023. https://www.youtube.com/watch?v=nlGiFTsaO7M

Time marker 41:30:

Eric Topol, he makes this mistake all the time. Here he is publishing that Covid-19 vaccine reduces the risk of long Covid.

Look at when the curves split! Covid-19 vaccine—it prevents you from getting long Covid before it can even prevent you from getting Covid! How is this plausible?

It’s not plausible. This is an observational study, and the reason it’s not plausible is that … [it] cures you of long Covid immediately. …

If interventions affect end points that cannot be plausibly related to the intervention, or if they work too fast, then I think that’s a fair sign that there is residual confounding in observational studies.

Here is Eric Topol on Paxlovid, which he thinks is so wonderful—85% reduction in death. And look at the curves split instantly! Paxlovid works even before you take it!

[76] Article: “Vaccines Stop Long Covid Immediately! Maybe 29 Days Before You Get It!” By Dr. Vinay Prasad, March 2, 2023. https://www.drvinayprasad.com/p/vaccines-stop-long-covid-immediately

A recent paper is out in BMJ Open; it is kind of funny. It is clearly preposterous. …

Simply put, the authors take a French dataset of long covid sufferers, and compare people who were vaccinated vs. those who were not with a sophisticated matching method. (Of course, it can’t correct for unmeasured covariates— but we don’t need to go there)

They ask: is vaccination associated with resolution of long covid symptoms? Resolution is defined as the symptoms vanish entirely. Here is the key Figure. …

Long covid starts to vanish immediately after vaccination! Vaccines work faster to stop long covid than they can stop the coronavirus. But wait, it actually works even sooner, because the questionnaire says PRIOR 30 days!

NOTE: See the footnote below for the study in question.

[77] Paper: “Efficacy of First Dose of Covid-19 Vaccine Versus No Vaccination on Symptoms of Patients with Long Covid: Target Trial Emulation Based on ComPaRe E-Cohort.” By Viet-Thi Tran and others. BMJ Medicine, February 27, 2023. https://bmjmedicine.bmj.com/content/bmjmed/2/1/e000229.full.pdf

Page 1:

Adult patients (aged ≥18 years) enrolled in the ComPaRe cohort before 1 May 2021 were included in the study if they reported a confirmed or suspected SARS-CoV-2 infection, symptoms persistent for >3 weeks after onset, and at least one symptom attributable to long covid at baseline. Patients who received a first covid-19 vaccine injection were matched with an unvaccinated control group in a 1:1 ratio according to their propensity scores. Number of long covid symptoms, rate of complete remission of long covid, and proportion of patients reporting an unacceptable symptom state at 120 days were recorded. …

By 120 days, vaccination had reduced the number of long covid symptoms (mean 13.0 (standard deviation 9.4) in the vaccinated group v 14.8 (9.8) in the control group; mean difference −1.8, 95% confidence interval −3.0 to −0.5) and doubled the rate of patients in remission (16.6% v 7.5%, hazard ratio 1.93, 95% confidence interval 1.18 to 3.14). Vaccination reduced the effect of long covid on patients’ lives (mean score on the impact tool 24.3 (standard deviation 16.7) v 27.6 (16.7); mean difference −3.3, 95% confidence interval −5.7 to −1.0) and the proportion of patients with an unacceptable symptom state (38.9% v 46.4%, risk difference −7.4%, 95% confidence interval −14.5% to −0.3%). In the vaccinated group, two (0.4%) patients reported serious adverse events requiring admission to hospital.

Page 7:

Our study had some limitations. Firstly, despite the use of robust methods and statistical techniques to make causal inferences from observational data, the intervention was not randomly assigned, and potential unmeasured confounders could have biased our results. For example, patients’ motivation to receive a covid-19 vaccine was not taken into account, although it might be related to their perception of their long covid symptoms. …

… patients in this study had not been vaccinated before their infection and long covid….

[78] Article: “Cause Versus Association in Observational Studies in Psychopharmacology.” By Chittaranjan Andrade. Journal of Clinical Psychiatry, August 26, 2014. https://www.psychiatrist.com/jcp/cause-versus-association-observational-studies-psychopharmacology/

In observational studies, causal explanations for identified associations may be supported in various ways. These include the presence of biological plausibility for the association, existence of a dependent relationship between the suggested dose-dependent relationship between the suggested cause and the effect of interest, and replication of the finding across studies.29 However, none of these supports is foolproof.

[79] Article: “Dose-Response Relationship.” By Sydney Pettygrove (PhD). Encyclopædia Britannica. Accessed April 18, 2025 at https://www.britannica.com/science/dose-response-relationship

A dose–response relationship is one in which increasing levels of exposure are associated with either an increasing or a decreasing risk of the outcome. Demonstration of a dose-response relationship is considered strong evidence for a causal relationship between the exposure and the outcome.

[80] Paper: “The Dose-Response Relationship Between Cigarette Consumption, Biochemical Markers and Risk of Lung Cancer.” By MR Law and others. British Journal of Cancer, 1997. https://pmc.ncbi.nlm.nih.gov/articles/PMC2223525/pdf/brjcancer00188-0136.pdf

“There is an approximately linear relationship between the number of cigarettes per day that a person reports smoking and the age-specific risk of lung cancer—as consumption doubles, risk doubles.”

[81] Book: Rutherford’s Vascular Surgery (8th edition, Volume 1). Edited by Jack L. Cronenwet and K. Wayne Johnston. Elsevier Saunders, 2014.

Chapter 1: “Epidemiology and Clinical Analysis.” By Loius L. Nguyen and Ann DeBord Smith.” Pages 2–14.

Observational Studies

There are two main types of observational studies: cohort and case-control studies. A cohort is a designated group of individuals that is followed over a period of time. Cohort studies seek to identify a population at risk for the disease of interest. After a period of observation, patients in whom the disease develops are compared with the population of patients who are free of the disease. Cohort studies are most often associated with epidemiology because they comprise many of the most prominent studies in the modern era. The classic example is the Framingham Heart Study (FHS), in which 5,209 residents of Framingham, Massachusetts, were monitored prospectively, starting in 1948.3 Much of our epidemiologic knowledge regarding risk factors for heart disease comes from the FHS.4 Although the FHS was initially intended to last 20 years, the study has subsequently been extended and now involves the third generation of participants. Cohort studies also seek to identify potential risk factors for development of the disease of interest. For example, if cigarette smoking is suspected in the development of peripheral arterial disease (PAD), smokers are assessed for the development of PAD from the beginning of the observation period to the end of the observation period. Because PAD does not develop in all smokers, and conversely, not all PAD patients are smokers, a relative risk (RR) is calculated as the ratio of the incidence of PAD in smokers versus the incidence of PAD in nonsmokers.

[82] Paper: “Mendelian Randomization Analysis of the Causal Effect of Cigarette Smoking on Hospital Costs.” By Padraig Dixon and others. Nicotine & Tobacco Research, April 17, 2024. https://pmc.ncbi.nlm.nih.gov/articles/PMC11494471/

Causal evidence on the effect of smoking on health care costs is also necessary for the robust evaluation of specific interventions that aim to prevent smoking and to treat its downstream consequences. Decision making by individual smokers may be improved with better information about the non-health consequences of smoking.12

However, establishing the causal effect of cigarette smoking on health care costs is challenging. Observed associations of smoking with health care costs may arise because smoking is indeed a cause of health care costs, because smoking is itself partly determined by health care costs, or because smoking is associated with causes or consequences of processes that influence health care costs. In general, it is not clear if the factors that may predispose an individual to smoke are themselves independent determinants of health care costs. For example, smoking tends to cluster with other behaviors known or suspected to affect health care costs, including high body mass index (BMI), poor diet, alcohol consumption, and low-physical activity.13–17 Smoking is also heavily socially patterned; globally, lower socio-economic status groups are more likely to smoke than higher status groups18,19 although this overall picture conceals variation over time and within regions. For example, in the first half of the 20th century, smoking prevalence was highest amongst higher socio-economic groups, but this pattern has now reversed.20 Similar patterns of greater smoking prevalence amongst high-income groups are observed in some low-income countries.21 The point remains that the cooccurrence of smoking and socio-economic status is a challenge for conventional analyses.

Smoking is also more prevalent amongst groups defined by some health statuses. For example, smoking is more common amongst individuals with depression and schizophrenia.22–25 These associations may affect all or some of smoking initiation, smoking intensity, and smoking cessation. Smoking also influences disease incidence (such as lung cancer) which in turn may prompt cessation. Smoking may reflect elements of self-medication26 or a desire to control one’s weight.27 Smoking may therefore be both a cause and consequence of health status and other circumstances.28,29 Measured associations of smoking with health care cost may also partly reflect wider attitudes to risk tolerance, including impulsivity and behavioral disinhibition.

[83] Book: Multiple Regression: A Primer. By Paul D. Allison. Pine Forge Press, 1998.

Chapter 3: “What Can Go Wrong With Multiple Regression?” https://us.sagepub.com/sites/default/files/upm-binaries/2726_allis03.pdf

Pages 52–53:

When we estimate a regression model, we often interpret the coefficients as measuring the causal effects of the independent variables on the dependent variable. But what if the “dependent” variable actually affects one or more of the “independent” variables? If it does, the resulting biases can be every bit as serious as those produced by the omission of important variables. This problem— known as reverse causation—actually can be worse than the omitted variables problem because

• Every coefficient in the regression model may be biased

• It’s hard to design a study that will adequately solve this problem

Unfortunately, there’s rarely any information in the data that can help you determine the direction of causation. Instead, decisions about the direction of causation have to be based almost entirely on your knowledge of the phenomenon you’re studying. There are, in fact, several different ways to argue against the possibility of reverse causation. If the data come from a randomized experiment, then randomization assures us that the dependent variable isn’t influencing who gets the treatment. Often, the time ordering of the variables gives us a pretty clear indication of the causal direction. For example, we usually feel safe in supposing that parents’ educational attainment affects the educational attainment of their adult children, not the other way around. Even when there’s no time ordering, our knowledge of basic physical and biological processes sometimes gives us a pretty good idea of the causal direction. We feel confident, for example, that a man’s height might affect his social prestige, but social prestige couldn’t affect height.

With non-experimental data, most applications of regression analysis involve some ambiguity about the direction of causality. In such cases, the causal mechanism could run in either direction, perhaps in both.

[84] Report: “Improving Health and Social Cohesion through Education.” Organization for Economic Cooperation and Development, Center for Educational Research and Innovation, 2010. https://www.oecd.org/en/publications/2010/09/improving-health-and-social-cohesion-through-education_g1ghce50.html

Pages 31–33:

(a) Reverse causality

One source of endogeneity stems from the possibility that there is reverse causality, whereby poor health or low CSE [civic and social engagement] reduces educational attainment. Poor health in youth might interfere with educational attainment by interfering with student learning because of increased absences and inability to concentrate. It may also lead to poor adult health, thus creating a correlation between education and adult health. Similarly, low CSE such as lack of trust and political interest might also reduce educational attainment. For example, a family with low CSE might reduce their involvement with schools, which might lead to poorer student outcomes.7

The bias due to reverse causality can be re-cast as an omitted variable problem after considering timing issues. Since health and CSE tend to persist over time, past health or CSE can be an important determinant of current health or CSE. Thus, past health or CSE is an omitted variable in equation (1) which is captured by the error term. The extent to which omitting past health or CSE will lead to an omitted variable bias depends on the extent to which past health or CSE is also correlated with the included variable Education. Because the current stock of education depends on past decisions about investments in education, reverse causality generates a correlation between past health or CSE and the individual’s current stock of education.8 If the estimated coefficient picks up the effect of past health or CSE … will be biased towards overestimating the causal effect of education.

(b) Hidden third variables

The second source of endogeneity comes from the possibility that there might be one or more hard-to-observe hidden third variables which are the true causes of both educational attainment and health and CSE.9 In the context of the education–earnings link, the most commonly mentioned hidden third variable is ability.10 The long-standing concern in this line of research has been that people with greater cognitive ability are more likely to invest in more education, but even without more education their higher cognitive ability would lead to higher earnings (Card, 2001). More recently, non-cognitive abilities such as the abilities to think ahead, to persist in tasks, or to adapt to their environments have been suggested as important determinants of both education and earnings outcomes (Heckman and Rubinstein, 2001).

In the context of the education–health link, Fuchs (1993) describes time preference and self-efficacy as his favorite candidates for hidden third variables. People with a low rate of time preference are more willing to forego current utility and invest more in both education and health capital that pays off in the future (Farrell and Fuchs, 1982, Fuchs, 1982). A classic example is the Stanford Marshmallow Experiment in which 4 year-olds were given the choice between eating the marshmallow now or waiting for the experimenter’s return and getting a second marshmallow. When these children were tested again at age 18, Shoda and others (1990) found a strong correlation between delayed gratification at age 4 and mathematical and English competence. Similarly, people with greater self-efficacy, i.e. those who believe in their ability to exercise control over outcomes, will be more likely to invest in schooling and health. Most studies of the schooling–health link use data sets that do not contain direct or proxy measures of time preference and self-efficacy. Consequently, these variables are typically omitted when estimating equation (1). The resulting omitted variable bias again implies that … will be biased towards overestimating the causal effect of education on health.

[85] Paper: “Retrospective Analyses of Large Medical Databases. What Do They Tell Us?” By Richard a. Ward and Michael E. Brier. Journal of the American Society of Nephrology, February 1999. https://jasn.asnjournals.org/content/10/2/429.short

The clinical study of human disease is complicated by interdependent variables, and powerful analytical tools are necessary to establish causal relationships. Prospective studies can be randomized and blinded to the investigators. These techniques protect prospective trials, somewhat, from the problems of bias in the study design and confounding by codependent variables. However, prospective clinical trials are difficult to perform, may require extended duration for adequate observations of human diseases, and are expensive to organize and perform. Retrospective studies may take less time and are less expensive than prospective studies because the data have already been measured. However, retrospective studies are susceptible to bias in data selection and analysis. Furthermore, confounding variables may go unrecognized because of inadequate knowledge of how they interrelate with the outcome of interest. Because of these limitations, retrospective data analysis may show associations among variables, but rarely establishes causal relationships.

Dialysis has achieved widespread clinical acceptance because its efficacy is undisputed and the outcome without therapy is obvious. Patients with end-stage renal disease (ESRD) die of the complications of uremia, unless they are dialyzed or receive a renal transplant. Because the consequence of not dialyzing a patient with ESRD is so clear, dialysis was never subjected to the rigors of a prospective, randomized clinical trial.


Discover more from Watts Up With That?

Subscribe to get the latest posts sent to your email.





Source link