Journal Club Toolbox

  • Uploaded by: Arjun Rajagopalan
  • 0
  • 0
  • April 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Journal Club Toolbox as PDF for free.

More details

  • Words: 7,988
  • Pages: 30
The Journal Club Toolbox Critical reading of “original” medical articles

Arjun Rajagopalan

Critical reading of “original” medical articles

CONTENTS STEP 1:

Don't read the abstract

3

STEP 2:

Read the introduction instead

4

STEP 3:

The title tags the type

4

STEP 4:

Identify the clinical question – in 4-part harmony

6

BREAK 1:

Bias (systematic error)

7

STEP 5:

Looking for sampling bias

8

STEP 6:

Looking for measurement bias

11

STEP 7:

Bias in comparison – apples with apples, not oranges

13

BREAK 2:

Estimating the impact of the inherent variability of populations

15

STEP 8:

Now for the results

19

STEP 9A:

Interpreting interventional studies

20

STEP 9B:

Interpreting studies on value of diagnostic tests

21

STEP 9C:

Interpreting studies on risk/ association/ causality

23

STEP 10:

Closing the loop – applying the results in your practice

26

© Dr Arjun Rajagopalan - 1

Critical reading of “original” medical articles

The fact that an opinion has been widely held is no evidence whatever that it is not utterly absurd; indeed in view of the silliness of the majority of mankind, a widespread belief is more likely to be foolish than sensible. Bertrand Russell

...where the value of a treatment, new or old, is doubtful, there may be a higher moral obligation to test it critically than to continue to prescribe it year-in and year-out with the support of custom or wishful thinking. F H K Green

Familiarity with medical statistics leads inevitably to the conclusion that common sense is not enough… Many people are not capable of using common sense in the handling and interpretation of numerical data until they have been instructed. Austin Bradford Hill

It is only prudent never to place complete confidence in that by which we have even once been deceived. Rene Descartes

A reasonable probability is the only certainty. E W Howe

In the space of one hundred and seventy-six years, the Lower Mississippi has shortened itself two hundred and forty-two miles. That is an average of a trifle over one mile and a third per year. Therefore, any calm person, who is not blind or idiotic, can see that in the old Colitic Silurian Period, just over a million years ago next November, the Lower Mississippi River was upward of one million three hundred thousand miles long, and stuck over the Gulf of Mexico like a fishing rod. And by the same token any person can see that seven hundred and forty-two years from now, the Lower Mississippi River will be only a mile and three quarters long, and Cairo and New Orleans will have joined their streets together, and be plodding along comfortably under a single mayor and a joint board of aldermen. There is something fascinating about science. One gets such wholesale returns of conjecture out of such trifling investment of fact. Mark Twain – Life on the Mississippi

© Dr Arjun Rajagopalan - 2

Critical reading of “original” medical articles

T

here's way too much stuff out there waiting to be read. The repeated chanting of the mantra of “evidence based medicine” leaves you with the nagging feeling that you should at least try to read some of the stuff, but, if you are like me, you have never received any formal instruction on how to go about making sense of arcane stuff like “p” values and 95% confidence intervals. Your memory of journal clubs is that of the potato chips and snacks that were provided. There are 2 kinds of journal readers: 1. Most of us have a short list of a half dozen or less journals whose Table of Contents we scan now and then, rejecting most items as fluff that no one will read except the authors. An article catches our eye and we turn to the page, or mark it for inclusion in a "journal club" or some such gathering. We are not narrowly focused researchers, but we see ourselves as "evidence-based" clinicians, without any real idea of what the term means. We browse journals in the following fashion: a) First, we scan the title. b) Then, we quickly look at the institutional affiliation(s) of the authors. c)

We jump to the Abstract "Conclusion" section.

and

land

first

on

the

2. The hard core researchers and nerds who will wade through a hundred and thirty-six references to answer a simple question and who, somehow, don't strike us as being capable of taking care of patients in the real world. This offering is for the first group. Don't be a wallflower at journal clubs, content to eat the potato chips, allowing a small group of loud, forceful people to act like they know it all. Get in there and pitch it back at them. This handbook provides you a tool to approach your selected bunch. First of all, it requires no special knowledge of biostatistics. What you need to know is sprinkled across the document in a painless, jargon-free manner. It provides a step-wise method that, when applied systematically, will give you the ability to critically analyze medical journal articles. You don't have to depend on the word of experts and residents of academic ivory towers. What's more, as you get good at this, you can burst their bubbles with confidence. The handbook is only a tool. Like all tools, it is up to you to use it and get good at it. Enjoy.

STEP 1 DON'T

READ THE ABSTRACT

Yes, I mean it. This is the worst way to read the literature. It makes you feel good but ends up implanting false ideas in the brain. Your mind is primed by biases before you begin. What's more, in the very likely event that you don't read the article critically, you will carry the biases acquired from the 1.2.3 method, through your day-to-day practice, often subliminally.

© Dr Arjun Rajagopalan - 3

Critical reading of “original” medical articles

STEP 2 READ

THE INTRODUCTION INSTEAD

It is more fruitful to start first with the introduction to the paper. Almost always, it will have this general format.

A proposition or statement Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Cras suscipit, diam ultricies consectetuer elementum, purus tellus dignissim odio, vitae porttitor odio orci id quam. Aenean interdum nunc id elit. Integer aliquet tempus augue. Nam erat. Praesent malesuada quam. Morbi id mauris. Fusce porta justo ut diam.

Declaration of current state of understanding Total ignorance

Complete clarity

We are now here We (the author(s)) would like to show by this study that the state of understanding can be shifted

STEP 3 THE

TITLE TAGS THE TYPE

Published medical evidence falls into one of three major groups. •



Observational studies: Data is collected and analyzed for significant patterns. Inferences are made from the patterns observed. There is no attempt to alter the natural course of the process that is studied. Observational studies can, again, be of three varieties: •

Those that only look for patterns in the data that is obtained – classical epidemiological studies.



Risk/ association estimation - the study attempts to answer the question: “Is there an association between A and X?” or “Does the presence of P pose an increased risk of developing R?”



Causation. “Is alpha the cause of delta?” Unlike risk, the relationship is of the “on-off” variety: if the cause is present, the outcome is inevitable, if not, there is no such event.

Evaluating diagnostic tests: The study attempts to evaluate the utility of a diagnostic test (decision making under states of uncertainty). © Dr Arjun Rajagopalan - 4

Critical reading of “original” medical articles •

Interventional trials: An external intervention, usually therapeutic, is made in an effort to alter the outcome from a disease, and evaluated in terms of this effect. As clinicians, these are the most common types of studies that will interest us

A necessary first step is to tag the study as one of the three types listed. The title of the paper will be sufficient in most cases.

Be wary of articles that attempt to be more than one of the three main types; they will rarely succeed in doing any one item well, leave alone all that they are attempting. You will evaluate articles on the basis of their label. Tagging is an easy task. To convince you, the underlying section lists a few papers from the literature. Tag them using the principles outlined above.

Practice – tagging article type Classify the articles listed in the table below as being one of three major types: observational (O), diagnostic test assessment (D) or interventional (I).

Title of article Prevention of contrast induced nephropathy with sodium bicarbonate Clinical value of the total white blood cell count and temperature in the evaluation of patients with suspected appendicitis The risk of cesarean delivery with neuraxial analgesia given early versus late in labor Outcomes in young adulthood for very-low-birth-weight infants Homocysteine as a predictive factor for hip fracture in older persons Randomised trial of a brief physiotherapy intervention compared with usual physiotherapy for neck pain patients: outcomes and patients’ preference A longitudinal, population-based, cohort study of childhood asthma followed to adulthood Emergency ultrasound in the acute assessment of haemothorax

The tag is the defining element in assessment of a study. This will become clearer when we get into the process of evaluating the results of a study.

© Dr Arjun Rajagopalan - 5

TAG

Critical reading of “original” medical articles

STEP 4 IDENTIFY

THE CLINICAL QUESTION



IN

4

PART

HARMONY

Example: Low-dose ramipril reduces microalbuminuria in type 1 diabetic patients without hypertension: results of a randomized controlled trial. O'Hare P, Bilbous R, Mitchell T, O' Callaghan CJ, Viberti GC

Diabetes Care 2000. 23(12): 1823-9 This study attempts to answer the 4-part question: In normotensive, type I diabetics (population), does a low dose of 1.25 mg of the ACE inhibitor ramipril (indicator variable), prevent progression of incipient diabetic nephropathy as measured by microalbuminuria (outcome variable), as compared to standard, 5 mg doses of ramipril or placebo. The paper's conclusion needs to be measured against the stated question and assessed for the degree of completeness and lack of evasiveness in addressing the issues raised. In this study, the authors have concluded: Microalbuminuria was reduced significantly by ramipril treatment in type 1 diabetic patients without hypertension, as compared to placebo. Although the magnitude of the response was greater, there was no significant difference between responses to 1.25 or 5 mg ramipril. (Note: The paper is reported without any attempt at examining the details of the study)

© Dr Arjun Rajagopalan - 6

Critical reading of “original” medical articles Once you have completed this task, identify the 4-part clinical question as: 1. Explicitly stated and complete. 2. Stated but incomplete. 3. Fuzzy.

The more explicitly stated and narrower the clinical question, the greater the likelihood of the authors' ability to establish the validity of their argument and overall soundness of the paper. Very broad and poorly articulated queries are unlikely to be adequately validated.

A SHORT BREAK TO EXPLAIN A CONCEPT BIAS (SYSTEMATIC

ERROR)

Assume it is everywhere Bias is defined as a prevailing preference or tendency that inhibits impartial comparisons and judgements. Bias is also called "systematic error" as opposed to random errors that occur due to the inherent variability of human groups. Bias has to be presumed to exist in all clinical studies. The challenge in study design and evidence based medicine is to minimise or eliminate bias - not easily done in many clinical siutations. In clinical trials and studies, bias occurs at three important points in the design of a study:

Bias in clinical trials Sampling Measurement Comparison

1. In the process of designing and drawing a sample for the study. 2. In the various measurements and observations that are a part of the study. 3. In designing and making comparisons between groups.

The importance of tagging a study lies in looking hard for a specific bias. Although all three sources of bias can flaw any study, specifically, each tag has a key concern: •

Observational studies – sampling bias.



Diagnostic accuracy – measurement bias.



Interventional studies – comparison bias.

© Dr Arjun Rajagopalan - 7

Critical reading of “original” medical articles

STEP 5 LOOKING

FOR SAMPLING BIAS

N o n -p r o b a b ility s a m p le Type

Process

Reason

Non-probability sampling - creation by a non-random process of a sample that is a facsimile of what would be a probability sample Consecutive sample

Including every patient who meets criteria over a given number or time frame

Simplest and most commonly used option in clinical research

Convenience sample

Using easily available members Easy strategy when any of an accessible population sample will be representative

Judgemental sample

Hand picking most appropriate members from an accessible population

Easy strategy when any sample will be representative

Clinical trials, necessarily, have to recruit patients from clinics and hospitals. Therefore, they are almost always non-probability samples and are inherently flawed in this respect. Multicentre trials can mitigate this bias to an extent. Given a choice, consecutive samples are the least subject to bias with convenience and judgmental samples running high risks of sampling bias.

© Dr Arjun Rajagopalan - 8

Critical reading of “original” medical articles All clinical trials, regardless of complexity, have to go through 4 stages in sampling. These are outlined in the diagram shown below.

It is best to look at an example first:

© Dr Arjun Rajagopalan - 9

Critical reading of “original” medical articles Sampling bias can be induced at 3 points in a study: •

The geography of the sample can induce cultural and socioeconomic bias that may render the study invalid for general application and external validation. The time period of the study may be critical if significant changes have occurred during this time frame in the understanding and management of the condition that is being studied.



Inclusion and exclusion criteria are a necessary part of good study protocols. However, if they are too tight or too loose, the resulting sample will be non-representative and biased.



All studies will have drop outs between the intended population (those available after application of inclusion/ exclusion criteria) and those available till the study is completed as defined in the protocol. Any study that has a drop out rate greater than 15% is suspect and automatically invalid. One has to assume that drop outs occur because of unfavorable outcomes.

Sampling errors are the most common reasons for flawed studies, yet, the average reader spends the least amount of time, if any, in critically evaluating the sampling process. Regardless of all else, a biased sample red flags a poor study.

STRENGTHENING

THE SAMPLING PROCESS



A PRIORI

SAMPLE SIZE ESTIMATION Later in this exercise, we will look at the impact of sample size (n) on estimating the significance of differences shown in studies. A large number of trials, although well done and showing differences, are weakened by being “underpowered”. Simply stated, the size of the sample is not large enough to prove the point beyond doubt. Good studies will attempt to handle this critical issue by estimating the sample size, a priori. This can be done by a biostatistician if the investigator provides three criteria: 1. The size of the difference between the groups (as a percentage or proportion) that he is looking for or postulating. 2. The power of the test to detect this difference – expressed as a percentage and usually set at 80 – 90%. 3. The confidence level or the margin of error that the investigator is prepared to accept – expressed as a decimal or percentage. This example will clarify: (Ref: The risk of cesarean delivery with

neuraxial analgesia given early versus late in labor. NEJM 2005; 352:65565.) “The study was designed to have 80 percent power to detect a

difference of 50% in the rate of cesarean delivery, with a two-sided alpha level of 0.05. The sample size required to detect this difference was 350 subjects per group.” A priori sample size estimation strengthens the study and protects it from the danger of small numbers or the expense of larger trials than are necessary.

© Dr Arjun Rajagopalan - 10

Critical reading of “original” medical articles

STEP 6 LOOKING

FOR MEASUREMENT BIAS

All studies involve measurements; as part of the study protocol, during the study itself or in assessing results. Bias can creep into measurements through various routes.

Measurement: precision & accuracy Any measurement has to have 2 characteristics: precision and accuracy. The diagram shown below describes the two entities with the simple exam of the results of hitting a target.

Precise and accurate

Precise but not accurate

Accurate but not precise

Neither precise nor accurate

Measurement errors can arise from 2 sources: 1. The observer making the measurement. 2. The instrument (or device) used to measure. We tend to overlook common bedside tools like pulse rate, BP measurement and temperature as sources of measurement error, but, studies can be weakened by such simple devices. Each of these can again be random (due to the nature of the sample) or systematic (bias) – part of the study design.

Observer bias Good studies will attempt to reduce observer bias through various strategies. Look for the following: 1. Training of observers: Putting the observers through a training and certification protocol to ensure reproducibility of results. 2. Standard protocols: Formal statements of procedures and processes that are to be used. 3. Using numerical scoring and scaling systems where subjective decisions are involved. A typical example would be the use of visual analogue pain scores to assess pain levels rather than descriptive terms such as "mild', "moderate", or "severe". 4. Blinding: This is the most classical of techniques to reduce what is called differential bias. The variants of blinding include: a. Single blinding (independent observer): where it is not possible for the investigator to administer an intervention without knowledge of its nature, but, the outcome variables can be assessed by a person who is © Dr Arjun Rajagopalan - 11

Critical reading of “original” medical articles unaware of the intervention given. For eg. using an independent infection control nurse to assess postoperative wound infections rather than the surgeon who performed the procedure. b. Double blinding: where neither the person administering the intervention nor the subject receiving the intervention is aware of the nature of the intervention. In such cases, observations need not necessarily be made by independent observers. c. Cross-over, double blind studies: At some point in the protocol, the interventions are switched in the subject. For eg. those on the placebo, now get placed on the treatment and vice versa, the entire process being carried out double blinded. Each patient, thus serves as his own control. Measuring observer variability: No two successive observations will be identical. Variability will occur between observations made by the same person on the same situation, on successive occasions – intra-observer variability, and between different persons making the same observation – inter-observer variability. This source of error can be estimated statistically and expressed as a “kappa” statistic, a valuable, objective figure. The final figure is stated as a decimal between 0 and 1. A kappa of "1" indicates perfect concordance between observers (virtually impossible) and a kappa of "0" represents total discordance. Practically, a kappa value above 0.6 is considered good agreement between observers.

Measurement bias Strategies for reducing measurement bias include: 1. Repetition: Obtaining multiple readings from the same event would increase the reliability of the values. 2. Calibration against a "gold standard". This aspect is particularly important when using advanced generation instruments that are yet to be proven. Some simple examples include the use of digital blood pressure recorders and weighing machines. Periodic calibration against a mercury column machine or a beam balance would be necessary. 3. Suitability of the instrument to what is being measured: Quite often this element is glossed over or missed. A well done study that uses an unsuitable instrument, will raise doubt on the validity of the results.

Systematically list everything that the study measures and put them through the checklist for measurement bias. Don't take anything for granted; look for explicit statements or the absence thereof.

© Dr Arjun Rajagopalan - 12

Critical reading of “original” medical articles

STEP 7 BIAS

IN COMPARISON



COMPARING APPLES WITH

APPLES, NOT ORANGES Progress in Medicine is a slow, incremental process of comparing the new with what is traditional or established. “Paradigm shifts” are very rare. Fair comparisons are a critical aspect of valid studies. Against the backdrop of the vast range of variability in human populations, assuring fair comparisons between groups is not an easy task

Comparison: being in "control"

GROUP A

Observation/ Outcome “A”

GROUP B

Observation/ Outcome “B”

The difference - “is it real?”

Most clinical trials are designed on the principle of making comparisons between groups. Observations and/or outcomes of a group (study group) are compared with another reference group (control group) and the differences established. The study, in conclusion, will make a decision whether this difference is valid or not.

It is therefore mandatory in all comparative trials that there be a reference group (control) against which the comparison is made. A large number of studies are invalid as evidence because of inadequacies in the control and the compared groups. Sources of error include: 1.No controls. However large the numbers and however rigorously designed the study, an uncontrolled study is unacceptable and ranks only as anecdotal evidence. (It is like the sound of one hand clapping!). 2.Historical controls. Quite commonly, the lack of controls in the study will be addressed by comparing the results of the study with other studies on the similar topic, done elsewhere, at different times in the past. Historical controls are not fair comparisons and are unacceptable. 3.Poorly matched controls. Controls may be present, but are not comparable with the study group. As an example, those is the study group may be younger and with less comorbidties than the controls and therefore, be associated with better outcomes - comparing apples with oranges.

© Dr Arjun Rajagopalan - 13

Critical reading of “original” medical articles

© Dr Arjun Rajagopalan - 14

Critical reading of “original” medical articles Good studies will outline the salient characteristics of control and study group and analyze differences using tests of statistical significance. These lists have to be scrutinized, certified as comparable, and, significant differences noted. No interventional trial is valid unless it is compared with a control group. At present, the prospective, randomized, controlled trial (RCT) is the only method of ensuring comparable groups in clinical trials. By extension, valid interventional trials must therefore, be RCTs. Retrospective analysis are always flawed in that allocation to control and study arms can never be unbiased. Comparisons in retrospective studies are seldom fair, the groups compared are always dissimilar, and, therefore, differences in outcome can never be proven to be true. By now, you would have weighed the study in terms of systematic error or bias and would like to move on to looking at the results of the study, usually expressed as differences between two or more groups. There is another important source of error that needs to be examined at this juncture – random error; the consideration that differences between groups might be purely an accident and an expression of the inherent variability and inhomogeneity of human populations.

A SECOND (AND FINAL) BREAK TO TAKE IN ANOTHER IMPORTANT CONCEPT ESTIMATING

THE IMPACT OF THE INHERENT

VARIABILITY OF HUMAN POPULATIONS The commonly used term, “the bell shaped curve”, encapsulates all our ideas, knowledge and emotions regarding the tendency of populations to exhibit values for single indicants that are distributed widely around an average or central tendency and not tightly bunched around a mean. An immediate implication of this phenomenon is that differences between human groups cannot be casually taken as real differences. In fact, it is assumed that any noticed difference is a random event and, the burden of proof is on the author or investigator to show that the difference is real, consistent and repeatable - in statistical jargon, “rejecting the null (no difference) hypothesis”. This is where we need the help of biostatistical methods. No study is acceptable unless put through the mill of statistical analysis to determine the significance of differences that have been demonstrated in the study. I can hear you saying, “Don't go there now!” In truth, contrary to popular belief, it is not necessary to have any knowledge of biostatistics to be able to critically analyze medical articles. You can safely consider all the arcane, background mumbo-jumbo as taking place in a black box that will finally put out two indices that you need to know about and be able to apply intelligently: •

The 'p' value, and



The 95% confidence interval (CI).

© Dr Arjun Rajagopalan - 15

Critical reading of “original” medical articles 'p' value

A quick recap first

GROUP A

Observation/ Outcome “A”

GROUP B

Observation/ Outcome “B”

The major task ahead of us is that of deciding whether differences seen in outcomes are real or just due to chance. Errors may be systematic (bias) or random (chance). Systematic errors are minimised by good study design. Random errors are evaluated by established biostatistical methods.

The difference - “is it real?” DATA FROM THE CLINICAL TRIAL

Biostatistical analysis is a formal process of taking the data generated from the clinical trial and putting it through well established procedures and processes that will permit us to make the decision regarding the truth - i.e. are we to reject the null (no difference) hypothesis or not.

BIOSTATISTICAL BLACK BOX

MEASURE OF THE TRUTH

Regardless of the type of statistical test used, a ' p' value will be generated at the end that is a simple estimate of the probability of error or chance in producing this observed difference.

'p' VALUE

The ' p' value is a simple estimate of the probability of error or chance in producing the observed difference. By convention, it is expressed as a decimal fraction, for eg. p = 0.03. The table shows some examples.

'p' value 0.38 < 0.05 0.001

Explanation 38% chance that the differences are not real Less than 5% chance that the differences are not real 1 in 1000 chance that the differences are not real

In biostatistical usage, a p value of 0.05 or less is taken as a signficant difference; i.e. less than 5% probability that the difference observed is due to chance alone.

Errors in assigning significance Null hypothesis true (no difference) Accept hypothesis Reject hypothesis

Correct decision

Null hypothesis false (difference is valid) Type II (beta) error (false rejection)

Type I (alpha) error (false validation)

Correct decision

Since the decision regarding the 'p' value cut off is somewhat arbitrary, it is essential that the value be chosen with the least potential for error. If the limit is made too narrow (small 'p' value) then the likelihood of Type II errors (false rejection) rises. Conversely, if too wide (large 'p' value) the likelihood of a Type I error (false validation) increases. The 0.05 cut-off appears to be the

© Dr Arjun Rajagopalan - 16

Critical reading of “original” medical articles "sweet spot" in this regard and determines significance. Although a convenient and easily understood measure of the truth behind differences, the 'p' value has many shortcomings. 1. The arbitrary nature of the cut-off point for significance creates a false dichotomy in what is often a continuum. This is particularly important when differences are close to the borderline - eg. p = 0.06. 2. The 'p' value tells us nothing about the size of the difference or its direction. It is not a quantitative measure. 3. Sample size has a major impact on 'p' values. A small difference in studies with large sample sizes will have the same 'p' value as a much larger difference with smaller samples. As the 'n' in the study increases, more 'p' values are likely to become significant. In recent years, the "95% confidence interval" is preferred as a measure of examining the differences between groups on a numerical basis. It is more likely to be useful to practitioners making clinical decisions.

The 95% confidence interval (CI)

...versus 'p' values

All studies rely on samples that are hopefully representative of the truth. The confidence interval is a quantitative estimate of the range of values within which the truth is likely to lie, with a specified degree of confidence. The 95% confidence interval is the range of values within which we can be 95% sure that the truth regarding the population lies. CIs can be calculated for any continuous variable. CIs may be expressed for any value, 90%, 99% etc, but, like the 'p' value, the sweet spot is the 95% cut-off. The table below compares the 95% CI with 'p' values.

Nature Basis of decision

Size of difference Direction of difference Impact of 'n'

'p' value

95% CI

Qualitative

Quantitative

Dichotomous (significant/ nonsignificant)

Continuous set of numbers

Not indicated Indicated Not indicated Indicated Present Present Reader cannot Reader can make make a judgement a judgement

95 % CIs can be derived for common indices like: - Differences between means or proportions - Relative risks and odds ratios - Sensitivities, specificities and likelihood ratios (These items are discussed later in the document)

T h e c o n c e p t o f 9 5 % C I is b e s t u n d e r s to o d th r o u g h e x a m p le s

The commonest kind of clinical trial is one that compares two groups. Specific outcome variables are measured and, the mean values or proportions compared. In the light of what we know about the tendency of measured values to be distributed across a range and not tightly bunched around the mean, we need some other measures that will permit us to to judge if these differences are real and valid or merely due to chance. The 95% CI is the most commonly used measure. Consider the following example: © Dr Arjun Rajagopalan - 17

Critical reading of “original” medical articles The data summarized in the table shown below is from a RCT testing the efficacy of pertussis vaccine. Patients were randomly assigned to receive either pertussis vaccine (study group) or a placebo (control). The table shows the outcome of the study as measured by the number of study subjects who developed pertussis in the follow up period.

Developed pertussis

Vaccine (1670) 72 (4.3%)

Placebo (1665) 240 (14.4%)

Ref: Trollfors B, Taranger J, Lagergard T, et al. NEJM 1995: 333: 1045-50

The difference in the rate of development of pertussis between placebo and vaccination (absolute risk reduction - ARR) was 10.1%. The 95% CI for this difference was calculated and reported as 8.2 12.0%. This tells us that even assuming the lowest value as being the truth, there is a real difference because it does not reach or cross zero, the point at which one would have to consider that one end of the truth estimate reaches the point of no difference. It is possible for the value to go below zero as well, indicating a possible negative difference. (In contrast, the 'p' value gives us no feel for the magnitude or direction of the difference.) The width of the CI is dependent on the numbers sampled - the 'n'. The more the numbers involved, the greater the degree of precision with which the 95% CI can be represented. This concept is best expressed in the following hypothetical situation involving the sensitivity of a diagnostic test.

n = 24 n = 240

Sensitivity

95% CI

95.8% 95.8%

75 – 100% 92.5 – 98.0%

In the example shown, increasing the 'n' from 24 to 240 results in a significantly more precise 95% CI, even though the underlying outcome variable, the sensitivity of the test, remains unchanged.

Two practical points in applying the 95% CI 1. When calculated for raw numbers (continuous variables) such as the differences observed between groups, the 95%CI will be expressed as a range that can include “0” as a value and even be negative. When the 95% CI touches or extends across the zero value, (eg. 3.4, -1.8) it means that there is likelihood of no real difference between the groups. 2. When calculated for ratios such as risk or odds ratios, the 95%CI will be expressed as a range that can include “1” and span values from an ever increasing series of decimals to any value above 1. If the 95% CI includes “1” within its range (eg 0.04, 2.6), it means that there is a likelihood that the differences shown are not real or statistically significant. The 95% CI provides quantitative information and permits the reader to make finer clinical judgments and decisions without the dichotomy of 'p' values. In practice, it is worthwhile looking at both estimates. © Dr Arjun Rajagopalan - 18

Critical reading of “original” medical articles

STEP 8 NOW

FOR THE RESULTS

At this point, you need to go back and recall two important elements: 1. The type of study that the paper represents (STEP 3). Depending on which one of the three types the study represents, you will have to go through a specific series of steps to evaluate the article. 2. The four part clinical question that the study is trying to address. It is here that most studies will become evasive and throw a large number of tables, charts and data at you. Hold your ground and keep focused on the major stated intentions of the study. Don't get distracted by post hoc data analysis that was not in the stated intentions. It is a fact of life that journals don't like publishing studies with equivocal results. To get around this, many studies will descend into creating subgroups and making comparisons. If this was the intention ahead of time, the authors should have clearly stated that in their clinical question and stratified the sample at the time of enrollment of study subjects. Don't accept attempts at post-hoc sub groups analysis. For eg. A study may not show any significant difference with the use of intervention X in a disease, as compared with the controls. They may then go on to state that women, who were post-menopausal and diabetic (or some such) showed a difference. If this was their aim, they should have stratified their study at enrollment into: men vs women; the women into post and premenopausal, and each of these as diabetic and non-diabetic; a strategy that would vastly increase the complexity, sample size and expenses. Watch out for this trick, it is all too common.

© Dr Arjun Rajagopalan - 19

Critical reading of “original” medical articles

STEP 9 A INTERPRETING

INTERVENTIONAL STUDIES

Ask yourself the following questions: •



Were the interventions explicitly stated? This is particularly important when interventions involve skills in the person making the intervention. Ask: ○

Were protocols drawn up before the study and was there a process for assuring basic competence in the intervention? Was there any monitoring by independent observers to assure quality during the study.



Is this level of training/ practical in day-to-day life?



If skill levels could not be standardized, were the results analyzed on the basis of known or assumed competence of the person making the intervention?

competence/

monitoring

What was the level of significance of demonstrated differences. Look at the 95% CI and 'p' values for the differences.

Simpler still, express the major difference as a fraction and invert it to get the NNT. (If the difference between control and study group was 12%, then the NNT is the reciprocal of 12/100, 100/12 which is about 8. © Dr Arjun Rajagopalan - 20

Critical reading of “original” medical articles

STEP 9B INTERPRETING

STUDIES ON VALUE OF DIAGNOSTIC

TESTS

First, closely scrutinize the study for measurement bias. Tests are obtained to resolve uncertainty and address the question, "Will I be better off after the test in terms of coming to a diagnosis?"

To explain the point further: 1. The LR+ is a measure of the true positive rate. An LR+ of 5 means that there will be one false positive for every five true positive tests. 2. Similarly, the LR- is an index of the false negative rate. An LR- of 0.2 (20%) means that there will be one false negative for every 5 true negative tests. Using sensitivities and specificities in a raw fashion, is erroneous because the likelihood of resolving the uncertainty behaves in a quirky fashion. Consider the table shown below.

© Dr Arjun Rajagopalan - 21

Critical reading of “original” medical articles Sensitivity

Specificity

LR +

LR -

95% 95% 80% 85% 75% 75% 65% 95% 65% 70%

95% 80% 95% 75% 85% 65% 75% 65% 95% 70%

19 4.75 16 3.4 5 2.1 2.6 2.7 13 2.33

0.05 0.06 0.21 0.20 0.29 0.38 0.47 0.07 0.37 0.43

A test that is 70% sensitive and 70% specific is practically useless. Let's look at the results of this study: "Magnetic resonance imaging for preoperative evaluation of breast cancer: a comparative study with mammography and ultrasonography". (Hata T, Takahashi H, Watanabe K, et al. J Am Coll Surg 2004; 198:190-197.) The authors are claiming that MRI can detect intraductal spread more accurately than the other two methods and conclude that MRI " appears to be indispensable in breast conserving surgery to minimize local recurrence". Is Hata san correct? Let's look at his numbers.

Sensitivity Specificity Ultrasound Mammogram MRI

21% 22% 67%

85% 86% 64%

LR +

LR -

1.4 1.57 1.86

5.26 5.57 0.52

At first glance, to the uninitiated, it looks as though MR does better sensitivity/ specificity of 67/64% as compared to mammography (22/86%) and ultrasound (21/85%). Now see what happens when Likelihood Ratios are calculated (which the authors did not). •

All three have LR+ < 2 - that puts them in the "should we bother getting it?" category. In simple words, this means that there will be one false positive test for every two true positives.



As far as the LR - is concerned, US and mammography are actually much better than MRI, which is practically worthless – an equal chance of true and false negatives. A negative MRI is far less valuable than a negative US or mammogram!

"Indispensable?"

Learn to use likelihood ratios, not the knee-jerk sensitivity/ specificity figures, to determine the value of a diagnostic test. It is well worth the effort. Rule of thumb: Look at the LR+ when the emphasis is on diagnosing (ruling in). Use the LR- when the issue is one of exclusion (ruling out, screening).

© Dr Arjun Rajagopalan - 22

Critical reading of “original” medical articles

STEP 9C INTERPRETING

STUDIES ON RISK/ ASSOCIATION/

CAUSALITY Clinical trials often attempt to assess the degree of harm or risk that accompanies exposure to various events and circumstances.

Risk assessment Have measurements been made on more than one occasion?

NO

YES

Cross-sectional study (case controlled)

Longitudinal study (prospective cohort)

Odds ratio

Relative risk

Evidence from these studies is acceptable only if they fall into one of the two types that are shown above. The longitudinal cohort study is the better choice but may not always be practical, in which event, a well done, case-control study is an option. Longitudinal cohort studies can assess incidence of the disease since they involve follow-up over a period of time and therefore, can measure risk. Longitudinal studies will yield a "relative risk". Cross sectional studies (case controlled) cannot provide estimates of risk. They will only yield an "odds ratio". •

Since this group does not perform interventions, sampling strategies have to be closely scrutinized for bias.



Studies on risk and association, will have to be represented as shown in the diagram below, keeping in mind the 4 part clinical question.

From the numbers in this simple 2x2 table the following results will be stated:

The calculation is not as hard as it looks. It's OK if you don't want to be bothered by it. Just remember the principle behind each and look for the numbers in the study. The more important aspect of using these numbers is outlined in the next section. © Dr Arjun Rajagopalan - 23

Critical reading of “original” medical articles From the earlier discussion on 95% CI we learned that when calculated for ratios such as risk or odds ratios, the 95%CI will be expressed as a range that can include “1” and span values from an ever increasing series of decimals to any value above 1. If the 95% CI includes “1” within its range (eg 0.04, 2.6), it means that there is a likelihood that the differences shown are not real or statistically significant. This will be clearer as you go through the examples provided below.

Association of obesity and cancer risk in Canada. Pan SY, Johnson KC, Ugnat AM et al. Am J Epidemiol.

The use of estrogens and progestins and the risk of breast cancer in postmenopausal women. Colditz

2004; 159:259-68.

GA, Hankinson SE, Hunter DJ et al. NEJM, 1995;

This is a population-based, case-control study of 21,022 incident cases of 19 types of cancer and 5,039 controls aged 20-76 years during 1994-1997 to examine the association between obesity and the risks of various cancers. The study compared people with a body mass index of less than 25 kg/m2, with (body mass index of > or = 30 kg/m2.

Odds ratio

95% CI

Overall Colon Pancreas Breast

1.34 1.93 1.51 1.66

1.22 – 1.48 1.61 – 2.31 1.19 – 1.92 1.33 – 2.06

Ovary Prostate

1.95 1.27

1.44 – 2.64 1.09 – 1.47

332:1589-93

This is data from a prospective, cohort study During 725,550 person-years of follow-up, 1935 cases of invasive breast cancer were newly diagnosed. When compared with women who had never used hormones, the data was as shown.

Relative risk

95% CI

1.32 1.41

1.14 – 1.54 1.15 – 1.74

1.46

1.22 – 1.74

Estrogen alone Estrogen + progestin 5-9 yrs users

When are risk/ odds ratios significant? At what level do we consider risk ratios and odds ratios to be clinically significant. Although there are formulae for calculating numbers needed to harm (much like the NNT seen earlier), these are not easily applied. As a general rule of thumb, it may be stated that: •

Risk ratios of 3 or more are significant.



Odds ratios of 4 or more are significant.



A risk ratio causality.

greater

than

20

practically

implies

As a general caveat, the threshold for significance would also be determined by the seriousness of the adverse event. The more serious the event, the smaller the ratio that could be considered significant.

© Dr Arjun Rajagopalan - 24

Critical reading of “original” medical articles Studies on risk may sometimes present a group of risk ratios/ odds ratios, in a graphical form such as what is shown below. This is from a paper that seeks to address the question whether women who have breast cancer detected early by mammographic screening programs have a lower risk of cancer-related death than those picked up conventionally.

With your knowledge of interpreting risk ratios and their significance, you can eyeball this chart and come to your own conclusions. The middle dot in each line represents the risk ratio figure for each study and the line represents the spread of the 95% CI. As you can see, several of them cross the “1” mark, the point of no significance, but, if your take the bottom value (red arrow), the one that represents the summation of all the values above it, you can see that the relative risk falls well short of the “1” mark. This means that women who were treated for mammographically detected breast cancer, had a lower risk of cancer-related deaths on long term follow up than the control population.

Proving causality

When your 2 x 2 table shows numbers like this, it is very tempting to extrapolate that the predictor variable may be the cause of the outcome. This is dangerous. It is suggested that 5 questions be asked and answered before a causal link can be established. 1. Is it clear that the exposure to the risk factor preceded the outcome? 2. Is a dose-response gradient demonstrable: consistently increasing harmful effects with increasing exposure? 3. Is there evidence from a "dechallenge-rechallenge" study:

© Dr Arjun Rajagopalan - 25

Critical reading of “original” medical articles the adverse effect decreases/ disappears when the risk factor is withdrawn and reappears when it is reinstituted? 4. Is the association consistent and repeatable in other studies? 5. Does the association make biological sense? Most often, evidence in support will be lacking or weak. Many common diseases are complex disorders with multiple etiological possibilities. Confounding - the process where a cause produces its effect through a less visible, but stronger factor - is commonplace in Medicine. The process of estimating such effects involves complex mathematical processes that would make you scream. Take them on faith. They will ultimately yield the familiar odds ratios, relative risk and 95% CIs that you can use to interpret the results. When multiple risk factors are being assessed in a single situation, it is possible to evaluate the relative strengths of each The process of establishing strength of association is a complex mathematical effort using regression analysis and other terrifying animals that we would rather take on faith than confront. Moreover, these complex regression analysis are usually pieces of sophistry that can seldom be applied in real life. If they scare you, leave them alone. You will not know the difference.

STEP 10 CLOSING

THE LOOP



APPLYING THE RESULTS IN YOUR

PRACTICE Journal reading should not be a sterile exercise. If you are satisfied with the quality of the article you have read, you need to apply its conclusion in your practice. A simple mnemonic might make the task easier. INFER, where •

I – Interesting: does the topic fall within you sphere of interest?



N – Novel: is it saying something new?



F – Feasible: can you do it in your daily environment. Use the NNT, likelihood ratios and risk ratios to make the decision.



E – Ethical: subtle issues like patient's preferences, cultural constraints and so on have to be looked at besides overall ethical concerns.



R – Resources: what impact will it have on resources – your's, the patient's, the hospital's, society as a whole?

“I never promised you a rose garden.” For some worked out examples and a downloadable, 2-page, study evaluation form visit: http://www.ebm4d.org

© Dr Arjun Rajagopalan - 26

4-part research question

Inclusion criteria

R

?

T

Exclusion criteria

R

?

T

Population:

Predictor

Authors: Journal:

variable:

Outcome variable:

Affiliation:

Comparison:

BACKGROUND:

Sampling Probablility sample | |

Simple random| |Stratified random| | Cluster| |

Non-probability sample |

|

Consecutive| |Convenience| |Judgmental| |

Sampling scorecard Target population Accessible population Intended population (after inclusion/ exclusion)

EBM Dashboard Primary nature of study

Drop outs (%)

Interventional

Study population

Diagnosis Risk/ association/causality

More ... 

Observational

Summing up

Evidence hierarchy Double blind RCT Randomised controlled trial (RCT) Prospective cohort Case control Case series

Systematic error (bias) 1

2

3

4

5

3

4

5

Sampling Measurement Comparison

Applicability 1

2

Interesting Novel Feasible Ethical Relevant / /200 EBM 4 dummies – © Dr Arjun Rajagopalan

- 1 of 2

Measurement Devices used Authors: Journal: Affiliation:

Comparison Controls Randomised

Measurement error

None

Blinding

T

Scoring

?

Training

R

Observer error Gold standard

Device used

Historical

Device suited to task

Repetition

Device error

Non-random

Protocols

Case controlled

1.

Controls - details Randomisation method

2. 3. 4. 5. 6.

Details

7. 8.

Personal notes and observations

Comparability

Disparity

EBM 4 dummies – © Dr Arjun Rajagopalan

- 2 of 2

Related Documents

Journal Club Toolbox
April 2020 8
Journal Club
October 2019 17
Pg Soc Journal Club
November 2019 13
Theory Toolbox
May 2020 10
Unix Toolbox
April 2020 9

More Documents from ""