0
We're unable to sign you in at this time. Please try again in a few minutes.
Retry
We were able to sign you in, but your subscription(s) could not be found. Please try again in a few minutes.
Retry
There may be a problem with your account. Please contact the AMA Service Center to resolve this issue.
Contact the AMA Service Center:
Telephone: 1 (800) 262-2350 or 1 (312) 670-7827  *   Email: subscriptions@jamanetwork.com
Error Message ......
Study |

Errors in the Archives of Dermatology and the Journal of the American Academy of Dermatology From January Through December 2003 FREE

Julie A. Neville, MD; Wei Lang, PhD; Alan B. Fleischer Jr, MD
[+] Author Affiliations

Author Affiliations: Departments of Dermatology (Drs Neville and Fleischer) and Public Health Science (Dr Lang), Wake Forest University School of Medicine, Winston-Salem, NC.


Arch Dermatol. 2006;142(6):737-740. doi:10.1001/archderm.142.6.737.
Text Size: A A A
Published online

Objective  To assess the frequency of statistical errors in the dermatology literature.

Design  Original studies published in the Archives of Dermatology and the Journal of the American Academy of Dermatology from January through December 2003 were analyzed for correctness of statistical methods and reporting of the results.

Results  Of 364 studies published, 155 included statistical analysis. Of these, 59 (38.1%) contained errors in the methods or omissions in reporting of the statistical results. Fourteen percent of the articles with statistical analysis contained errors in the methods used (considered to be more significant errors), 26.5% contained errors in the presentation of the results, and 2.6% contained errors in both.

Conclusions  The misuse of statistical methods is prevalent in the dermatology literature, and the appropriate use of these methods is an integral component of all studies. Readers should critically analyze the methods and results of studies published in the dermatology literature.

Statistics are frequently used when reporting the results of studies in the medical literature, yet errors commonly occur in the correct use and presentation of statistical findings. Previous reviews in other medical disciplines have demonstrated high error rates, ranging from 45% to 95%.17 A review of 100 articles in the dermatopathology literature found an error rate of 36% in 25 articles that contained statistical analysis.8 Inadequate power is also prevalent, with the results of 1 study9 indicating that most clinical trials with negative conclusions in dermatology did not have an adequate sample size to detect a difference between treatment groups. A more comprehensive review of the general dermatology literature to detect other statistical errors has not been conducted, to our knowledge.

For this reason, we performed a retrospective review of the statistical methods from all studies published in the Archives of Dermatology and the Journal of the American Academy of Dermatology in 2003. These 2 journals were chosen because they are well-respected peer-reviewed journals in the dermatology literature.

Articles published in the Archives of Dermatology and the Journal of the American Academy of Dermatology from January through December 2003 that included statistical methods were reviewed for errors. The articles included in this study were those containing statistical analysis from the sections of these journals publishing scientific studies. In the Archives of Dermatology, these included the Studies, Observations, Correspondence, Evidence-Based Dermatology, and Reviews sections, and in the Journal of the American Academy of Dermatology, these included the Reports, Therapy, Laser Surgery, Dermatologic Surgery, Dermatopathology, and Brief Reports sections. Despite the inconsistent definition of what constitutes a statistical error, we chose to include those errors that were highlighted in previous reviews from other medical disciplines.17 The 2 groups of statistical errors considered were in the use of a statistical test and in the presentation of the results.

ERRORS IN THE USE OF A STATISTICAL TEST

Because most articles did not provide the raw data necessary to determine the distribution, we assumed that sample sizes smaller than 30 in each group would not have a normal distribution and that a nonparametric test should be used. While parametric statistical tests assume that the data collected have a normal, continuous, bell-shaped (Gaussian) distribution, nonparametric methods are free of this assumption and work well for small sample sizes and data with skewed distributions. The sample size of 30 was selected because it is used in Basic & Clinical Biostatistics by Dawson-Saunders and Trapp10 as an arbitrary cutoff for differentiating between data sets with a normal or nonnormal distribution. Exceptions to this were the rare occasions when the authors stated that they performed a visual or statistical test to ascertain the distribution of the data and then used the appropriate test based on these results. It is also possible that sample sizes larger than 30 may not have a normal distribution and that a nonparametric test should be used with these data. Given the lack of a clearly defined cutoff value and data necessary to determine if the correct test was chosen, we considered these to be errors in the use of a statistical test, although arguably, they may be questionable. In addition to using the appropriate test for the data distribution, we also evaluated the correct use of unpaired and paired t test, the use of analysis of variance (ANOVA) for multiple comparisons, the use of Fisher exact test for small sample sizes, and the pooling of variance.

ERRORS IN THE PRESENTATION OF THE RESULTS

Minor errors in the presentation of the findings included failure to report the type of statistical test used in the article and whether it was 1-sided or 2-sided. Another error was presenting the results in the format of “a ± b” without reporting if b represents the standard deviation or the standard error of the mean. Although not considered an error in our analysis, we checked for the inclusion of details about the statistical analysis package and the power of the study.

During this 1-year period, 155 (42.6%) of 364 articles published in these 2 journals contained statistical analysis. Most articles that did not include statistical analysis were descriptive studies in which statistics would not have contributed any additional information to the article. Thirty-three (21.3%) of the articles used parametric methods only, 49 (31.6%) used nonparametric methods only, 45 (29.0%) used a combination of these methods, and 28 (0.2%) used other methods. The most frequently used tests were χ2 test (29.7%), unpaired t test (18.7%), ANOVA (16.8%), Fisher exact test (14.8%), and paired t test (10.3%) (Table).

Table Graphic Jump LocationTable. Statistical Tests Used in Articles With Statistical Analysis

Of those studies that included statistical analysis, 59 (38.1%) of 155 contained errors or omissions in statistical methods or the presentation of the results. Twenty-two articles (14.2%) contained significant errors in the use of a statistical test that could potentially change the validity of the study results, 41 (26.5%) contained errors in the presentation of the results, and 4 (2.6%) contained errors in both.

Thirty-eight of the errors occurred in the Journal of the American Academy of Dermatology, with 14 errors in the use of a statistical test, 26 errors in the presentation of the results, and 2 errors in both. In the Archives of Dermatology, 21 errors occurred, constituting 8 errors in the use of a statistical test, 15 errors in the presentation of the results, and 2 errors in both.

Errors in the statistical test chosen included 3 articles (1.9%) not using Fisher exact test when analyzing a 2 × 2 contingency table, as should have been performed when the expected cell count for at least 1 of the cells was fewer than 5. Other errors included using an unpaired t test with paired data (3 articles [1.9%]), using t test or z test to compare multiple samples when a test such as ANOVA should have been used (2 articles [1.3%]), and comparing multiple studies without using the correct methods for pooling variance (1 article [0.6%]). The questionable error of using a parametric test (often t test) on sample sizes smaller than 30 without indicating the use of a test for normality occurred in 16 articles (10.3%).

In the presentation of statistical results, failure to state if a test was 1-sided or 2-sided was the most common omission, occurring in 32 articles (20.6%). Eight articles (5.2%) provided statistical results and P values without disclosing the statistical test used. Two articles (1.3%) did not state if they were reporting standard deviations or standard errors of the mean. Although not considered an error in our analysis, 92 articles (59.4%) did not report the statistical package used for the analysis, and only 16 (10.3%) of the articles included any information about the power of the study.

We evaluated industry sponsorship of studies to see if this affected the rates of errors. Of those studies with errors in the use of a statistical test, 4 (18.2%) of 22 were sponsored by industries, as were 10 (24.4%) of 41 studies with errors in the presentation of the results.

In this review, 59 (38.1%) of 155 studies using statistical tests contained errors in statistical methods or in the presentation of the results. Most of these errors were minor omissions in the presentation of the results, but 22 studies (14.2%) used an incorrect statistical test. Without the original data, it is impossible to know if these errors invalidate the results of studies, but such errors should prompt the reader to question the results.

Most articles that were published in these 2 journals used nonparametric statistics, often in conjunction with parametric methods. The error rate of 38.1% is consistent with error rates in published studies17 from other medical disciplines and in the study by Flotte et al8 in the dermatopathology literature.

Three studies used unpaired t tests with paired data. Typically, this results in a falsely elevated P value and can lead to failure in detecting a significant difference when one exists.11 Two studies did not use a test for multiple comparisons, which can result in finding a spurious difference between 2 groups.

Only 10.3% of the studies included information on the power of the study, usually to determine the sample size necessary to detect a statistical difference before study initiation. Studies with inadequate sample sizes have an increased risk of type II error (failing to find a difference when one actually exists).12 Although not necessary in all studies, power should be reported in studies reaching negative conclusions because the inadequate sample size may have resulted in the lack of significance.9,13 A less significant omission occurred in the failure to report the statistical package used for analysis, although we did not consider this an error because some authors only include the package details if relevant.

This study is limited by the inconsistent definition of what constitutes a statistical error. We chose to include those errors that were analyzed in reviews from other medical disciplines,17 but some of these errors can be considered questionable. One example of this was considering the use of a parametric test (usually t test) with small sample sizes to be an error unless the authors stated that they performed a test for normality. t Test is robust, and the results likely would not be changed by minor deviations from normality. Therefore, it is difficult to know without the raw data if the use of this test with small sample sizes affected the study results. In addition, some journals may only report normality testing if the data required transformation as a result. In studies with sample sizes smaller than 30 in each group, we recommend using a nonparametric test or reporting the performance of normality testing, through visual inspection of the plotted data or by means of a statistical test.

To correct these errors that occur within peer-reviewed journals, it has been suggested that a statistician review all articles before submission or that a statistician be included as a reviewer.1417 Although this adds an additional burden of time and expense to the publication process, statistical reviewing has been shown to decrease the number of statistical errors in medical publications.7,1722 This onus is worthwhile to ensure the validity of the study results.

Most statistical analyses are based on a limited battery of tests taught in an introductory statistics course, but many dermatologists may not be familiar with the correct statistical test that should be used with their data. As a result, unless authors are familiar with the statistical test that they are performing, they should consult a statistician before submission of their study results. In our analysis, we found a lower rate of errors in industry-sponsored studies, which typically include statisticians in the data analysis. In addition to these measures, a statistical checklist should be referenced before submission of any journal article that includes statistical analysis.18,23

It may also be beneficial to incorporate training in statistics into dermatology residency programs or as a continuing medical education program. These programs would be offered to increase awareness about the importance of critically analyzing journal articles and recognizing common statistical errors when interpreting the results.

In summary, the appropriate use of statistical methods is an integral part of all studies performed and published. Errors in statistics frequently occur in the dermatology literature, as in many other disciplines of medicine, and readers should be critical of statistical methods and conclusions drawn from studies with incorrect or incomplete statistical analysis.

Correspondence: Alan B. Fleischer, Jr, MD, Department of Dermatology, Wake Forest University School of Medicine, Medical Center Boulevard, Winston-Salem, NC 27157-1071 (afleisch@wfubmc.edu).

Financial Disclosure: None.

Previous Presentation: This study was presented as a poster at the 63rd Annual Meeting of the American Academy of Dermatology; February 18-22, 2005; New Orleans, La.

Accepted for Publication: July 25, 2005.

Author Contributions:Study concept and design: Neville and Fleischer. Analysis and interpretation of data: Neville, Lang, and Fleischer. Drafting of the manuscript: Neville. Critical revision of the manuscript for important intellectual content: Lang and Fleischer. Statistical analysis: Neville, Lang, and Fleischer. Study supervision: Fleischer.

White  SJ Statistical errors in papers in the British Journal of PsychiatryBr J Psychiatry 1979;135336- 342
PubMed Link to Article
Cruess  DF Review of use of statistics in the American Journal of Tropical Medicine and Hygiene for January-December 1988. Am J Trop Med Hyg 1989;41619- 626
PubMed
Gore  SMJones  IGRytter  EC Misuse of statistical methods: critical assessment of articles in BMJ from January to March 1976. BMJ 1977;185- 87
PubMed Link to Article
Felson  DTCupples  LAMeenan  RF Misuse of statistical methods in Arthritis and Rheumatism: 1982 versus 1967-68. Arthritis Rheum 1984;271018- 1022
PubMed Link to Article
MacArthur  RDJackson  GG An evaluation of the use of statistical methodology in the Journal of Infectious DiseasesJ Infect Dis 1984;149349- 354
Link to Article
Hall  JC Use of the t test in the British Journal of Surgery [letter]. Br J Surg 1982;6955- 56
PubMed Link to Article
Schor  SKarten  I Statistical evaluation of medical journal manuscripts. JAMA 1966;1951123- 1128
PubMed Link to Article
Flotte  TJDuncan  LMLerner  LHMihm  MC Tools of the trade: statistical analysis in dermatopathology articles. J Cutan Pathol 1999;26265- 268
PubMed Link to Article
Williams  HCSeed  P Inadequate size of “negative” clinical trials in dermatology [published correction appears in Br J Dermatol. 1997;136:151]. Br J Dermatol 1993;128317- 326
PubMed Link to Article
Dawson-Saunders  BTrapp  RG Basic & Clinical Biostatistics.  East Norwalk, Conn Appleton & Lange1994;
Glantz  SA Primer of Biostatistics. 5th ed. New York, NY McGraw-Hill Co2002;
Ferraris  VAFerraris  SP Assessing the medical literature: let the buyer beware. Ann Thorac Surg 2003;764- 11
PubMed Link to Article
Bhardwaj  SSCamacho  FDerrow  A  et al.  Statistical significance and clinical relevance. Arch Dermatol 2004;1401520- 1523
PubMed Link to Article
Rushton  L Reporting of occupation and environmental research: use and misuse of statistical and epidemiological methods. Occup Environ Med 2000;571- 9
PubMed Link to Article
McGuigan  SM The use of statistics in the British Journal of PsychiatryBr J Psychiatry 1995;167683- 688
PubMed Link to Article
Weinstock  MA Statistics. J Am Acad Dermatol 2004;51315- 316
Link to Article
Katz  KACrawford  GHLu  DW  et al.  Statistical reviewing policies in dermatology journals: results of a questionnaire survey of editors. J Am Acad Dermatol 2004;51234- 240
PubMed Link to Article
Gardner  MJMachin  DCampbell  MJ Use of check lists in assessing the statistical content of medical studies. Br Med J (Clin Res Ed) 1986;292810- 812
PubMed Link to Article
Gore  SMJones  GThompson  SG The Lancet's statistical review process: areas for improvement by authors. Lancet 1992;340100- 102
PubMed Link to Article
Gardner  MJAltman  DGJones  DRMachin  D Is the statistical assessment of papers submitted to the “British Medical Journal” effective? BMJ (Clin Res Ed) 1983;2861485- 1488
Link to Article
Gardner  MJBond  J An exploratory study of statistical assessment of papers published in the British Medical Journal. JAMA 1990;2631355- 1357
PubMed Link to Article
Altman  DGGore  SMGardner  MJPocock  SJ Statistical guidelines for contributors to medical journals. BMJ 1983;2861489- 1493
PubMed Link to Article
Katz  KA The (relative) risks of using odds ratios. Arch Dermatol 2006;142761- 764
Link to Article

Figures

Tables

Table Graphic Jump LocationTable. Statistical Tests Used in Articles With Statistical Analysis

References

White  SJ Statistical errors in papers in the British Journal of PsychiatryBr J Psychiatry 1979;135336- 342
PubMed Link to Article
Cruess  DF Review of use of statistics in the American Journal of Tropical Medicine and Hygiene for January-December 1988. Am J Trop Med Hyg 1989;41619- 626
PubMed
Gore  SMJones  IGRytter  EC Misuse of statistical methods: critical assessment of articles in BMJ from January to March 1976. BMJ 1977;185- 87
PubMed Link to Article
Felson  DTCupples  LAMeenan  RF Misuse of statistical methods in Arthritis and Rheumatism: 1982 versus 1967-68. Arthritis Rheum 1984;271018- 1022
PubMed Link to Article
MacArthur  RDJackson  GG An evaluation of the use of statistical methodology in the Journal of Infectious DiseasesJ Infect Dis 1984;149349- 354
Link to Article
Hall  JC Use of the t test in the British Journal of Surgery [letter]. Br J Surg 1982;6955- 56
PubMed Link to Article
Schor  SKarten  I Statistical evaluation of medical journal manuscripts. JAMA 1966;1951123- 1128
PubMed Link to Article
Flotte  TJDuncan  LMLerner  LHMihm  MC Tools of the trade: statistical analysis in dermatopathology articles. J Cutan Pathol 1999;26265- 268
PubMed Link to Article
Williams  HCSeed  P Inadequate size of “negative” clinical trials in dermatology [published correction appears in Br J Dermatol. 1997;136:151]. Br J Dermatol 1993;128317- 326
PubMed Link to Article
Dawson-Saunders  BTrapp  RG Basic & Clinical Biostatistics.  East Norwalk, Conn Appleton & Lange1994;
Glantz  SA Primer of Biostatistics. 5th ed. New York, NY McGraw-Hill Co2002;
Ferraris  VAFerraris  SP Assessing the medical literature: let the buyer beware. Ann Thorac Surg 2003;764- 11
PubMed Link to Article
Bhardwaj  SSCamacho  FDerrow  A  et al.  Statistical significance and clinical relevance. Arch Dermatol 2004;1401520- 1523
PubMed Link to Article
Rushton  L Reporting of occupation and environmental research: use and misuse of statistical and epidemiological methods. Occup Environ Med 2000;571- 9
PubMed Link to Article
McGuigan  SM The use of statistics in the British Journal of PsychiatryBr J Psychiatry 1995;167683- 688
PubMed Link to Article
Weinstock  MA Statistics. J Am Acad Dermatol 2004;51315- 316
Link to Article
Katz  KACrawford  GHLu  DW  et al.  Statistical reviewing policies in dermatology journals: results of a questionnaire survey of editors. J Am Acad Dermatol 2004;51234- 240
PubMed Link to Article
Gardner  MJMachin  DCampbell  MJ Use of check lists in assessing the statistical content of medical studies. Br Med J (Clin Res Ed) 1986;292810- 812
PubMed Link to Article
Gore  SMJones  GThompson  SG The Lancet's statistical review process: areas for improvement by authors. Lancet 1992;340100- 102
PubMed Link to Article
Gardner  MJAltman  DGJones  DRMachin  D Is the statistical assessment of papers submitted to the “British Medical Journal” effective? BMJ (Clin Res Ed) 1983;2861485- 1488
Link to Article
Gardner  MJBond  J An exploratory study of statistical assessment of papers published in the British Medical Journal. JAMA 1990;2631355- 1357
PubMed Link to Article
Altman  DGGore  SMGardner  MJPocock  SJ Statistical guidelines for contributors to medical journals. BMJ 1983;2861489- 1493
PubMed Link to Article
Katz  KA The (relative) risks of using odds ratios. Arch Dermatol 2006;142761- 764
Link to Article

Correspondence

CME
Also Meets CME requirements for:
Browse CME for all U.S. States
Accreditation Information
The American Medical Association is accredited by the Accreditation Council for Continuing Medical Education to provide continuing medical education for physicians. The AMA designates this journal-based CME activity for a maximum of 1 AMA PRA Category 1 CreditTM per course. Physicians should claim only the credit commensurate with the extent of their participation in the activity. Physicians who complete the CME course and score at least 80% correct on the quiz are eligible for AMA PRA Category 1 CreditTM.
Note: You must get at least of the answers correct to pass this quiz.
Your answers have been saved for later.
You have not filled in all the answers to complete this quiz
The following questions were not answered:
Sorry, you have unsuccessfully completed this CME quiz with a score of
The following questions were not answered correctly:
Commitment to Change (optional):
Indicate what change(s) you will implement in your practice, if any, based on this CME course.
Your quiz results:
The filled radio buttons indicate your responses. The preferred responses are highlighted
For CME Course: A Proposed Model for Initial Assessment and Management of Acute Heart Failure Syndromes
Indicate what changes(s) you will implement in your practice, if any, based on this CME course.
Submit a Comment

Multimedia

Some tools below are only available to our subscribers or users with an online account.

Web of Science® Times Cited: 10

Related Content

Customize your page view by dragging & repositioning the boxes below.

See Also...
Articles Related By Topic
Related Collections
PubMed Articles
Cutaneous drug eruption induced by antihistamines. Clin Exp Dermatol 2014;39(8):918-20.
Neutrophilic dermatosis: disease mechanism and treatment. Curr Opin Hematol Published online Nov 12, 2014.;