Null Hypothesis Significance Testing: procedure, misconceptions and some suggestions for good practices


Published: Oct 15, 2020
Keywords:
Null hypothesis significance testing Statistical reasoning
Πέτρος Ρούσσος
Abstract

The rationale of Null Hypothesis Significance Testing (NHST) is described, and the consequences of its hybridism are discussed. The paper presents examples published in “PSYCHOLOGY: The Journal of the HPS” refer to NHST and interpret its outcomes. We examined the 445 articles published between 1992 and 2010. We noted misuses of NHST and searched for any use of confidence intervals or error bars or use of these to support interpretation. Part of the paper focuses on the statistical-reform debate and provides detailed guidance about good statistical practices in the analysis of research data and the interpretation of findings. The proposed guide does not fall into the trap of mandating the use of particular procedures; it rather aims to support readers’ understanding of research results.

Article Details
  • Section
  • BRIEF RESEARCH REPORTS
Downloads
Download data is not yet available.
References
American Psychological Association (2009). The Pu -
blication Manual of the American Psychol ogical
Association (6th ed.). Washington, DC: APA.
American Psychological Association (2001). The
Publication Manual of the American Psychologi -
cal Association (5th ed.). Washington, DC: APA.
Anderson, J. L. (1998). Embracing uncertainty: The
interface of Bayesian statistics and cognitive psy -
chology. Conservation Ecology, 2 (1), 2. Avail able
from the Internet. URL: http://www.consecol.
org/vol2/iss1/art2/
Bailar, J. C. & Mosteller, F. (1988). Guidelines for sta -
tistical reporting in articles for medical journals.
Annals of Internal Medicine, 108, 266-273.
Bausell, R. B. & Li, Y. F. (2002). Power analysis for
experimental research: A practical guide for the
biological, medical and social sciences. Cam -
bridge, UK: Cambridge University Press.
Ben-Zvi, D. & Garfield, J. (2004). Statistical literacy,
reasoning, and thinking: Goals, definitions, and
challenges. In D. Ben-Zvi and J. Garfield (Eds.),
The Challenge of Developing Statistical Literacy,
Reasoning and Thinking (pp. 3-15). Kluwer Aca -
demic Publishers.
Berkson, J. (1938). Some difficulties of interpretation
encountered in the application of the chi-square
test. Journal of the American Statistical Associa -
tion, 33, 526-542.
Cohen, J. (1988). Statistical Power Analysis for the
Behavioral Sciences (2nd ed.). Hillsdale, NJ: Erl -
baum.
Cohen, J. (1990). Things I have learned (so far).
American Psychologist, 45, 1304-1312.
Cohen, J. (1994). The earth is round (p < .05). Amer -
ican Psychologist, 49, 997-1003.
Edgington, E. S. (1995). Randomization tests (3rd
ed.). New York: Marcel Dekker.
Ελληνικό Στατιστικό Ινστιτούτο (2009). Λεξικό Στα-
τιστικής Ορολογίας Αγγλο-Ελληνικό & Ελληνο-
Αγγλικό. Αθήνα: ΕΣΙ.
Falk, R. & Greenbaum, C. W. (1995). Significance
tests die hard: The amazing persistence of a
probabilistic misconception. Theory & Psycholo -
gy, 5, 75-98.
Fidler, F., Thomason, N., Cumming, G., Finch, S., &
Leeman, J. (2004). Editors can lead researchers
to confidence intervals, but can’t make them
think: Statistical reform lessons from medicine.
Psychological Science, 15, 119-126.
Fisher, R. A. (1925). Statistical methods for research
workers. London: Oliver & Boyd.
Fisher, R. A. (1935). The design of experiments. Ed -
inburgh: Oliver & Boyd.
Ο έλεγχος μηδενικών υποθέσεων 237
Fisher, R. A. (1956). Statistical methods and scientif -
ic inference. Edinburgh: Oliver & Boyd.
Gigerenzer, G. (1993). The Superego, the Ego, and
the Id in statistical reasoning. In G. Keren & C.
Lewis (Eds.), A handbook for data analysis in the
behavioral sciences: Methodological issues (pp.
-339). Hillsdale, NJ: Erlbaum.
Gigerenzer, G. (1998). We need statistical thinking,
not statistical rituals. Behavioral and Brain Sci -
ences, 21, 199-200.
Gigerenzer, G. & Murray, D. J. (1987). Cognition as
Intuitive Statistics. Hillsdale, NJ: Erlbaum.
Gigerenzer, G., Swijtink, Z., Porter, T., Daston, L.,
Beatty, J., & Krüger, L. (1989). The empire of
chance. How probability changed science and
every day life. Cambridge: Cambridge Universi -
ty Press.
Greene, W. H. (2000). Econometric Analysis (4th ed.).
Upper Saddle River, NJ: Prentice Hall.
Harris, R. J. (1997). Reforming significance testing
via three-valued logic. In L. L. Harlow, S. A. Mu -
laik & J. H. Steiger (Eds.), What if there were no
significance tests? (pp. 145-174). Mahwah, NJ:
Erlbaum.
Kirk, R. E. (1996). Practical significance: A concept
whose time has come. Educational and Psycho -
logical Measurement, 56, 746-759.
Kline, R. B. (2004). Beyond significance testing: Re -
forming data analysis methods in behavioral re -
search. Washington, DC: American Psychologi -
cal Association.
Krüger, L., Daston, L., & Heidelberger, M. (Eds.).
(1987). The probabilistic revolution: Vol. 1. Ideas
in history. Cambridge, MA: MIT Press.
Lecoutre, M.P., Poitevineau, J., & Lecoutre, B.
(2003). Even statisticians are not immune to mis -
interpretations of Null Hypothesis Significance
Tests. International Journal of Psychology, 38 (1),
-45.
Little, R. J. A. & Rubin, D. B. (1987). Statistical Analy -
sis with Missing Data. New York: John Wiley.
Loftus, G. R. (1996). Psychology will be a much bet -
ter science when we change the way we analyze
data. Current Directions in Psychological Sci -
ence, 5, 161-171.
Loftus, G. R. (2002). Analysis, interpretation, and vi -
sual presentation of data. Stevens’ Handbook of
Experimental Psychology, (3rd ed.), Vol 4. (pp.
-390). New York: John Wiley and Sons.
MacCallum, R. C. (2003). Working with imperfect
models. Multivariate Behavioral Research, 38(1),
-139.
Neyman, J., & Pearson, E. S. (1928). On the use and
interpretation of certain test criteria for purpos -
es of statistical inference. Biometrika, 29A, Part I:
-240; Part II: 263-294.
Nickerson, R. S. (2000). Null hypothesis statistical
testing: A review of an old and continuing con -
troversy. Psychological Methods, 5, 241-301.
Oakes, M. (1986). Statistical inference. New York: Wi -
ley.
Pollard, P. (1993). How significant is “significance”?
In G. Keren & C. Lewis (Eds.), A handbook for
data analysis in the behavioral sciences: Method -
ological issues (pp. 448-490). Hillsdale, NJ: Erl -
baum.
Porter, T. M. (1986). The rise of statistical thinking
-1900. Princeton, NJ: Princeton University
Press.
Rodgers, J. L. (2010). The epistemology of mathe -
matical and statistical modeling: A quiet method -
ological revolution. American Psychologist, 65(1),
-12.
Rosenthal, R., Rosnow, R. L., & Rubin, D. B. (2000).
Contrasts and effect sizes in behavioral research:
A correlational approach. Cambridge University
Press.
Rosnow, R. & Rosenthal, R. (1989). Statistical pro -
cedures and the justification of knowledge in
psy chological science. American Psychologist,
, 1276-1284.
Tabachnick, B. G. & Fidell, L. S. (2007). Using Mul -
ti variate Statistics (5th ed.). Boston: Allyn and
Bacon.
Thompson, B. (1992). Two and one-half decades of
leadership in measurement and evaluation. Jour -
nal of Counseling and Development, 70, 434-438.
Tukey, J. W. (1991). The philosophy of multiple com -
parisons. Statistical Science, 6, 100-116.
Wilkinson, L. & Task Force on Statistical Inference,
APA Board of Scientific Affairs. (1999). Statistical
methods in psychology journals: Guidelines and
explanations. American Psychologist, 54, 594-
Yates, F. (1951). The influence of “statistical methods
for research workers” on the development of the
science of statistics. Journal of the American Sta -
tistical Association, 46, 19-34.
Most read articles by the same author(s)