======================================================================
=                     Foundations of statistics                      =
======================================================================

                             Introduction                             
======================================================================
The Foundations of Statistics are the mathematical and philosophical
bases for statistical methods. These bases are the theoretical
frameworks that ground and justify methods of statistical inference,
estimation, hypothesis testing, uncertainty quantification, and the
interpretation of statistical conclusions. Further, a foundation can
be used to explain statistical paradoxes, provide descriptions of
statistical laws, and guide the application of statistics to
real-world problems.

Different statistical foundations may provide different, contrasting
perspectives on the analysis and interpretation of data, and some of
these contrasts have been subject to centuries of debate. Examples
include the Bayesian inference versus frequentist inference; the
distinction between Fisher's 'significance testing' and the
Neyman-Pearson 'hypothesis testing'; and whether the likelihood
principle holds.

Certain frameworks may be preferred for specific applications, such as
the use of Bayesian methods in fitting complex ecological models.

Bandyopadhyay &amp; Forster identify four statistical paradigms:
classical statistics (error statistics), Bayesian statistics,
likelihood-based statistics, and information-based statistics using
the Akaike Information Criterion. More recently, Judea Pearl
reintroduced formal mathematics by attributing causality in
statistical systems that addressed the fundamental limitations of both
Bayesian and Neyman-Pearson methods, as discussed in his book
'Causality'.


Fisher's "significance testing" vs. Neyman–Pearson "hypothesis testing"
======================================================================
During the 20th century, the development of classical statistics led
to the emergence of two competing foundations for inductive
statistical testing. The merits of these models were extensively
debated. While a hybrid of the two methods is widely taught and used,
the philosophical questions raised in the debate have not been
resolved..


 Significance testing 
======================
Fisher popularized significance testing with Statistical Methods for
Research Workers, published in 1925, and The Design of Experiments,
published in 1935.   Fisher was motivated to obtain scientific
experimental results without the explicit influence of prior opinion.
The significance test is a probabilistic version of modus tollens, a
classic form of deductive inference. The significance test might be
simplistically stated, "If the evidence is sufficiently discordant
with the hypothesis, reject the hypothesis". In application, a
statistic is calculated from the experimental data, and the
probability of exceeding that statistic under a default or 'null'
model is compared to a threshold. The threshold (the numeric version
of "sufficiently discordant") is arbitrary (usually decided by
convention). A common application of the method is deciding whether a
treatment has a reportable effect based on a comparative experiment.
The null hypothesis then corresponds to the model with no treatment
effect, implying the treated and the controls come from the same
population. Statistical significance is a measure of probability, not
practical importance. It can be regarded as a requirement placed on
statistical signal/noise. Note that the test cannot prove the
hypothesis (of no treatment effect), but only provide more or less
evidence against it. The method is based on the formulation of an
imaginary infinite population (i.e. a specified statistical model)
corresponding to the null hypothesis.

The Fisherian significance test involves only one hypothesis, but the
choice of test statistic requires at least a feeling for relevant
directions of deviation from the hypothesis model. Historically, the
result of the test was either to reject the hypothesis or not, whereas
nowadays the probability of observing the test results assuming the
hypothesis can be calculated with the aid of computers, which was
impossible in Fisher's day. This probability, known as the p-value,
allows for a more precise assessment of the significance of the
result.


 Hypothesis testing 
====================
Neyman &amp; Pearson collaborated on a different, but related, problem
- selecting among competing hypotheses based on the experimental
evidence alone. Of their joint papers, the most cited is from 1933.
The famous result of that paper is the Neyman-Pearson lemma. The lemma
says that a ratio of probabilities is an excellent criterion for
selecting a hypothesis (with the threshold for comparison being
arbitrary). The paper proved an optimality of Student's t-test (one of
the significance tests). Neyman expressed the opinion that hypothesis
testing was a generalization of and an improvement on significance
testing. The rationale for their methods is found in their joint
papers.

Hypothesis testing requires multiple hypotheses. A hypothesis is
always selected, a multiple choice. A lack of evidence is not an
immediate consideration. The method is based on the assumption of
repeated sampling of the same population (the classical frequentist
assumption), although Fisher criticized this assumption (Rubin, 2020).


 Grounds of disagreement 
=========================
The length of the dispute allowed the debate of a wide range of issues
regarded as foundational to statistics.


 Fisher's attack{{sfn|Fisher|1955}} 
====================================
Repeated sampling of the same population
* Such sampling is the basis of frequentist probability
* Fisher preferred fiducial inference
Type II errors
* Which result from an alternative hypothesis
Inductive behavior
* (Vs inductive reasoning)


 Neyman's rebuttal{{sfn|Neyman|1956}} 
======================================
Fisher's attack on inductive behavior has been largely successful
because he selected the field of battle. While 'operational decisions'
are routinely made on a variety of criteria (such as cost),
'scientific conclusions' from experimentation are typically made based
on probability alone.
Fisher's theory of fiduciary inference is flawed
* Paradoxes are common

A purely probabilistic theory of tests requires an alternative
hypothesis. Fisher's attacks on Type II errors have faded with time.
In the intervening years, statistics have separated the exploratory
from the confirmatory. In the current environment, the concept of Type
II errors are used in power calculations for confirmatory hypothesis
tests' sample size determination.


 Discussion 
============
Fisher's attack based on frequentist probability failed but was not
without result. He identified a specific case (2×2 table) where the
two schools of testing reached different results. This case is one of
several that are still troubling. Commentators believe that the
"right" answer is context-dependent. Fiducial probability has not
fared well, being virtually without advocates, while frequentist
probability remains a mainstream interpretation.

Fisher's attack on inductive behavior has been largely successful
because he selected the field of battle. While 'operational decisions'
are routinely made on a variety of criteria (such as cost),
'scientific conclusions' from experimentation are typically made based
on probability alone.

In this exchange, Fisher also discussed the requirements for inductive
inference, with specific criticism of cost functions penalizing faulty
judgments. Neyman countered that Gauss and Laplace used them. This
exchange of arguments occurred 15  years 'after' textbooks began
teaching a hybrid theory of statistical testing.

Fisher and Neyman were in disagreement about the foundations of
statistics (although united in vehement opposition to the Bayesian
view):
* The interpretation of probability
** The disagreement between Fisher's inductive reasoning and Neyman's
inductive behavior contained elements of the Bayesian-Frequentist
divide. Fisher was willing to revise his opinion (reaching a
provisional conclusion) based on calculated probability, while Neyman
was more inclined to adjust his observable behavior (making a
decision) based on computed costs.
* The appropriate formulation of scientific questions, with a
particular focus on modelling
* Whether it is justifiable to reject a hypothesis based on a low
probability without knowing the probability of an alternative
* Whether a hypothesis could ever be accepted based solely on data
** In mathematics, deduction proves, while counter-examples disprove.
** In the Popperian philosophy of science, progress is made when
theories are disproven.
* Subjectivity: While Fisher and Neyman struggled to minimize
subjectivity, both acknowledged the importance of "good judgment".
Each accused the other of subjectivity.
** Fisher 'subjectively' selected the null hypothesis.
** Neyman-Pearson 'subjectively' determined the criterion for
selection (which was not limited to probability).
** Both 'subjectively' established numeric thresholds.

Fisher and Neyman were separated by attitudes and perhaps language.
Fisher was a scientist and an intuitive mathematician. Inductive
reasoning was natural. Neyman was a rigorous mathematician. He was
convinced by deductive reasoning rather than by a probability
calculation based on an experiment. Thus there was an underlying clash
between applied and theoretical (between science and mathematics).


 Related history 
=================
Neyman, who had occupied the same building in England as Fisher,
accepted a position on the West coast of the United States of America
in 1938. His move effectively ended his collaboration with Pearson and
their development of hypothesis testing. Further development was
continued by others.

Textbooks provided a hybrid version of significance and hypothesis
testing by 1940. None of the principals had any known personal
involvement in the further development of the hybrid taught in
introductory statistics today.

Statistics later developed in different directions including decision
theory (and possibly game theory), Bayesian statistics, exploratory
data analysis, robust statistics, and nonparametric statistics.
Neyman-Pearson hypothesis testing contributed strongly to decision
theory which is very heavily used (in statistical quality control for
example). Hypothesis testing readily generalized to accept prior
probabilities which gave it a Bayesian flavor.

Neyman-Pearson hypothesis testing has become an abstract mathematical
subject taught in post-graduate statistics, while most of what is
taught to under-graduates and used under the banner of hypothesis
testing is from Fisher.


 Contemporary opinion 
======================
The hybrid of the two competing schools of testing can be viewed
differently: as the imperfect union of two mathematically
complementary ideas or as the fundamentally flawed union of
philosophically incompatible ideas.  Fisher enjoyed some philosophical
advantage, while Neyman &amp; Pearson employed the more rigorous
mathematics. Hypothesis testing is controversial among some users, but
the most popular alternative (confidence intervals) is based on the
same mathematics.

The history of the development left testing without a single citable
authoritative source for the hybrid theory that reflects common
statistical practice. The merged terminology is also somewhat
inconsistent. There is strong empirical evidence that the graduates
(and instructors) of an introductory statistics class have a weak
understanding of the meaning of hypothesis testing.


 Summary 
=========
* The interpretation of probability has not been resolved (but the
fiducial probability is an orphan).
* Neither test method has been rejected. Both are heavily used for
different purposes.
* Texts have merged the two test methods under the term hypothesis
testing.
** Mathematicians claim (with some exceptions) that significance tests
are a special case of hypothesis tests.
** Others treat the problems and methods as distinct (or
incompatible).
* The dispute has adversely affected statistical education.

* Bayesian theory has a mathematical advantage.
** Frequentist probability has existence and consistency problems.
** But finding good priors to apply Bayesian theory remains (very?)
difficult.
* Both theories have impressive records of successful application.
* Neither the philosophical interpretation of probability nor its
support is robust.
* There is increasing scepticism about the connection between
application and philosophy.
* Some statisticians are recommending active collaboration (beyond a
cease-fire).


           Bayesian inference versus frequentist inference            
======================================================================
Two different interpretations of probability (based on objective
evidence and subjective degrees of belief) have long existed. Gauss
and Laplace could have debated alternatives more than 200  years ago.
Two competing schools of statistics have developed as a consequence.
Classical inferential statistics was largely developed in the second
quarter of the 20th  century, much of it in reaction to the (Bayesian)
probability of the time which utilized the controversial principle of
indifference to establish prior probabilities.  The rehabilitation of
Bayesian inference was a reaction to the limitations of frequentist
probability. More reactions followed. While the philosophical
interpretations are old, the statistical terminology is not. The
current statistical terms "Bayesian" and "frequentist" stabilized in
the second half of the 20th century.
The (philosophical, mathematical, scientific, statistical) terminology
is confusing: the "classical" interpretation of probability is
Bayesian while "classical" statistics is frequentist. "Frequentist"
also has varying interpretations--different in philosophy than in
physics.

The nuances of philosophical probability interpretations are discussed
elsewhere. In statistics, the alternative interpretations 'enable' the
analysis of different data using different methods based on different
models to achieve slightly different goals. Any statistical comparison
of the competing schools considers pragmatic criteria beyond the
philosophical.


 Major contributors 
====================
Two major contributors to frequentist (classical) methods were Fisher
and Neyman.  Fisher's interpretation of probability was idiosyncratic
(but strongly non-Bayesian).  Neyman's views were rigorously
frequentist. Three major contributors to 20th  century Bayesian
statistical philosophy, mathematics, and methods were de Finetti,
Jeffreys and Savage.  Savage popularized de Finetti's ideas in the
English-speaking world and made Bayesian mathematics rigorous. In
1965, Dennis Lindley's 2  volume work "Introduction to Probability and
Statistics from a Bayesian Viewpoint" brought Bayesian methods to a
wide audience. Statistics has advanced over the past three
generations; The "authoritative" views of the early contributors are
not all current.


 Frequentist inference 
=======================
Frequentist inference is partially and tersely described above in
(Fisher's "significance testing" vs. Neyman-Pearson "hypothesis
testing"). Frequentist inference combines several different views. The
result is capable of supporting scientific conclusions, making
operational decisions, and estimating parameters with or without
confidence intervals. Frequentist inference is based solely on (one
set of) evidence.


 Bayesian inference 
====================
A classical frequency distribution describes the probability of the
data.  The use of Bayes' theorem allows a more abstract concept - the
probability of a hypothesis (corresponding to a theory) given the
data. The concept was once known as "inverse probability". Bayesian
inference updates the probability estimate for a hypothesis as
additional evidence is acquired. Bayesian inference is explicitly
based on the evidence and prior opinion, which allows it to be based
on multiple sets of evidence.


 Comparisons of characteristics 
================================
Frequentists and Bayesians use different models of probability.
Frequentists often consider parameters to be fixed but unknown while
Bayesians assign probability distributions to similar parameters.
Consequently, Bayesians speak of probabilities that don't exist for
frequentists; a Bayesian speaks of the probability of a theory while a
true frequentist can speak only of the consistency of the evidence
with the theory. Example: A frequentist does not say that there is a
95% probability that the true value of a parameter lies within a
confidence interval, saying instead that 95% of confidence intervals
contain the true value.

Efren's comparative adjectives	 Bayesian	 Frequentist
!Basis	|Belief (prior)	|Behavior (method)
!Resulting Characteristic	|Principled Philosophy	|Opportunistic
Methods
!Distributions	|One distribution	|Many distributions (bootstrap?)
!Ideal Application	|Dynamic (repeated sampling)	|Static (one sample)
!Target Audience	|Individual (subjective)	|Community (objective)
!Modeling Characteristic	|Aggressive	|Defensive

Alternative comparison	 Bayesian	 Frequentist
!Strengths	*Complete	*Coherent	*Prescriptive	*Strong inference from
model	*Inferences well calibrated	*No need to specify prior
distributions	*Flexible range of procedures	*Strong model formulation
&amp; assessment	**Unbiasness, sufficiency, ancillary...	**Widely
applicable and dependable	**Asymptotic theory	**Easy to interpret
**Can be calculated by hand
!Weaknesses	*Too subjective for scientific inference	*Denies the role
of randomization in design	*Requires and relies on full specification
of a model (likelihood and prior)	*Weak model formulation &amp;
assessment	*Incomplete	*Ambiguous	*Incoherent	*Not prescriptive	*No
unified theory	*Potential overemphasis on asymptotic properties	*Weak
inference from model


 Mathematical results 
======================
Neither school is immune from mathematical criticism and neither
accepts it without a struggle. Stein's paradox (for example)
illustrated that finding a "flat" or "uninformative" prior probability
distribution in high dimensions is subtle. Bayesians regard that as
peripheral to the core of their philosophy while finding frequentism
to be riddled with inconsistencies, paradoxes, and bad mathematical
behavior. Frequentists can explain most. Some of the "bad" examples
are extreme situations - such as estimating the weight of a herd of
elephants from measuring the weight of one ("Basu's elephants"), which
allows no statistical estimate of the variability of weights. The
likelihood principle has been a battleground.


 Statistical results 
=====================
Both schools have achieved impressive results in solving real-world
problems. Classical statistics effectively has a longer record because
numerous results were obtained with mechanical calculators and printed
tables of special statistical functions. Bayesian methods have been
highly successful in the analysis of information that is naturally
sequentially sampled (radar and sonar). Many Bayesian methods and some
recent frequentist methods (such as the bootstrap) require the
computational power widely available only in the last several decades.
There is active discussion about combining Bayesian and frequentist
methods, but reservations are expressed about the meaning of the
results and reducing the diversity of approaches.


 Philosophical results 
=======================
Bayesians are united in opposition to the limitations of frequentism
but are philosophically divided into numerous camps (empirical,
hierarchical, objective, personal, subjective), each with a different
emphasis. One (frequentist) philosopher of statistics has noted a
retreat from the statistical field to philosophical probability
interpretations over the last two generations.  There is a perception
that successes in Bayesian applications do not justify the supporting
philosophy. Bayesian methods often create useful models that are not
used for traditional inference and which owe little to philosophy.
None of the philosophical interpretations of probability (frequentist
or Bayesian) appears robust. The frequentist view is too rigid and
limiting while the Bayesian view can be simultaneously objective and
subjective, etc.


 Illustrative quotations 
=========================
* "Carefully used, the frequentist approach yields broadly applicable
if sometimes clumsy answers"
* "To insist on unbiased [frequent] techniques may lead to negative
(but unbiased) estimates of variance; the use of p-values in multiple
tests may lead to blatant contradictions; conventional 0.95
confidence regions may consist of the whole real line. No wonder that
mathematicians find it often difficult to believe that conventional
statistical methods are a branch of mathematics."
* "Bayesianism is a neat and fully principled philosophy, while
frequentist is a grab-bag of opportunistic, individually optimal,
methods."
* "In multiparameter problems flat priors can yield very bad answers"
* "Bayes' rule says there is a simple, elegant way to combine current
information with prior experience to state how much is known. It
implies that sufficiently good data will bring previously disparate
observers to an agreement. It makes full use of available information,
and it produces decisions having the least possible error rate."
* "Bayesian statistics is about making probability statements,
frequentist statistics is about evaluating probability statements."
* "Statisticians are often put in a setting reminiscent of Arrow’s
paradox, where we are asked to provide estimates that are informative
and unbiased and confidence statements that are correct conditional on
the data and also on the underlying true parameter." (These are
conflicting requirements.)
* "Formal inferential aspects are often a relatively small part of
statistical analysis"
* "The two philosophies, Bayesian and frequent, are more orthogonal
than antithetical."
* "A hypothesis that may be true is rejected because it has failed to
predict observable results that have not occurred. This seems a
remarkable procedure."


                       The likelihood principle                       
======================================================================
Likelihood is a synonym for probability in common usage. In statistics
that is not true. A probability refers to variable data for a fixed
hypothesis while a likelihood refers to variable hypotheses for a
fixed set of data. Repeated measurements of a fixed length with a
ruler generate a set of observations. Each fixed set of observational
conditions is associated with a probability distribution and each set
of observations can be interpreted as a sample from that distribution
- the frequentist view of probability. Alternatively, a set of
observations may result from sampling any of several distributions
(each resulting from a set of observational conditions). The
probabilistic relationship between a fixed sample and a variable
distribution (resulting from a variable hypothesis) is termed
likelihood - a Bayesian view of probability. A set of length
measurements may imply readings taken by careful, sober, rested,
motivated observers in good lighting.

A likelihood is a probability (or not) by another name that exists
because of the limited frequentist definition of probability. The
likelihood is a concept introduced and advanced by Fisher for more
than 40  years (although prior references to the concept exist and
Fisher's support was half-hearted). The concept was accepted and
substantially changed by Jeffreys. In 1962 Birnbaum "proved" the
likelihood principle from premises acceptable to most statisticians.
His "proof" has been disputed by statisticians and philosophers.
Importantly, by 1970 Birnbaum had rejected one of these premises (the
conditionality principle) and had also rejected the likelihood
principle because they were both incompatible with the frequentist
"confidence concept of statistical evidence".  The likelihood
principle says that all of the information in a sample is contained in
the likelihood function, which is accepted as a valid probability
distribution by Bayesians (but not by frequentists).

Some (frequentist) significance tests are not consistent with the
likelihood principle. Bayesians accept the principle which is
consistent with their philosophy (perhaps encouraged by the
discomfiture of frequentists). "[T]he likelihood approach is
compatible with Bayesian statistical inference in the sense that the
posterior Bayes distribution for a parameter is, by Bayes's Theorem,
found by multiplying the prior distribution by the likelihood
function." Frequentists interpret the principle adversely to Bayesian
as implying no concern about the reliability of evidence. "The
likelihood principle of Bayesian statistics implies that information
about the experimental design from which evidence is collected does
not enter into the statistical analysis of the data." Many Bayesians
(Savage for example) recognize that implication as a vulnerability.

The likelihood principle's staunchest supporters claim that it offers
a better foundation for statistics than either of the two schools.
"[L]ikelihood looks very good indeed when it is compared with these
[Bayesian and frequentist] alternatives." These supporters include
statisticians and philosophers of science. While Bayesians acknowledge
the importance of likelihood for calculation, they believe that the
posterior probability distribution is the proper basis for inference.


                              Modelling                               
======================================================================
Inferential statistics is based on statistical models. Much of
classical hypothesis testing, for example, was based on the assumed
normality of the data. Robust and nonparametric statistics were
developed to reduce the dependence on that assumption. Bayesian
statistics interprets new observations from the perspective of prior
knowledge - assuming a modeled continuity between past and present.
The design of experiments assumes some knowledge of those factors to
be controlled, varied, randomized, and observed. Statisticians are
well aware of the difficulties in proving causation (more of a
modeling limitation than a mathematical one), saying "correlation does
not imply causation".


More complex statistics utilize more complex models, often with the
intent of finding a latent structure underlying a set of variables. As
models and data sets have grown in complexity, foundational questions
have been raised about the justification of the models and the
validity of inferences drawn from them. The range of conflicting
opinions expressed about modeling is large.

* Models can be based on scientific theory or ad-hoc data analysis.
The approaches use different methods. There are advocates for each.
* Model complexity is a compromise. The Akaikean information criterion
and Bayesian information criterion are two less subjective approaches
to achieving that compromise.
* Fundamental reservations have been expressed about even simple
regression models used in the social sciences. A long list of
assumptions inherent to the validity of a model is typically neither
mentioned nor checked. A favorable comparison between observations and
model is often considered sufficient.
*
* Traditional observation-based models are inadequate to solve many
important problems. A much wider range of models, including
algorithmic models, must be utilized. "If the model is a poor
emulation of nature, the conclusions may be wrong."
* Modeling is often poorly done (the wrong methods are used) and
poorly reported.

In the absence of a strong philosophical consensus review of
statistical modeling, many statisticians accept the cautionary words
of statistician George Box: "'All models are wrong, but some are
useful.'"


                            Other reading                             
======================================================================
For a short introduction to the foundations of statistics, see

In his book 'Statistics as Principled Argument', Robert P. Abelson
articulates the position that statistics serve as a standardized means
of settling disputes between scientists who could otherwise each argue
the merits of their positions 'ad infinitum'. From this point of view,
statistics is a form of rhetoric; as with any means of settling
disputes, statistical methods can succeed only as long as all parties
agree on the approach used.


                               See also                               
======================================================================
*Philosophy of statistics
*History of statistics
*Philosophy of probability
*Philosophy of mathematics
*Philosophy of science
*Evidence
*Likelihoodist statistics
*Probability interpretations
*Founders of statistics


                              References                              
======================================================================
*
*
*  The text is a collection of essays.
*
*
*
*
*  University of Houston lecture notes?
*
*  Translation of the 1937 French original with later notes added.
*  Preliminary version of an article for the International
Encyclopedia of the Social and Behavioral Sciences.
*
*
*
*
*
*
*
*
*
*
*  - A joke escalated into a serious discussion of Bayesian problems
by 5  authors (Gelman, Bernardo, Kadane, Senn, Wasserman) on pages
445-478.
*
*
*
*  - A working paper that explains the difference between Fisher's
evidential 'p'-value and the Neyman-Pearson type  I error rate .
*
*
*
*
*
*
*
*
*  Working paper contains numerous quotations from the sources of the
dispute.
*
*
*
*
*
*
*
*
*
*
*  - Lecture notes? University of Illinois at Chicago


                           Further reading                            
======================================================================
*
*
* .
*
*  - Bayesian.
* .


 License 
=========
All content on Gopherpedia comes from Wikipedia, and is licensed under CC-BY-SA
License URL: http://creativecommons.org/licenses/by-sa/3.0/
Original Article: http://en.wikipedia.org/wiki/Foundations_of_statistics


.