Statistics for Non-Statisticians
Note: This page is still under construction. Several graphics are missing,
and certain statistical equation notations need to be corrected. 11/30/96
Descriptive Statistics
Some Basic Notations
= Equal < Less than Less than/equal to
Not equal to > More than More than/equal to
Sum (Sigma)
Descriptive Statistics
(Statistics applies to samples; parameters applies to populations)
Measures of Central Tendency
Frequency -- number (count) of occurrences
Mean (M) -- arithmetic average, computed by summing values of all
scores, divided by number of all scores
Median (Md) -- midpoint; 50% of scores are higher; 50% of scores
are lower (useful in splitting and comparing groups)
Mode (Mo) -- most frequent value or score (can be unimodal,
bimodal, trimodal or multi-modal).
Mean, median and mode are same only in the case of all scores
being the same, which negates the value of most research
(because there is no variation to study).
Measures of Dispersion
Range -- lowest to the highest value or scores, an indication of
the amount of variation that can be observed.
Percentiles, deciles, quartiles -- divisions of the scores by
hundredths, tenths, quarters, respectively. Often useful for
same purpose as a median split (comparison of groups).
Variance -- measure of dispersion away from a mean
Variance in a population: sigma-squared:
Variance in a sample: s-squared
Standard deviation (SD or s.d.) -- square root of the variation.
Standard deviation in a population: sigma
Standard deviation in a sample: s
This is a useful method for standardizing the variance found in
any population or sample. In a normal distribution:
ñ 1 standard deviation = 68.26% of cases (34.13% each
direction)
ñ 2 standard deviations = 95.46% of cases (additional 13.59%)
ñ 3 standard deviations = 99.87% of cases (additional .0214%)
ñ 3.25 standard deviations = 99.94% of cases
The following chart from Williams (1979) shows a hypothetical
normal distribution, with a population mean (Mu), and population
standard deviation:
GRAPHIC UNDER CONSTRUCTION
The example assumes Mu=48 and the standard deviation of the
population=4.0 (the variance would be 16). One standard
deviation says that 68% of all scores fall between 44 and 52; two
standard deviations says that 95% of all scores fall between 40
and 56.
Inferential Statistics
Inferential Statistics
While we want to draw conclusions about populations in research,
the data we have usually comes from a sample. Implicit in this
fact is the idea that there will be measurement error, i.e. a
particular sample will not necessarily be an accurate measure of
the population mean and variance (which we don't know usually).
The statistical validity concern to researchers is the
probability that any particular sample does not accurately
reflect the population it is intended to measure. Researchers
are willing to provide for a chance finding once in every 20
samples. We denote that with a probability statement:
pó.05: There is a 5% or less chance that the results
obtained were purely by chance (95% confidence level).
pó.01: The probability is reduced to one-in-100 that the
results are by chance (99% confidence).
pó.001: The probability is only one-in-1,000 that the results
are by chance.
Note: Probability should not be construed to mean that there
is a 95% chance that the results are accurate.
In hypothesis testing, behavioral researchers technically test
for the null hypothesis (Ho), i.e. there is no significant
difference that couldn't be accounted for by chance.
Some important concepts:
Sampling distribution -- The patterns of normal distributions
for samples taken, compared to the hypothetical population
distribution. The idea of sampling distribution suggests that
over repeated attempts, samples drawn will be normally
distributed.
GRAPHIC UNDER CONSTRUCTION
Sampling error -- An estimate of how statistics can be expected
to deviate from parameters when sampling randomly from a given
population. This is calculated by first computing the standard
error: the standard deviation (square root of variance), divided
by the square root of the number of observations.
Then, depending on probability level selected, it possible to
identify the sampling error:
For pó=.05, multiple the standard error by 1.96 -- the
standardized z-score for 95.00 compared to 95.46%
(2.00) using standard deviation units.
For pó=.01, multiple the standard error by 2.58 -- the
standardized z-score for 99.00 compared to 99.87%
(3.00) using standard deviation units.
Important: The size of the standard error is a direct function
of the size of the sample since its square root is used in the
denominator: The smaller the sample; the larger error term. To
narrow the chance of error, increase sample size!
Consider this example from Williams (1979):
GRAPHIC UNDER CONSTRUCTION
The example above assumes that the mean of the population is 7.38
and that the standard error (åm)=.47. The graph shows values
needed to calculate the sampling error at the 95% and 99% levels:
For pó=.05, .47 x 1.96= ñ.92 sampling error
For pó=.01, .47 x 2.58= ñ1.21 sampling error
Confidence Interval. The values of the range where the mean is
believed to fall, based on the mean obtained and the upper and
lower values of the sampling error
For pó=.05 (95% confidence interval): 7.38 ñ .92 = 6.46-8.30
For pó=.01 (99% confidence interval): 7.38 ñ 1.21 = 6.17-8.59
Statistics Measuring Relationships
Some Basic Notations
Numbers of treatments
Chi-square
Phi
Correlation coefficient
Coefficient of determination
Statistics Describing Relationships
Between Two or More Variables
While describing the central tendency and variance for one
variable in a population or a sample is important, more
interesting questions often involve examining two or more
variables at one time.
Nominal (Categorical) and Ordinal Data
Cross-Tabulation -- This technique is used to compare two or more
sets of categorical data. The simplest example is a 2x2
contingency table, which can be presented using frequencies or
percentages.
Department Enrollments By
Gender and Classification
Males Females
__________________
Grad Students 9 12 21
__________________
Undergraduates 150 200 350
___________________
159 212 371
Cross-tabs can also be used to analyze combinations of
categorical and ordinal measures. For example:
Department Enrollments by
Gender and Class Level
Males Females
___________________
Grad Students 9 12 21
___________________
Seniors (90-125) 65 85 150
Juniors (60-89) 42 58 100
Sophomores (30-59) 32 43 75
Freshmen (0-29) 11 14 25
___________________
150 200 371
For nominal and ordinal data, normal distribution cannot be
assumed. Therefore, there are a series of non-parametric tests
that can be performed.
Chi-square ( ý) is a test of statistical independence of
categorical data. Using a table of ý critical values,
a researcher can compare results obtained in the
calculation to determine whether the value obtain exceeds the critical
value necessary.
Phi (í) is a test of association using the Chi-square
statistic that is specifically for 2x2 contingency
tables. Computationally, it is square root of the
chi-square statistic divided by the number of
observations.
Cramer's V is a more generalized chi-square based
statistic that can be used for categorical
situations larger than 2x2 and is based on Phi (í).
Several nonparametric measures can be used to analyze
ordinal and categorical/ordinal data: These include:
Kendall's Tau and Tau
Gamma
Spearman's rho (tests ordering)
Interval and Ratio Data
Most research involving interval data involves more than 30
observations and, according to the Central Limits Theorem, it is
therefore possible to assume normality of distribution. As such,
it is possible to use a series of statistical specially designed
to test the relationship between two interval measures.
Scatter-plots are used to track to two sets of interval data,
such as grade point average and ACT tests:
GRAPHIC UNDER CONSTRUCTION
Linear regression can be thought of a measure of the combined
means of both measures. A regression line can be drawn through
the data that provides "best fit" explanation of the data
pattern. Given the score on one variable, regression allows
prediction of the score or value of the second. A regression can
be drawn by knowing where the line crosses the X-axis and knowing
the slope of the line.
The general formula is: Y= bX + a, where:
Y= value on Y variable
bX= value where the line across the X axis
a = the rise on the y access compared to the x
axis (run versus rise).
Correlation can be thought of a measure of the dispersion of two
scores and is based on calculating the variance. The most
common measure is the Pearson product-moment correlation coefficient
(Pearson r), which ranges in value from .00 (no relationship) to
1.00 (perfect correlation). It provides a single-figure measure
of relationship.
It is rare that a perfect correlation is found. In social
science, Pearson r correlations can be interpreted as follows:
less than .20: slight (possible relationship)
.20-39: low (some relationship)
.40-69: moderate (substantial relationship)
.60-89: high (marked relationship)
more than 90: strong (definitely related)
Again, the number of observations is critical. With a very large
number of observations, somewhat small correlations can be
statistically significant. With a small number of observations,
however, statistical significance (a conclusion that the result
is other than by chance) requires relatively high correlation
numbers.
Some important enhancements to these approach include:
Multiple regression -- Using two or more variables to predict the
value of a dependent variable
Curvilinear regression -- Assumes other than a straight line
(linear) relationship, such as a U-shape or inverted U-shape.
Factor analysis -- A technique based on correlation that involves
reducing a large number of similar items, such as items used in a
scale, to distill a fewer number of underlying constructs or
dimensions that are represented.
Statistics Measuring Differences
Some Basic Notations
p probability less M Mean (M1, M2 etc.) k any number
than or equal to n no. observations 1Q one-tailed test
d.f. degrees of freedom diff standard error 2Q two-tailed test
t t statistic of difference v1 column d.f
F F statistic v2 row d.f.
Statistics Comparing Differences Between Two or More Groups
Much research involves determining the means and variances for
different groups of observations, and then comparing the results
to see if the scores can be attributed to causes other than
chance.
Such comparisons are common in survey research, where it is
valuable to compare scores between demographic categories (e.g.
gender). Such comparisons also are the foundation for most
experiments, which involve exposing subjects to two or more
levels of an independent (categorical) variable and then
comparing responses on a dependent variable (usually an interval
measure).
Comparing Two Means: t-testM
The simplest differences test involves two means, and is called a
Student's t-test. Researchers need to know: a) the means
obtained, b) the variance or standard deviation for each group,
or the combined variance for all groups, and c) the number of
subjects in each group.
t= M1 - M2
_______
ådiff
where M1 and M2 represent the two means obtained, and ådiff
represents the standard error of the difference (which is
calculated by knowing the variance and sample size for each
sample.
Consider this example:
Group A Group B
Means 57 52
n 5 5
ådiff 2 2 (equal)
The tobtained is calculated based on the formula:
t = 57-52 = 5 = 2.50
2 2
(Note: In this example the sample sizes and standard errors for each group are
the same. If this were not the case, it is necessary to compute the standard
error using a weighting formula.)
To determine whether the means are different statistically, the
researcher compares the t (obtained)=2.50 to a critical value of t that
can be found in a table included in most statistics books.
A t-table is organized based on p-values (columns) and the
number of observations (rows). The latter is expressed
as degrees of freedom (d.f. or df). To read the table:
First, choose a desired p-value: p=.05, p=.01 or p=.001.
Second, if you can predict in advance the direction of the
difference (i.e. which numbers are higher and lower), you
can choose to use a one-tailed test (which improves your
chances of finding a significant difference). Otherwise,
rely on the two-tailed option.
Third, determine the degrees of freedom (d.f) which apply,
based on the number of observations:
d.f.= n1 + n2 - 2
where n1 is the number of subjects in group 1 and n2 is
the number of subjects in group 2.
Fourth, go down the column (showing p-values and one- or two-
tailed direction, 1Q versus 2Q) to the row showing the
number of degrees of freedom (or the number that most
closely approximates it). The value at the intersection
is the critical value of t.
If the t (obtained) is larger than the tcritical, the difference
is significant statistically. If less, it is probable that
the result obtained was merely by chance at the p-level
selected.
In our example:
From calculation: tobtained = 2.50
From t table: t(8).05,2t = 2.306
Because tobtained=2.50 > tcritical=2.306, we can conclude the 5-
point difference in means is statistically significant
(only 1-in-20 odds that it was by chance at the .05 level
using a two-tailed test).
Analysis of Variance: Fundamentals
The t-test is appropriate only when researchers analyze
differences between two groups or two measurements of subjects
within the same group. A t-test is a streamlined version of a
more general procedure for comparing differences, analysis of
variance.
In ANOVA, the sources of all possible combinations of variation
are analyzed at the same time. The aim is to determine whether
the proportion of variance accounted for by any particular
variable, or combination of variables (called an interaction), is
substantial compared to all of the remaining unexplained
variance. Think of ANOVA in terms of a partitioning process:
100% of the variance can be put together, then partitioned and
repartitioned (sliced and diced) in combinations.
To compute ANOVAs:
1) The variance for each score is calculated, then squared
(to eliminate effects of negative and positive values).
Each variance is then summed; the result is the sum of squares.
The SS for the variance explained and the SS for the
variance unexplained (residual) add up to total variance.
2) Each sum of squares is divided by the applicable degrees
of freedom (based on the total number of treatments
related to the variable under analysis, less one). This
results in a mean square.
3) An F ratio is computed by dividing the resulting mean
square for each variable or combination of variables by
the mean square computed for the residual, representing
all the remaining variance.
As with the t-test, statistical tables can be found in statistics
books that provide the critical values of F. However
1) Separate F tables exist for each probability level or p-
value adopted: Upper 5 Percent Points (p=.05), Upper 1
Percent Points (p=.01), etc.
2) Two different degrees of freedom are reported and used to
determine the critical value. Most F statistics are
reported as follows:
F(1,26).05=3.76
The first number in the parenthesis (v1) represents the
number of treatments analyzed, calculated by multiplying
the number of treatments for variable 1 less one, times
the number of treatments for variable 2 less one:
Treatments 1x3 = 2 d.f. 2x2 = 1 d.f. 3x3 = 4 d.f.
1x4 = 3 d.f. 2x3 = 2 d.f. 3x4 = 6 d.f.
1x5 = 4 d.f. 2x4 = 3 d.f. 3x5 = 8 d.f.
The corresponding critical value is found in the
applicable columns of the F-table.
The second number in the parenthesis (v2) represents
the total number of subjects, less one, and is found in
the rows of the F-table.
For 10 subjects: d.f. = 9
For 20 subjects: d.f. = 19 etc.
Given an F-statistic, the same general procedure is used to
determine whether it is significant statistically: Compare the
the F (obtained) to the F (critical) found in the table. If the value exceeds
the tabled critical value, the difference is significant.
Note: The t distributions and F distributions are related: A t-statistic is the
equivalent of the square root of F in the case of an ANOVA with only two means
being compared (F1,k). Thus, in the example above, t(8).05,2t=2.306 is the same as
F(1,8).05,2t=5.517. The first d.f. quoted in an F statistic is understood always to be
to be 1 in the case of a t-test. Thus, in a t-table, v1 does not need to be specified
and only v2 is used.
There are two basic types of ANOVAs: one-way and factorial.
Comparing Three or More Means For One Variable: One-Way ANOVA
The One-Way ANOVA allows a researcher to compare three or more
treatments based upon one variable and to determine if they are
statistically different. Operationally, a One-Way ANOVA is
computed by dividing
F Value = MSbet (Mean square between)
_______
MSwith (Mean square within)
where the mean square between is the variance explained related
to the variable being investigated, compared to all random
(unexplained) variance found within the subjects.
A One-Way ANOVA differs from the t-test because a) more than two
means can be compared, but 2) only the fact that a difference
exists can be determined. To determine which mean(s) is (are)
different from the other(s) requires use of multiple comparison
procedures. Multiple comparisons (often referred to as a priori
or post-hoc comparisons) operate like multiple, simultaneous t-
tests, but are conducted within a desired probability level.
Factorial Designs: General ANOVA Model
Analysis of variance is useful because it allows multiple
variables to be analyzed simultaneously. In addition, it permits
exploration of interactions between variables or factors.
Consider this simple example from a study comparing news
and advertising:
Summary Table of Means:
Believability Scale
(Means based on 7-point scale: 1=not believable, 7=highly believable)
Content Class
Gender
News Ads All
Females 4.84 4.43 4.64
Males 5.06 4.76 4.91
Totals 4.96 4.62 4.79
Instead of a single ratio, multiple F ratios are computed, and
are presented in a single table such as this:
F Table
Sum of D.F. Mean F p
Signif- Squares Square
icance
Main Effects
Gender 24.658 1 24.658 17.390 .000
Content Class 38.885 1 38.885 21.284 .000
Interaction
Gender X Class .965 1 .965 .528 .468
Explained 64.507 3 21.502 11.700 .000
Residual 2387.803 1301 1.827
(Unexplained)
Total 2441.337 1304 1.872
Fs were computed as follows:
Gender (24.658 divided by 1.827) = F(1,1301).05=17.390, pó=.000
Class (38.885 divided by 1.827) = F(1,1301).05=38.885, pó=.000
Interaction (.965 divided by 1.827) = F<1, n.s.
Note: 1304 d.f. is based on 1,316 observations, less 11 incomplete cases,
minus 1 degree of freedom.
A critical issue for researchers relates to the correct
calculation of the error term (residual) used as the denominator
in the F computation. Slightly different error terms are used
depending on whether a between-subjects or within-subjects design
is used. Most computer programs require the researcher to
specify the type of design.
Some related procedures:
ANCOVA -- Analysis of covariance follows generally the same
procedure, but first adjusts the scores based on some other
(spurious) variable that is suspected of affecting the results.
The effect of this covariate is netted out before the regular
ANOVA is calculated. The effects of the covariate are examined
by computing a mean square that is divided by the mean square of
the residiual in the same way as a regular ANOVA. The same kind
of F ratio and p-values are calculated.
MANOVA -- Multivariate analysis of variance treats several
variables as a single dependent measure, upon which the analysis
of variance is performed. Most often MANOVA is used when the
several dependent measures are very highly correlated, suggesting
that the effects should be the same on each of the underlying
variables.
Example: In consumer behavior research, attitude toward the
advertisement, attitude toward the brand, and purchase intent are
inter-related ideas. All of the dependent variables are analyzed
frequently as if they were the same, which eliminates the need to
duplicate similar analyzes on each. However, additional v1
degrees of freedom are used when dependent variables are combined
in this procedure.
MANCOVA -- Combines ANCOVA and MANOVA, allowing the researcher
to control for one or more extraneous or spurious variables
before conducting an analysis of variance procedure on multiple
dependent measures.