Statistics for Non-Statisticians

DESCRIPTIVE STATISTICS | INFERENTIAL STATISTICS FOR SAMPLES
MEASURING RELATIONSHIPS | MEASURING DIFFERENCES
Note: This page is still under construction. Several graphics are missing, and certain statistical equation notations need to be corrected. 11/30/96

Descriptive Statistics

Some Basic Notations

=       Equal             < Less than          Less than/equal to
        Not equal to      > More than          More than/equal to
                                               Sum (Sigma)
                                     
                          Descriptive Statistics

(Statistics applies to samples; parameters applies to populations)  

Measures of Central Tendency 

Frequency  --  number (count) of occurrences
Mean (M) -- arithmetic average, computed by summing values of all
   scores, divided by number of all scores
Median (Md) --  midpoint; 50% of scores are higher; 50% of scores
   are lower (useful in splitting and comparing groups)
Mode (Mo) -- most frequent value or score (can be unimodal,
bimodal, trimodal or multi-modal).
   Mean, median and mode are same only in the case of all scores  
   being the same, which negates the value of most research       
   (because there is no variation to study).


Measures of Dispersion

Range --  lowest to the highest value or scores, an indication of
   the amount of variation that can be observed.
Percentiles, deciles, quartiles -- divisions of the scores by
   hundredths, tenths, quarters, respectively.  Often useful for
   same purpose as a median split (comparison of groups).
Variance --  measure of dispersion away from a mean
      Variance in a population:  sigma-squared: 
      Variance in a sample:  s-squared  

Standard deviation  (SD or s.d.) -- square root of the variation.
      Standard deviation in a population:  sigma
      Standard deviation in a sample:  s

This is a useful method for standardizing the variance found in
any population or sample.  In a normal distribution:

   ñ 1 standard deviation = 68.26% of cases (34.13% each
       direction)     
   ñ 2 standard deviations = 95.46% of cases (additional 13.59%)
   ñ 3 standard deviations = 99.87% of cases (additional .0214%)
   ñ 3.25 standard deviations = 99.94% of cases

The following chart from Williams (1979) shows a hypothetical
normal distribution, with a population mean (Mu), and population
standard deviation:

GRAPHIC UNDER CONSTRUCTION

The example assumes Mu=48 and the standard deviation of the
population=4.0 (the variance would be 16).  One standard
deviation says that 68% of all scores fall between 44 and 52; two
standard deviations says that 95% of all scores fall between 40
and 56.



Inferential Statistics


                     Inferential Statistics

While we want to draw conclusions about populations in research,
the data we have usually comes from a sample.  Implicit in this
fact is the idea that there will be measurement error, i.e. a
particular sample will not necessarily be an accurate measure of
the population mean and variance (which we don't know usually).

The statistical validity concern to researchers is the
probability that any particular sample does not accurately
reflect the population it is intended to measure.  Researchers
are willing to provide for a chance finding once in every 20
samples.  We denote that with a probability statement:
   pó.05:  There is a 5% or less chance that the results          
           obtained were purely by chance (95% confidence level). 
   pó.01:  The probability is reduced to one-in-100 that the      
           results are by chance (99% confidence).
   pó.001: The probability is only one-in-1,000 that the results  
           are by chance.
   Note: Probability should not be construed to mean that there
   is a 95% chance that the results are accurate.

In hypothesis testing, behavioral researchers technically test
for the null hypothesis (Ho), i.e. there is no significant
difference that couldn't be accounted for by chance.  

Some important concepts:

Sampling distribution --  The patterns of normal distributions
for samples taken, compared to the hypothetical population
distribution.  The idea of sampling distribution suggests that
over repeated attempts, samples drawn will be normally
distributed.

GRAPHIC UNDER CONSTRUCTION

Sampling error  --  An estimate of how statistics can be expected
to deviate from parameters when sampling randomly from a given
population.  This is calculated by first computing the standard
error:  the standard deviation (square root of variance), divided
by the square root of the number of observations.   

Then, depending on probability level selected, it possible to
identify the sampling error:
  For pó=.05, multiple the standard error by 1.96 -- the          
            standardized z-score for 95.00 compared to 95.46%     
            (2.00) using standard deviation units.
  For pó=.01, multiple the standard error by 2.58 -- the          
            standardized z-score for 99.00 compared to 99.87%     
            (3.00) using standard deviation units.

Important:  The size of the standard error is a direct function
of the size of the sample since its square root is used in the
denominator:  The smaller the sample; the larger error term.  To
narrow the chance of error, increase sample size!

Consider this example from Williams (1979):

GRAPHIC UNDER CONSTRUCTION


The example above assumes that the mean of the population is 7.38
and that the standard error (åm)=.47.   The graph shows values
needed to calculate the sampling error at the 95% and 99% levels:
  For pó=.05, .47 x 1.96= ñ.92  sampling error
  For pó=.01, .47 x 2.58= ñ1.21 sampling error 

Confidence Interval.  The values of the range where the mean is
believed to fall, based on the mean obtained and the upper and
lower values of the sampling error
  For pó=.05 (95% confidence interval):  7.38 ñ .92 =  6.46-8.30 
  For pó=.01 (99% confidence interval):  7.38 ñ 1.21 =  6.17-8.59



Statistics Measuring Relationships


Some Basic Notations

      Numbers of treatments
      Chi-square
      Phi
      Correlation coefficient
      Coefficient of determination

     
                   Statistics Describing Relationships 
                       Between Two or More Variables


While describing the central tendency and variance for one
variable in a population or a sample is important, more
interesting questions often involve examining two or more
variables at one time.


Nominal (Categorical) and Ordinal Data

Cross-Tabulation -- This technique is used to compare two or more
sets of categorical data.  The simplest example is a 2x2
contingency table, which can be presented using frequencies or
percentages.


                    Department Enrollments By 
                    Gender and Classification    

                         Males      Females
                         __________________

       Grad Students       9           12       21
                         __________________

       Undergraduates      150        200      350
                         ___________________ 
                           159        212      371


Cross-tabs can also be used to analyze combinations of
categorical and ordinal measures.  For example:

                          Department Enrollments by
                          Gender and Class Level

                                 Males Females             
                                 ___________________
              Grad Students        9     12      21
                                 ___________________
              Seniors (90-125)    65     85     150    
              Juniors (60-89)     42     58     100
              Sophomores (30-59)  32     43      75
              Freshmen (0-29)     11     14      25
                                 ___________________
                                 150    200     371


For nominal and ordinal data, normal distribution cannot be
assumed.  Therefore, there are a series of non-parametric tests
that can be performed.

      Chi-square ( ý) is a test of statistical independence of
      categorical data.  Using a table of  ý critical values,
      a researcher can compare results obtained in the
      calculation to determine whether the value obtain exceeds the critical
      value necessary.
  
          Phi (í) is a test of association using the Chi-square
          statistic that is specifically for 2x2 contingency
          tables.  Computationally, it is square root of the
          chi-square statistic divided by the number of           
          observations.

          Cramer's V is a more generalized chi-square based
          statistic that can be used for categorical
          situations larger than 2x2 and is based on Phi (í).

      Several nonparametric measures can be used to analyze
      ordinal and categorical/ordinal data:  These include: 

          Kendall's Tau and Tau          
          Gamma
          Spearman's rho (tests ordering)


Interval and Ratio Data

Most research involving interval data involves more than 30
observations and, according to the Central Limits Theorem, it is
therefore possible to assume normality of distribution.  As such,
it is possible to use a series of statistical specially designed
to test the relationship between two interval measures.

Scatter-plots are used to track to two sets of interval data,
such as grade point average and ACT tests: 

GRAPHIC UNDER CONSTRUCTION

Linear regression can be thought of a measure of the combined
means of both measures.  A regression line can be drawn through
the data that provides "best fit" explanation of the data
pattern.   Given the score on one variable, regression allows
prediction of the score or value of the second.  A regression can
be drawn by knowing where the line crosses the X-axis and knowing
the slope of the line.  

       The general formula is:  Y= bX + a, where:
            Y=  value on Y variable
            bX= value where the line across the X axis
            a = the rise on the y access compared to the x
                axis (run versus rise).


Correlation can be thought of a measure of the dispersion of two
scores and is based on calculating the variance.   The most
common measure is the Pearson product-moment correlation coefficient
(Pearson r), which ranges in value from .00 (no relationship) to
1.00 (perfect correlation).  It provides a single-figure measure
of relationship.

It is rare that a perfect correlation is found.  In social
science, Pearson r correlations can be interpreted as follows:
         
          less than .20:  slight (possible relationship)
                 .20-39:  low (some relationship)
                 .40-69:  moderate (substantial relationship)
                 .60-89:  high (marked relationship)
          more than  90:  strong (definitely related) 

Again, the number of observations is critical.  With a very large
number of observations, somewhat small correlations can be
statistically significant.  With a small number of observations,
however, statistical significance (a conclusion that the result
is other than by chance) requires relatively high correlation
numbers.

Some important enhancements to these approach include:

Multiple regression -- Using two or more variables to predict the
value of a dependent variable

Curvilinear regression -- Assumes other than a straight line
(linear) relationship, such as a U-shape or inverted U-shape.

Factor analysis -- A technique based on correlation that involves
reducing a large number of similar items, such as items used in a
scale, to distill a fewer number of underlying constructs or
dimensions that are represented.  


Statistics Measuring Differences


Some Basic Notations

p    probability less   M   Mean (M1, M2 etc.)  k  any number 
     than or equal to   n   no. observations   1Q one-tailed test
d.f. degrees of freedom diff standard error    2Q two-tailed test
t    t statistic             of difference     v1 column d.f 
F    F statistic                               v2 row d.f.

                                     
    Statistics Comparing Differences Between Two or More Groups

Much research involves determining the means and variances for
different groups of observations, and then comparing the results
to see if the scores can be attributed to causes other than
chance.  

Such comparisons are common in survey research, where it is
valuable to compare scores between demographic categories (e.g.
gender). Such comparisons also are the foundation for most
experiments, which involve exposing subjects to two or more
levels of an independent (categorical) variable and then
comparing responses on a dependent variable (usually an interval
measure).  


Comparing Two Means: t-testM

The simplest differences test involves two means, and is called a
Student's t-test.  Researchers need to know:  a) the means
obtained, b) the variance or standard deviation for each group,
or the combined variance for all groups, and  c) the number of
subjects in each group.
                              t= M1 - M2
                                 _______      
                                   ådiff

where M1 and M2 represent the two means obtained, and ådiff
represents the standard error of the difference (which is
calculated by knowing the variance and sample size for each
sample.

Consider this example:  
                               Group A    Group B
                      Means       57        52
                      n            5         5
                      ådiff          2         2   (equal)

The tobtained is calculated based on the formula:

                         t = 57-52  = 5   = 2.50
                               2      2

(Note: In this example the sample sizes and standard errors for each group are
the same.  If this were not the case, it is necessary to compute the standard
error using a weighting formula.)

To determine whether the means are different statistically, the
researcher compares the t (obtained)=2.50 to a critical value of t that
can be found in a table included in most statistics books.

     A t-table is organized based on p-values (columns) and the   
     number of observations (rows). The latter is expressed       
     as degrees of freedom (d.f. or df).  To read the table:  

     First, choose a desired p-value:  p=.05, p=.01 or p=.001.

     Second, if you can predict in advance the direction of the
        difference (i.e. which numbers are higher and lower), you
        can choose to use a one-tailed test (which improves your  
        chances of finding a significant difference).  Otherwise, 
        rely on the two-tailed option. 

     Third, determine the degrees of freedom (d.f) which apply,
        based on the number of observations:

                           d.f.=  n1 + n2 - 2

        where n1 is the number of subjects in group 1 and n2 is
        the number of subjects in group 2.

     Fourth, go down the column (showing p-values and one- or two-  
        tailed direction, 1Q versus 2Q) to the row showing the    
        number of degrees of freedom (or the number that most     
        closely approximates it).   The value at the intersection 
        is the critical value of t.

     If the t (obtained) is larger than the tcritical, the difference
     is significant statistically.  If less, it is probable that
     the result obtained was merely by chance at the p-level 
     selected. 

     In our example:
           From calculation:  tobtained      =   2.50
           From t table:      t(8).05,2t      =   2.306
 
     Because tobtained=2.50 > tcritical=2.306, we can conclude the 5-
     point difference in means is statistically significant
     (only 1-in-20 odds that it was by chance at the .05 level
     using a two-tailed test).   


Analysis of Variance: Fundamentals

The t-test is appropriate only when researchers analyze
differences between two groups or two measurements of subjects
within the same group.  A t-test is a streamlined version of a
more general procedure for comparing differences, analysis of
variance.  

In ANOVA, the sources of all possible combinations of variation
are analyzed at the same time.  The aim is to determine whether
the proportion of variance accounted for by any particular
variable, or combination of variables (called an interaction), is
substantial compared to all of the remaining unexplained
variance.  Think of ANOVA in terms of a partitioning process: 
100% of the variance can be put together, then partitioned and
repartitioned (sliced and diced) in combinations.  


To compute ANOVAs:

    1) The variance for each score is calculated, then squared
       (to eliminate effects of negative and positive values).  
       Each variance is then summed; the result is the sum of squares.
       The SS for the variance explained and the SS for the       
       variance unexplained (residual) add up to total variance.

    2) Each sum of squares is divided by the applicable degrees
       of freedom (based on the total number of treatments
       related to the variable under analysis, less one).  This
       results in a mean square.

    3) An F ratio is computed by dividing the resulting mean     
       square for each variable or combination of variables by
       the mean square computed for the residual, representing
       all the remaining variance.     

As with the t-test, statistical tables can be found in statistics
books that provide the critical values of F.  However

     1) Separate F tables exist for each probability level or p-  
        value adopted: Upper 5 Percent Points (p=.05), Upper 1    
        Percent Points (p=.01), etc.

     2) Two different degrees of freedom are reported and used to
        determine the critical value.   Most F statistics are 
        reported as follows:
                          F(1,26).05=3.76

        The first number in the parenthesis (v1) represents the
        number of treatments analyzed, calculated by multiplying  
        the number of treatments for variable 1 less one, times
        the number of treatments for variable 2 less one:  
             
        Treatments  1x3  =  2 d.f.   2x2 = 1 d.f.   3x3 = 4 d.f.
                    1x4  =  3 d.f.   2x3 = 2 d.f.   3x4 = 6 d.f.
                    1x5  =  4 d.f.   2x4 = 3 d.f.   3x5 = 8 d.f.

        The corresponding critical value is found in the
        applicable columns of the F-table.
                                     
        The second number in the parenthesis (v2) represents
        the total number of subjects, less one, and is found in
        the rows of the F-table.
                      For 10 subjects:  d.f. = 9
                      For 20 subjects:  d.f. = 19 etc.

Given an F-statistic, the same general procedure is used to
determine whether it is significant statistically:  Compare the
the F (obtained) to the F (critical) found in the table.  If the value exceeds
the tabled critical value, the difference is significant.   

Note:  The t distributions and F distributions are related: A t-statistic is the
equivalent of the square root of F in the case of an ANOVA with only two means
being compared (F1,k). Thus, in the example above, t(8).05,2t=2.306 is the same as
F(1,8).05,2t=5.517.   The first d.f. quoted in an F statistic is understood always to be 
to be 1 in the case of a t-test.  Thus, in a t-table, v1 does not need to be specified
and only v2 is used.

There are two basic types of ANOVAs:  one-way and factorial.


Comparing Three or More Means For One Variable:  One-Way ANOVA

The One-Way ANOVA allows a researcher to compare three or more
treatments based upon one variable and to determine if they are
statistically different.  Operationally, a One-Way ANOVA is
computed by dividing 

           F Value =  MSbet      (Mean square between)    
                      _______

                      MSwith     (Mean square within)

where the mean square between is the variance explained related
to the variable being investigated, compared to all random
(unexplained) variance found within the subjects. 

A One-Way ANOVA differs from the t-test because a) more than two
means can be compared, but 2) only the fact that a difference
exists can be determined.   To determine which mean(s) is (are)
different from the other(s) requires use of multiple comparison
procedures.  Multiple comparisons (often referred to as a priori
or post-hoc comparisons) operate like multiple, simultaneous t-
tests, but are conducted within a desired probability level. 


Factorial Designs: General ANOVA Model

Analysis of variance is useful because it allows multiple
variables to be analyzed simultaneously.  In addition, it permits
exploration of interactions between variables or factors. 
Consider this simple example from a study comparing news
and advertising:


Summary Table of Means:

                               Believability Scale 
     (Means based on 7-point scale: 1=not believable, 7=highly believable)    

                                  Content Class
                    Gender               
                               News       Ads     All
                     
                    Females     4.84     4.43     4.64  
                    Males       5.06     4.76     4.91

                     Totals     4.96     4.62     4.79


Instead of a single ratio, multiple F ratios are computed, and
are presented in a single table such as this:  


                                  F Table
                                                            
                     Sum of      D.F.   Mean       F       p
Signif-              Squares            Square         
icance
Main Effects
 Gender               24.658      1     24.658   17.390  .000
 Content Class        38.885      1     38.885   21.284  .000
Interaction
 Gender X Class         .965      1       .965     .528  .468

Explained             64.507      3     21.502   11.700  .000

Residual             2387.803   1301     1.827
(Unexplained)

Total                2441.337   1304     1.872

Fs were computed as follows:     
 Gender      (24.658 divided by 1.827) =   F(1,1301).05=17.390, pó=.000
 Class       (38.885 divided by 1.827) =   F(1,1301).05=38.885, pó=.000
 Interaction (.965 divided by 1.827)  =   F<1, n.s.

Note: 1304 d.f. is based on 1,316 observations, less 11 incomplete cases, 
minus 1 degree of freedom.  

A critical issue for researchers relates to the correct
calculation of the error term (residual) used as the denominator
in the F computation.  Slightly different error terms are used
depending on whether a between-subjects or within-subjects design
is used.  Most computer programs require the researcher to
specify the type of design.


Some related procedures:

ANCOVA -- Analysis of covariance follows generally the same
procedure, but first adjusts the scores based on some other
(spurious) variable that is suspected of affecting the results. 
The effect of this covariate is netted out before the regular
ANOVA is calculated.  The effects of the covariate are examined
by computing a mean square that is divided by the mean square of
the residiual in the same way as a regular ANOVA.  The same kind
of F ratio and p-values are calculated. 

MANOVA --  Multivariate analysis of variance treats several
variables as a single dependent measure, upon which the analysis
of variance is performed.  Most often MANOVA is used when the
several dependent measures are very highly correlated, suggesting
that the effects should be the same on each of the underlying
variables.   

Example:  In consumer behavior research, attitude toward the
advertisement, attitude toward the brand, and purchase intent are
inter-related ideas.  All of the dependent variables are analyzed
frequently as if they were the same, which eliminates the need to
duplicate similar analyzes on each.  However, additional v1
degrees of freedom are used when dependent variables are combined
in this procedure.

MANCOVA --  Combines ANCOVA and MANOVA, allowing the researcher
to control for one or more extraneous or spurious variables
before conducting an analysis of variance procedure on multiple
dependent measures.