*! version 1.1.0 - April 2006 Mike Lacy. * Many helpful modifications from Richard Williams ***************************************************************** * April 2006: Changed display of results so as to scale them * by 1/N. They are thus analogous to variances rather than sums of squares * and match what is reported in the article. * No standard Stata help file is available yet. * See Comments and "Usage" below. ***************************************************************** * capture program drop R2o program define R2o, rclass version 7 * This program calculates the ordinal R2o measure and its adjusted * version as described in: * Lacy, Michael G. 2006. "An Explained Variation Measure for * Ordinal Response Models with Comparisons to Other Ordinal R2 * Measures." Sociological Methods and Research 32,4. * * The program below presumes that a categorical response model (ologit * mlogit oprobit gologit2) that can yield predicted probabiliites * via -predict- has just been run and that its return list is still * intact. * * The "trustme" option can be used to force the program to run even if * the user has not just executed one of preceding response model programs. * This option would be relevant for the user who has a nonstandard response model * program that works with -predict-. * * Usage: R2o [, trustme] * Returns: The regular R2o and its bias adjusted version in * r(R2o) and r(uR2o) * * Begin program * Detect whether an appropriate categorical response estimation command has been run syntax [, TRustme] tempname ok scalar `ok' = 0 local clist = "ologit mlogit oprobit gologit2" foreach c of local clist { if !`ok' & e(cmd)== "`c'" { scalar `ok' = 1 } } * User can force the program to work with his or her command if "`trustme'"!="" { scalar `ok' = 1 } if !`ok' { display as error "Preceding estimation command must be able to generate predicted probabilities." display as error "Command -R2o- not executed" exit } * Else, we've got a response model that gives predicted probs., so we can go ahead. * First capture stuff from the response model just run. * This will be used to yield the predicted Prob(Y), given the X vector local depvar = "`e(depvar)'" /* name of dependent variable */ local kcat = e(k_cat) /* # response categories */ tempname numcovar N scalar `numcovar' = e(df_m) /* number of covariates */ scalar `N' = e(N) /* Sample size from model just run */ * * Get the weighting information, if any * Need to change pweights to aweights so tabulate & sum can handle them if "`e(wtype)'"=="pweight"{ local wgt "[aweight`e(wexp)']" } else if "`e(wtype)'"!="" { local wgt "[`e(wtype)'`e(wexp)']" } * Or, if you just want to kill support for weights altogether, uncomment these lines. *if "`e(wtype)'"!=""{ * display as error "Weights are not supported" * exit *} tempvar touse mark `touse' if e(sample) /* mark the cases used to yield the prediction */ * * Obtain total variation of the ordinal response variable from its marginal distribution. * The marginal distribution comes from a simple tabulation, and then * the ordinal variation measure is calculated on this distribution. di " " tempname F /* to eventually hold a vector of predicted cum probabilities*/ display as text "Marginal distribution for cases in the estimation sample." tabulate `depvar' if `touse' `wgt', matcell(`F') * Marginal frequencies are now in the vector F, ordered from 1 to r(r). * Convert them to cumulative relative frequencies * matrix `F'[1,1] = `F'[1,1]/`N' forvalues ir = 2/`r(r)' { matrix `F'[`ir',1] = (`F'[`ir'-1, 1]) + `F'[`ir', 1]/`N' } * * Now calculate the marginal variation, SY, by applying ordinal * variation function, Sum (F_j(1-F_j), j = 1..k to the marginal * cum F distribution. * The sum actually needs only to go to `r(r)' -1 but since F_r(r) = 1, * there is no harm in running up to r(r). tempname SY scalar `SY' = 0 forvalues ir = 1/`r(r)' { scalar `SY' = `SY' + ( `F'[`ir',1] * ( 1.0 - `F'[`ir',1])) } scalar `SY' = `SY' matrix drop `F' /* avoid confusion later with another F */ * * * Start on conditional variation SYX, i.e., the variation of Y given X . * Create list of temporary variables, `F1'... to hold predicted * probabilities for each case as created by the response model * the user just executed. local Flist = "" forvalues i = 1/`kcat' { tempvar F`i' local Flist = "`Flist'`F`i'' " /* `F1' `F2' ... */ } quietly predict `Flist' if `touse' /* predicted prob for each case into F1, ... */ * Cumulate the predicted probabilities for each case tempvar prev gen `prev' = `F1' forvalues i = 2/`kcat' { quietly replace `F`i'' = `prev' + `F`i'' /* double quote for tempvar and i */ quietly replace `prev' = `F`i'' } * The sum actually needs only to go to `kcat'-1, but since F_kcat = 1, * there is no harm in running up to kcat. tempvar iSYX gen `iSYX' = 0 forvalues i = 1/`kcat' { quietly replace `iSYX' = `iSYX' + `F`i'' * (1.0 - `F`i'') } * * Mean of the iSYX across all cases is the SYX sum `iSYX' if `touse' `wgt', meanonly tempname SYX scalar `SYX' = r(mean) tempname R2o uR2o scalar `R2o' = 1.0 - `SYX'/`SY' /* the basic ordinal R^2 measure */ return scalar R2o = `R2o' * * Now do the bias corrected version of R2o, calculated like * the adjusted version of a conventional R^2. `numcovar' denotes the * number of covariates used in the response model scalar `uR2o' = 1.0 - (`SYX'/`SY') * ((`N'-1)/(`N'- `numcovar' -1)) return scalar uR2o = `uR2o' * display "" display as text /// "Total Model Error Lacy Bias Adj." _newline /// "Variation Variation Variation R2o R2o" _newline /// _dup(58) char(175) _newline /// as result /// %-12.6f `SY' %-12.6f (`SY' - `SYX') %-12.6f `SYX' %-12.5f `R2o' %-12.5f `uR2o' * end