化的回归系数与给标准化的回归系数
Standardization.
The 1981 reader by Peter Marsden (Linear Models in Social Research) contains some useful and readable papers, and his introductory sections deserve to be read (as an unusually perceptive book reviewer noted in the journal Social Forces in 1983). One paper in that collection that has become a standard reference is "Standardization in Causal Analysis" by Kim and Ferree. Standardization, in the social and behavioral sciences, refers to the practice of redefining regression equations in terms of standard deviation units. An ordinary ("raw") regression coefficient b is replaced by b times s(X)/s(Y) where s(Y) is the standard deviation of the dependent variable, Y, and s(X) is the standard deviation of the predictor, X . An equivalent result can be achieved by imagining that all variables in a regression have been rescaled to z-scores by subtracting their respective means and dividing by their standard deviations. This is often referred to as a change of scale or linear transformation of the data.
Changes of scale are trivial in one sense, for they do not affect the underlying reality or the degree of fit of a linear model to data. Choosing to measure distance in meters rather than feet is a matter of taste or convention, not a matter for the theoretical physicist or statistician to worry about. But since such changes affect the values of numbers, they may have an impact on a naive researcher whose goal is to evaluate "the relative importance of different explanatory variables" or "the relative importance of a given variable in two or more different populations" (Marsden, p. 15). While there are an infinite number of ways to change scales of measurement, the standardization technique is the one most often adopted by social and behavioral scientists. The standardized regression coefficients are often called "beta weights" or simply "betas" in some books and are routinely calculated and reported in SPSS.
Agresti and Finlay (p.416) illustrate standardization in a model in which the subject's "life events" and "socio-economic status" have been used to predict "mental impairment". The respective coefficients are .103 and -.097, indicating that "there is a .1-unit increase in the estimated mean of mental impairment for every 1-unit increase in the life events score, controlling for SES" (p. 392) compared to a decrease of .097 in estimated mean mental impairment when SES increases by one point and life events are held constant. These two "effects" are hard to compare since the two predictors have entirely different units of measurement. After standardizing, the regression coefficients are .43 and -.45, respectively, and A&F conclude that the two coefficients have similar magnitudes: a "standard deviation increase in X2 , controlling for X1 " has about the same effect on mental impairment as "a standard deviation increase in X1 , controlling for X2" , but in the opposite direction.
The attenuation problem also arises in this context, unless the data being used are a simple random sample from the population. If stratified sampling has been used, or if the data are from a designed experiment, the standard deviations of the predictors may not be unbiased estimates of their population analogs. While the unstandardized regression coefficients will usually be good estimates of the population model parameters, the standardized coefficients will not be generalizable and thus are difficult to interpret.
Kim & Ferree argued forcefully that routine use of standardized coefficients to solve the problem of comparing apples and oranges is not justifiable, and that it is possible to evaluate relative importance of predictors only when some legitimate common unit of measurement is available for all predictors. Agresti and Finlay (p. 419) warn against using standardized coefficients when comparing the results of the same regression analysis on different groups. Hubert Blalock, of course, had made the same points many years before (see Chapter 8 of his 1971 reader Causal Models in the Social Sciences, which reproduces his 1967 article).
Despite these warnings, social and behavioral science applications of regression analysis in the period 1960 - 1990 were very likely to use standardized variables. My opinion is that it is only in the last decade that the tide has turned toward analysis that emphasizes measured units and de-emphasizes the goal of comparative effect evaluation.
These issues apply to single-equation regression models, but become even more involved when a multiple equation causal model is being studied. Early converts to Sewall Wright's path analysis methodology saw as their goal the decomposition of X/Y correlations into direct effects, indirect effects, and effects due to common causes. The Pearson correlations among the variables served as the raw data for such analyses and the path coefficients used in the decomposition of effects were standardized regression coefficients. Standardization was taken for granted, not considered a problematic step in the research process. (See Agresti and Finlay, Section 16.2 for an example.) To summarize, correlations (whether r or R) can be considered as characteristics of a population as well as descriptions of a sample. Non-random samples will not necessarily provide good estimates of these correlations. Under such circumstances standardized regression coefficients, R-squares, and "path coefficients" computed from the sample data in routine ways may not be good estimates of the population phenomena the researcher is seeking to understand. The aforementioned reviewer of Marsden's reader, noting that some of the articles in the book used data from designed experiments or non-simple random samples, pointed out that:
本文来自: 人大出参考:经经经经 经经经经经经
tid=19900&page=1