The Pure Characteristics
Discrete Choice Model
with Application to Price Indices
still preliminary
Steven Berry
Dept. of Economics, Yale
and NBER
Ariel Pakes
Dept. of Economics, Harvard
and NBER
June 12, 2001
Abstract
In this paper we consider a class of discrete choice models in which consumers
care about a finite set of product characteristics. These models have been
used extensively in the theoretical literature on product differentiation, but
have not as yet been translated into a form that is useful for empirical work.
Most recent econometric applications of discrete choice models implicitly
let the dimension of the characteristic space increase with the number of
products. The models in this paper have very different theoretical properties.
After developing those properties and comparing them to the properties of
models where there is a taste for the product per se, we provide an algorithm
for an estimator for the parameters of our model. In this version of the
paper, we particularly consider how the modeling choices discussed in the
paper affect the calculation of ideal consumer price indices, especially in the
presence of new goods.
1 Introduction.
Discrete choice models have recently gained importance in the study of new
goods and of differentiated products oligopoly. These models allow for a
parsimonious treatment of demand, via Lancaster’s (1971) idea that products
be treated as bundles of characteristics. Utility is then defined over a limited
set of characteristics rather than a potentially very large number of products.
Griliches’s (1961) work on hedonic price functions revived empirical work
on models based on characteristics. Hedonic price functions were introduced
as a way of accounting for “quality change” in the prices of new goods. The
reasoning was that since newer models of goods often had more desirable
characteristics, the difference between the prices of the newer and the older
models should not be entirely attributed to inflation. On the other hand,
if we build our price indices entirely from inter-period price comparisons
of goods sold in both periods, that is if we never compare “old” to “new”
goods directly, we will never capture the effect switching to new goods has on
welfare, and this will bias price index calculations upward. Griliches suggests
estimating a surface which relates prices to characteristics, and then using
the estimated surface to obtain estimates of “quality adjusted” price changes
for products with given sets of characteristics. This suggestion leads to a
lower bound for the benefits from new goods (see Pakes 1998), but to go
further than this we need a more complete analysis of characteristics based
demand systems.
Rosen (1974) introduced a class of equilibrium hedonic models with a
continuum of products and with perfect competition on the supply side.
We will start instead from the literature that uses discrete choice models of
demand and which often assumes oligopoly pricing.
McFadden’s work on discrete choice models (e.g. McFadden 1974) in-
troduced feasible techniques for estimating a complete characteristics-based
discrete choice model of demand. However, very particular assumptions were
needed and the literature that followed (including many contributions by Mc-
Fadden himself) has been concerned that the structure of a specific discrete
choice model would in some way restrict the range of possible outcomes,
thereby providing misleading empirical results. This paper is a continuation
of that tradition. In contrast to the typical empirical model, we consider es-
timation of a class of discrete choice models in which consumers care about a
finite set of product characteristics. These models have been used extensively
in the theoretical literature on product differentiation, but have not as yet
been translated into a form that is useful for empirical work.
Typical discrete choice empirical models implicitly assume that the di-
1
mension of the product space increases with the number of products. This
assumption is often embedded in an otherwise unexplained i.i.d. additive
random term in the utility function. This term might be interpreted as a
direct “taste for the product”, as opposed to a taste for the characteris-
tics of the products. In many cases, such models probably do a good job
of approximating demand, but we worry because these models have some
counter-intuitive implications as the number of products increases. Thus,
they might not do a great job of answering questions that are specifically
about changes in the number of goods – one example being the evaluation
of the benefits from introducing new goods into the market.
We begin by explaining why we might want to use a “pure characteris-
tics model”, with a finite set of product characteristics and no tastes for the
products themselves. We then develop some of the properties of our model.
These properties enable us to build an algorithm for estimating the pure
characteristics model. The paper provides with some Monte Carlo evidence
both on the estimation of utility parameters and on the use of those param-
eters to construct price indices. We conclude with a discussion of proposed
empirical work on the personal computer industry.
2 Discrete Choice Models and Empirical Work.
We consider models in which each consumer chooses to buy at most one
product from some set of differentiated products. Consumer i’s (indirect)
utility from the purchase of product j is
Uij = U(Xj, Vi, θ), (1)
where Xj is a vector of product characteristics (including the price of the
good), Vi is a vector of consumer tastes and θ is some vector of parameters.
Probably the earliest model of this sort in the economic literature is the
Hotelling (1929) model of product differentiation on the line. In that model
X is the location of the product and V is the location of the consumer.
Subsequent applications were concerned both with demand conditional on
product locations and the determination of those locations.
To obtain the market share, sj, of good j, we simply add up the number
of consumers who prefer good j over all other goods. That is
sj = Pr{Vi : U(Xj, Vi, θ) > U(Xk, Vi, θ),∀k 6= j)} (2)
To make the transition to empirical work easier we follow Berry, Levinsohn
and Pakes (1998)and partition the vector of consumer attributes, Vi, into zi,
2
which an econometrician with a micro data set might observe, and νi, which
the econometrician does not observe. We also partition product characteris-
tics, Xj, into xj, which is observed by the econometrician, and ξj, which is
not. All market participants are assumed to have perfect information.
Typically, empirical studies write the utility function as additively sepa-
rable in a deterministic function of the data and a disturbance term
Uij = f(Xj, zi; θ) + µij, (3)
where θ is a parameter to be estimated. A natural interpretation of (3) in
terms of (1) is to think of the µij as resulting from interactions between
unobserved consumer tastes (the ν) and the product characteristics X (both
observed and unobserved). The specification of the model in (3) is completed
by making a detailed set of assumptions on the joint distribution of the
{µi,j, Xj, zi} tuples.
For example, if there were K product characteristics then one might spec-
ify
µij ≡
K∑
k=1
νikXjk. (4)
and assume a parametric distribution for ν conditional on (X, z). We would
then have a “random coefficients” model. The observations on (x, z) and the
distributional assumption on the ν (together with either product dummies
or a distributional assumption on unobserved ξ component of X) would then
generate a joint distribution for {µi,j, Xj)}.
However, it is hard to interpret the typical specifications used in empirical
work in this way. Empirical work typically assumes that µij contains an i.i.d.
(across products and consumers) additive component that has support on
the entire real line. This i.i.d. component insures that the distribution of
random utilities in turn has full support on RJ (where J is the number of
products) no matter what characteristics and prices define the products. It
is not important that there is literally an i.i.d. component in the model, but
rather that the µ contain an additive component with the property that its
density, conditional on the realizations of the additive components for the
other products, is positive on the entire real line. Then for every possible set
of products, there will always be some consumers who like any given product
“infinitely” more than the others. Familiar examples of specifications that
have additive components with full support include the random coefficient
logit model discussed in Berry, Levinsohn and Pakes (1995) as well as the
random coefficient probit (see Hausman and Wise 1978, McFadden 1981).
To generate an additive component with full support from a specification
like (4), we have to make the dimension of the characteristic space, K, be a
3
function of the number of products. Caplin and Nalebuff’s (1991) suggestion
is to think of the additive component as being formed from the interaction
of a set of product-specific dummy variables and a set of i.i.d. tastes for each
product. We will refer to this class of models as having including “tastes for
products,” as opposed to just tastes for product characteristics. Though this
assumption does justify empirical work, it contradicts the spirit of the liter-
ature on characteristic based demand models (which focus on demand and
product location in a given characteristic space), and has several questionable
implications (as outlined in the next subsection).
On the other hand, models that include a taste for products have a num-
ber of important practical advantages. In particular the additive component
with full support insures that all the purchase probabilities are nonzero (at
every value of the parameter vector), and have particularly simple derivatives.
This makes most estimation algorithms, particularly maximum likelihood,
relatively easy to implement. Further, the additivity of this disturbance sim-
plifies the limits of integration for the integrals defining the needed shares.1
2.1 Properties of Empirical Models.
Recall that models with tastes for products behave as if every new good is
introduced with its own characteristic which is valued by consumers indepen-
dently of the characteristics of all other characteristics and has full support.
This is the source of the familiar “red-bus blue-bus” problem. When we in-
troduce a new product that is virtually identical to an existing product (say
product “A”) we expect the combined market shares of the new product and
product “A” to to be approximately the same as was the market share of
product “A” before we introduced the new product. The fact that we in-
troduce a “new characteristic” with the new product insures that the model
will predict a larger combined share than that (consumers who value the new
characteristic enough will switch from products other than “A” to the new
product). It should not be surprising then that we are most worried about
the implications of the model when we use it to compare situations which
involve different amounts of products.
Our biggest worry, and a leading reason to consider the model intro-
duced in this paper, is the model’s implications vis a vis the welfare gains
from product introductions. This is because models with tastes for products
insure that there will be consumers who like the new good infinitely more
than any of the previously existing products (independent of either the ob-
1See McFadden (1981) for a discussion of the transformation that transforms the probit
model’s region of integration into the positive orthant.
4
served characteristics of the new good, or of the relationship between those
characteristics and the characteristics of the products already marketed).
We hasten to add here that similar problems plague the other demand
models that have been used to evaluate new products. This is because ob-
taining the change in welfare that results from the introduction of a product
requires one to integrate over the marginal utility gains from every unit of the
product bought. At best the data can identify the marginal utilities that ob-
served price movements sweep over. The marginal utilities gains from units
that were purchased at all observed prices are obtained by extrapolating the
estimated demand system into a region for which there is no data (for a more
detailed discussion of this problem, including how it manifests itself in de-
mand systems estimated in product, in contrast to characteristic, space see
Pakes (1998) and also Hausman (1997), who reports that infinite benefits are
implied by some of his demand specifications). Welfare analysis of product
introductions is one of the more important uses of demand systems. Still,
because this type of analysis necessarily involves imputing benefits outside
the range of the data, when we engage in it we should be particularly careful
about the implications of the model’s assumptions.
Discrete choice systems have been applied to analyzing the benefits from
new goods at least since the CT scanner study of Trajtenberg (1989). Pakes,
Berry and Levinsohn (1993) go one step further and use the discrete choice
system estimated by Berry, Levinsohn and Pakes (1995) to compute an
“ideal” price index for autos that accounts for new good introduction, but
Petrin (2000), in his study of minivans, shows how the importance of the
additive component can drive the welfare implications of these models and
implicitly raises a cautionary note for all of these studies. Petrin tries to
reduce the impact of the additive component by using household-level data.
This gives more room for differences in preferences for observable attributes
to explain choices and lessens the impact of the additive component (a simi-
lar strategy is used in Berry, Levinsohn and Pakes (1998)). In this paper we
provide an alternative procedure: do away the additive component entirely.
At the very least we hope our alternative will give some indication of the
robustness of the results from these studies to the presence of the additive
component with full support.
The additive component also has other implications that are suspect.
Assume that the appropriate model has a finite-dimensional characteristics
space (no “tastes for products”). Then if we held the environment constant
we would expect the space itself to “fill up” as the number of products grew
large .2 This has two implications that are at odds with the model with
2The caveat on the environment is to rule out either technological changes, or changes
5
additive components with full support.
First, as the number of products increases (holding population fixed)
products will become increasingly good substitutes for one another and oligopolis-
tic competition will approach the competitive case, with prices driven toward
marginal cost. In models with additive components with full support there
are always some consumers with a nearly infinite preference for each product.
As a result as more goods are added markups do not generally go to zero but
are bounded from below by some positive constant (a similar point is made
by Anderson, DePalma and Thisse (1992) in the context of the logit model).
This fact might lead us to worry about the implications of the model with
the additive components on the incentives for product development, at least
in markets with large numbers of products.
Second, pure characteristics models with finite marginal preferences for
each characteristic imply that the benefits that a consumer can gain from
consuming a single product from the given market are bounded (no matter
the number of products marketed). As we increase the number of products
in a model with an additive component with full support we insure that each
consumer’s benefits grow without bound. This might lead us to worry about
the implications of the model with additive components on estimates of the
benefits to “variety”.
2.2 Finite Dimensional Models
Here we provide a brief review of the literature on pure characteristic mod-
els. The theoretical literature on these models includes the Hotelling model
of competition on a line, the “vertical” model of product differentiation of
Mussa and Rosen (1978) (see also Gabszewicz and Thisse (1979) Shaked and
Sutton (1982)) and Salop’s (1979) model of competition on a circle. In all
these models, demand is determined by the location or “address” of the prod-
ucts in the characteristics space and an exogenous distribution of consumer
preferences over this space. As the number of products increases, the product
space fills up, with products becoming very good substitutes for one another.
The vertical model was first brought to data by Bresnahan (1987) and has
been subsequently used by a few others including Greenstein (1996). Since
we will want to explicitly allow for unobserved product characteristics we use
a specification due to Berry (1994)
uij = Xjβ − αipj, (5)
in competing and complimentary products, which alter the relative benefits of producing
in different parts of the characteristic space.
6
where the unobserved and observed characteristics both enter through Xj =
(xj, ξj). In this model the quality of the good is the single index
δj ≡ Xjβ (6)
and its value increases over the entire real line. All consumers agree on
this quality ranking. The reason consumers differ in their choices is that
different consumers have different marginal utilities of income (this generates
the differences in their coefficients on price).
Other examples of pure characteristics models are given in Caplin and
Nalebuff (1991) and Anderson et al. (1992). These include the “ideal point”
models in which consumers care about the distance between their location
(vi) and the products’ location (Xj) in Rk:
uij = ‖Xj − νi‖ − αipj. (7)
where ‖ ·‖ is some distance metric. A special case of this is Hotelling’s model
of competition on the line.
If we interpret ‖ · ‖ as Euclidean distance, expand (7) and eliminate
individual specific constant terms that have the same effect on all choices
(since these do not effect preference orderings), this last specification becomes
uij = Xjβi − αipj. (8)
Equation (8) is a pure characteristics random coefficients model. It allows
consumers to differ in their tastes for different product characteristics, in
addition to differences in their marginal utility of income. Note that unlike
the standard random coefficients model used in the econometric literature
8 does not have tastes for the products, aside from the tastes for the X’s
themselves.
2.2.1 A Finite Dimensional Model for Empirical Work.
We will investigate models which are special cases of (8). The extra con-
straint we impose on this model is that there be only one unobserved product
characteristic; i.e. X = (x, ξ) ∈ Rk × R1. If, in addition, we constrain the
coefficient of the unobserved characteristic ξ to be the same for all consuming
units, so that
uij = xjβi − αipj + ξj, (9)
our model becomes identical to the model in Berry, Levinsohn and Pakes
(1995) without their additive component with full support. If we allow for
coefficients on ξ which vary over consumers, so that
uij = xjβi − αipj + λiξj, (10)
7
where λi is an additional random coefficient, then our model is identical
to the one in Das, Pakes and Olley (1995), but without an i.i.d. additive
component.
Our previous work has emphasized the reasons for (and the empirical
importance of) allowing for unobserved product characteristics in estimat-
ing discrete choice demand systems (see, in particular, Berry (1994), Berry,
Levinsohn and Pakes (1995) and Berry, Levinsohn and Pakes (1998)). Those
papers note inconsistencies in estimation techniques that do not allow for
unobserved product characteristics, explain the bias in price elastic