The
Box-Jenkins Forecasting Technique
Joseph
George Caldwell, PhD
January 1971
(Reformatted September 2006)
©
1971, 2006 Joseph George Caldwell. All
Rights Reserved.
Posted
at Internet websites http://www.foundationwebsite.org
and http://www.foundation.bw .
Contents
3. Some Classical Times Series Models
B. Forecasters Derived from Tested Mathematical
Models
A. Stochastic Box-Jenkins Models
2. Derivation of the Optimal Forecaster
3. Simulation Using a Stochastic Box-Jenkins
Model
B. Stochastic-Dynamic Box-Jenkins Models
2. Identification of Stochastic-Dynamic Model
Structure
3. Process Control Applications
Note: This document (The Box-Jenkins Forecasting Technique),
posted at http://www.foundationwebsite.org/BoxJenkins.htm
, presents a nontechnical description of the Box-Jenkins methodology. For a technical description of the
Box-Jenkins approach, see the document, TIMES
Box-Jenkins Forecasting System, posted at http://www.foundationwebsite.org/TIMESVol1TechnicalBackground.htm
.
The
Box-Jenkins Forecasting Technique
Joseph George Caldwell, PhD
The operations
of most businesses are continually affected by events beyond the control of the
management. Some of these events are
unanticipated or of such importance that they require special management
consideration to determine the response to the new situation. For example, new legislation, a large fire,
or the introduction of a significant new competitive product, are examples of
changes in the environment which will require custom-tailored analysis by
management. On the other hand, there are
in the course of normal operations a tremendous number of events that cannot be
controlled or predicted in detail, but whose occurrence is nevertheless
anticipated. For example, product
demand, raw material costs, and interest rates are continually changing. There is uncertainty associated with the
exact magnitude of the variation in these items, but such variation represents
a normal part of the environment. So
long as the environment remains essentially unchanged, this variation is
expected, and the response to this variation should be automatic, not requiring
unusual management analysis. The
effectiveness of management response to this variation often depends to a great
degree on the extent to which it can reduce the uncertainty about the magnitude
and direction of the variation. For
example, if a company can improve its ability to predict product demand, it can
likely lower its inventory requirements or improve its scheduling
efficiency. While it is often desirable
to predict long-term changes in the environment, a very basic problem in the
routine business operation is that of short-term forecasting: predicting
variation in the near future, under the assumption that the essential nature of
the variability will continue as in the past (or in some other specified
fashion). Management's solution to
problems of this sort represent its standard response to the variability
problem; its analysis and response to long-term changes in the environment will
no doubt require special attention.
The term
"near future" used above may refer to several days, months, or years
into the future, depending on the time frame of the application. The number of days, months, or years for which
we are potentially able to significantly reduce the variability will, of
course, depend on the nature of the process we are studying. For example, in forecasting monthly product
sales, we may be able to forecast well only a few months ahead if the product
is nonseasonal, or a year or two ahead if the sales exhibit strong annual
seasonal behavior.
Since
short-term forecasting problems are concerned with prediction of variation
under the assumption (explicit or not) that the essential nature of the process
will continue, it is appropriate to try to determine fixed rules to predict the
near future from the recent past. We
refer to any forecasting procedure that is a specified function of observed
data as a mathematical forecaster. Quite a number of short-term forecasting
procedures have been developed in the past.
Depending on the nature of the problem, these procedures have ranged
from very simple smoothing to the use of formulas based on elaborate
econometric models. For example, if a
company is forecasting monthly sales of hundreds of products as part of its
inventory control system, forecasting procedures (or "forecasters"
for short) having low data and computational requirements are appropriate. Alternatively, the same company might use a
detailed econometric model and sample survey data to obtain short-term
forecasts of the highest possible accuracy for the cost of a key raw material
or for total quarterly sales.
In any event,
no matter what type of forecaster is appropriate, it is desirable to obtain the
best possible forecast for the amount of effort expended. Many short-term forecasters in wide use fall
far short of providing the maximum accuracy possible, given the information on
which they are based and the computational efforts they require. If little or no historical data are
available, then it is quite possible for the errors of judgment in choosing a
short-term forecaster to be smaller than the random errors associated with
estimating a forecaster from data. If
extensive historical data are available, however, it becomes possible to derive
a mathematical forecaster based on this data which can predict better than a
mathematical forecaster that is chosen by judgment. Since the business situations in which
short-term forecasting is of concern generally continue over long periods of
time, or for a number of similar items, such data are often available.
This article
describes a new technique, developed by Profs. G.E.P. Box and G.M. Jenkins, that
enables the development from historical data of forecasters that have both high
accuracy and low computational requirements.
The technique may be applied to quickly determine forecasters that are
as uncomplicated in form as the simple smoothing methods, or that involve a
number of economic variables. In either
case, use of technique enables efficient utilization of other predictive
information contained in the data and offers assurance of obtaining the highest
forecasting accuracy possible in terms of the variables on which the forecast
is based.
Although the
Box-Jenkins model first appeared in book form (Reference 2) in 1967, the
business forecasting community seems still largely unaware of the potential of
the method. This situation is perhaps
understandable since published applications of the technique have appeared, to
this author's knowledge, only in technical journals (e.g., Reference 9). The situation has improved somewhat with the
publication of Box and Jenkins' book (Reference 1) in late 1970. A recent Harvard
Business Review article (Reference 4) includes the Box-Jenkins technique in
a discussion of the problem of selecting the appropriate forecasting techniques
for various applications. There is no
question, however, that business awareness of the Box-Jenkins forecasting
method is much less than the importance of forecasting problems would
warrant. This article is intended to
help increase this awareness.
In addition to
describing the Box-Jenkins technique, this article briefly describes some
previous forecasting techniques and cites some comparisons between the
forecasting ability of the old and new techniques. The article avoids mathematical detail, but
does involve discussion of certain technical concepts considered fundamental to
the problem of developing good forecasters.
Two approaches
have been used in the past to develop forecasters from past observations. On the one hand, forecasters have been
developed which possess intuitively satisfying properties. Alternatively, they have been developed from
tested mathematical models of the process under study. These two methods represent fundamentally
different philosophical approaches to the forecasting problem, and, as we shall
see later, they differ considerably in their forecasting abilities.
A procedure
which has been used widely in the past to produce forecasts of future
observations is smoothing. Smoothing
methods represent attempts to determine some sort of "average" value
around which the observations appear to be fluctuating. Two examples of smoothing procedures are the
moving average method and exponential smoothing. In the method of moving averages, the
forecast value is computed as the average of a fixed number of preceding
observations. The number of observations
to be included in the average is usually determined arbitrarily to compromise
between the "responsiveness" and the "stability" of the
forecaster. (Alternatively, the number
of observations included might be chosen to remove some sort of periodic behavior,
such as a monthly cycle in weekly data.
In this case, the moving average is in fact a forecast of the mean level
over the next month, rather than a forecast of the next week's level.)
With
exponential smoothing, the forecast value is a weighted average of the
preceding observations, where the weights decrease with the age of the past
observations. In exponential smoothing,
there are one or more parameters (constants) which determine the rate of
decrease of the weights. These
parameters, called smoothing constants, are determined either arbitrarily or by
some formal estimation procedure such as the method of least squares.
The smoothing
methods described above are easy to describe and straightforward to
implement. As their simplicity might
suggest, they can generally be improved upon as methods for forecasting. We shall now describe some forecasting
methods that are much more elaborate than, but offer little if any improvement
over, smoothing methods.
A graph of the
history of any process generally contains characteristic "patterns"
which appear over and over. The tendency
to extrapolate such patterns is often hard to resist, a number of forecasting
methods have been based on the premise that such extrapolation is a reasonable
thing to do.
Consider for
the moment the graph of the process known as a "random walk." At each successive point in time, there is an
equal likelihood that the process will move upward or downward. In the short term, stock prices often exhibit
such behavior. Although quadratic-like
curves often occur throughout the graph, any attempt to forecast by
extrapolating these curves will result in forecast errors of larger average
magnitude than the "random walk" forecaster that forecasts the next
period's value to be the same as the current value. Reference 1 contains an interesting example
of an elaborate method of exponential smoothing derived from fitting curves to
a process that was later shown to be a random walk. Not surprisingly, the "curve
fitting" method produced forecast errors of substantially larger magnitude
than those produced by the "random walk" forecaster.
A series of
observations taken over various points in some time interval is technically
called a time series. During the past several decades, a
considerable amount of statistical theory has been developed to analyze time
series. As with all techniques of
statistical analysis, the conclusions of time series analysis are critically
dependent on the assumptions underlying the analysis. One of the tragedies of modern
"scientific" investigation is that the computational procedures of
statistics have often been uncritically applied to data, the underlying
assumptions ignored, and false conclusions drawn from this so-called
"statistical" analysis. An
example of this situation is the application of regression analysis to
historical data to derive an equation which is then (wrongly) purported to
indicate what will happen if the regressor variables are varied. With respect to the forecasting problem, a
similar misapplication of statistical procedures has been carried out. As in the case of regression analysis, the
inappropriateness has been obscured by the complexity of the estimation
procedures.
Statistical
time series analysis provides us, among other things, with good procedures for
estimating parameters of various types of models. The structure of the models often corresponds
to physically meaningful behavior. For
example, in structural analysis, the physics of a situation may dictate a model
which is trigonometric in nature. In
this situation, a model involving sine and cosine terms would be reasonable,
and good statistical estimation procedures (e.g., multiple regression analysis)
are available for estimating the model parameters. The use of such models for describing
economic time series, however, is not necessarily appropriate simply because
the time series data exhibit periodic behavior.
Nevertheless, such models have been used in the past to describe, for
example, a seasonal time series. Along
this same vein of fitting an arbitrarily chosen model to data, a number of
"seasonal adjustments" could be estimated from the data. If the postulated model structure were
appropriate, then use of the corresponding estimation procedure would be
reasonable. In any event, the
statistical adequacy of the model must be determined before accepting any such
model as a basis for forecasting.
Thus we see
that the statistical estimation procedures can be uncritically applied to fit a
particular type of model to time series data, and this fitted model then used
as a basis for forecasting. The two key
points that are often overlooked are that the particular model structure chosen
may be inappropriate and that the fitted model may fail to be an adequate
representation of the process. These
points appear to be overlooked because of the elaborate statistical procedures
that are used to derive estimates of the parameters of the model. Of course, if the underlying model
assumptions are satisfied, then these estimation procedures -- often developed
out of sophisticated theoretical analysis -- do provide good estimates of the
model parameters and a good fitted model.
As we shall see later, however, there is a vast difference between
"model fitting" and "model building" -- the process of
determining an adequate mathematical representation of the process generating
the time series data. This process
involves subjecting a fitted model to a variety of diagnostic checks of its
adequacy to represent the process being modeled. A competent time series analysis, of course,
includes this latter process. This
section does not take issue with the methods of statistical time series
analysis -- they are valid. The trouble
arises with the uncritical application of an arbitrarily selected technique of
time series analysis -- parameter estimation (model fitting) -- and
subsequently neglecting to subject the fitted model to various tests of
adequacy.
The fact is
that many classical time series models were derived for physical situations (as
in astronomical, electrical, and mechanical fields) in which underlying
deterministic components (such as sine and cosine terms) are obscured by simple
random noise, and the various estimation procedures were developed to estimate
these components. Economic time series,
however, are essentially stochastic rather than deterministic in nature, and
for this reason, many classical time series models are simply inappropriate. The use of these essentially deterministic
time series models (also referred to as "unobserved component"
models) has been remarkably widespread, apparently a carryover resulting from
their successful (and appropriate) application in many physical situations for
more than a century. It appears,
however, that the questionable use of such models in economic applications is
finally drawing to an end (Reference 6).
The previous
paragraphs have illustrated a number of procedures for fitting a model to
data. In view of the large amount of
effort that has been expended using fitted models as a basis for forecasting,
it is indeed unfortunate that the fact that a particular model if efficiently
fitted to data is no assurance that it is an appropriate basis for forecasting.
The reason for
referring to the techniques described above as "intuitive" is no
doubt clear by now -- there is no substantiation that the model underlying the
forecaster is an adequate representation of the process under study. Whatever properties the forecaster might
possess relate to the model to which it corresponds, but these properties don't
mean very much if there is little assurance that the model is a good
representation of the actual process. To
determine a good short-term forecaster for a process, we need a good model of
the short-term behavior of the process.
We shall now turn our attention to forecasters based on models which are
good representations (in the short term) of the processes they represent.
The process of
developing a good model for a process is called model building. Model
building involves four basic steps. (See
Reference 1 for a detailed discussion of model building.) First, we must have available a class of
models which is capable of exhibiting the essential characteristics of the
process under study. Second, a
preliminary analysis of the problem under study will suggest a tentative
subclass of models which is reasonable to entertain. Third, observed data are used to
"fit" one of these models, i.e., to estimate the parameters of the
model. Fourth, the fitted model is
subjected to a number of diagnostic checks to test whether the model is an
adequate representation of the process under study. If the tests are not satisfied, the tentative
model is modified. Steps 3 and 4 are
then repeated until the tests are satisfied, i.e., until we have an adequate
model. In summary, model building
involves: (1) selection of a general model class; (2) model identification
(tentative model selection); model fitting (estimation of parameters); (4)
diagnostic checking (tests of model adequacy) and model modification; (5)
repeat steps 2, 3, and 4 until an adequate model is found. Thus model building is an iterative process
involving much more than simply fitting a model to data, the basis for
intuitive forecasting. Good procedures
for estimating parameters are easy to specify in terms of formulas and,
understandably, there has been a great deal of model fitting done in the past,
with the subsequent use of intuitive forecasters. Diagnostic checking and model modification
requires critical statistical analysis, rather than the straightforward
application of standard statistical estimation formulas, and in fact represents
the bulk of the human effort required in model building; (the computations
required to fit tentative models and compute the statistics required to test
model adequacy can be performed by computers).
Thus we see that model building involves a wide range of time series
analysis techniques, not just those associated with parameter estimation.
The first step
in model building is to choose a class of time series models with which to
work. At the extremes, there are two
basic approaches. On the one hand, we
may attempt to develop a dynamic, or causal model of the process under
study. In such a model we attempt to
relate the behavior of the variable of primary interest to the behavior of
other variables. The variables to tentatively
include in the model, and the tentative model structure, are suggested by
economic theory, and the model is referred to as an econometric model. As an
alternative to the econometric model, we may attempt to develop a purely stochastic, or empirical model of the process.
With the approach we strive simply to develop a model which exhibits the
same essential characteristics as the process under study, without attempting
to identify the casual nature of the relationships between the various relevant
interacting variables.
The terms used
above are not perfectly descriptive. For
example, a "dynamic" model will always possess a simple stochastic term to describe the
variation unexplained by the dynamic structure. Furthermore, a
"casual" economic model will undoubtedly employ empirically derived relationships between the variables. Intermediate between the two extremes of the
dynamic and the stochastic models is the class of stochastic-dynamic models, which contain both casual components and
nontrivial stochastic components.
In forecasting
with a stochastic model of a process, we are in essence attempting to predict
the next few moves of the process based generally on all of the past behavior
of the process, and in particular, on the recent history of the process. The model derived from the data describes how
the process behaves, and the forecaster predicts the near future of the time
series from the recent past, based on the stochastic behavior of the process as
characterized by the model. With an
econometric model, the forecaster predicts the near future from recent past,
based on the economic relationships characterized in the model.
The problem of
choosing a model class involves balancing accuracy requirements against the
costs associated with developing and implementing the forecaster. For example, an elaborate econometric model
would clearly be inappropriate for forecasting the short-term sales of each of
thousands of items of an inventory system.
Reference 4 includes a discussion of the problem in choosing an
appropriate model class.
To construct
an econometric time series model, we identify the variables that are considered
to have an effect on the variable of interest, and then pose a tentative
structure for the model. In most cases a
linear or linearized model relating the variables is considered, and standard
regression analysis is used to estimate the model parameters (regression
coefficients). The estimation is,
however, not always straightforward. For
example, suppose that we wish to forecast sales, and it is known that sales are
related to price, and price in turn to sales.
Then a simultaneous system of equations is necessary to describe the
system, the usual regression estimates are inappropriate, and some other
method, such as "two-stage" least squares, must be used to estimate
the parameters.
Once the
estimation has been completed, i.e., we have a fitted model, then it is
necessary to subject the model to various diagnostic checks. These checks often involve the model "residuals." A "residual" is the difference, or
"error," between an actual observation and the value predicted by the
model. In an econometric model, one of
the usual underlying assumptions is that the residuals are unrelated to each
other in a certain statistical sense. In
technical terms, the residuals are not autocorrelated. If, for a fitted model, the residuals are
autocorrelated, then the current tentative model must be modified and a new
tentative model entertained. The new
tentative model might involve either a somewhat different dynamic structure or
include some new variable.
Alternatively, if the residuals are relatively small in magnitude, their
autocorrelation can be taken into account in the estimation procedure, without
changing the dynamic model structure. If
the latter approach is followed, then we in fact have a simple example of a
stochastic-dynamic model.
In developing
an econometric model (or any other model for that matter) it can be unwise to
include a very large number of variables in the model, particularly if the
inclusion is based solely on empirically observed relationships. With a large enough number of variables, it
would not be surprising to find a model that seemed generally adequate and yet
proved, for several reasons, to be a poor basis for forecasting. First, the number of parameters may simply be
so large compared to the number of observations that it becomes difficult to
perform sensitive tests of model adequacy.
Second, by increasing the number of variables in a model we increase the
risk of discovery, by chance, an apparent relationship of the variable we wish
to forecast to some other totally unrelated variable. This, of course, is a principal drawback
associated with using regression analysis to simply "fit" a forecasting
model to data. An additional problem
associated with a many-variable forecasting model is that in using it we
implicitly assume that the relationship between all the variables of the model
will continue in the future as in the past.
The larger the number of variables involved, the less reasonable such an
assumption becomes. In general, if a
large number of parameters seems necessary, it is reasonable to suspect that
the identified model structure is inappropriate.
In order to
develop a stochastic time series model to represent a process, it is necessary
to have a flexible class of models available.
We have already mentioned a few classical time series models above and
noted their applicability to essentially deterministic situations. Two other models that have been used and that
are essentially stochastic in nature are the (finite) autoregressive model and
the (finite) moving average model. In
the autoregressive model, the current observation is represented as a weighted
average of previous observations, plus a random term that is uncorrelated with
the random terms of other observations.
In the moving average model, the current observation is represented as a
weighted average of uncorrelated random terms.1
In general, the
preceding models have not proved sufficiently general to model arbitrary
economic time series with a reasonable number of parameters, and this fact
probably accounts in part for the limited use of stochastic models (of
stochastic-dynamic models) for model building.
This situation has now been remedied, for the models investigated by Box
and Jenkins possess the capability to efficiently represent a tremendous
variety of economic time series.
This section
has described in brief detail the major categories of techniques that have been
used in the past to develop forecasting models.
We shall now turn our attention to a description of the Box-Jenkins
forecasting method.
As the
preceding section has suggested, not a great deal of forecasting has been done
using tested stochastic or stochastic-dynamic time series models. The essential reason for this situation is
that a suitable class of stochastic models had not been identified as
possessing the flexibility necessary to represent efficiently (i.e., using a
reasonable number of parameters) the tremendous variety of characteristics of
economic time series. This situation no
longer holds, for Box and Jenkins have thoroughly investigated a class of
models that prove to be quite satisfactory for both the stochastic and
stochastic-dynamic situations. We shall
first describe the purely stochastic models.
Technically, these models are called autoregressive integrated moving
average (ARIMA) models, or simply Box-Jenkins models for short.
The
Box-Jenkins models can be used to represent processes that are stationary or
nonstationary. A stationary process is
one whose statistical properties are the same over time; in particular, such a
time series fluctuates around a fixed mean value. Examples of nonstationary time series include
series which include changes in level, trends, changes in trends, or seasonal
behavior.
The purely
stochastic Box-Jenkins model is remarkably simple in form. The current observation is represented by a
linear combination (weighted average) of previous observations, plus an error
term associated with the current observation, plus a linear combination of
error terms associated with previous observations. The error terms have zero mean, constant
variance, and are uncorrelated with each other.
The portion of the model involving the observations is called the autoregressive part of the model, and
the portion involving the error terms is called the moving average part of the model.2
The problem of
building a stochastic Box-Jenkins model of a process in essence involves
determining the number of terms in the autoregressive and moving average parts
of the model, and determination of values for the parameters associated with
those parts. By statistical analysis of
the time series data, it is possible to choose from the full class of
autoregressive-integrated moving average models, subclasses of models having a
specific structure appropriate to the particular time series under
examination. By determining such a
reasonable structure for the model, the number of parameters to be estimated in
the model can be substantially reduced.
This parameter reduction is quite important, for "nonlinear"
statistical estimation procedures are generally required to fit a tentative
Box-Jenkins model.
After a
tentative Box-Jenkins model has been fitted, it is subjected to various
diagnostic checks to test its adequacy as a stochastic representation of the
process under study. If the model is
found to be inadequate, analysis of the model residuals suggests ways to modify
the model structure to obtain a new tentative model which will likely do an
improved job of representing the process.
The basic statistic for assisting identification of a reasonable
structure for a new tentative model is the autocorrelation function of the
residuals of the current model. For
testing model adequacy, the power spectrum of the residuals can also be used as
an alternative to the autocorrelation function (see Reference 7). This process of fitting a tentative model,
testing it, and determining a new tentative model, is repeated until a model is
found which does an adequate job of representing the process. (References 1 and 3 contain detailed descriptions
of this model building process.)
After a model
has been determined that, according to the various tests to which it is
subjected, is considered to be an adequate representation of the time series
under examination, we are in a position to derive a forecaster for the time
series. In order to determine a
forecaster from a model, it is necessary to specify a criterion which the
forecaster satisfies. Ideally, the
criterion should take the "cost" associated with forecast errors into
account. We would then like to derive
the forecaster for which the expected cost associated with forecast errors is
minimized. If the cost function is not
apparent, a reasonable approach is to determine the (linear) forecaster that
has the least forecast error variance, or mean squared error of
prediction. That is, for each specified
time in the future we wish to determine the forecast that has minimum mean
squared error at the point. The minimum
mean squared error forecaster is usually referred to as the "optimal"
forecaster. The optimal forecaster
corresponding to a Box-Jenkins model turns out to be the expected (mean) value
of the process, conditional on the past observations. For example, the optimal forecast one time
period into the future is the expected value of the process at that point in
time, given that the past values of the time series are as observed.
Computation of
the optimal forecast is quite easy for a stochastic Box-Jenkins model. To compute the lead-one forecast, for
example, we simply substitute, into the formula defining the model, the
observed (past) values and the optimal estimates of the past error terms. These errors terms can be estimated
recursively: they are simply the past forecast errors. To compute the optimal forecast beyond lead
one, we substitute forecasts for future observations and zeros for future error
terms. Thus we see that the
computational effort required to forecast using a Box-Jenkins model is on the
same order as that required by the simple smoothing techniques.
Computation of
tolerance (probability) limits around the forecasts is straightforward for a
Box-Jenkins model. If we have a
stationary process, the distance between these limits will gradually increase
for forecasts further and further into future time, to a fixed value
proportional to the variance of the process.
For a nonstationary process, the variance of the process is undefined,
and the tolerance interval grows substantially wider as we forecast further
into the future.
It is of
interest to note that a forecaster derived from a stochastic model is an
adaptive forecaster. Adaptivity is, of
course, a very desirable property for a forecaster to possess and a necessary
one for a forecaster of a nonstationary process. (The various smoothing methods -- exponential
smoothing, moving average -- have been so widely used because, in addition to
requiring few computations, they are adaptive in nature.) A very important feature of the Box-Jenkins
models is their ability to efficiently represent quite general nonstationary
processes. By its very definition, a
stochastic model embodies the changing nature of the process, and the
corresponding forecaster is in essence a description of how the future is
likely to turn out, given the recent past behavior of the process. Suppose, for example, that a process appears
to have "trends" that change slope over time, or that it exhibits
pseudo-periodic behavior with a stochastically varying phase and
amplitude. If such a process is modeled
using, for example, trends, trend adjustments, seasonals, and seasonal
adjustments, then the parameters of the model will probably have to be updated
rather frequently; i.e., the forecaster cannot adapt automatically to the
changing situation, since the model does not represent the process well. The key point here is that we want an
adaptive forecaster rather than an
adaptive model. We wish to derive a fixed model and to derive
an adaptive forecaster from it. An
adequately tested stochastic model of a process meets this requirement. By taking the essential stochastic
characteristics of the process properly into account, it produces a forecaster
that in essence tells how to take the
changing features of the process into account.
Although we
are primarily concerned with forecasting in this article, a few words are in
order regarding the use of the Box-Jenkins models for simulation. For example, we may be interested in
determining the effect of sales variability on a new inventory policy, starting
from the current sales position. In such
a case, we need a model which exhibits the same statistical properties as the
actual sales themselves. An econometric
model would be difficult to use in such a situation because, as the model is
simulated into the future, we need to substitute values for all of the
variables of the model, and the econometric model does not specify the behavior
of any of these variables other than the variable(s) of primary interest. A Box-Jenkins model can readily be used in
such a situation, for it is in fact a description of the statistical properties
of each succeeding observation in terms of the preceding observations. Each simulated observation becomes input to
the succeeding simulated observation. It
should be remembered, however, that a Box-Jenkins model is concerned
essentially with the local (short-term) behavior of a process, since it is
through understanding the local behavior that we are able to predict the near
future from the recent past. If a
long-term simulation model is desired, then special care must be taken to make
sure that the model reflects the longer-term properties of the process.
As we mentioned
earlier, the Box-Jenkins method can be used to develop stochastic-dynamic
models, in which the behavior of the variable of primary interest (the
endogenous variable, or variable we wish to forecast) is related not only to
its past behavior, but to the behavior of other (exogenous) variables as
well. The reason for including exogenous
variables is obvious: since the model class is expanded, the precision of the
forecast may be increased over that corresponding to the pure stochastic
model. If an exogenous variable is
included in a model to be used for forecasting, however, then the values of
that variable must be known or forecast over the forecasting period of
interest. Typically, the behavior of the
exogenous variable presages that of the variable of primary interest; that is,
the current behavior of the variable of primary interest is related to the past
behavior of the exogenous variable. Such
an exogenous variable is called a leading
indicator. Obviously, if there is a
strong lagged relationship between an exogenous variate and the variable of
primary interest, the precision of the forecasts will be considerably enhanced.
It is recalled
that a stochastic-dynamic model is a hybrid between a purely econometric model
and a purely stochastic model. In it,
the behavior of the variable of interest that cannot be explained in terms of
the behavior of the exogenous variates is represented by a stochastic portion
of the model. As we add additional
exogenous variables to the model, the stochastic portion of the model becomes
less and less elaborate, and we may approach the "purely dynamic"
model in which the stochastic portion of the model is simple uncorrelated
random variation. (In a particular situation
it may not be feasible to derive such a "purely dynamic" model; the
practical limit may be a stochastic-dynamic model with a nontrivial stochastic
component.)
Note that once
we have introduced an exogenous variate into a model, we might refer to such a
stochastic-dynamic model as an econometric model. Rather than simply implying the inclusion of
more than one economic variable, however, the term econometric usually indicates that the model has a special
structure dictated by economic theory, rather than empirically determined. Usually, the stochastic part of the such
econometric models is quite simple in form, and the predictive power of the
model arises out of the dynamic rather than the stochastic relationships. Hence, while an econometric model is certainly
an example of a stochastic-dynamic model (in which the stochastic portion is
usually simple in form), we shall generally use the term
"stochastic-dynamic" to describe models in which the relationships
between the variables are empirically determined.
A
multivariable3 Box-Jenkins stochastic-dynamic model may include all
of the relevant variables of a so-called econometric model. The Box-Jenkins stochastic-dynamic models
simply represent a particular class of empirical models that are capable of
efficiently representing a wide variety of processes involving more than one
variable. While the structure of these
models is quite flexible, the economic nature of a process may suggest a
special model structure which could be an even more efficient representation of
the process. In a sense a Box-Jenkins
stochastic-dynamic model could be viewed as an empirically-derived
"econometric" model, in contrast to a causally-derived
"econometric" model; (we are using the term "econometric
model" to refer to this latter type of model). To the extent possible, of course,
mathematical model building should always take advantage of special features or
understanding of the real-world process being modeled. Such understanding forms the fundamental
basis for selecting a particular class of models to represent a process.
The
performance of a forecaster based on an econometric model is often compared to
that of a forecaster based on a simple stochastic model, such as a finite
autoregressive scheme. The forecasting
ability of the autoregressive forecaster is often taken as a minimum standard
of performance for an econometric model (References 5, 8 and 10). Needless to say, a more reasonable comparison
would be that of the econometric forecaster to a forecaster based on a tested
stochastic model, rather than on an arbitrarily selected and probably
inappropriate stochastic model.
Often
econometric models have been constructed using linear regression analysis. The term "linear" refers, of
course, to the nature of the statistical estimation, rather than the functional
form of the regression function, which may be highly nonlinear. Computationally, about the most difficult
problem that might arise with this approach would be that the residuals (error
terms) of the model may be autocorrelated, but the standard regression methods
can usually still be used for the analysis after appropriate steps have been
taken. There are essentially two reasons
for such autocorrelation. First, it may
simply not be possible or practical to include a sufficient number of exogenous
variables in the model so that the stochastic part of the model would be simple
in form (not autocorrelated). Second,
the structure of the model may not be representing the relationship between the
variables of the model in a very efficient fashion.
By way of
analogy, suppose that a one-parameter moving average process best described the
stochastic behavior of a time series, but that an autoregressive model was
fitted instead to the data. By allowing
a sufficient number of parameters in the autoregressive model, a satisfactory
fit could be obtained, with residuals having negligible autocorrelation. However, if we restrict the number of
autoregressive parameters, we would have neither a very good fit nor negligible
autocorrelation in the residuals.
Similarly, with the stochastic-dynamic model, it may be quite
inefficient to insist that the variable of interest be explained solely in
terms of linear combinations of other variables. A much more efficient structure for the
model, allowing for a better fit with fewer parameters and less autocorrelation
in the residuals, may be possible if we represent the process also in terms of
a linear combination of the variable of primary interest.