The Box-Jenkins Forecasting Technique

 

 

Joseph George Caldwell, PhD

 

January 1971

(Reformatted September 2006)

 

 

 

© 1971, 2006 Joseph George Caldwell.  All Rights Reserved.

Posted at Internet websites http://www.foundationwebsite.org and http://www.foundation.bw .

 


Contents

 

 

I.  Introduction. 1

II.  Previous Techniques. 2

A.  Intuitive Methods. 3

1.  Smoothing. 3

2.  Curve Fitting. 3

3.  Some Classical Times Series Models. 4

B.  Forecasters Derived from Tested Mathematical Models. 5

1.  The Model Building Process. 5

2.  Classes of Models. 6

3.  Econometric Models. 6

4.  Stochastic Models. 7

III.  The Box-Jenkins Method. 8

A.  Stochastic Box-Jenkins Models. 8

1.  Estimation of the Model 8

2.  Derivation of the Optimal Forecaster 9

3.  Simulation Using a Stochastic Box-Jenkins Model 10

B.  Stochastic-Dynamic Box-Jenkins Models. 10

1.  General Considerations. 10

2.  Identification of Stochastic-Dynamic Model Structure. 12

3.  Process Control Applications. 13

IV.  Applications. 13

A.  Purely Stochastic Models. 13

B.  Stochastic-Dynamic Models. 14

V.  Some Comparisons. 15

VI.  Development Effort 15

VII.  Summary. 17

References. 18

Endnotes. 19

 

 

Note: This document (The Box-Jenkins Forecasting Technique), posted at http://www.foundationwebsite.org/BoxJenkins.htm , presents a nontechnical description of the Box-Jenkins methodology.  For a technical description of the Box-Jenkins approach, see the document, TIMES Box-Jenkins Forecasting System, posted at http://www.foundationwebsite.org/TIMESVol1TechnicalBackground.htm .

 

 


The Box-Jenkins Forecasting Technique

Joseph George Caldwell, PhD

 

 

I.  Introduction

 

The operations of most businesses are continually affected by events beyond the control of the management.  Some of these events are unanticipated or of such importance that they require special management consideration to determine the response to the new situation.  For example, new legislation, a large fire, or the introduction of a significant new competitive product, are examples of changes in the environment which will require custom-tailored analysis by management.  On the other hand, there are in the course of normal operations a tremendous number of events that cannot be controlled or predicted in detail, but whose occurrence is nevertheless anticipated.  For example, product demand, raw material costs, and interest rates are continually changing.  There is uncertainty associated with the exact magnitude of the variation in these items, but such variation represents a normal part of the environment.  So long as the environment remains essentially unchanged, this variation is expected, and the response to this variation should be automatic, not requiring unusual management analysis.  The effectiveness of management response to this variation often depends to a great degree on the extent to which it can reduce the uncertainty about the magnitude and direction of the variation.  For example, if a company can improve its ability to predict product demand, it can likely lower its inventory requirements or improve its scheduling efficiency.  While it is often desirable to predict long-term changes in the environment, a very basic problem in the routine business operation is that of short-term forecasting: predicting variation in the near future, under the assumption that the essential nature of the variability will continue as in the past (or in some other specified fashion).  Management's solution to problems of this sort represent its standard response to the variability problem; its analysis and response to long-term changes in the environment will no doubt require special attention.

 

The term "near future" used above may refer to several days, months, or years into the future, depending on the time frame of the application.  The number of days, months, or years for which we are potentially able to significantly reduce the variability will, of course, depend on the nature of the process we are studying.  For example, in forecasting monthly product sales, we may be able to forecast well only a few months ahead if the product is nonseasonal, or a year or two ahead if the sales exhibit strong annual seasonal behavior.

 

Since short-term forecasting problems are concerned with prediction of variation under the assumption (explicit or not) that the essential nature of the process will continue, it is appropriate to try to determine fixed rules to predict the near future from the recent past.  We refer to any forecasting procedure that is a specified function of observed data as a mathematical forecaster. Quite a number of short-term forecasting procedures have been developed in the past.  Depending on the nature of the problem, these procedures have ranged from very simple smoothing to the use of formulas based on elaborate econometric models.  For example, if a company is forecasting monthly sales of hundreds of products as part of its inventory control system, forecasting procedures (or "forecasters" for short) having low data and computational requirements are appropriate.  Alternatively, the same company might use a detailed econometric model and sample survey data to obtain short-term forecasts of the highest possible accuracy for the cost of a key raw material or for total quarterly sales.

 

In any event, no matter what type of forecaster is appropriate, it is desirable to obtain the best possible forecast for the amount of effort expended.  Many short-term forecasters in wide use fall far short of providing the maximum accuracy possible, given the information on which they are based and the computational efforts they require.  If little or no historical data are available, then it is quite possible for the errors of judgment in choosing a short-term forecaster to be smaller than the random errors associated with estimating a forecaster from data.  If extensive historical data are available, however, it becomes possible to derive a mathematical forecaster based on this data which can predict better than a mathematical forecaster that is chosen by judgment.  Since the business situations in which short-term forecasting is of concern generally continue over long periods of time, or for a number of similar items, such data are often available.

 

This article describes a new technique, developed by Profs. G.E.P. Box and G.M. Jenkins, that enables the development from historical data of forecasters that have both high accuracy and low computational requirements.  The technique may be applied to quickly determine forecasters that are as uncomplicated in form as the simple smoothing methods, or that involve a number of economic variables.  In either case, use of technique enables efficient utilization of other predictive information contained in the data and offers assurance of obtaining the highest forecasting accuracy possible in terms of the variables on which the forecast is based.

 

Although the Box-Jenkins model first appeared in book form (Reference 2) in 1967, the business forecasting community seems still largely unaware of the potential of the method.  This situation is perhaps understandable since published applications of the technique have appeared, to this author's knowledge, only in technical journals (e.g., Reference 9).  The situation has improved somewhat with the publication of Box and Jenkins' book (Reference 1) in late 1970.  A recent Harvard Business Review article (Reference 4) includes the Box-Jenkins technique in a discussion of the problem of selecting the appropriate forecasting techniques for various applications.  There is no question, however, that business awareness of the Box-Jenkins forecasting method is much less than the importance of forecasting problems would warrant.  This article is intended to help increase this awareness.

 

In addition to describing the Box-Jenkins technique, this article briefly describes some previous forecasting techniques and cites some comparisons between the forecasting ability of the old and new techniques.  The article avoids mathematical detail, but does involve discussion of certain technical concepts considered fundamental to the problem of developing good forecasters.

 

II.  Previous Techniques

 

Two approaches have been used in the past to develop forecasters from past observations.  On the one hand, forecasters have been developed which possess intuitively satisfying properties.  Alternatively, they have been developed from tested mathematical models of the process under study.  These two methods represent fundamentally different philosophical approaches to the forecasting problem, and, as we shall see later, they differ considerably in their forecasting abilities.

 

A.  Intuitive Methods

 

1.  Smoothing

 

A procedure which has been used widely in the past to produce forecasts of future observations is smoothing.  Smoothing methods represent attempts to determine some sort of "average" value around which the observations appear to be fluctuating.  Two examples of smoothing procedures are the moving average method and exponential smoothing.  In the method of moving averages, the forecast value is computed as the average of a fixed number of preceding observations.  The number of observations to be included in the average is usually determined arbitrarily to compromise between the "responsiveness" and the "stability" of the forecaster.  (Alternatively, the number of observations included might be chosen to remove some sort of periodic behavior, such as a monthly cycle in weekly data.  In this case, the moving average is in fact a forecast of the mean level over the next month, rather than a forecast of the next week's level.)

 

With exponential smoothing, the forecast value is a weighted average of the preceding observations, where the weights decrease with the age of the past observations.  In exponential smoothing, there are one or more parameters (constants) which determine the rate of decrease of the weights.  These parameters, called smoothing constants, are determined either arbitrarily or by some formal estimation procedure such as the method of least squares.

 

The smoothing methods described above are easy to describe and straightforward to implement.  As their simplicity might suggest, they can generally be improved upon as methods for forecasting.  We shall now describe some forecasting methods that are much more elaborate than, but offer little if any improvement over, smoothing methods.

 

2.  Curve Fitting

 

A graph of the history of any process generally contains characteristic "patterns" which appear over and over.  The tendency to extrapolate such patterns is often hard to resist, a number of forecasting methods have been based on the premise that such extrapolation is a reasonable thing to do.

 

Consider for the moment the graph of the process known as a "random walk."  At each successive point in time, there is an equal likelihood that the process will move upward or downward.  In the short term, stock prices often exhibit such behavior.  Although quadratic-like curves often occur throughout the graph, any attempt to forecast by extrapolating these curves will result in forecast errors of larger average magnitude than the "random walk" forecaster that forecasts the next period's value to be the same as the current value.  Reference 1 contains an interesting example of an elaborate method of exponential smoothing derived from fitting curves to a process that was later shown to be a random walk.  Not surprisingly, the "curve fitting" method produced forecast errors of substantially larger magnitude than those produced by the "random walk" forecaster.

 

3.  Some Classical Times Series Models

 

A series of observations taken over various points in some time interval is technically called a time series.  During the past several decades, a considerable amount of statistical theory has been developed to analyze time series.  As with all techniques of statistical analysis, the conclusions of time series analysis are critically dependent on the assumptions underlying the analysis.  One of the tragedies of modern "scientific" investigation is that the computational procedures of statistics have often been uncritically applied to data, the underlying assumptions ignored, and false conclusions drawn from this so-called "statistical" analysis.  An example of this situation is the application of regression analysis to historical data to derive an equation which is then (wrongly) purported to indicate what will happen if the regressor variables are varied.  With respect to the forecasting problem, a similar misapplication of statistical procedures has been carried out.  As in the case of regression analysis, the inappropriateness has been obscured by the complexity of the estimation procedures.

 

Statistical time series analysis provides us, among other things, with good procedures for estimating parameters of various types of models.  The structure of the models often corresponds to physically meaningful behavior.  For example, in structural analysis, the physics of a situation may dictate a model which is trigonometric in nature.  In this situation, a model involving sine and cosine terms would be reasonable, and good statistical estimation procedures (e.g., multiple regression analysis) are available for estimating the model parameters.  The use of such models for describing economic time series, however, is not necessarily appropriate simply because the time series data exhibit periodic behavior.  Nevertheless, such models have been used in the past to describe, for example, a seasonal time series.  Along this same vein of fitting an arbitrarily chosen model to data, a number of "seasonal adjustments" could be estimated from the data.  If the postulated model structure were appropriate, then use of the corresponding estimation procedure would be reasonable.  In any event, the statistical adequacy of the model must be determined before accepting any such model as a basis for forecasting.

 

Thus we see that the statistical estimation procedures can be uncritically applied to fit a particular type of model to time series data, and this fitted model then used as a basis for forecasting.  The two key points that are often overlooked are that the particular model structure chosen may be inappropriate and that the fitted model may fail to be an adequate representation of the process.  These points appear to be overlooked because of the elaborate statistical procedures that are used to derive estimates of the parameters of the model.  Of course, if the underlying model assumptions are satisfied, then these estimation procedures -- often developed out of sophisticated theoretical analysis -- do provide good estimates of the model parameters and a good fitted model.  As we shall see later, however, there is a vast difference between "model fitting" and "model building" -- the process of determining an adequate mathematical representation of the process generating the time series data.  This process involves subjecting a fitted model to a variety of diagnostic checks of its adequacy to represent the process being modeled.  A competent time series analysis, of course, includes this latter process.  This section does not take issue with the methods of statistical time series analysis -- they are valid.  The trouble arises with the uncritical application of an arbitrarily selected technique of time series analysis -- parameter estimation (model fitting) -- and subsequently neglecting to subject the fitted model to various tests of adequacy.

 

The fact is that many classical time series models were derived for physical situations (as in astronomical, electrical, and mechanical fields) in which underlying deterministic components (such as sine and cosine terms) are obscured by simple random noise, and the various estimation procedures were developed to estimate these components.  Economic time series, however, are essentially stochastic rather than deterministic in nature, and for this reason, many classical time series models are simply inappropriate.  The use of these essentially deterministic time series models (also referred to as "unobserved component" models) has been remarkably widespread, apparently a carryover resulting from their successful (and appropriate) application in many physical situations for more than a century.  It appears, however, that the questionable use of such models in economic applications is finally drawing to an end (Reference 6).

 

The previous paragraphs have illustrated a number of procedures for fitting a model to data.  In view of the large amount of effort that has been expended using fitted models as a basis for forecasting, it is indeed unfortunate that the fact that a particular model if efficiently fitted to data is no assurance that it is an appropriate basis for forecasting.

 

B.  Forecasters Derived from Tested Mathematical Models

 

1.  The Model Building Process

 

The reason for referring to the techniques described above as "intuitive" is no doubt clear by now -- there is no substantiation that the model underlying the forecaster is an adequate representation of the process under study.  Whatever properties the forecaster might possess relate to the model to which it corresponds, but these properties don't mean very much if there is little assurance that the model is a good representation of the actual process.  To determine a good short-term forecaster for a process, we need a good model of the short-term behavior of the process.  We shall now turn our attention to forecasters based on models which are good representations (in the short term) of the processes they represent.

 

The process of developing a good model for a process is called model building.  Model building involves four basic steps.  (See Reference 1 for a detailed discussion of model building.)  First, we must have available a class of models which is capable of exhibiting the essential characteristics of the process under study.  Second, a preliminary analysis of the problem under study will suggest a tentative subclass of models which is reasonable to entertain.  Third, observed data are used to "fit" one of these models, i.e., to estimate the parameters of the model.  Fourth, the fitted model is subjected to a number of diagnostic checks to test whether the model is an adequate representation of the process under study.  If the tests are not satisfied, the tentative model is modified.  Steps 3 and 4 are then repeated until the tests are satisfied, i.e., until we have an adequate model.  In summary, model building involves: (1) selection of a general model class; (2) model identification (tentative model selection); model fitting (estimation of parameters); (4) diagnostic checking (tests of model adequacy) and model modification; (5) repeat steps 2, 3, and 4 until an adequate model is found.  Thus model building is an iterative process involving much more than simply fitting a model to data, the basis for intuitive forecasting.  Good procedures for estimating parameters are easy to specify in terms of formulas and, understandably, there has been a great deal of model fitting done in the past, with the subsequent use of intuitive forecasters.  Diagnostic checking and model modification requires critical statistical analysis, rather than the straightforward application of standard statistical estimation formulas, and in fact represents the bulk of the human effort required in model building; (the computations required to fit tentative models and compute the statistics required to test model adequacy can be performed by computers).  Thus we see that model building involves a wide range of time series analysis techniques, not just those associated with parameter estimation.

 

2.  Classes of Models

 

The first step in model building is to choose a class of time series models with which to work.  At the extremes, there are two basic approaches.  On the one hand, we may attempt to develop a dynamic, or causal model of the process under study.  In such a model we attempt to relate the behavior of the variable of primary interest to the behavior of other variables.  The variables to tentatively include in the model, and the tentative model structure, are suggested by economic theory, and the model is referred to as an econometric model.  As an alternative to the econometric model, we may attempt to develop a purely stochastic, or empirical model of the process.  With the approach we strive simply to develop a model which exhibits the same essential characteristics as the process under study, without attempting to identify the casual nature of the relationships between the various relevant interacting variables.

 

The terms used above are not perfectly descriptive.  For example, a "dynamic" model will always possess a simple stochastic term to describe the variation unexplained by the dynamic structure. Furthermore, a "casual" economic model will undoubtedly employ empirically derived relationships between the variables.  Intermediate between the two extremes of the dynamic and the stochastic models is the class of stochastic-dynamic models, which contain both casual components and nontrivial stochastic components.

 

In forecasting with a stochastic model of a process, we are in essence attempting to predict the next few moves of the process based generally on all of the past behavior of the process, and in particular, on the recent history of the process.  The model derived from the data describes how the process behaves, and the forecaster predicts the near future of the time series from the recent past, based on the stochastic behavior of the process as characterized by the model.  With an econometric model, the forecaster predicts the near future from recent past, based on the economic relationships characterized in the model.

 

The problem of choosing a model class involves balancing accuracy requirements against the costs associated with developing and implementing the forecaster.  For example, an elaborate econometric model would clearly be inappropriate for forecasting the short-term sales of each of thousands of items of an inventory system.  Reference 4 includes a discussion of the problem in choosing an appropriate model class.

 

3.  Econometric Models

 

To construct an econometric time series model, we identify the variables that are considered to have an effect on the variable of interest, and then pose a tentative structure for the model.  In most cases a linear or linearized model relating the variables is considered, and standard regression analysis is used to estimate the model parameters (regression coefficients).  The estimation is, however, not always straightforward.  For example, suppose that we wish to forecast sales, and it is known that sales are related to price, and price in turn to sales.  Then a simultaneous system of equations is necessary to describe the system, the usual regression estimates are inappropriate, and some other method, such as "two-stage" least squares, must be used to estimate the parameters.

 

Once the estimation has been completed, i.e., we have a fitted model, then it is necessary to subject the model to various diagnostic checks.  These checks often involve the model "residuals."  A "residual" is the difference, or "error," between an actual observation and the value predicted by the model.  In an econometric model, one of the usual underlying assumptions is that the residuals are unrelated to each other in a certain statistical sense.  In technical terms, the residuals are not autocorrelated.  If, for a fitted model, the residuals are autocorrelated, then the current tentative model must be modified and a new tentative model entertained.  The new tentative model might involve either a somewhat different dynamic structure or include some new variable.  Alternatively, if the residuals are relatively small in magnitude, their autocorrelation can be taken into account in the estimation procedure, without changing the dynamic model structure.  If the latter approach is followed, then we in fact have a simple example of a stochastic-dynamic model.

 

In developing an econometric model (or any other model for that matter) it can be unwise to include a very large number of variables in the model, particularly if the inclusion is based solely on empirically observed relationships.  With a large enough number of variables, it would not be surprising to find a model that seemed generally adequate and yet proved, for several reasons, to be a poor basis for forecasting.  First, the number of parameters may simply be so large compared to the number of observations that it becomes difficult to perform sensitive tests of model adequacy.  Second, by increasing the number of variables in a model we increase the risk of discovery, by chance, an apparent relationship of the variable we wish to forecast to some other totally unrelated variable.  This, of course, is a principal drawback associated with using regression analysis to simply "fit" a forecasting model to data.  An additional problem associated with a many-variable forecasting model is that in using it we implicitly assume that the relationship between all the variables of the model will continue in the future as in the past.  The larger the number of variables involved, the less reasonable such an assumption becomes.  In general, if a large number of parameters seems necessary, it is reasonable to suspect that the identified model structure is inappropriate.

 

4.  Stochastic Models

 

In order to develop a stochastic time series model to represent a process, it is necessary to have a flexible class of models available.  We have already mentioned a few classical time series models above and noted their applicability to essentially deterministic situations.  Two other models that have been used and that are essentially stochastic in nature are the (finite) autoregressive model and the (finite) moving average model.  In the autoregressive model, the current observation is represented as a weighted average of previous observations, plus a random term that is uncorrelated with the random terms of other observations.  In the moving average model, the current observation is represented as a weighted average of uncorrelated random terms.1

 

In general, the preceding models have not proved sufficiently general to model arbitrary economic time series with a reasonable number of parameters, and this fact probably accounts in part for the limited use of stochastic models (of stochastic-dynamic models) for model building.  This situation has now been remedied, for the models investigated by Box and Jenkins possess the capability to efficiently represent a tremendous variety of economic time series.

 

This section has described in brief detail the major categories of techniques that have been used in the past to develop forecasting models.  We shall now turn our attention to a description of the Box-Jenkins forecasting method.

 

III.  The Box-Jenkins Method

 

A.  Stochastic Box-Jenkins Models

 

1.  Estimation of the Model

 

As the preceding section has suggested, not a great deal of forecasting has been done using tested stochastic or stochastic-dynamic time series models.  The essential reason for this situation is that a suitable class of stochastic models had not been identified as possessing the flexibility necessary to represent efficiently (i.e., using a reasonable number of parameters) the tremendous variety of characteristics of economic time series.  This situation no longer holds, for Box and Jenkins have thoroughly investigated a class of models that prove to be quite satisfactory for both the stochastic and stochastic-dynamic situations.  We shall first describe the purely stochastic models.  Technically, these models are called autoregressive integrated moving average (ARIMA) models, or simply Box-Jenkins models for short.

 

The Box-Jenkins models can be used to represent processes that are stationary or nonstationary.  A stationary process is one whose statistical properties are the same over time; in particular, such a time series fluctuates around a fixed mean value.  Examples of nonstationary time series include series which include changes in level, trends, changes in trends, or seasonal behavior.

 

The purely stochastic Box-Jenkins model is remarkably simple in form.  The current observation is represented by a linear combination (weighted average) of previous observations, plus an error term associated with the current observation, plus a linear combination of error terms associated with previous observations.  The error terms have zero mean, constant variance, and are uncorrelated with each other.  The portion of the model involving the observations is called the autoregressive part of the model, and the portion involving the error terms is called the moving average part of the model.2

 

The problem of building a stochastic Box-Jenkins model of a process in essence involves determining the number of terms in the autoregressive and moving average parts of the model, and determination of values for the parameters associated with those parts.  By statistical analysis of the time series data, it is possible to choose from the full class of autoregressive-integrated moving average models, subclasses of models having a specific structure appropriate to the particular time series under examination.  By determining such a reasonable structure for the model, the number of parameters to be estimated in the model can be substantially reduced.  This parameter reduction is quite important, for "nonlinear" statistical estimation procedures are generally required to fit a tentative Box-Jenkins model.

 

After a tentative Box-Jenkins model has been fitted, it is subjected to various diagnostic checks to test its adequacy as a stochastic representation of the process under study.  If the model is found to be inadequate, analysis of the model residuals suggests ways to modify the model structure to obtain a new tentative model which will likely do an improved job of representing the process.  The basic statistic for assisting identification of a reasonable structure for a new tentative model is the autocorrelation function of the residuals of the current model.  For testing model adequacy, the power spectrum of the residuals can also be used as an alternative to the autocorrelation function (see Reference 7).  This process of fitting a tentative model, testing it, and determining a new tentative model, is repeated until a model is found which does an adequate job of representing the process.  (References 1 and 3 contain detailed descriptions of this model building process.)

 

2.  Derivation of the Optimal Forecaster

 

After a model has been determined that, according to the various tests to which it is subjected, is considered to be an adequate representation of the time series under examination, we are in a position to derive a forecaster for the time series.  In order to determine a forecaster from a model, it is necessary to specify a criterion which the forecaster satisfies.  Ideally, the criterion should take the "cost" associated with forecast errors into account.  We would then like to derive the forecaster for which the expected cost associated with forecast errors is minimized.  If the cost function is not apparent, a reasonable approach is to determine the (linear) forecaster that has the least forecast error variance, or mean squared error of prediction.  That is, for each specified time in the future we wish to determine the forecast that has minimum mean squared error at the point.  The minimum mean squared error forecaster is usually referred to as the "optimal" forecaster.  The optimal forecaster corresponding to a Box-Jenkins model turns out to be the expected (mean) value of the process, conditional on the past observations.  For example, the optimal forecast one time period into the future is the expected value of the process at that point in time, given that the past values of the time series are as observed.

 

Computation of the optimal forecast is quite easy for a stochastic Box-Jenkins model.  To compute the lead-one forecast, for example, we simply substitute, into the formula defining the model, the observed (past) values and the optimal estimates of the past error terms.  These errors terms can be estimated recursively: they are simply the past forecast errors.  To compute the optimal forecast beyond lead one, we substitute forecasts for future observations and zeros for future error terms.  Thus we see that the computational effort required to forecast using a Box-Jenkins model is on the same order as that required by the simple smoothing techniques.

 

Computation of tolerance (probability) limits around the forecasts is straightforward for a Box-Jenkins model.  If we have a stationary process, the distance between these limits will gradually increase for forecasts further and further into future time, to a fixed value proportional to the variance of the process.  For a nonstationary process, the variance of the process is undefined, and the tolerance interval grows substantially wider as we forecast further into the future.

 

It is of interest to note that a forecaster derived from a stochastic model is an adaptive forecaster.  Adaptivity is, of course, a very desirable property for a forecaster to possess and a necessary one for a forecaster of a nonstationary process.  (The various smoothing methods -- exponential smoothing, moving average -- have been so widely used because, in addition to requiring few computations, they are adaptive in nature.)  A very important feature of the Box-Jenkins models is their ability to efficiently represent quite general nonstationary processes.  By its very definition, a stochastic model embodies the changing nature of the process, and the corresponding forecaster is in essence a description of how the future is likely to turn out, given the recent past behavior of the process.  Suppose, for example, that a process appears to have "trends" that change slope over time, or that it exhibits pseudo-periodic behavior with a stochastically varying phase and amplitude.  If such a process is modeled using, for example, trends, trend adjustments, seasonals, and seasonal adjustments, then the parameters of the model will probably have to be updated rather frequently; i.e., the forecaster cannot adapt automatically to the changing situation, since the model does not represent the process well.  The key point here is that we want an adaptive forecaster rather than an adaptive model.  We wish to derive a fixed model and to derive an adaptive forecaster from it.  An adequately tested stochastic model of a process meets this requirement.  By taking the essential stochastic characteristics of the process properly into account, it produces a forecaster that in essence tells how to take the changing features of the process into account.

 

3.  Simulation Using a Stochastic Box-Jenkins Model

 

Although we are primarily concerned with forecasting in this article, a few words are in order regarding the use of the Box-Jenkins models for simulation.  For example, we may be interested in determining the effect of sales variability on a new inventory policy, starting from the current sales position.  In such a case, we need a model which exhibits the same statistical properties as the actual sales themselves.  An econometric model would be difficult to use in such a situation because, as the model is simulated into the future, we need to substitute values for all of the variables of the model, and the econometric model does not specify the behavior of any of these variables other than the variable(s) of primary interest.  A Box-Jenkins model can readily be used in such a situation, for it is in fact a description of the statistical properties of each succeeding observation in terms of the preceding observations.  Each simulated observation becomes input to the succeeding simulated observation.  It should be remembered, however, that a Box-Jenkins model is concerned essentially with the local (short-term) behavior of a process, since it is through understanding the local behavior that we are able to predict the near future from the recent past.  If a long-term simulation model is desired, then special care must be taken to make sure that the model reflects the longer-term properties of the process.

 

B.  Stochastic-Dynamic Box-Jenkins Models

 

1.  General Considerations

 

As we mentioned earlier, the Box-Jenkins method can be used to develop stochastic-dynamic models, in which the behavior of the variable of primary interest (the endogenous variable, or variable we wish to forecast) is related not only to its past behavior, but to the behavior of other (exogenous) variables as well.  The reason for including exogenous variables is obvious: since the model class is expanded, the precision of the forecast may be increased over that corresponding to the pure stochastic model.  If an exogenous variable is included in a model to be used for forecasting, however, then the values of that variable must be known or forecast over the forecasting period of interest.  Typically, the behavior of the exogenous variable presages that of the variable of primary interest; that is, the current behavior of the variable of primary interest is related to the past behavior of the exogenous variable.  Such an exogenous variable is called a leading indicator.  Obviously, if there is a strong lagged relationship between an exogenous variate and the variable of primary interest, the precision of the forecasts will be considerably enhanced.

 

It is recalled that a stochastic-dynamic model is a hybrid between a purely econometric model and a purely stochastic model.  In it, the behavior of the variable of interest that cannot be explained in terms of the behavior of the exogenous variates is represented by a stochastic portion of the model.  As we add additional exogenous variables to the model, the stochastic portion of the model becomes less and less elaborate, and we may approach the "purely dynamic" model in which the stochastic portion of the model is simple uncorrelated random variation.  (In a particular situation it may not be feasible to derive such a "purely dynamic" model; the practical limit may be a stochastic-dynamic model with a nontrivial stochastic component.)

 

Note that once we have introduced an exogenous variate into a model, we might refer to such a stochastic-dynamic model as an econometric model.  Rather than simply implying the inclusion of more than one economic variable, however, the term econometric usually indicates that the model has a special structure dictated by economic theory, rather than empirically determined.  Usually, the stochastic part of the such econometric models is quite simple in form, and the predictive power of the model arises out of the dynamic rather than the stochastic relationships.  Hence, while an econometric model is certainly an example of a stochastic-dynamic model (in which the stochastic portion is usually simple in form), we shall generally use the term "stochastic-dynamic" to describe models in which the relationships between the variables are empirically determined.

 

A multivariable3 Box-Jenkins stochastic-dynamic model may include all of the relevant variables of a so-called econometric model.  The Box-Jenkins stochastic-dynamic models simply represent a particular class of empirical models that are capable of efficiently representing a wide variety of processes involving more than one variable.  While the structure of these models is quite flexible, the economic nature of a process may suggest a special model structure which could be an even more efficient representation of the process.  In a sense a Box-Jenkins stochastic-dynamic model could be viewed as an empirically-derived "econometric" model, in contrast to a causally-derived "econometric" model; (we are using the term "econometric model" to refer to this latter type of model).  To the extent possible, of course, mathematical model building should always take advantage of special features or understanding of the real-world process being modeled.  Such understanding forms the fundamental basis for selecting a particular class of models to represent a process.

 

The performance of a forecaster based on an econometric model is often compared to that of a forecaster based on a simple stochastic model, such as a finite autoregressive scheme.  The forecasting ability of the autoregressive forecaster is often taken as a minimum standard of performance for an econometric model (References 5, 8 and 10).  Needless to say, a more reasonable comparison would be that of the econometric forecaster to a forecaster based on a tested stochastic model, rather than on an arbitrarily selected and probably inappropriate stochastic model.

 

2.  Identification of Stochastic-Dynamic Model Structure

 

Often econometric models have been constructed using linear regression analysis.  The term "linear" refers, of course, to the nature of the statistical estimation, rather than the functional form of the regression function, which may be highly nonlinear.  Computationally, about the most difficult problem that might arise with this approach would be that the residuals (error terms) of the model may be autocorrelated, but the standard regression methods can usually still be used for the analysis after appropriate steps have been taken.  There are essentially two reasons for such autocorrelation.  First, it may simply not be possible or practical to include a sufficient number of exogenous variables in the model so that the stochastic part of the model would be simple in form (not autocorrelated).  Second, the structure of the model may not be representing the relationship between the variables of the model in a very efficient fashion.

 

By way of analogy, suppose that a one-parameter moving average process best described the stochastic behavior of a time series, but that an autoregressive model was fitted instead to the data.  By allowing a sufficient number of parameters in the autoregressive model, a satisfactory fit could be obtained, with residuals having negligible autocorrelation.  However, if we restrict the number of autoregressive parameters, we would have neither a very good fit nor negligible autocorrelation in the residuals.  Similarly, with the stochastic-dynamic model, it may be quite inefficient to insist that the variable of interest be explained solely in terms of linear combinations of other variables.  A much more efficient structure for the model, allowing for a better fit with fewer parameters and less autocorrelation in the residuals, may be possible if we represent the process also in terms of a linear combination of the variable of primary interest.