Daniel Hellerstein, 202-694-5613, danielh@ers.usda.gov  Updated 2/06 

	
        	  The GRBL2 Maximum Likelihood Routines


     This document describes the GRBL2 maximum likelihood routines.  These
routines can be easily used to estimate a user-defined function.  Among the features
supported are constrained optimization (using linear inequalities), and step-by-step
control of maximization strategy.  

     This document gives a detailed description of these routines.  In addition,
the MLE_TEST.PRG file contains an example of the use of these routines -- we
highly recommend perusing MLE_TEST.PRG before attempting your own projects.


Contents:

    A.    Optimization Strategy
    A.1       Constrained Optimization
    B.   How to Use GRBL2 Maximum Likelihood Routines
    B.1     Create a GAUSS program that calls the MAXLIKC
    B.2     Create a GAUSS procedure that computes the log-likelihood, gradient, and hessian
    B.3     Optimization control parameters
    C.  Create constraints matrix 
    D.  Compute covariance matrix 
    E.   Other useful routines.  

NOte: GRBL2 is a descendant of the GRBL set of procedures (GRBL was
written for DOS GAUSS 3.1).

			--------------------------------

A.   Optimization Strategy
     

The basic "hill climbing" algorithm used by the GRBL2 maximum likelihood
routines is:

                i+1 =  i - sHi-1 Gi

with:
          i = Kx1 coefficient vector, at the ith iteration.
         Hi = KxK "direction" matrix.
         Gi = Kx1 gradient vector (for all observations).
         s = 1x1 step size (eg; 1.0)
Gi and Hi are both computed at  i.
              
         A number of variants of this algorithm are supported, where each variant
uses a different mechanism for computing the direction and the step size.  Several
methods for computing H are supported, including:

         Newton Raphson (NR).The hessian (KxK matrix of 2nd derivatives) of
         the log-likelihood is used.

         Berndt, Hall, Hall, Hausman (BHHH).  BHHH is a quasi-Newton
         strategy that uses the moment matrix of the gradient to approximate H:
         H=GN'GN; with GN the NxK matrix of gradients (for all N observations
         and all K variables).

         Davidson, Fletcher, Powell (DFP) and Broyden, Fletcher, Goldfarb,
         Shannon (BFGS) are quasi-Newton techniques that use an updating
         strategy to approximate H.

         Steepest descent uses the gradient (normalized to 1.0).  H is set to the
         identity matrix divided by the norm of G.

         Alternatively, instead of the updating strategy, a Random Search strategy
is available.  In this technique, the likelihoods from a randomly generated set of
coefficient vectors are computed.  The coefficient vector yielding the best
likelihood is then selected.  A multi-variate normal distribution is used to generate
the random coefficient  vectors.

         In addition, the step size (S) can also be selected.  There are basically two
methods: use a step size equal to 1.0, or use the GOLDEN algorithm to select the
optimal step size (given current G and H).

         
         The GRBL2 package is especially useful when attempting to solve difficult
functions.  When using a simple optimization strategy, these functions often "get
stuck", with no possible improvements in log-likelihood even though the computed
gradient is not close to zero.  To overcome this problem, GRBL2 gives the user the
capability of fine-tuning the set of optimization techniques used.  This fine-tuning
consists of changing, on an iteration-at-a-time basis, the strategy adopted (say,
choosing GOLDEN step size every 5th iteration); and through a repetitive strategy
(for a given  i), where more sophisticated techniques (say, NR) are attempted
first, followed by progressively less sophisticated (say BHHH) techniques should
the "more sophisticated" techniques fail. 

			--------------------------------

         
A.1  Constrained Optimization


    GRBL2 also offers a constrained optimization solution algorithm.  This
algorithm is easily invoked by setting constraints.  GRBL2 will then proceed with a
"step-direction constrained" variant of MLE that guarantees that these constraints
will be satisfied.  
         
   The basic approach is iterative.  Starting with the non-constrained step
direction, GRBL2 finds the largest step size that will not violate the constraints. 
If this step size equals zero, GRBL2 will determine which constraint(s) are
binding, and selectively bind the variables in these constraints.  Bound variables
are then treated as constants, and the step direction is re-computed using the
remaining "free to vary" variables.  In other words, the "constrained" variables
are temporarily removed from the problem, and optimization proceeds for one
step as if these variables were constants.  Presuming that a likelihood improving
step can be found, we then re-introduce the "bound" variables back into the
problem, and the process is reiterated.  

   GRBL2 attempts to limit the number of variables that are "removed" from
the problem.  Thus, if a 3 variable constraint (say, X1 + 2 * X2 - Z3 < 100 )
is binding, rather then setting all 3 variables to be constants, GRBL2 will constrain
a number of different combinations of these (3) variables, starting with the least
stringent.  Alternatively, GRBL2 can be instructed to try all combinations of these
3 and choose the best (where best means the combination with the largest log-likelihood). 
             
   This constrained optimization algorithm has the advantage of flexibility,
but it is of a heuristic nature and is not guaranteed to return a global (constrained)
maximum, even in cases where the unconstrained model is known to be globally
concave (e.g., the homoscedastic TOBIT or the POISSON).    

			--------------------------------


B.   How to Use GRBL2 Maximum Likelihood Routines

In order to use the GRBL2 maximum likelihood routines, we assume you have
installed the GRBL2 package, and have GRBL2_MLE.INC (and other GRBL2_* files)
in the SOURCE subdirectory (of your GRBL2 directory).

There are basically two steps:
   1) Create a gauss program that calls the GRBL2 MAXLIKC (MAXimum LIKelihood with Constraints) procedure.
   2) Create a "model" procedure (that computes log-likelihood, etc), that is called by MAXLIKC.

			--------------------------------

B.1  Create a GAUSS program that calls the MAXLIKC

The call to the MAXLIKC should have the syntax:

          {loglik,beta,freeindx}=
                maxlikc(bstart,cnstlist,ditype,dogolden,prntit,header,&MODELNAM,parnames);

where the 3 arguments returned are:

              loglik:    a scalar value equal to the log-likelihood computed
                         over the entire sample
              beta:      an estimate of Kx1 coefficient vector
              freeindx:  a Jx1 (JK) vector indexing the coefficients that are
                         not constrained.


         and the 8 input arguments are:

              bstart:      A Kx1 vector of starting values .
              cnstlist:  A M x K+2 matrix used if constrained optimization
                         desired.  For unconstrained optimization, this should
                         be set to missing (e.g.; cnstlist=miss(1,1)).  See
                         step 5 below for more information on creating a
                         constraints matrix.
              ditype:    Default type of "direction matrix".  Values between
                         1 and 7 are valid, where:
                               1 =   Use hessian 
                       2 = Use inner product of gradient = G'G
                       3 = Use Steepest descent
                       4 = Use DFP
                       5 = Use BFGS
                       6 = Reserved for A User-specified method
                           (e.g.;  Grid Search).  See step 3 for
                           more details.
                       7 = Use Random Search.
                      Alternatively, instead of values between 1 and
                      7, you can use the character strings: "NR",
                      "BHHH", "STEEPEST", "DFP", "BFGS",
                      "USER", and "RANDOM".

              dogolden:  If dogolden=1, use GOLDEN algorithm for
                         determining step size.  Otherwise, use a step size of
                         1.0.
              prntit:    If prntit=1, print intermediate results to screen and
                         to output file.
              header:    A character string displayed during maximum
                         likelihood estimation.
              MODELNAM:  Name of the GAUSS procedure that computes
                         log-likelihood, gradient, and hessian.  Note
                         that the & character must appear before the
                         MODELNAM.  See step 2 below for more details.
              parnames:  Kx1 vector of names of the coefficients.


         In addition, several global matrices are set by MAXLIKC:

              _mlegrad = Gradient (summed) at maximum (a Kx1 vector).
              _mlehess = Hessian (BHHH or NR, or quasi-Newton) at  maximum (KxK)
              _mliter = Actual # of iterations required. 
              _mle_buse = Current estimate of beta coefficients (useful if you want to examine current values)

              _CtCall = A two element counter of how many times the MODEL has been called
                        This vector is available to your model procedure.
                   _CtCall[1]= total number of calls 
                               you can use this to initialize stuff at the beginning of the  1st iteration.
                   _CtCall[2]= number of call in this iteration
                              (the model may be called several times in a single iteration). 
 		               you can use this to initialize stuff at the beginning of each iterations.

			--------------------------------

B.2   Create a GAUSS procedure that computes the log-likelihood, gradient, and hessian
      of the function you wish to maximize.

  You, the user, must create a procedure that computes the log-likelihood,
  the gradient, and (optionally) the hessian of your model at a specified
  value of b (a candidate coefficient vector).  
  This procedure will be called by MAXLIKC repeatedly. 

  The procedure will be called using the following syntax:

              {li,grad,dirmtx}=modelnam(b,arg1)

         The 3 arguments your procedure must be able to return are:
              li = log-likelihood (a scalar)
              grad = gradient vector (Kx1)
              dirmtx = direction matrix (KxK)

         ** The log-likelihood, gradient and hessian of the true log-likelihood
         ** should be computed (not the negative log-likelihood).

         The 2 input arguments provided to your procedure (by MAXLIKC) are:
                b = Kx1 value of coefficients at which to estimate li, grad,
                    and dirmtx.
              arg = Requests what should be computed.  
                     0) Li, Grad and DirMtx=G'G
                     1) Li, Grad and DirMtx=Hessian
                     2) DirMtx=Hessian only
                     3) Li only
                     4) DirMtx=G'G only
                     5) Grad and Li only
                     9) Nx1 likelihood only (for use by gradp and hessp)

           Reminder:
             If arg is not 9
                  li and grad must be (1x1) and (Kx1) respectively.
                  DirMtx, if requested, must be KxK.  
             if arg=9
		  li must be Nx1
		  grad and dirmtx are ignored (they can be anything)
            
           Note that G is the NxK gradient matrix (N being the total number of observations in the
           sample). 

	   MAXLIKC will use li, grad, and dirmtx in its hill climbing. Thus, your procedure makes no
	   decisions -- it simply provided results to MAXLIKC
	   However --  your procedure can set a global variable: _GRBLCVG.
	         Setting _GRBLCVG=1 will force convergence,
			 _GRBLCVG=2 will force the maximization to abort 
           (note that these are equivalent to hitting C and A while the program is running).

              
           For arg values of 2, 3, 4, and 5, some of the three output
           arguments need not be computed.  In these cases, such as the
           DirMtx variable when arg=5, a missing value can be returned.

              Note: arg tells this procedure what values MAXLIKC needs. 
              Thus, if arg=3, there is no need to compute the gradient or the
              direction matrix.  In other words, you should write this
              procedure to take arg into account, thereby avoiding needless
              computations.
             
         Your procedure (i.e.; modelnam) is assumed to use all the observations in 
         your dataset.

            It is easiest if your procedures use global variables, that contain the dependent,
            independent, etc variables; these globals being created  before your call to MAXLIKC. 
            However, that is entirely up to you -- your procedure could read the dataset 
            (using a global filename, or file handle) each time it is called.


         For many models, the hessian will be unavailable.  In these cases, you
         can skip the computation of the hessian; just be sure that arg never equals
         1 or 3.  This means that NR can not be used -- that ditype (in the call to
         MAXLIKC) can never equal 1.

            Alternatively, you can set up a "stub" version of your procedure that 
            calls your procedure with an arg of 9.  This stub can be used as an argument to
            hessp (or gradp, if you don't want to code up the gradient).  This recursive
            method makes it easy to compute numeric hessians and gradients.

	    Example:
		a) Your function is TRUNC_NEGBIN
	        b) The hessian for this is complicated
		c) But you want to use NR
	        d) So, when MAXLIKC wants the hessian (say, arg=1), you would call
		    dirmtx=hessp(&trunc_negbin_2,b) ;
	        e) trunc_negbin_2 should look like:
			proc trunc_negbin_2(b1);
			  local a,b,c;
			  {a,b,c}=trunc_negbin(b1,9);
			  retp(a);
		        endp;

             Hint: when computing the gradient, gradp returns an NxK matrix. 

These two steps are the minimum requirements. In addition, a number
of other features can be manipulated.  These include setting optimization control
parameters, creating a "constraints" matrix, fine tuning control of optimization
strategy, and computing the coefficient covariance matrix.  

			--------------------------------

B.3          Optional optimization control parameters

Typically, these are set in the main GAUSS program before MAXLIKC is  called.

           _dsp_Frc = If=1:  MAXLIKC will output the gradient if the user forced convergence (dsp_frc= 1).

           _error_out  = 0  Do not output error messages to output file
                         1  Output error messages to output file
			 Error messages include "gradient not zero at the maximum".
				For certain models, such as iterative simulations, these can be both
				annoyingly frequent and essentially uninformative. 
			        In these cases, setting _error_out=0 can be convenient.

          _maxiter Maximum # of iterations. Default value is 50.
          __rows   # of observations per read. Default value is 0, which
                   means "program set automatically". THIS IS DEPRECATED -- we don't use it anymore.
          _mlmnstp Minimum step size in several places. Default=0.001.
          _mlepsb  Stopping criteria for coefficient elasticity.
                   Default=0.00001.
          _mlepsg  Stopping criteria for gradient elasticity.
                   Default=0.00001.
          _mlatmax    How many stopping criteri a to satisfy (1,2, or 3).  The
                      three criteria are _mlespb, _mlepsg, and change in log-likelihood (value of _mlepsb is used as stopping value
                      for change in log-likelihood).  Default=2.
          _mlbhhh  Controls use of BHHH as backup technique (used if
                   default technique, see ditype in step 2, fails). 1=Yes,
                   0=No. Default=1.
          _mlsteep Controls steepest descent as backup technique. 
                    Default=0.
                           If 0, don't do it.
                        If 1, use all coefficients.
                        If 2, then first use all coefficients, then try one-at-a-time (starting with largest absolute value) 
                        if no  improvement
          _mlmarq  Use Marquardt algorithm if direction matrix is not
                   positive definite:: H = (H + t*I), t>0.  0=No,
                   1=Yes. Default=1.
          _mldornd Controls use of random search as a backup technique.
                   The value sets how many alternate  vectors to try. If
                   0, don't do it.  Default=0.
          _mldorn2 Used with _mldornd.  How many successes to look for. 
                   Default=5 (note that if _mldornd=0, _mldorn2 is
                   ignored).
          _mlbkup  Try moving backwards (set S=-1.0) if stuck (e.g.; at
                   an inflection point). 1=Yes, 0=No. Default=1.
          _mlrndvc Controls H*, in N(beta,H*) selection of random beta (used by
                   RANDOM technique).  Default=2.
                  1 = H* = Hessian.
                  2 = H* = G'G. 
                  < 1  =   Values < 1.0 mean use a diagonal matrix equal to _mlrndvc * beta


          _mlelist    List of techniques. These are done first. 
                      Set to 0.0 to suppress. E.g.; _mlelist=0.
        
                      Otherwise, use this to specify a fairly intricate optimization methodology.
                      _mlelist should be Hx5, with the 5 columns (H is up to you), with each row having:

                           arg~step~freq~min~max
 
                           arg  == Technique; where:  1=NR, 2=BHHH, 3=STEEPEST, 4=DFP,  and 5=BFGS
                           step ==  -1 = Backward (useful if at an inflection point)
                                   0 = One
                                    1 = Golden
                           freq ==  every nth iteration.
                               Note that less frequent methods should precede  more frequent methods in _mlelist 
                               (at each iteration, _mlelist is scanned, and techniques are considered in order of 
                               appearance).
                           min  == First iteration to consider this on
                           max  == Last iteration to consider this on

                Default=0.


          _no_cls controls whether a CLS is issued before each iteration (before the MAXLIKC Iter= line is 
		  written at the top of the screen)
                     0 = clear screen between iterations
                     1 = do NOT clear screen (useful if you write status messages to lower portion of screen)

          _useallb Controls the constrained optimization heuristic. 
                   Default=2.
                0 =   If first unbound direction yields no
                      improvement, then do next strategy (e.g.; give
                      up after only 1 unbound direction).
                1 =   Keep trying unbound directions before doing
                      next technique (e.g.; keep trying unbound
                      directions until either none left to try, or
                      success).
                2 =   Try all unbound directions for each technique,
                      & choose best.  If none offer an improvement
                      then attempt next technique.

         These parameters can be changed interactively by using the
         ASKMLALG routine (see step 7 below).

			--------------------------------

 
C.  Create constraints matrix (optional).

              The constraints matrix (CNSTLIST in step 2 above) is an M by
         K+2 matrix.  Each row of the matrix is a different equality, or
         inequality, constraint.  The first K columns correspond to the K
         coefficients.  The K+1 column identifies the type of constraint, with:
               1 = "less than" constraint
               2 = equality constraint.
         The K+2 column is the value.  Note that a negative value, in conjunction
         with a "less than" constraint, is equivalent to a "greater then" constraint
         (with a positive value).

         If no constraints are desired, the constraints matrix should be set to
         missing (cnstlist=miss(1,1) in step 2).

         Note that equality constraints can only incorporate one variable, thus the
         first K columns of the corresponding row of CNSTLIST will contain K-1
         zeros, and one value of 1.0.

         A constraint matrix can be created interactively by using the
         MLE_ASK_CONTRAINTS and MLE_CONSTRAINTS (see step 7 below).


			--------------------------------

D.  Compute covariance matrix (optional).

              The covariance matrix can be computed using the COMPCVC
         routine.  COMPCVC is called using:

                 {vc,vcid}=compcvc(beta,vctype,freeindx,&MODELNAM);

         Where the two arguments returned are:
              vc      KxK Covariance matrix.  The square root of the
                      diagonal of vc are the standard errors of the
                      coefficients. 
              vcid    Flag to indicate which method of computing the
                      covariance matrix was used:
                      1 = -inv(H)                : -inverse
                                                 of hessian
                      2 = inv(G'G)               : G'G
                                                 matrix
                      3 = inv(H)*(G'G)*inv(H)    : 'white'
                                                 heteroscedastic
                                                 consistent
                                                 matrix
                                                 =inv(H)*(G'G)*inv(H)
         and the arguments are:
              beta       Kx1 coefficient vector
              vctype       Type of covariance matrix to compute (see
                           vcid above)
              freeindx     Index of "unconstrained variables"; see step 2
                           above.
              MODELNAM   Name of procedure that computes log-likelihood, gradient, and hessian; see step 2
                         above.

         Vctype will differ from vcid only if the hessian can not be computed (the
         hessian is computed when vctype equals 1 or 3).  If this should occur, the
         G'G matrix is used, and vcid is set accordingly.

         Note that all elements of the covariance matrix are computed, including
         equality constrained variables and variables that are "bound", with all
         constraints ignored (the unconstrained hessian, or G'G matrix, is
         computed).  Thus, when binding constraints occur, for standard error
         estimation a Bayesian or bootstrap approach may be preferable to the
         heuristic approach adopted here.


			--------------------------------

E.   Other useful routines.  


 ASKMLALG.
    ASKMLALG is an interactive routine that allows the user to select and
    specify a number of options, including:

              Default maximization algorithm (see ditype in step 2). 
              Default step-type (see dogolden in step 2).
              Custom control of maximization strategy (see _mlelist in step 4).
              Type of covariance matrix (see vctype in step 5).
              Set optimization control parameters (see step 4).

    Askmlalg is called:

              
         {ditype,vctype,dogolden}=askmlalg(othr_typ,ditype,vctype,dogolden);

      where ditype and dogolden are defined in step 2 above, and VCTYPE
      defined in step 6 above.

      Othr_typ should be a character string.  It is used to describe the "user
      specified direction matrix" (see the description of ditype under step 2
      above).

    Note that Askmlalg, in addition to changing ditype, vctype and dogolden,
    will change a number of global variables (such as the optimization control
    variables).


   MLE_ASK_CONSTRAINTS and MLE_CONSTRAINTS (and MLE_SHOW_CONSTRAINTS)

         MLE_ASK_CONTRAINTS and mle_constraints are used to create a constraints matrix.  The
         usage should be as follows: 
         
              cnststrg=mle_ask_constraints(parnames) ;
              if scalmiss(cnststrg) ;
                 cnstlist=miss(1,1);
              else;
                 cnstlist=mle_constraints(cnststrg,parnames,0);
              endif;

         where parnames is the Kx1 vector of coefficient names.  MLE_ASK_CONSTRAINTS will
         ask the user to enter a list of constraints, or to enter a filename containing
         a list of constraints.  mle_constraints will take the output of mle_ask_constraints 
         (cnststrg),    and convert it into a constraint matrix (cnstlist).  

        Note that mle_constraints does not ask for user input.
        MLE_CONSTRAINTS can also process a cnststrg provided directly (say, as read from an input file).
 
         When using mle_constraints, the cnststrg argument should have the following syntax:
              Val1 * Var1 + Val2 * Var2 ... < ValC ;  ... ; 
         For example:
                2 * X1  +  Z1 * 3 < 10  ; X2 = 3 ;
                3  X1 - Z3 > 100 ;

         Note that constraints are separated by semi-colons.  You can
         have several constraints on a single line, or you can spread one constraint
         over several lines. The asterisks are optional, - can be used instead of +,
         and if the Val is left out, a coefficient of 1.0 is assumed.  In addition,
         exact matches on parameter names are not necessary (sub-string matches
         are allowed).  Lastly, you must leave a space between each item in the
         constraint -- in other words: 2*X1+3*Z1<10 will be unreadable!

         Equality constraints can only have one variable: X1 + X2 = 10
         is not allowed.  Equality constraints are enforced, regardless of user
         specification of starting values.  Furthermore, equality constrained
         variables are treated as constants throughout the optimization.

        MLE_SHOW_CONSTRAINTS will display, on screen and to the output file, the 
        current constraints. 

	Syntax:
           call mle_show_constraints(cnstlist,parnames); ?;
        where the cnstlist argument is what was produced by MLE_CONSTRAINTS.


