Updated 21 June 2006 -- a work in progress!

                        The Discrete Choice Models Estimator
Abstract:

    DISCRETE is a "GRBL2"a GAUSS program  that will estimate multiple bounded models;
    including multi-bounded logit, multi-bounded probit, two-stage probit,
    and ordered probit  models.  To use DISCRETE, you must create a simple 
    "commands" file that specifies the dataset, the model, variable names, etc. 
    This document discusses how to create these "commands" files.


Contents:

  I. Introduction

  II. Description of Commands
  IIa. List of keywords (with examples):

  IIb. Descriptions of DISCRETE keywords
        IIb1. Non-model specific keywords

  IIc. Description of Models
    IIc1. STATS: Basic statistics on the data
    IIc3. CONVERT: Convert a "one-row" dataset into the multi-row format used by DISCRETE
    IIc4. TEST: Compute an F-Test of parameter significance.
    IIc5. PROBIT and LOGIT:  The PROBIT and LOGIT models take the same keywords.
    IIc6. MNL (Multinomial logit)
    IIc7. MNL2 (2-stage, nested, Multinomial logit)
    IIc8. MIXED (mixed logit) -- a varying parameters MNL
    IIc9. DOUBLE -- a variety of single and double bounded discrete choice models
       IIc9.a. Synopsis:
       IIc9.b. Syntax:
       IIc9.c. Notes on -2STAGE and -BIVAR
       IIc9.d. Keywords used with DOUBLE are
       IIc9.e. Description of the CHOICE keyword
       IIc9.f. Description of the other DOUBLE keywords


  III. Examples
    (under construction)


   ------------------------------------------


Ia. Introduction 

  DISCRETE is a GRBL2 program -- please see GRBL2.TXT for a general description. In particular,
  DISCRETE uses a GRBL2 "command file" to select models, variables, etc.

   DISCRETE currently supports the following estimators.

       PROBIT, and LOGIT -- plain vanilla probit and logit
       DOUBLE            -- single, double, and multiple bounded probit and logit (dichotomous choice)
       MNL      --  multi-nomial logit)
       MNL2             --  nested multi-nomial logit
       MIXED            --  a variety of mixed logit models

   You can also use DISCRETE to create new datasets, convert from "one-row" to "multi-row",
   perform F-Tests, display some statistics, and create/modify values in a gauss dataset file.
   
   And we will be adding more models in the future!    

   -----------------------------

II. Description of Commands


DISCRETE is built using the GRBL2 interface. GRBL2 supports a number of sophisticated
keyphrases (that begin with keywords) for use in "commands files".

The following keywords are those that are either unique to DISCRETE, or are sufficiently
important that they bare repeating. Note that keywords described in GRBL2_BATCH.TXT,
but not mentioned here, WILL work in DISCRETE command files.

In other words, if you want to get fancy, please see GRBL2_BATCH.TXT 

   -----------------------------
 

IIa. List of keywords (with examples):


Keywords specific to DISCRETE

 HEADER           HEADER Using US data ;
 MODEL            MODEL STATS ; MODEL MBL ; MODEL MBP : MODEL MBP TRUNC ; MODEL OPROBIT ;
 RESET2           RESET2 ; RESET2 -ALWAYS ;
 SAVE_VC          SAVE_VC MDL1a ;
 WRITE_VC         WRITE_VC YES ;
 CREATE           CREATE ; ..... ; RUN ;

In addition, each model supports a number of keywords; some of them unique to the model,
some shared by several models, and some of them supported by GRBL2. 
These are described in the model-specific sections under II.3.

The CREATE keyword is used to create a new Gauss dataset. It uses the same syntax as the
CREATE keyword in the MAKEDATA program -- see MAKEDATA.TXT for the details.

   -----------------------------

IIb. Descriptions of DISCRETE keyword

To repeat, the generic GRBL2 keywords will all work in DISCRETE -- see GRBL2_BATCH.TXT for the details!

Section II.3 discusses each model, listing both the keywords shared with other models,
and the keywords specific to the model.

   -----------------------------

IIb1. Non-model specific keywords

These DISCRETE keywords are not specific to a model. 


MODEL: Model to estimate.

   Select a model.

   Syntax:
    MODEL model_name [modifiers] ;
  
   There are several models currently supported, each of which may take its own
   set of modifiers:

       CONVERT  convert a one-row to a multi-row dataset.
       DOUBLE   single, double, and multiple bounded logit and probit
       MIXED   mixed multinomial logit
       MNL     multinomial logit, single stage
       MNL2    multinomial logit, two stage
       PROBIT and LOGIT   plain vanilla Probit and Logit estimator
       STATS   basic statistics
       TEST     compute f-test

    For details on what keywords to use with  these models, see section 2.3 below.

    The modifiers expected by the models are:

       STATS : Basic statistics
         SHOWY=1   : 
               If included, then display a table listing how often each alternative was chosen.
               The table columns list 
                   # of times the alternative was chosen
                   # sum of choices (if non 0/1 values used in the response variable)
                   # of times alternative chosen when the alternative was part of a group

               Note that the total number of rows (1 row per alternative) will be
               the maximum alternatives across all observations (or all observations/choice-occasions).
             The last column is only displayed if a GROUP command is included (GROUP is used by the
             MIXED model).


       CONVERT: Convert one-row to multi-row dataset.
         One modifier is required:
         out=newfile_name;
        Newfile_name is the name of the new file that will be created. 
         If no path is given, the file will be saved in Gauss's 
         current working directory.

       TEST: Do an F-Test.
         No modifiers are expected.

       PROBIT and LOGIT: A plain vanilla probit and logit.
           No modifiers are expected.

       DOUBLE:   Double bounded logit or probit.
        DOUBLE can take several modifier. The complete list is in the DOUBLE documentation below:
   
        -PROBIT  : Estimate a double bounded probit model (normal distribution)
        -WEIBIT  : Estimate a double bounded weibit model (extreme value distibution)
        -LOGIT   : Estimate  double bounded logit model (logistic distribution).
        -LOG     : Use log of bid values
        -BHHH    : Use a BHHH (rather then a Newton Raphson) estimator (see below for -WHITE option)
        -2STAGE  : Estimate a 2-stage model
        -BIVAR   : Estimate a bivariate probit model 
             -NOSTART : suppress display of starting values -- only used with 2stage probit.
      
       Example:  MODEL DOUBLE -PROBIT ; 
           MODEL DOUBLE -LOGIT -LOG ;
           MODEL DOUBLE -BHHH ;

       Note: -LOGIT is the default.

       MIXED: The mixed logit (ala Train).
           -FULL  : type of correlation (or the random portion of the coefficients)
                    By default, a diagonal VC matrix (of random components) is estimated -- each "random" coefficient
                    is effected only by a single random component.
               However, if you specify -FULL then a full covariance matrix be estimated.

         -NOSTART : suppress display of starting values.
                    By default, starting values (as predicted by an MNL regression) will be displayed.
               To suppress this display, use a -NOSTART .


         -SHOWINFO : display some information on the structure of the data: choice occasions per individual,
                     alternatives per choice occasion, and number of groups per choice occasion.

         Examples:  MODEL MIXED -FULL ;
                   MODEL MIXED -NOSTART -FULL ;

       MNL: The joint multinomial logit.
              BH_EXP: Modify how the likelihood function is computed.
       
              Examples: 
               MODEL MNL BH_EXP=1   ; -- use expected value of the BHHH direction matrix
                MODEL MNL BH_EXP=0   ; -- (the default) use the BHHH direction matrix as is

        If not specified, BH_EXP=0 is assumed


       PROBIT and LOGIT:  -- A plain vanilla probit and logit.
           No modifiers are expected.


       MNL2: The nested multnomial logit
         Several  optional modifiers can be specified:   
            TYPE=atype, 
             BH_EXP=1, and 
            FORCE=1

       TYPE:  Select the maxmization algorithim.
          TYPE refers to the maximization algorithim. 
               "atype" can be one of four values:
           SIMPLE   : simple, 2-stage estimator. 
                SE_FIX   : same as SIMPLE, but with Amemiya's correction for 2nd stage standard errors
                     LIML     : A one-step Limited Information Maximum Likelihood
                   FIML     : Full Information Maximum Likelihood
               If a TYPE is not specified, SIMPLE is used.
         
         FORCE: Constrain inclusive value.
             FORCE=1  : force the inclusive value parameter to be between 0 and 1.
                   If you do not specify FORCE=1, this constraint will not be imposed.

             Actually, you can use the CONSTRAINTS and CONSTRAINTS2 GRBL2 keywords to impose
             a constraint on the INCLUSVE parameter. 
             The FORCE=1 option is a shortcut that imposes a 0 to 1 constraint.

          BH_EXP : Enable BH_EXP first stage maximization method 
           BH_EXP=1   ; -- use expected value of the BHHH direction matrix
            BH_EXP=0   ; -- (the default) use the BHHH direction matrix as is
   
         Examples:  MODEL MNL2 TYPE=SE_FIX BH_EXP=1 ;
          MODEL MNL2 FORCE=1 TYPE=FIML ;
          MODEL MNL2 ;



RESET2 
   Reset GRBL2 and MLE variables.
   Similar to the GRBL2 RESET command, but also resets a few DISCRETE specific variables,
   and resets the DEFINE string replacements.
   RESET2 can take two options: -ALWAYS or -ONCE

     RESET2 -ALWAYS ;     (the default)
         GRBL2 will reset all parameters for each model. This means that you need to specify SELECT, X, Y, etc.
         for each model. 
     RESET2 -ONCE ;
         GRBL2 will NOT reset all parameters for each model. This means that SELECT, X, Y, etc. specifications
         will hold across several models.
   
    RESET2 -ALWAYS is the default: parameter reset occurs every time a MODEL is processed.

    Notes:

       * RESET2 will reset the input file and DEFINEd strings.

       * However, the "always reset" (that occurs with each MODEL) will NOT reset the following:
           -- the input file
           -- the DEFINEd strings
           -- the OUTPUT file
           -- the TITLE

       * some models reset some variables regardless of the above (i.e.; DOUBLE resets BOUND and NEVERYES
         each time it is called).


       * If you chose to disable the -ALWAYS default setting...
            be sure to keep in mind that settings can accumulate in odd manners!
       * the GRBL2 "RESET" command is "called" by the DISCRETE "RESET2" command.

SAVE_VC
    Save parameters (parameter names, beta, vc matrices) to a GAUSS .FMT file.
    Example:
   SAVE_VC MOD1 ;     
    Note: do NOT include the .FMT extension

    If no path is specified, the file will be written to the working directory, or the
    current directory (if you did not specify a working directory).
    Special values:
    SAVE NO  ;   -- do NOT save results
    SAVE YES ;   -- save results to a .FMT file with same name as your dataset
          So, if FILE E:\STUFF\FOO ; is used, then results will be saved to
          E:\STUFF\FOO.FMT

WRITE_VC:  Write the variance matrix to the OUTPUT file.

        Expects either a NO or YES  modifier.

        Example:
            WRITE_VC no  ;


   -----------------------------


IIc. Description of Models


Currently, DISCRETE supports the following models:
  STATS     basic statistics
  MAKE       convert a text-data set into a GAUSS dataset.
  TEST      hypothesis testing (not implemented quite yet)
  PROBIT    simple Probit model
  Logit     simple Logit model
  MNL       simple MNL model
  MNL2      2-stage, nested MNL model
  MIXED     mixed MNL, with many variants


In addition, you can CREATE new datasets from old datasets.
To do this, use:
   CREATE  ;
     .. ;
   RUN ;
where the .. signifies CREATE keyphrases. These CREATE keyphrases are the same as those used by the
MAKEDATA program -- see MAKEDATA.TXT for the details.


            --------------------------------


IIc1. STATS: Basic statistics on the data

   Generate basic statistics on the data.

   The usefulness of this command is that the statistics are on the data actually used in the model.  
   That is .. 
      * excluded observations (due to use of a SELECT GRBL command), and
      * observations with missing values, 
   are removed before any statistics are generated.

   In addition, STATS will report information on alternatives per observation (if an unbalanced design is
   used).

   You need to specify what variables to examine. You should use the following keywords:

     X : Generates min, max, mean, and sd for the "X" variables.
    AUX: Generates min, max, mean, and sd for the "AUX" variables.

    Y : The "choice" variable
   GROUP: The "group" variable
  WEIGHT: A "weight" variable

   ID : The "observation identifier" 
  ID2 : The "nest" identifier. Or, the OCCASION identifer in the MIXED logit model).
  ID3 : The "x-replication" identifier (in the MIXED logit model) 

 
  Notes:
     *  STATS does not treat "X" and "AUX" variables differently -- it just displays them in different tables.
    *  For the MIXED model, you can use OCC instead of ID2.
    *  Note the use of Y, instead of RESP, to indicate the "dependent" variable.


   --------------------------------


IIc2. MODEL CREATE and MODEL MAKE are no longer supported (as of 1/1/06).
     Use the CREATE keyphrase, or use the MAKEDATA program, instead.
   

   --------------------------------

IIc3. CONVERT: Convert a "one-row" dataset into the multi-row format used by DISCRETE

  DISCRETE uses a data architecture based on "one-row per alternative".
  Thus, each individual observation is specified across multiple rows in the dataset.
  In this format, the Kth attribute of alternative M  is found in the Kth column of the M'th row
  of the set of rows descsribing this observation.
  
  However, some data (and some other programs) expect data in a "one-row per observation" format.
  In this format, the Kth attribute of alternative M  is found in the Kth instance of this variable in  
  the single row used for this observation.

  For example, if you have 4 alternatives and two variables (XA and XB), each row of the dataset would
  contain 8 variables: XA1 XA2 XA3 XA4 XB1 XB2 XB3 and XB4 -- with XA1 being the "XA variable for the first
  alternative", etc.

  CONVERT is used to convert this one-row format to DISCRETE's multiple row format.

  Note that CONVERT will read data from the GAUSS dataset specified in a FILE command.
  It will write the new data file to the file specified in the OUT=newfile_name option.
  In both cases, if no path is specified, DISCETE will read (write) to (from) the GAUSS working directory.
  For example:
   file mydata0;
        model convert out=mydata1 ;
      ... CONVERT options (as described below)  ...
   run;


  To use it, you specify the following;

    ID   -- an integer specifying how many alternatives per individual.
     Y   -- the "response variable"
     Z   -- a list of "observation specific" variables
     X   -- a list of "alternative specific" variables


 Details:

     ID : MUST be an integer
     Example: ID 5 ;

        Note: There MUST be a balanced number of alternatives per observations (that is, the same number
              of alternatives for each respondent).

     Y : MUST be an integer between 1 and NALTS, where NALTS is the value used in ID. Thus,
    a Yi=3 means "ith observation chose the 3rd alternative".
      Example: ID MYCHOICE ;


     Z : Optional. Can be a list of variables. The values of these variables are copied to each
    of the newly created rows for this observation. Thus, each row of an observation will have
    the same values of the Z variable.
       Example:  Z  ZIPCODE AGE INCOME ;

     X : A list of "alternative specific" variables. 
         Actually, this list should contain several sets of variables.
         Each set must have NALTS different names specified, and these variable names MUST be defined in the dataset.

         For example, if COST and TIME are alternative specific variables, and there are 5 alternatives,
         they you should specify:
      X  COST1 COST2 COST3 COST4 COST5 TIME1 TIME2 TIME3 TIME4 TIME5 ;

         Note that you can specify these in any order (i.e.; COST1 COST5 TIME1 TIME5 COST2 ...)

         The end result is that the new (multi-row) dataset will have two variables: COST and TIME.
 
         Naming convention:  
               Each alternative specific variable name should end with a nn, where nn runs between 1 to NALTS.
               If there are less then 10 alternatives: nn should be a single digit.
                                  >9 and <100: nn should be two digits (use a leading 0 for values <10)
                              >99 and <1000: nnn should be three digits (use leading 0s if need be).


   --------------------------------

IIc4. TEST:    Compute an F-test of parameter significance.

    Two types of tests are supported: LRT (likelihood ration) and WALD.
    In both cases, "results matrix" saved with the SAVE_VC keyword
    are used.

    Syntax:
       MODEL TEST LRT  con_file , uncon_File1 ... ;
    or 
       MODEL  TEST WALD  con_file , constraint_list ... ;
    (in both cases, the comma is optional).

    For LRT, you supply the results matrix of the constrained model (con_file),
    and (1 or more) results matrices of the unconstrained model(s) (uncon_fileN).

    If you specify one matrix (i.e.; the results from one unconstrained model),
    the df of the chi square test is the difference between the number of
    parameters.

    If you supply several matrices (i.e.; model results from different
    sub-samples), then each model MUST have the same set of parameters;
    and the d.f. equals:
            ((number of models)-1) * (# of parameters).


    For WALD, you supply the results matrix of the model, followed by
    a list of constraints. Constraints take the form
         cVal  * VAR  +  cVal * VAR = rVal  ,
         cVAL1 * VAR = val ;
    where cVal and rVal are numeric values, and VAR are variable names.

    Example:
      Suppose that VER1, VER1Z, VER2, VER1A, VER1B and VER1C contain the
      "same data", with:
         VER1 has N observations and X containing of X1 and X2
         VER1Z has N observations and X containing X1, X2, Z1, and Z2
         VER2 has N observations and X containing X1, X2A, X2B, and X2C
              where X2A, X2B, and X2C are "sub-sample specific" versions of X2.
         VER1A, VER1B and VER1C are subsets of VER1, with
               Na, Nb and Nc observations (Na+Nb+Nc = N ),
               and X of X1, X2

      Then:

         MODEL TEST LRT  VER1 , VER1A VER1B VER1C  ;
           tests the equality of X1 and X2 across the three subsets.
           ... VER1 is the constrained model
               VER1A + VER1B + VER1C is the unconstrained model
               df = (3-1) * 2 = 4

         MODEL TEST LRT VER1 , VER1Z
          tests whether Z1=0 and Z2=0 (df=2)
          .... VER1 is the constrained model, VER1Z is the unconstrained
               model

         MODEL TEST WALD  VER2 X2A - X2B = 0 , X2C - X2B = 0 ;
          tests that X2A = X2B = X2C (with df=2).

     WARNING: as of 28 May 05
         * WALD tests do not work
    * results files must have 8 character names
    * and be in the "current directory"
         This will be updated soon!


   --------------------------------


IIc5. PROBIT and LOGIT:  The PROBIT and LOGIT models take the same keywords.

  The important keywords used with PROBIT and LOGIT are:
  
  RESPONSE: used to specify the NO/YES dependent variable. 

      RESPONSE specifies a single response variable.
      
      You can also specify which values of these response variables correspond to "YES".

      Syntax:
     RESPONSE VAR=varname COND=type  VAL=aval
       where
           varname is a variable name
           COND is optional. If specified, it should be one of: EQ GT GE LT LE NE
           aval is optional, and is only used if COND is used. It should be a single numeric value.    

       If cond is not specified, a value of 1 means YES (all other values mean NO)

       Alternate syntax:
             RESPONSE varname ;
       which is a shorthand for:
             RESPONSE vAR=varname COND=eq VAL=1 ;

       Example:
        RESPONSE DIDIT  ;                        -- DIDIT=1 means "YES"
        RESPONSE VAR=NOTPAY COND=ne VAL=1   ;    -- NOTPAY=0 (or 2) means "YES"


  X, DUMMY, and XNEW: used to specify the independent variables. 

     However, we advise using the CREATE option to create DUMMY and NEW variables, rather then generating them
     on-the-fly.

     Note: to include a constant term, use -CONST as one of the X variables.

  WEIGHT: (optional) used to weight observations in a linear fashion -- basically
     observations are replicated (a weight of 2 yields twice as much contribution to the log-likelihood as a 
     weight of 1). However, WEIGHT does not effect the degrees of freedom.

  NORMALIZE: used to normalize the independent variables
      NORMALIZE  ;  --   to force normalization of the all the  independent variables 
                       (though "dummy" independent variables will    not be normalized).
     NORMALIZE 0 ;  -- do NOT normalize independent variables
 
  AUX and BAUX : AUX is used to specify a vector of "fixed-coefficient" variables, and BAUX specifies
            the "fixed coefficients" used with these "AUX" variables.
                 This product (AUX*BAUX) is used when computing the probability (along with the independent
       variables ) --  but the BAUX coefficients do NOT change (they are not estimated).

   --------------------------------

IIc6. MNL (Multinomial logit)


    There are multiple rows per observation.
    Each row corresponds to a unique alternative.
    Each row should contain explanatory variables that describe this alternative.

    These explanatory variables may be "conditional" variables, or "multinomial variables".

       * Conditional variables vary across alternatives (and across observations).
       * Multinomial variables vary across observations, but not across alternatives.

     Multinomial variables will be used to create "dummy variables", with  #_alternatives-1
     dummies per observation. For each row (for each alternative), only one of these dummies will be 
     non-zero -- the dummy corresponding to the alternative.


  The important keywords used with MNL:

  ID:  specify an observation identifier.  This can either be a variable name, or an integer --
       the integer is used when you have a balanced design (exact same number of alternatives per
       observation).
 
      For ID, you can specify either a -ID or -COUNT type 
    -ID : ID variable contains an observation specific value.
        -COUNT : ID variable contain a count of how many (contiguous) rows belong to this individual
      Examples:
     ID MY_NUM  -ID ;
     ID IN_OBS -COUNT ;

      See GRBL2_BATCH.TXT for the details.
  
      To use "multinomial" variables, you MUST have balanced design!

      Examples:   ID OBSNUM; 
        ID 9  ;      @ 9 alternativew per observation @ 

  RESPONSE : the choice variable. 

       This is typically a 0/1 dummy, where only one row (alternative)  observation is non-zero.

       However, you can estimate "share" and "aggregated response" models by having non-zero 
       values for this "response" variable.


   Example:  RESPONSE CHOICE ;

  WEIGHT: Weight variable. 
   
         Or WEIGHT NO ; to not weight.
         Note that weights are analytically equivalent to using a counts type of YUSE (in 
    multi-row datasets). 


  X, DUMMY, XNEW : "Conditional" independent variables.

  Z:  "Multinomial" variables.  Sorry, you can't use on-the-fly defined variables as multinomial variables 
       -- use CREATE first!

      Examples:
          Z NO ;
          Z INCOME AGE  ;
  

  NORMALIZE:  Normalize the independent variables (both conditional and multionmial).  See the description in
         PROBIT for the details.

  AUX and BAUX:  Fixed-coefficient variables. See the description in PROBIT for the details.


  Details on "conditional" and "multinomial" variables:
     Conditional:
        Values differ across alternatives. The coefficients on the XUSE variables
        is the same for all alternatives.
        A conditional variable (say, X1) has an implicit normalization: one of the alternatives 
        can have its values set to 0, with the others values recentered 
        around this "set to 0" value of X1.  In other words, what drives the
        estimation is the differences between X1 variables (across observations), not the
        actual value.

     Multinomial:
        Values are the same across alternatives. The coefficients depend on the alternative.
        A normalization is enforced, with the coefficient on the last alternative dropped
   (implicitily set to 0).

        For example,f there are 4 alternatives, for a "multinomial" variable Z1,  
        observation i would look like:
      Z1i    0     0
           0    Z1i     0
      0      0   Z1i
      0      0     0

   Notes:

          * Three coefficients will be estimated; Z1_1, Z2_2, and Z1_3 (where the _1 refers to alternative 1, etc.)

     * the last line is all 0s -- the last alternative is the one that is "normalized".

     * multinomial variables can be used ONLY when the number of alternatives is balanced
            (same set of alternatives for each observation).
   
     * when specifying a multinomial variable, typically the same value will be used for all rows (alternatives)
       in an observation. However, this is not enforced -- within an observation, you can specify different values 
            for each alternative (we are not sure why one would do this, but we do not forbid it).


   --------------------------------

   
IIc7. MNL2 (2-stage, nested, Multinomial logit)

  Data is assumed to be in multi-rows -- with one row per "choice", and either a balanced
  number of choices per observation, or an observation ID used to identify the alternatives
  available to an observation.

  In addition to choices, nests must be identified.  

   You can specify a balanced design. In a balanced design:
     a) the number of alternatives is the same for all individuals
     b) the number of nests is the same for all individuals
     c) the number of alternatives per nest is the same for all nests

   Thus, if there are 20 alternatives, and 4 nests, there will be 5 alternatives
   per nest.

   To specify alternatives and nests, use the ID and ID2 variables.

   Unbalanced design:
    ID and ID2 are variable names:
       ID identifies an observation ID. As with the MNL model, all alternatives for an individual must
       appear sequentially in the data file (a change in the ID variable's value signals a new observation).

      For ID, you can specify either a -ID or -COUNT type 
    -ID : ID variable contains an observation specific value.
        -COUNT : ID variable contain a count of how many (contiguous) rows belong to this individual
      Examples:
     ID MY_NUM  -ID ;
     ID IN_OBS -COUNT ;
      See GRBL2_BATCH.TXT for the details.



       ID2 identifies nests within an observation. These do NOT need to be sequential: the program
       will sort alternatives (within a single observation) into their proper nests.

       You can NOT specify a -COUNT type for ID2.

    Balanced design:
      ID and ID2 are numeric values.            
        ID: number of alternatives per observation
        ID2: number of nests per observation.
   Thus, the number of alternatives per nest is ID/ID2. If this is not a whole number, an error occurs.

   The MNL2 keywords are similar to the MNL keywords, with the addition of "2nd stage" options.

   They are:

     X     : 1st stage independent variables
     XNEW  : 1st stage "created" variables
     DUMMY : 1st stage "dummy" variables
     WEIGHT : 1st stage "weight" variables -- NOT SUPPORTED IN LIML AND FIML VARIANTS
     Z     : 1st stage "multinomial indepdendent" variables  
     AUX   : 1st stage "auxillary" variables
     BAUX  : Fixed coefficients (required if AUX is specified)

     Y     : Choice indicators (used in both stages).
        Note: the choice indicator for a nest is the SUM of Y values (for an observation's
         alternatives within the nest)

     X2    : 2nd stage independent variables
     Z2    : 2nd stage "multinomial independent" variables
     AUX2  : 2nd stage "auxillary" variables
     BAUX2 : Fixed coefficients for AUX2

     BSTART : Starting values -- must match # of variables specified
         LIML and FIML : first stage vars, inclusive value, 2nd stage vars
         otherwise: First stage vars
     BSTART2 : Starting values for second stage (must match)
          NOT used by FIML and LIML variants.


    MAKE_NEST:  Create a "nest identifier" variable
 
         MAKE_NEST is used to assign observations to subsets. It is used instead of ID2.

         Typically, this is used to assign the "alternatives" available to an individual into 
        "nests" --- where each alternative is a row in the dataset, and an observation consists
        of a set of these rows (say, all rows sharing the same value of an ID variable).

    Syntax:
      MAKE_NEST  nest_var    n1 n2 n3 ~ n4 n5 n6 ~ ... ~ nJ nK nM ;
         
            nest_var  : the variable to use to assign a row to a nest
                   ni (i=1...) : a value that the nest_var might take.  These values SHOULD be
         integers, though real numbers can be used (albeit with no guarantee
         of an exact match).
         The  ~ (or |) is used to signal "end of subset".
         
          Example: MAKE_NEST IDVAR1  1 2 4 ~ 5 3 8  ~ 7 6 9 10 ;
     This would look at the IDVAR1 variable, and if the value is:
        1, 2 or 4: assign the row to the first nest
        5, 3 or 8: assign the row to the 2nd nest
             6, 7, 9, 10: assign the row to the 3rd nest
     All other values will be assigned to a 4th (non-speciifed) nest.

     Note: You should NOT specify ID2 and MAKE_NEST in the same model.

    Notes: 

      
       *  there is NO 2nd stage equivalent for WEIGHT, XNEW, or DUMMY.
      WEIGHT is NOT supported for FIML and LIML models
 
       *  The LIML model is just the FIML model, but with only 1 iteration.

       *  For for non-fiml and non-liml estimators, for the convariance matrix ...
        DISCRETE will NOT consider constraints when computing the covariance matrix
          For FIML and LIML, DISCRETE will consider constraints (binding variables are treated
     as constants).

       *  For X2, and AUX2 -- within a nest (for an observation) the values of X2 and AUX2
     should be the same. Actually, DISCRETE will use the value of the first alternative
     encountered (within the nest)

       *  For Z variables, all alternatives within a nest (within an observation) should have the same 
          value.
          DISCRETE will create #alternatives_within_a_nest - 1 "multinomial" variables
     (where #alternatives_within_a_nest = ID/ID2)

       *  For Z2, all alternatives within an observation should have the same value.
     #nests-1 "multinomial variables" will be created (where #nests=ID2)
  
       *  When using BSTART and BSTART2, do not forget to start with a YES.
     Example (assuming 3 variables in first stage): BSTART YES 1.0 1.0 -0.5 ;


   --------------------------------

   
IIc8. MIXED (mixed logit) -- a varying parameters (across individuals) MNL

  Several variants of the mixed logit can be estimated:
   a) standard mixed logit (varying betas).
   b) uncertain X-variables (imprecise measures of independent variables) -- the UNC option
   c) repeated choice occasions for each individual -- the OCC option
        d) grouping of alternatives (a  simple nesting)   -- the GROUP option

  These variants can be combined. You can also use MIXED to estimate a plain-vanilla MNL, but
  it's quicker to use MNL for that.

  Data is assumed to be in a  multi-rows format (one-row is NOT supported).
  You can have unbalanced or a balanced number of alternatives per observation.

  Note that  Z (multinomial variables), XNEW, and DUMMY are not currently supported for the  MIXED model.
  If you need to use these kind of variables, you can use CREATE to create a new dataset.

  Options supported by  MIXED (that apppear on the MODEL MIXED line):
         -FULL
      -NOSTART
         -SHOWINFO
  Keywords supported by MIXED (that appear after MODEL MIXED .. and before RUN )
             ID : Specify observation ID variable
            OCC : Specify choice-occasion ID variable
            UNC : Specify "Uncertain X" and "uncertain Y" model
              X : "Conditional" independent variables
       RESPONSE : the dependent variable
         2STAGE : specify a 2-stage model
             X1 : First stage independent variables
            Y1  : first stage dependent variable
           AGG  : Specify an aggregate-alternatives model
         SAMPLE : Specify a "sample of alternative"
             Z  : Aggregate-sites "size" variables.
         WEIGHT : Alternative specific weight variable used to estimate uncertain Y variant
   AUX and BAUX : Fixed-coefficient variables and betas
      NORMALIZE : Normalize the independent variables
         NORMAL : A list of "normal" X varying coefficients
        UNIFORM : A list of the "uniform" varying coefficients
       TRIANGLE : A list of the "triangle" varying coefficients
      LOGNORMAL : A list of "log-normal" varying  coefficients

           REPS : # of replications of the beta vector.
           SEED : Random number seed
       RND_SEQ  : the type of random values to generate
       BSTART   : starting values

         WTP    : estimate Willingness to Pay

 In addition, the generic GRBL2 commands are supported, such as:
    PRNTIT 1  : display intermediate resuls

 Also, you can instruct MIXED to NOT estimate a model by using:
    RUN -noest ;
 This is a useful trick if you use SAMPLE to create a subset of the data, or if you want to
 compute WTP values using a known beta-vector.

 The DISCRETE specific options and keywords are described below.


  The -FULL option selects the type of covariance (of the random components of the betas):

     If a -FULL option is NOT specified, a diagonal VC matrix (of random components) is estimated --
     each "random" coefficient is effected only by a single random component.
     However, you can use the -FULL option to specify that a full covariance matrix be estimated.

     Example:
        MODEL MIXED -FULL  ;

    More precisely, the upper triangle (including diagonal) of the cholesky decomposition of the covariance matrix
    is estimated when FULL is specified. Otherwise, just the diagonal components are estimated.
    Note that in either case eps_ik is pulled from a  multivariate normal distribution.
    FULL just controls the covariance structure (of eps_ik's elements).


 The  -NOSTART option controls whether the "starting values" are displayed.
    By default, starting values (by default, as predicted by an MNL regression) will be displayed.
    To suppress this display, use a  -NOSTART option in the MODEL MIXED line of your
    command file.


 The -SHOWINFO option
    Display some statistices on the structure of the data
    The mean, sd, min, max of the
         choice occasions per individual,
         aternatives per choice occasion, and
         number of groups per choice occasion
    will be written to your output file.

  Note: to do everything EXCEPT estimate the model, use the -NOEST option of the RUN keyphrase.

   :::::::::::::

  The following keywords are also used in MIXED:

  ID:  specify an observation identifier.  This can either be a variable name, or an integer --
       the integer is used when you have a balanced design (exact same number of alternatives per
       observation).

      For ID, you can specify either a -ID or -COUNT type
    -ID : ID variable contains an observation specific value.
        -COUNT : ID variable contain a count of how many (contiguous) rows belong to this individual
      Examples:
     ID MY_NUM  -ID ;
     ID IN_OBS -COUNT ;
      See GRBL2_BATCH.TXT for the details.


 RESPONSE : the choice variable (the dependent variable).

    This is typically a 0/1 dummy, where only one row (alternative)  observation is non-zero.
    However, you can estimate "share" and "aggregated response" models by having non-zero
    values for this dependent variable.

     You can also use Y to account for sample weights.
     Say, you have
        OBWWT: a sample weight (measuring how many people in the population are just like this observation).
        YCHOSEN: number of times this alternative was chosen (during a choice occasion).
     Then,the new variable:
        WT1 = OBSWT * CHOSEN
     can be used to control for both "sample weighting" and "many-choices".

     In this example, we assume that OBSWT does not vary across rows belonging to an observation.
     However, DISCRETE does not test for this.


     Notes:

       * RESPONSE and the Full-information aggregated alternatives model.
   
         In the full-information aggregated alternatives model (AGG TYPE=FULL), 
         if you use non 0/1 dependent variables, for each group
         DISCRETE will sum all the values of the RESPONSE variable, and use this 
         value as the dependent variable for the group.  
         See AGG for a further  description.

       * The WEIGHT keyphrase is NOT used for "sample weights" -- it is used by
         the uncertain-dependent variables model to specify posterior probabilities.


 2STAGE : specify a 2-stage model

    The 2STAGE option specifies that a two stage model should be estimated.
    The two stage model correlates the random term in the first (participation) stage with
    a selected set of the varying parameters.

    2STAGE 0 means "do NOT estimate a 2-stage model".

    Otherwise, 2STAGE can take two options: a "-1" flag, and a list of variable names.

       -1   : If specified, then a single correlation coefficient is estimated.
              If NOT specified, a seperate correlation coefficient is estimated for each selected variable.
              The -1 should appear BEFORE the list of variables.

       List-of-variables:
            A list of variable names -- these MUST be variables that have non-lognormal varying
            parameters. That is, variables specified in NORMAL, UNIFORM, and/or TRIANGLE.
            A seperate correlation coefficient, between the varying paramter and the first stage error term,
            will be estimated.

            Hint: to specify ALL the non-lognormal variables, use * as the "list of variable"

     Examples:
        NORMAL AGE PRICE EDUC ;

        2STAGE 0 ;

        2STAGE * ;
        2STAGE AGE PRICE EDUC ;   @ 2STAGE * is synonymous with this, given the above NORMAL keyphrase @

        2STAGE -1 * ;
        2STAGE -1 AGE  PRICE  ;


     Notes:
      * 2STAGE models can not have LOGNORMAL varying parameters, and can NOT have a FULL correlation matrix.
        That is, 2-stage models are only available for varying parameters that are not lognormal and
        that are independent.
      * If you specify 2STAGE, you must also specify  X1 and Y1.
      * The first stage is a probit.
        The second stage is the mixed logit, augmented with information from the first stage.
        In particular,
          * the varying parameters are now modeled using a mills ratio as an additional explanatory
            variable. This mills ratio is computed from first stage coefficients.
          * the first stage (the PROBIT stage) is estimated using all observations
          * the second stage (the MNL stage) is computed using only observations with non-zero first stage values
       * NORMAL (and UNIFORM and TRIANGLE) can be specified after 2STAGE.

  Y1 : the first stage dependent variable (the PARTICIPATION variable).

   Only used if you have specified -2STAGE.
   This should be the name of a 0/1 dummy variable (1 means "participant").
    Only the value from the first row (of each observation's rows) will be used.


  WEIGHT: Weight variable -- used to form posterior probabilities for the uncertain dependent 
         variables model.

      In a standard MNL, the probability of a choice occasion is exp(X_i*b) / Sum_j{X_j * b) 

      However, instead of using just one value in the numerator (the "ith alternative"), you
      can use a weighted value of all the j=1..J alternatives.
      Thus, the likelihood contribution for a choice occasion could be:
          Sum_j(W_j * exp(X_j*b) }  /  Sum_j{exp(X_j * b)}
      or
          Prod(exp(X_j*b)^w_j }  /  Sum_j{exp(X_j * b)}

      With both sums over the j=1..J alternatives (rows) in this choice occasion.
   
      It's primary purpose is for the "uncertain dependent variables" correction.  
   
 
  
  AUX and BAUX:  Fixed-coefficient variables. See the description in PROBIT for the details.

  X : "Conditional" independent variables.
   A list of variable names.

   Reminder: MIXED does not estimate "multinomial" variables. You can use CREATE to generate
   a new data file that contains "multinomial" style variables.
        You can then use this new file, and specify each one of these multinomial-style variables 
        in the X list.

  X1 : the first stage independent variables
   
        A list of variable names.

        Only the values from the first  row (of each observation's rows) will be used.


  NORMALIZE:  Normalize the independent variables.  See the description in PROBIT for the details.


  AGG:  Specify an aggregate-alternatives model

    In many cases, the actual alternative chosen may not be known, but you do know which of several,
    seperate, aggregated alternative "groups" was chosen.

         That is, you know which "group" was chosen, but not which of the alternatives in the group.

    For example, one may not know what site a trip was taken to, but one may know the region
    that the visited site is in.

    DISCRETE supports two kinds of aggregated-alternatives models: full information and limited information.

      i) Full information:
           Information on the attributes of all the alternatives is available.
           You know which group each alternative belongs to.
           You don't know which alternative was chosen, you do know which group was chosen.

     ii) Limited information (based on the Ben-Akiva/Lerman Logit Model With Aggregate Alternatives):
          Information on the attributes of alternatives is NOT available.
          Aggregate information on the attributes of groups is avaialable. In particular, information on the
          mean and variance-covariance of attributes (across alternatives belonging to a group), and the size of the
          group.
          You don't know which alternative was chosen, you do know which group was chosen.

    Syntax:
    AGG TYPE=FULL VAR=GRP_VAR ;
    AGG TYPE=LIM  VCI=vci_file CLASS=class_var APPROX=aa ; Z SIZEVAR ;
    AGG NO ;

        * AGG NO is used to NOT estimate an aggregate alternatives model.
        * ATT TYPE=FULL is selects the full-information model,
        * ATT TYPE=LIM selects the limited-information model,

    Full information: TYPE=FULL ;

        You must have X information on all the alternatives. What is unclear is which alternative
        (within a group) was chosen.

        The VAR= specifes a variable that identifies which group an alternative is part of.

        If an alternative is NOT in a group, the value of the GRP_VAR variable should be 0.
        Otherwise, all alternatives with the same value of the GRP_VAR variable will be placed in the
        same group.

        For example, if there are 4 alternative, and if alternatives 3 and 4 are in a "group", then
        you can distinguish between:
              Alt 1 was chosen  (GRP_VAR=0)
              Alt 2 was chosen  (GRP_VAR=0)
              Alt 3 or 4 was chosen, but not which of 3 or 4 (GROUP=1)

        In other words, a value of 0 does NOT mean "in group 0".

       Note that each choice occasion can consist of a mixture of single alternatives and groups.
       Thus, there can be zero, one, or more than one group in a choice occasion.

    Limited information: TYPE=LIM

       The limited information model uses two source of information on the aggregates: the variance of
       attributes, and the size of the aggregate.

       Syntax:  AGG TYPE=LIM VCI=vci_file class=classvar APPROX=xx ;
               Z Zvars ;

         VCI and CLASS information is used to add a "heterogeneity" correction
         to the MNL likelihood.
         APPROX is used to modify how the heterogeneity correction is computed.
         Z  is used to add a "size" correction to the MNL likelihood.

         Specification of Z variables, or of the VCI file is optional:
           * If you do not specify Z -- no size correction is included
           * If you do not specify a VCI -- no heterogeneity correction is included.
         However, you must specify one or the other!

        The vci_file specifies the GRBL2 variance-covariance-information.
         The classvar specifies a class-identification variable, that must be
         present in both your main gauss dataset and in the VCI file.

           Example: AGG TYPE=LIM VCI=fipsvc  class=FIPS ;

         VCI files are described in detail at the end of this section.

         Briefly, for each observation DISCRETE will match the class variable to
         rows in the VCI file with the same value of its class variable.
         These rows are then used to construct a variance-covariance matrix for the
         chosen X variables.
           *  If there is no such class in a VCI file, a covariance of 0 is used.
           *  If the class id of an observation is 0, a covariance of 0 is used.
           *  If a class exists, but a variable is not specified, a covariance (for
              this variable) of 0 is used.

         Note that the values in your main gauss dataset are interpreted as "means" of
         variables in a aggregation. That is, each observation summarizes an aggregation --
         unlike the full information model, one has NO direct measures of individual alternatives.

         Actually, alternatives that are not in aggregates (classid=0) are treated as "aggregates of 1,
         with 0 variance".

         The APPROX can take two kinds of values:
             APPROX=NO    -- do not use the "linear approximation" for the heterogeneity correction
             APPROX=YES   -- use the "linear approximation to the heterogeneity" correction,
                             using the default parameters
             APPROX="v1 v2 v3" (where v1,v2, and v3 are values) --  use the "linear approximation
                             to the heterogeneity" correction, using the v1, v2, and v3 parameters
         The linear approximation attempts to correct for bias in the Ben-Akiva/Lerman "approximation to
         the true heterogeneity". See DISCRETE.PDF for the details.

         Zvars is used to specify "size of alternative".
         In the simplest case, you specify a single variable.
             Example:  Z ACRES ;
         Or, a more complicated model specifies multiple "indirect" measures of size.
             Example:  Z RVR_MILE LAKEACRE ;
         In the simple case, the single variable is used as is.
         In the multiple variables case, size is computed using exp(Z*beta_z), where beta_z
         is an estimated vector of parameters (with the constraint that beta_z[1]=1).

         See DISCRETE.PDF for the details.

    Notes for both models:

       * The number of groups can vary across observations -- some observations can have 0 groups (that is,
         all the observations consist of groups with 1 element, with a class id of 0), while some
         may have many.

       * AGG TYPE=FULL can NOT be combined with uncertain dependent variables correction (UNC YTYPE=xxx)
       * AGG TYPE=FULL CAN be combined with uncertain independent variables correction

       * AGG TYPE=LIM CAN be combined with uncertain dependent variables correction
       * AGG TYPE=LIM can NOT be combined with uncertain independent variables correction

         Note that it is permitted, but some what odd to, have combine the dependent variable model with
         "limited info" group -- it   means that one is not sure of two things: which group (where a group can
         consist of one element) was chosen, and which element of the group was chosen (if there are more
         than one member of a group).

       * The XVC option in the MAKEDATA program can be used to construct VCI files. It can also be
         be used to compute averages values of X-variables across groups.

       * Comparison to nested-MNL:
         in the nested-MNL you know the elements of a subset, you know if a subset was chosen, and you know
         which alternative within a subset was chosen
         (the knowledge of the chosen subset being an obvious result of knowing with alternative was chosen).

       * On aggregated-responses in the full-information aggregated response model:
         For the set of alternatives comprising a group, the sum of Y is used as the dependent variable.
         It does NOT matter how this sum is achieved.
         Thus, using the above example with a group containing alternatives 3 and 4,
         the following rows are analyticaly equivalent:

              Y_for_Choice 3        Y_for_choice_4
                 2                    1
                 3                    0
                 1                    2
                 0                    3

  Z   : Aggregate-site "size" variables.
   A list of variable names.

        These variables will be used to compute the "size" of an aggregate. This estimate is included in the
        limited-information aggregated-site model.

   If you specify Z 0 ; then the "size" estimate will NOT be included in the likelihood computation.
   If you specify one variable, it is used as is (actually, it is logged  and multiplied by a scale
   parameter).
   If you specify more than one variable, the first is used as is, the others are multiplied by an
   estimated beta.


 SAMPLE : Specify a sample of alternatives
       If there are many alternatives per choice occasion, it can be difficult estimating a model
       (in terms of time).  Using a subset, a "sample", of alternatives is one strategy for reducing
       the size of the problem.

       The SAMPLE keyphrase is used to specify how to select a sample of alternatives.

   Syntax:
            SAMPLE TYPE=atype N=nalts  VAR=pvar NORM=donorm THRESH=sthresh SAVE=filename;

   Details:

     TYPE : the type of sampling.
        atype = RANDOM
           Random sampling. For each observation (or each choice occassion within an
           observation), a subset of alternatives are sampled. The "non-sampled"
           rows are discarded (they do NOT contribute to the likelihood in any way).
       atype= IMP   (IMPORTANCE).
           Importance sampling. Rows are chosen with a probability given by the VAR=pvar
           variable -- rows with higher values are more likely to be sampled.
       atype = NO
           No sampling -- use all rows.

      N : Number of rows (alternatives) to use
            This can either be 1 1, an integer >1, or a value < 1.0
            If an integer > 1, it is the number of rows used per observation (or per choice-occassion).
            If < 1.0, it is the fraction of rows used.
            If 1.0, then ALL rows are used (see below for why this might be useful)

        Notes:
          * rows with a response (dependent variable) that is > 0 are ALWAYS used.
            Example:
                  # rows = 100
                  N=10
                  Row 22 chosen (it is the only row with a non-zero response value)
                  Then, row 22 is extracted, along with 9 others

               * N=1 meant to be used with:
                    *  TYPE=IMPORTANCE:
                    *  a dataset that aleady has been sampled -- a dataset where unsampled rows have been removed,
                    *  VAR=pvar is specified
                 See below for a scenario where N=1 is useful --- a two step estimation that is handy with huge datasets.

         VAR : The probability of being sampled variable.
                Not used if TYPE=RANDOM is selected.
                This can either be an actual probablity -- implying that the sum of pvar across all
                of the "original" rows of an obseravtion will sum to 1.0
                Or, you can use NORM to normalize arbitrary values.
                Notes:
                 * a value of 0 ALWAYS means "do NOT sample this row" (even if you normalize).
                 * when N=1 (and TYPE=IMP), this value is used as is -- it is directly added to
                   likelihood function (it is NOT used to choose which rows to use).


         NORM: Normalize the pvar (normalize across all rows within an observation).
               donorm=NO  : do not normalize
               donorm=YES : normalize.
                    Sampling probability for row r: Sprob_r = pvar_r / (sum_j=1..#rows pvar_j)
               donorm=INV : normalize, than take inverse
                    Sampling probability for row r: Sprob_r = 1 - [pvar_r / (sum_j=1..#rows pvar_j)]

              HOWEVER: if pvar_r= 0, Sprob_r ALWAYS equals 0.

         THRESH : Y-uncertainty probability threshold.
                Sthresh should be ge 0.0 (and lt 1.0)
                This is ONLY used if you are estimating an uncertain-dependent variables model.
                If the probabilty associated with an alternative (a row) is > sthresh, the
                row is ALWAYS sampled.

         SAVE : filename to save sampled data to
                 filename can be relative (to current working directory), or fully-qualified.

    Examples:
        SAMPLE TYPE=RANDOM N=0.5 ;
        SAMPLE TYPE=IMP   N=20 VAR=DISTANCE NORM=INV  ;
        SAMPLE TYPE=IMP N=15 VAR=WACRES NORM=YES STHRESH=0.5 ;

     Reminder: Rows with higher probabilities are more likely to be "sampled".
               Rows that are "chosen" are ALWAYS sampled.
               A "chosen" row either has a response variable >0.0,
               Or if you are estimating an uncertain-dependent variables model, a
               weight > sthresh.


     For model details; see Ben Akiva and Lerman, chapter 9.3

     Note: On using sampling to save a subset of the data

        If you have a huge dataset (say, a 5000 obsevatiosn each with 200 alternatives= 1 million rows), it is quite likely
        that DISCRETE will run out of memory, and fail.
        This is especially likely to happen if you are using the x-uncertainty corrections (which requires the generation of
        a lot of random numbers).
        To deal with this case, you can sample the alternatives, save the results, and then re-run the model using this saved file.

        This requires two steps;

           1)   i) Specify DISCRETE with the usual commands, and with a SAMPLE keyphrase that has  SAVE=filename option.
               ii) Use RUN -NOEST.
              example:  Model Discrete ;
                        id .. ; x ... ; response .. ;
                        SAMPLE var=selwt n=25 thresh=0.3 type=imp norm=YES save=samp1 ;
                        RUN -NOEST '
             A SAMP1 dataset will be created containing all the X, response, etc. variables.

          2) Specify DISCRETE with the usual commands, using the SAMP1 as thei input dataset, and with a SAMPLE keyphrase that has
             an N=1 and a var=_SPROB option.
              example:  MODEL Discrete ;
                        id .. ; x ... ; response .. ;
                        sample var=_SPROB n=1 type=imp  ;
                        RUN ;

         Note that SAVEd datasets have the following structure:
                 Id_variable -- the id variable
                 _XREP   -- xreplication -- always equal to 1 if you do not have x-replication data in your original input file
                 _OCC   -- occassion -- always equal to 1 if only one occassion per observation
                 _ALT   -- alternative number (from the original, unsampled, data)
                 class_var -- class variable (not included if VCI information not required)
                 _SPROB  -- sampling weight (for use in a VAR= statement)
                 weight  -- yuncertainty weight (only included if one is specified)
                 response -- the response variable
                 x      -- the x varaiables
                 aux    -- the auxillary variables (if any are specified)
                 y1     -- first stage y variables (if any are specified)
                 x1     -- first stage x variables (if any are specified)

      Hint: when you use RUN -NOEST, you cau use AUX to specify extra variables to include in the saved dataset
      (you do NOT need to specify a BAUX keyphrase).

  UNC :   Specify an "uncertain data" model.

     In addition to variance in the betas, the MIXED model can control for variance in the independent
     and dependent variables.

  * Uncertainty in X

      MIXED can control for "uncertainty in the independent variables" using three approaches:

      1) Static replication of X (independent) variables -- the dataset contains "replicated rows" --
         for each alternative (for an indvidual) each altenative appears numerous times (numerous
         replications): with the indepedent variables for each replication drawn being a distinct
         measure of the underlying true X.
      2) Dynamic replication of X (independent) variables -- similar to static, but the replications
         are generated on the fly, and are used in a multi-round estimation process.
      3) Replication of X*beta (the mean).  This requires information on the variance matrix of the X
         values for each and every alternative in the dataset. Note that alternatives can share the same
         variance matrix (say, if alternative A is the same for all observations).

 * Uncertainty in Y

      MIXED can control for "uncertainty in the dependent variables" using two approaches:

      1 ) ADDitive. The probability of a choice occasion is:
             Sum_j(W_j * exp(X_j*b) }  /  Sum_j{exp(X_j * b)}

      2) MULTiplicative.   The probability of a choice occasion is:
                      Prod(exp(X_j*b)^w_j }  /  Sum_j{exp(X_j * b)}

    where w_j is a choice specific weight variable.

 UNC is used with several option, the main ones being XTYPE and YTYPE .
         XTYPE can take values of X, XB, XREP, SAVE, or NO.
         YTYPE can take values of ADD, MULT, or NO

Other options depend on the value of XTYPE:

      XTYPE=X or XTYPE=X_STATIC
            VAR=X-replication identifier variable.

      XTYPE=XREP or XTYPE=X_DYNAMIC
            VCI=vci_file
            REP=#_replications
          CLASS=class_id_variable
          VCI_CLASS=class_id_variable
           FRAC=Fraction_to_dynamically_replicate
          NROOT=Stringency_modifier
         ROUNDS=#_rounds

      XTYPE=XB
           VCI=vci_file
           REP=#_replications
         CLASS=class_id_variable
          VCI_CLASS=class_id_variable

     XTYPE=SAVE
          VCI=vci_file
          REP=#_replications
          CLASS=class_id_variable
          VCI_CLASS=class_id_variable
           OUT=ouptut_file

      XTYPE=NO

     or ...
          UNC NO ; means "no uncertainty" in the X or Y data.


   The YTYPE option -- estimate an uncertain-dependent variables model

   If you include YTYPE, then the weights specified in a WEIGHTS keyphrase will
        are used to estimate an "uncertain dependent variables" correction

   Thus: If you specify YTYPE, you MUST also specify a WEIGHT variable.

           As described in the WEIGHT command:
             The uncertain dependent variables correction is meant for cases where you are not sure which
             alternative was actually chosen -- but where you have some kind of "reporting accuracy" measure.
             An alternative with a higher reporting accuracy value is more likely to be the alternative
             that was actually chosen.

             The WEIGHT variable is used as the reporting accuracy. A 0 value means "certainly did not
             chose this alternative". Non-zero values are normalized, with higher values meaning a higher
             probability that this is the chosen alternative. If there is only one row with a non-zero value,
             then that row was chosen with certainty.

     Note that these weights will be normallized (within a choice occasion).

   YTYPE can take the values:
      YTYPE=ADD -- estimate the additive (quasi-bayesian) model
      YTYPE=MULT -- estimate the multiplicative (implicit replications) model
      YTYPE=NO   -- no Y uncertainty

   The XTYPE options -- what kind of "uncertain independent variable" correction.

     XTYPE=X (or XTYPE=X_STATIC):

        The dataset has "x-replicated" data embedded in it.
        The variable identifying which "replication" a row belongs to is specified using the VAR= option.
        or, if you have balanced design, you can specify a  numeric value.

       Examples:
          UNC XTYPE=x Var=XREP ; @ variable that identifies which x-replication a row belongs to@
          UNC XTYPE=X var=6  ;

        On balanced designs:
         if ID 120 is used (120 rows per obseravtion), and XREP 20 is used, then each observation
              has 20 x-replications, with each replication containing 6 alternatives (one row per alternative):
                replication 1: rows 1 to 6
                            2:  7 to 12
                    ...
                           20: 115 to 120

        Notes:
          * For balanced designs (when ID is an INTEGER), nn in VAR=nn  MUST ALSO BE AN INTEGER!
            Furthermore, nn must evenly divide into ID
          * For unbalanced designs, the X-replication VARiable must be unique to an observation.
            It does NOT have to be unique across observations.
          * For unbalanced designs, rows in the same x-replication do NOT have to be adjacent
          * The "random beta" is the same across all x-replications  for an individual.
          * See below for a discussion of the differences between OCC and UNC.

     XTYPE=XREP (or XTYPE=X_DYNAMIC):

       Control for uncertainty in X by replicating X data on the fly, using a multi-round optimization
       process.

       Basically, given Bt (an estimate of Beta),  new values of X (x-replications)
       are drawn for a subset of the  "worst" observations -- those observations with the most
       negative log-likelihoods.
       For each observation, the X-replication that yields the best loglikelihood (for the observation,
       using Bt) is treated as the "best guess of X". These are used, along with the
       unmodified X values (for the non-worst observations) to re-estimate Beta.
       The process is repeated, using this re-estimated beta as Bt, until no improvement occurs.


       Syntax:
           XTYPE=xrep  REP=j VCI=filename CLASS=alt_id FRAC=v0 NROOT=n ;

        where:
           VCI=filename
              REQUIRED.  Filename is the name of the VCI gauss data file that contains variance
              of X information. The structure of this datafile is at the end of this section.

           REP=j      (default value=100)
              OPTIONAL. j must be an integer. It is the number of x-replications to create for each
              alternative.

           ROUNDS=r   (default value=50)
              OPTIONAL.  r must be an integer. It is the maximum number of rounds; where each round consists of
              finding the "best"  X-replication (values of X), and maximum likelihood estimation use
              these updated X values.


           CLASS=class_id  (required)

              A variable used to identify the "class" for this alternative -- in the main (observations) dataset.
              If VCI_CLASS is not specified,
              If CLASS= is not specified, then _CLASS_ is used.

          VCI_CLASS=class_id_variable
              A variable used to identify the "class" for this alternative -- in the VCI (covariances) dataset.
              If VCI_CLASS is not specified, the variable specified in CLASS is used (or the _CLASS_ default is used).

           FRAC=0.xx  -- value between 0 and 1.0 (default value of 0.15)
              The fraction of observations to x-replicate. Larger values (up to 1.0) mean more
              of the observations are x-replicated during each round.

              Example; FRAC=0.15 means the worst 15% observations will be x-replicated during a round.

           NROOT=integer   default value of 5)
              The use of dynamic x-replication carries a risk of overfitting, since the X values
              can concievably be twisted to make any coefficient vector yield precise results.
              To discourage such pathologies, DISCRETE uses a probability weighting -- X-replications
              that are "far" from the original X values are less likely to be used as the "best
              guess of X". Thus, the choice of what X-replication to use in estimation is a function
              of the likelihood associated with it, and its distance from the original X

              NROOT controls this "stringency". A value of 0 means "stringent" -- the probability
              weighting is used as is.  A value of infinity means "non-stringent" -- the X-replication
              with the best likelihood is used, regardless of its location in X space.


    For a complete description of the Dynamic X-replication algoritim, see the end of this section.

     XTYPE=XB:

       Control for uncertainty in X by modeling uncertainty in X*beta using on-the-fly replications.

       This is collapses the dimensions of the uncertainty -- instead of uncertainty across
       M*K (M alternatives each described by k independent variables) variables, the uncertainty is across
       M variables.

       Each of these M variables is the alternatives underlying "mean" -- the X * Beta.

        Syntax:
           XTYPE=xb  REP=j VCI=filename CLASS=alt_id ;

         where:

           VCI=filename  (required)
               Filename is the name of the VCI gauss data file that contains variance
               of X information. The structure of this datafile is at the end of this section.

           REP=j
              j must be an integer. It is the number of x-replications to create for each
              alternative. If REP=j is not specified, j=100 is used

           CLASS=class_id
              a variable used to identify the "class" for this alternative.
              This variable MUST exist in BOTH the main gauss data file, and in the VCI file.
              if CLASS= is not specified, then _CLASS_ is used.


     XTYPE=SAVE:

       Using information on the variance of each alternative's X values,
       create a new dataset with "x-replicated" data embedded.
       You can then use this new dataset with the TYPE=X option.

        Syntax:
           XTYPE=save  REP=j VCI=filename CLASS=alt_id OUT=filename ;

         where:
           REP, VCI, and CLASS are the defined above (in XTYPE=XB)

           OUT=filename
              REQUIRED. Filename should the the name you want to give the new dataset.
              if it is not fully qualified, it will be relative to the working directory
              (see WORK_DIR in GRBL2_BATCH.TXT for instructions on how to select a working directory).

              A gauss dataset containing a "x-replicated" version of the current data will
              be generated. This file can be used with UNC TYPE=X ...

              CAUTION: if a file with this  name exists, it will be overwritten.

              The following are included in a SAVEd (x-replicated) datafile:
                The ID variable. or, if a balanced design was specified, an _ID_ variable.
                An _XREP_ variable, that contains the var_name name you would use in an TYPE=X VAR=var_name.
                The choice variable
                X variables
                if specified..
                   The OCC variable. or, if a balanced design was specified, an _OCC_ variable.
                   Choice-group variable
                   Weight variables
                   Auxillary variables

      XTYPE=NO
         No uncertainty in the X variables.


     Examples:

        UNC NO ;
        UNC YTYPE=ADD XTYPE=NO ; WEIGHT CPROB ;
        UNC YTYPE=NO XTYPE=x Var=XREP3  ;
        UNC YTYPE=NO  XTYPE=XB  REP=150 VCI=xvc3 CLASS=ALTCLASS  ;
        UNC YTYPE=MULT XTYPE=XB REP=200 VCI=xvc3 CLASS=ALTCLASS ; WEIGHT SITEPROB ;


  OCC: Specify a "choice occasion" identifier.

       Rows within the same "choice occasion"  form the set of choices -- the alternative specified by
       a row is compared against other alternatives with the same choice occasion id.

       Syntax:
         OCC  OCCID ;
         OCC 3  ;
         OCC  NO   ;

       OCC can either be a variable name, or an integer -- an integer is used to identify the 
       number of choice occasions (NOT the number of rows belonging to a choice occasion).

      
      Thus, if ID 12 is used (12 rows per observation), and OCC 3 is used, then each "choice occasion"
      consists of 4 rows (1: rows 1 to 4, 2: rows 5 to 8, 3: rows 9 to 12).

      Notes:
         * For balanced designs (when ID is an INTEGER), OCC MUST ALSO BE AN INTEGER!
           Furthermore, OCC must evenly divide into ID [trunc(ID/OCC) must equal ID/OCC]

           If UNC TYPE=X var=nn is also specified, then OCC must also evenly divide nn.

         * For unbalanced designs, rows with the same OCC do not have to be adjacent
         * the "random beta" is the same across all choice occasions for an individual.
         * Example: if ID 12 and OCC 3 are used
              P1=e1/(e1+e2+e3+e4) ... P4=e4/(e1+e2+e3+e4), P5=e5/(e5+e6+e7+e8)... P12=e12/(e9+e10+e11+e12)
           where:
             Pj is the probability of the ith individual choosing the jth alternative
            ej=exp(Xj*Bi)   
            Bi is the "B coefficient vector for the ith individual", and
            Xj is the vector of X variables for the jth alternative for the ith individual.

         * OCC complements non 0/1 Y values -- they can be used together.

       Non 0/1 Y values means "this alternative was chosen this many times"
            However, the attributes (the X values) are the same in each choice.

       In contrast OCC means "this alternative was available for this choice occasion".
            Thus, the attributes of this choice (and of the other choices within the choice
            occasion) can vary across occasions.

            Reiterating: beta does NOT vary across choice occasions for a single observation.

            Thus, OCC is more powerful, but a non 0/1 Y is easier to specify.
   
  In addition to these keywords, you should specify which variables are varying; using the following 4 keywords:
  NORMAL, TRIANGLE, UNIFORM, and LOGNORMAL.

  NORMAL : a list of "normal" varying  coefficients

  TRIANGLE : a list of "normal" varying  coefficients

  UNIFORM  : a list of "normal" varying  coefficients

   Examples: 
        NORMAL X3  x2  ;
        TRIANGLE X4   ;
        UNIFORM X6 X9 ;

    for observation i, the kth "normal" coefficient:  B2_ik = b2_k + s2_k * eps_ik
     where b2_k and s2_k are coefficients to be estimated, and eps_ik is an unobservable random variable.
    If  NORMAL is used: eps_ik has a standard normal distribution
    If  UNIFORM is used: eps_ik has a -1 to 1 uniform  distribution
    If  TRIANGLE is used: eps_ik has a -1 to 1 triangular distribution


  LOGNORMAL: a list of "log-normal" varying coefficients
   Example: 
        LOGNORMAL X1 X10 ;
         for the "diagonal" model;
         for observation i, the kth LOGNORMAL coefficient: B3_ik = exp(b3_k + s3_k * eps_ik)
         where b3_k and s3_k are coefficients to be estimated, and eps_ik is an unobservable 
         standard normalrandom variable.
   Note that for b3_k, 
            median = exp(b3_k)
              mean = exp(b3_k + ((s3_k^2)/2) )
              sd =  [exp(b3_k + ((s3_k^2)/2))] * sqrt[exp(s3_k^2)-1]

 *ALERT*  
     If NORMAL, UNIFORM, TRIANGLE and LOGNORMAL are specified, they MUST each be a subset of the variables
     specified in the X keyword. 

     All variables specified in the X keywords that are NOT specified in either NORMAL, UNIFORM, TRIANGLE or 
     LOGNORMAL are assumed to have "non-varying"  coefficients. 

     Variables specified in NORMAL, UNIFORM, TRIANGLE, and LOGNORMAL must not overlap (you can't specify the
     same variable in more than one of these 4 keywords).

  Several other keywords are supported:
   

   REPS : # of replications of the beta vector. 
     Each replication is associated with a different "draw" of the beta vector.
     The more replications, the closer the numerical estimator will be to the actual integral 
     (the integral across the range of support of the random coefficients). 
     By default, 50 replications will be used.

     Examples:    REPS 200     (within an iteration, use 200 replications for each observation)
                  REPS 0    -- suppress varying parameters correction.
   
     REPS 0 is useful if you have multiple occasions per individual, or you wish to estimate an X-replication
     model without varying parameters.

     Notes:
        * the actual number of replications is also affected by the use of X-replication 
          (see the above  descriptions of UNC).
        * Along with REPS, the type of RND_SEQ used can affect the accuracy of the
          numerical simulation (of the underlying probability distribtuion of betas)

   SEED : Random number seed
     This seeds the random number generator used in generating "replicated" data. Thus, a different SEED ought to yield
     somewhat different coefficient estimates. By default, a seed value of 0 is used.
      Example:  SEED 1521  ;


   RND_SEQ  : the type of random values to generate (note: SEQ can be used as a shorthand for RND_SEQ).
       By default, MIXED uses a "scrambled halton sequence" to generate the random values used when simulating integration.

       Although its not something that most people will need to tinker with, if desired
       you can specify other sequences.

       Basic syntax:
           RND_SEQ type options ;
       where type can be:
           HALTON : generate a halton sequence
          MLHS   : generate a modified latin hypercube sequence
           ANTI   : generate using random draws and their antithetics
           NORMAL : generate independently -- using GAUSS's rndn procedure. 
      
       and options that depend on the type.

       Options should be in a space delimited list.

       For HALTON, the options include
        SCRAMBLE       : use "scrambling". This breaks correlation observed for low
                          number of replications when there are many (over 10) randomly 
                          varying parameters. 
        PRIME_START=n  : n is an integer < 100. This sets the halton sequences to be 
                          generated starting from the "nth" prime (i.e.; n=4 means use 7 
                          to generate rvs for the first variable).
         DISCARD=n      : discard n numbers from the beginning of the halton sequence 
                          (of each variable)
        ADD=n          : add, and then remove, n draws to each observation.
                          Use ADD=0 and DISCRETE will compute an appropriate value of n.
        RANDOM=aseed   : generate a "randomized" halton sequence, using aseed as a random 
                          number seed.        


       For MLHS, the options include
           RANDOM=aseed : set the random number seed value used to generate the MLHS sequence


       For ANTI, the options include
          ANTI=atype    : where atype is the "type" of antithetic rvs generated.

        The atype options work by taking a generated random vector and ...
          1  =  generate another one by taking its negative
          2  =  generate 2^NV by taking all combinations of the positive/negative
                values of each component (NV = # of varying parameters)
          3  =  Same as 2, but also swap the values of the last two components
          4  =  Same as 2, but also scramble the values of all the components
     

    Notes:
      * By default (if no options are specified, or if RND_SEQ is not specified at all), a
            RND_SEQ HALTON SCRAMBLE ;
        is used.

      * If TRIANGLE or UNIFORM are used, the HALTON sequence MUST be used.

      * RND_SEQ HALTON ;  is synonymous with SEQ HALTON SCRAMBLE ;
      * RND_SEQ ANTI   ;  is synonymous with SEQ ANTI=3 ;
     
      *  HALTON sequences are thought to lead to random values that "cover" the 
         distribution much more efficiently. Train suggests that by using random 
         values generated using a Halton sequence, one can use about 1/10 the 
         number of replications (as compared to using a simple random
         number generator).

      *  Antitheic draws works by taking a generated random vector, and modifying its
         components to create additional random vectors. Each of these modifications
         basically generates a new random vector in a different quadrant.
         The notion is that these   modifications help ensure better coverage than 
         would be obtained by  generating the same number of additional random vectors.

            * Advantages and disadvantages:
       RND_SEQ NORMAL : quick generation of random values, minimal storage ... but not efficient draws
       RND_SEQ ANTI   : fairly quick generation of random values, minimal storage -- somewhat efficient
       RND_SEQ HALTON : slower generation of random values, significant storage requirements 
            (# observations * # replications * # varying parameters) -- but efficient
                 RND_SEQ MLHS   : Similar to Halton, but may have better efficiences.
             Somewhat of an experimental method, and the GRBL2 implementation of it may 
             not be exactly as specified (by the methods authors).
     
            *  SCRAMBLE and ADD are used to fix some pathologies of halton sequences that occur when
           there are a large number of varying parameters and a small number of draws.
           Thus, they are less necessary with many (say, 1000) draws, and when the
           number of varying parameters is small (say, less than 4).

  BSTART: Specify starting values

   If BSTART is not specified (or BSTART NO is specified), DISCRETE will use MNL to 
        find starting values.
   If you wish to provide your own starting values, you can use BSTART.
   The basic syntax is (as with other DISCRETE models):
      BSTART YES v1 v2 .... ;
   
   The order of appearance of variables should be:
       i) First, the betas as specified in your X command.
          These will be the non-varying coefficients, and the "means" of the 
           varying coefficients.
   Depending on the model, these should be followed by:
       iia) For the diagonal model, the starting standard deviations, in the order listed in 
            the NORMAL, TRIANGLE, UNIFORM, and LOGNORMAL options (NORMAL, TRIANGLE,
            and UNIFORM variables first,followed by LOGNORMAL variables).
       iib) For the full covariance matrix model,the vectorization of the upper triangle of  
            the cholesky decomposition of the covariance matrix, 
          * the first rows are the NORMAL variables, followed by the LOGNORMAL variables
          * the vectorization takes a row at a time, starting from the diagonal.

   Example using 5 variables, 3 of them varying:

      a) If starting "mean" values are : 1 2 3 4 5
      b) If diag model, with starting "sd" values  1.1 2.2 3.3
         Then use:
             BSTART YES 1 2 3 4 5 1.1 2.2 3.3 ;
      b2) If full model, with the staring "cholesky decomposition":
            11 12  13
            0  22  23
            0   0  33
         Then use:
             BSTART YES 1 2 3 4 5 11 12 13 22 23 33 ;

    Reminder: if you specify both NORMAL, TRIANGLE, UNIFORM and LOGNORMAL variables, the
             first items should be the NORMAL variables, followed by TRIANGLE,
             followed by UNIFORM, followed by LOGNORMAL variables (in order entered
             in the respective keyphrases).


  WTP:   Estimate willingness to pay values
  
       If WTP is specified, an expected "per choice occassion" willingness to pay is computed.
       This is:
           -ln(Sum(exp(x*beta))/ beta_price
       hence, it is the expected utility from having a choice occassion "available" to the respondent
       (it is the expected value of the maximum utlity from all the alternatives available).
       
       Note that this is computed across choice occassions (so an observation with more than one
       choice occassion is basically treated as multiple individuals).

       If static X-replication, XB-replication, or limited-info aggregation (Ben-Akiva/Lerman aggregation) is selected,
       an appropriate "average" WTP is compute; either the sum across the replications (for X-rep and XB),
       or incorporating heteroscedasticity or size correction factors (limited-info aggregation).

       For now, WTP estiamtion is NOT supported for two-stage models.
       
       Also, for now the WTP for the dynamic X-replication model uses the "average" X values (which is probably
       not appropriate).

       Four values are reported:
          * the average and sd of the WTP (across choice occassions)
          * the average and sd of the coefficient of variation of WTP (across choice occassions)

       The latter two statistics use:
           CV_i =  E[WTP_i]/ SD[WTP_i]
        where CV_I is the "coefficient of variation of occassion i"
        Note that if there is NO replication, the coefficient of variation is undefined (missing values will be reported).

        Syntax:
           WTP PRICE=avar ;
        where avar is the "price" variable (it is used to select the beta_price).

        Or,
            WTP NO ;
         to NOT estimate WTP.



   :::::::::::::

Comparing OCC and UNC

     * OCC is used to implement "panels" ... more than one choice occasion per individual
       for each individual, the "Pj" probabilities are multiplied together to compute the likelihood
       for this observation.

    *  UNC is used  to account for uncertainty in X.
       TYPE=X and TYPE=XB use simulated integration over uncertain measures of X  for each individual,
       the probability for a number of "x" (or "XB") replications are averaged together
       to compute the likelihood
       In TYPE=XREP, instead of an averaging, a maxiumum (over a number of replications) is used.

     * OCC and UNC TYPE=X can be used together. In this case, for each observation: 
       i) For replication r (r=1..#replications), the rows of data are extracted,
      ii) rows belonging to each choice-occasions are identified (within these extracted rows),
     iii) a probability for each of these "within x-replication" choice-occasions is computed,
      iv) these are multiplied  together to form this x-replications contribution to the likelihood
       v) the average of the "x-replication contributions" is used as this "observation's contribution"

     * OCC and UNC TYPE=XREP can be used together. As above, but the "maximum", rather then an average,
       is used.

     * To combine balanced OCC and UNC (TYPE=X)
      i) All the ID rows for an individual must be adjacent
     ii) The ID / UNC rows for an x-replication must be together.
        Thus, if ID=100 and TYPE=X VAR=5, there will be 20 rows per x-replication
       * rows 1 to 20  belong to x-replication 1,
       * rows 21 to 40  belong to x-replication 2,
       * rows 81 to 100  belong to x-replication 5,
     iii) Within these x-replication clumps, the (ID/XREP)/OCC rows forming a choice-occasion must be
          clumped.   Thus, in the above example if OCC=2,
       * rows 1 to 10  belong to choice-occasion 1 of x-replication 1,
       * rows 11 to 20  belong to choice-occasion 2 of x-replication 1,
       * rows 21 to 30  belong to choice-occasion 1 of x-replication 2,
       * rows 91 to 100  belong to choice-occasion 2 of x-replication 5,

    * Summarizing:
        The values in rows corresponding to the same alternative (for an individual) will
        be different across different X-replications.
     In contrast,
        The values across the same alternative in different choice-occasions are typically (but not necessarily)
        the same -- each choice occasion is often a choice over the same set of alternatives.
        This is also true for choice-occasions within the same x-replication.

      Furthermore, the Y values should also NOT VARY across these x-replications -- since the observation
      is choosing the same alternative. In other words, the X-replications do not capture additional
      information about individual behavior, they capture additional information about the measurement
      of the attributes of the choices.

      Note on x-replication and beta-replications:
        In a sense, the use of X-replication extends the mixed logit's uncertainty to include both the betas
        and the independent variables; an uncertainty that is captured by replicating the observation.
        In the standard mixed-logit, this replication is done across "NREPS" draws of the beta vector.
        X-replication expands the number of replications -- the number of replications will be
        NREPS * #_of_x-replications.
            For TYPE=x  DISCRETE takes the averages of these replications (across each
            individual) and adds that to the log-likelihood function.

            For TYPE=XREP, DISCRETE finds the X that yields the best log-likelihood (at a starting value of
            beta), and uses it in standard MLE (no x-uncertainty).

            It is our experience that TYPE=XREP is far more powerful -- but carries a risk of over fitting.


GOF Statistics:

     MIXED produces several goodness of fit statistics (they are based on statistics noted in W. Greene, 4th edition, pg 831):

     a) Wald Test (H0 Prob):
            This is a Wald test of the probability that all coefficients equal 0:
                WALD= B' inv(VC) * B
            where B are the estimated coefficients, VC is the estimated coefficient covariance matrix
    b) McFadden Index:
          The McFadden likelihood ratio index:
             LRI=1 - [ ln(L)/Ln(L0)]
           where L is the likelihood, and L0 is the likelihood computed with the coefficients set to 0
           Values range between 0 and 1, with 0 being "no explanatory power for coefficients" and 1
           meaning "perfect fit".  However, the values between 0 and 1 have no natural interpretation

    c) Ben-Akiva/Lerman R-Squate:
       This is a prediction rule:
            BLR= 1/N SUM { [(y*F ] + [1-y)*(1-F)]}
       Y equals 1 if the alternative was chosen, otherwise it equals 0.
       F is the MNL probability an alternative was chosen.
       The sum is across ALL alternatives for ALL observations.
       For mixed and uncertain-X models, an average value is reported
       (the average y*F, using observation specific randomly drawn beta values, is computed for
       each alternative).
       This measure has been criticized as not doing well in unbalanced samples (where some alternatives
       are rarely chosen).


    d) Cramer R-square:
       This is a prediction rule:
           C = (average F| y=1) - (average  F | y=0)
           
       This accounts for imbalance by averaging within YES and NO responses -- it "heavily penalizes
       the incorrect predictions".


       Note that (c) and (d) are extensions to the MNL: they were proposed for binary choice models.

wld=beta[freeindx]'*inv(vc_ml)*beta[freeindx];


VCI Files:

  Variance-covariance information (VCI) files are specially structured gauss dataset that contains
  variance/covariance information on the x variables of the alternatives.

  These files are described in the MAKEDATA program -- which you can use to generate a VCI file.

  Basically, the variance/covariance matrix for an alternative is specified by:
       i) Get the class-id for this alternative
      ii) Find the rows in the VCI file that have the same value for their class-id
     iii) Constructing a "class" specific variance-covariance matrix using the
          variable specific information stored in these rows.

    If this information is missing (if a no such class exists in the VCI file, or if information on
    a variable is not stored), use a value of 0 (zero covariance).

    Example: (first row are the variable names)
        _CLASS_  _NAME_    X1  X12   X5   X4 ;
           1      X1       20    1     5   -3
           1      X12       1   51    -2    6
           1      X5        5   -2    22   12
           1      X4       -3    6    12    8

           2      X1        23    2    0   -13
           2      X12        2   31    0    12
           2      X4       -13   12    0     2

           3      X1        13   72    1    5
                ...........

    Notes:

      *  You can use MAKEDATA, XVC option, to create vCI files. XVC will create properly
         structured VCI files.

      *  Symmetry should be maintained: the value of row XA and column XB should match that
         of row XB column XA. DISCRETE will enforce symmetry, but not necessarily in a way you
         intend.

      *  Note that the X5 row is missing for CLASSID=2 -- values of 0 are used (as reflected in the
         X5 column for CLASSID=2).

      * if a model uses the  X variables: X1 Z1  X12 X4,
        then the variance/covariance matrix used for CLASSID=1 will be
                    20  0  1  -3
                  0   0  0   0
                  1   0 51   6
                 -3  0   6   8
         Note that since Z1 is not specified in either columns are rows, 0s are used.

      * The variance/covariance matrix formed from rows with CLASSID=1 is used for ALL rows with a
        CLASSID=1

            Thus, if a set of alternatives have different "observed" values of X, but the
            same variance/covariance (say, the same noisiness features), one should
            use the same CLASSID value for each alternative -- even if these alternatives
            are associated with different observations.


The Dynamic-X model.

 The dynamic-X model, for correcting for uncertain X, uses a multi-round approach.

 At the beginning of each round, a mixed-logit model is used to estimated Beta_d (beta, round d).
 A no-uncertainy X model is estimated, with almost all of the other DISCRETE model options
 available. For example, you can specify choice occasions and  Y uncertainty.
 Two-stage models can also be estimated, though the first stage variables are assumed
 to be measured with no uncertainty.

 After estimating Beta_d, DISCRETE will use Beta_d to compute the likelihood for each
 obsevation. Then, DISCRETE will identify which observations seem funny (very low likelihoods),
 hence may be suffering from an uncertain measure of X.

 For each of these observations, DISCRETE will generate a number of X-replications
 around their "measured" X values. For each of these generated X-replications, DISCRETE will compute a likelihood.
 The X-replication that generates the "best" likelihood is then used as a "better measure"
 of the value of X (for this observation).

 Once these "better measures of X" have been determined for each of the "funny" observations, a new
 round commences, that uses these "better measures" as the independent variables. Note that for the
 non-funny observations, either the original X values are used (without modification), or a "better
 measure" of X (from a prior round) that yielded a non-funny likelihood.

 Note that an observation may be one of the "funny" observations for several rounds. DISCRETE will
 generate x-replications around the "better measure" (determined in a prior round). Thus, over the course
 of several rounds DISCRETE will modify its prior guess of the "better measure of X".

 Generation of x-replications uses several factors:
    1) The variance of the X data in a row --- this uses the CLASSID variable
    2) The round. In later rounds, the variance is shrunk. Thus, in later rounds the neighborhood of
       searching (for "better values of X") shrinks.

 In practice, choosing what is the "better measure of X", from a set of X-replications, is
 based on two factors: the value of the likelihood (at this X-replcation, given a beta),
 and the distance between the X-replication and the original X.

   Note that this is done on a 1-observation-at-a-time basis -- each observation is treated as
   an indpendent entity when determining its "better X" value.

 The use of distance is meant to discourage over-fitting of X. You can use NROOT variable to
 adjust how important distance is. If NROOT=0, then distance is used fully -- X-replications that
 are far from the measured X are less likely to be used. Large NROOT values (say, 100), cause
 distance to be de-emphazised; so what matters is the likelihood.

 Set NROOT=large_value will lead to better likelihoods (for the entire dataset), but greatly increased
 the chance of overfitting -- it is more likely that X values will be chosen that just happen to work with an
 arbitary (hence incorrect) beta value.

  --------------------------------


IIc9. DOUBLE -- single and double bounded models

 DOUBLE can estimate the following dichotomous choice models:

   * Single and double bounded logit  (logistically distributed)
   * Single and double bounded probit (normally distributed)
   * Single and double bounded weibit (more precisely, extreme value distributed)
   * 2-stage probit (particpation stage, followed by resposne stage)
   * Bivariate double bounded probit (different betas and sds for first and second responses)

   Furthermore, DOUBLE provide a number of different ways to specify bid and response variables.

IIc9.a. Synopsis:

 DOUBLE is used to estimate the  double bounded logit, weibit, and probit models.
 In contrast to the  LOGIT and PROBIT estimators that uses a Yes/No response as the dependent,
 DOUBLE uses a "lower" and "upper" bound as dependent variables.

 You can also use DOUBLE to estimate single-bounded models.

 Basically, in the double bounded model you know the lower and upper bounds
 bracket  a respndent's value.
 In the single bounded model, you know if the value is less than or greater 
 than a single value.

 In the double bounded model, the respondent is typically offered two-choices; 
 a starting bid and a followup bid.
     Depending on the answer to the first choice, the followup bid will be higher 
     (if the respondent answered Yes to the first choie), or 
     lower (if the respondent answered No).

 Thus, DOUBLE assumes 4 different response patterns, patterns that bracket a respondents "value",
 where "value" is measured using "bids":
        YY  - Yes to first and to follow up bid
        YN  - Yes to first, but No and to follow up bid (the followup bid will be higher)
        NY  - No to first, but Yes and to follow up bid (the followup bid will be lower)
        YN  - No to first, and No and to follow up bid

 In information terms, this implies the following knowledge
    YY :  value is greater than the (high valued) followup bid
    YN :  value is between starting and (high valued) followup bid
    NY :  value is between starting and (low value) followup bid
    NN :  value is less then (low value) followup bid

 For user ease, DOUBLE provides several ways of indicating what the bid values were, and 
 which response pattern was chosen.

 In addition, several bivariate/2-stage bivariate models are supported:

    i) a bivariate double-bounded probit model is supported. This is an extension
       to the double bounded probit model. Instead of assuming the same equation
       for both "first" and "second" responses, one can have seperate equations
       for each response, with seperate (though possibly correlated) error terms, 
       etc. Several variants of this bivariate probit are supported (see 
      -BIVAR below).

   ii) 2-stage models, for both the single and double bounded probit.
       This is a simultaneously estimated two-stage model. 
       The first stage is a simple participation stage (with NO bid information) estimated
       with a Probit. 
       The second stage is a response stage (using bid values) estimated with single or double
       bounded Probits.

 Lastly, you can compute several measures of willingness to pay.

IIc9.b. Syntax:

 DOUBLE can take several modifiers:

    -LOGIT   : Estimate  double bounded logit model (logistic). This is the default.
    -WEIBIT  : Estimate  double bounded weibit model (extreme value).
    -PROBIT  : Estimate a double bounded probit model (normal).  

    -BIVAR_x : Bivariate model of type x. x can be SD_RHO, RHO, or ALL
    -2STAGE  : Estimate a 2-stage model (participation stage/response stage)

    -LOG     : Use log of bid values

    -BHHH    : Use BHHH optimization; and use G'G  for variance matrix
    -NR      : Use Newton Raphson optimization; and use hessian for variance matrix
    -WHITE   : Use Whites's robust (G'G and hessian) covariance matrix (NR for optimization)

   -NOSTART  : suppress display of starting values --  used with 2-stage and bivariate models


 Notes:
 
   *  Obviously, one should use only one of -PROBIT, _WEIBIT, and -LOGIT.

   * As an alternative, you can use:
     MODEL DLOGIT .. ; instead of MODEL DOUBLE -LOGIT 
       or
     MODEL DPROBIT .. ; instead of MODEL DOUBLE -PROBIT 

    * on -BHHH, -NR, and -WHITE:

       -- If you do not specify, defaults will be used (NR for Dlogit, BHHH for others).
       --  If you specify -NR or -WHITE for non-DLOGIT models, a numeric hessian is 
          used. This can be slow.
    

 Examples:  MODEL DOUBLE -PROBIT ;
                MODEL DOUBLE ;
                MODEL DOUBLE -LOGIT ;
                MODEL DOUBLE -LOG ;
                MODEL DOUBLE -PROBIT -BIVAR  ;
                MODEL DOUBLE -PROBIT -BIVAR_SD_RHO  ;
                MODEL DOUBLE -PROBIT -2STAGE  ;


IIc9.c. Notes on  2-stage and bivariate models 

You can specify a 2-stage or bivariate model by using the -2STAGE and -BIVAR
options in the MODEL statement.

   *  -BIVAR and -2STAGE is only used with PROBIT models 

   *  The -BIVAR model is used only with double-bounded models .

   *  In the 2-stage models:
        The first stage is a simple yes/no participation response.
        Two beta vectors are estimated, one for each stage.
       A rho and an standard-deviation (for the 2nd stage) is estimated.
        Thus, you MUST specify a X and an X1 variables -- X1 is used in the first 
       (participation) stage, X in the second (bid-response) stage.
     
   *  For  the bivariate double bounded model, seperate epsilons (possibly correlated) are 
      assumed to apply to each response:

       *  -BIVAR_ALL (or just -BIVAR) estimates seperate betas for first
          and second bids, seperate sigmas (standard deviation of the two
          random factors), and rho (correlation between the random factors)
       *  -BIVAR_SD_RHO estimates one beta vector for both responses,
          but seperate sigmas, and a rho
       *  -BIVAR_RHO estimates the same beta and sigma for both responses, but
          does estimate a rho 

  * Due to their complexity, the bivariate and 2-stage models do not converge as readily
    as othe models. Moreover, they seem to be sensitive to starting values.
    One useful trick is to try several different starting values for the RHO (correlation
    coefficient parameter). The STARTRHO keyword is used for this purpose.

  * Sorry, a 2-stage bivariate (double bounded) Probit model is NOT currently available.



IIc9.d. Keywords used with DOUBLE are:

The most important keyword is CHOICe. It is required by all models.

     CHOICE   : Specify the choice pattern, variables used to indicate choice, and bid values.

Other keywords are:     

    AUX and BAUX  
    AUX1 and BAUX1 
    AUX2 and BAUX2 :  Fixed-coefficient variables, and their coefficient values.

    BID      : Specify BID values -- used if TYPE=M
    BOUND    : Recode upper and lower bounds
   JACKKNIFE : Use a Jackknife procedure to estiamte coefficient, and WTP, distribution
    NEVERYES :  Identify non-particpants, and set bounds accordingly
    RESPONSE : Specify response variables  -- used if TYPE=M
    RESP1    : First stage response variables -- used in 2-stage models
    STARTRHO : Starting values for the RHO coefficient -- used in bivariate and 2-stage models
    WTP      : Compute willingness to pay
     X       : Independent variables.
    X1       :  X variables used in the first stage -- used in 2-stage probit models


 The AUXn, BAUXn, and X keywords are described in the PROBIT writeup.  The following 
 describes the other keywords, starting with CHOICE.

   
 Notes:
    * DOUBLE does not support XNEW or DUMMY. Use CREATE instead! 

   *  The AUXn and BAUXn are used to specify "auxillary variables with fixed coefficients".
      These can be used to add various corrections (such aggregation corrections).

      To specify these auxillary variables, it depends on the model:
       -- 2-stage models: AUX1 and AUX2 (and BAUX1 and BAUX2) are used to specify first stage and
           second stage auxillary variables (respectively)
       -- BIVAR_ALL  models: AUX1 and AUX2 (and BAUX1 and BAUX2) are used to specify first response and
          second response auxillary variables respectively.
       -- All other models, use AUX and BAUX. This includes BIVAR models that are not BIVAR_ALL.


   
IIc9.e. Description of the CHOICE keyword

  CHOICE: Specify the type of choice variables, and the bid variables

     Syntax:  CHOICE TYPE=atype [options]
             where options depend on the atype

    atype can take one of the following values (and options).
      X (or 2) -- explicit (uses 2 variables)
      D (or 4) -- dummies (uses 4 variables)
      I (or 1) -- indicator (uses 1 variable)
                M        -- multiple bounds (uses arbirary number of RESPONSE and BID variables)
      S        -- single bounded (uses 1 variable)

         (in what follows,"dumYN", "start_bid", etc. should be variable names of your choice)

    TYPE=X  U=bid_upper L=bid_lower ;

          Use explicit lower/upper bounds. 

          Note that you must specify the following options:  U and L.

          "bid_upper" must be a variable that contains the value of the upper bid.
          "bid_lower" must be a variable that contains the value of the lower bid.

            Note that for TYPE=X, all observations are assumed to be YN (or NY) responses.
            Thus information about the "response pattern" is not required.       

            You can use the BOUNDS keyword to recode selected values of the upper and lower bid,
            to infinity and -infinity (ie.; to convert a response to a YY or NN).

       
      TYPE=I VAR=cvar 1=start_bid U=bid_upper L=Bid_lower [YY=aval YN=aval NY=aval NN=aval] ;

          Use an indicator variable to specify response pattern.

          Note that you must specify the following options: VAR, 1, U, and L.
          Optional options are:  YY, YN, NY, and NN.
 
          A single variable, with name specified by cvar, identifies which
                    response  pattern was chosen. By default, the following values are
                    used:
             1 = YY,  2=YN, 3=NY, and 4=NN

          However, you can specify a different "mapping" of values to response patterns. 
          To do this, use the YY, YN, NY, and NN modifiers.

          For example:
             YY=10 YN=20 NY=30 NN=0
          means a value of 10 means "YY response", of 30 means "NY response", etc.

          Bid_upper and bid_lower are  the upper and lower bid values,
                    Start_bid is the starting bid.

          Therefore (assuming default values are used for the patterns),  
                    the following are used as the bounds:

         cvar   lower-bound-value   upper-bound-value
         -----------------------------------------------
         1 (YY)      bid_upper      "infinity"
         2 (YN)      start_bid      bid_upper
         3 (NY)      bid_lower      start_bid
         4 (NN)      "-infinity"    bid_lower
   

     TYPE=D  YY=dumYY YN=dumYN  NY=dumNY NN=dumNN   1=start_bid U=bid_upper L=Bid_lower  ;

         Use dummy variables to specify a response pattern.

         Note that you must specify the following options: YY, YN, NY, NN, 1, U, and L.

         YY, YN, NY, and NN are used tos specify four dummy variables; only one of which 
                   should have a non-zero value.
         Thus, if a respondent answered yes-no, the YN variable (dumYN) should equal 1, 
                   and the other three variables should equal 0.

         The start_bid, bid_upper, and bid_lower variables are interpreted as above.

      TYPE=MULTIPLE BIDTYPE=btype   YES=VList

         A multiple bounded model, where respondents answer YES/NO to several different bids.

              For the MULTIPLE type, you also must specify a BID and a RESPONSE keyword -- see
         the section below for the details on BID and RESPONSE. Do note that the
              syntax of the BID and RESPONSE keyword will depend on the btype option.

              btype is used to specify how to read the BID keyword (VAL is the default):
     
         VAL = BID should provide a set of values. Thus, each respondent faces the
               same set of bid values. 
         VAR = BID should contain a set of variable names. Each variable points to a 
               bid value for the respondent. The variable names should be entered in 
               ascending order (the first variable in the list points to the lowest bid 
                         value)

              In both cases,  RESPONSE should contain a set of variable names. Each variable name
              indicates whether a YES or NO answer was given for the corresponding bid.
              If a RESPONSE variable has a value in the VList, a YES is assumed.
              If  VList is not specified, non-zero values mean YES.
                  VList is either a single value (YES=2), 
                  or a list of values in quotes (YES="3 6 7" ).

              Note that the number of elements specified in the BID keyword should correspond
              to the number of elements specified in the RESPONSE keyword. That is, the same
              number of elements should be entered in both BID and RESPONSE, and in the sameorder.

              Examples:
              CHOICE TYPE=MULTIPLE BIDTYPE=VAL ;
               BID  10 20 50 100 200 500 ;
               RESPONSE R1 R2 R3 R4 R5 RTOP ;

            CHOICE TYPE=MULTIPLE BIDTYPE=VAR ;
               BID  BL1 BL2 BL3 BL4 BL5 BLHIGH ;
               RESPONSE R1 R2 R3 R4 R 5;

            CHOICE TYPE=MULTIPLE BIDTYPE=VAR YES="4 5" ;
               BID  BL BL2 BL3 BL4 BL5 BLHIGH ;
                RESPONSE RL R2 R3 R4 R5 RH ;
         
      TYPE=SINGLE  VAR=cvar BID=bid_var [YES=VList] ;

         Estimate a single bounded model (possibly 2-stage).

             In essence, responses in single bounded models are YY or NN 
             (YN and NY responses can not be specified).
       
        Note that you must specify the following options: VAR, BID
        The YES option is optional.

       cvar should be dummy variable indicating whether YES was chosen.
        bid_Var is a bid variable. Note that in the SINGLE model, only one bid variable
       is used (obviously).

            The YES=vlist is optional. It allows you to specify which values of cvar are treated as YES. 
            VList is either a single value (YES=2), 
                  or a list of values in quotes (YES="3 6 7" ).

            If YES= is not specified, all non-zero values of cvar are treated as YES.
       
           Example: TYPE=SINGLE var=DIDIT  bid=CASH2  YES="2 3 4 "
            (say, 1 is used for NO, and 2, 3, and 4 are certainty levels of YES).


   Examples:
       CHOICE TYPE=x U=BIDU L=BIDL ;
          CHOICE TYPE=D VAR=ACHOICE  1=SBID U=ubid L=L2BID  ;
       CHOICE TYPE=D VAR=ACHOICE YY=10 NN=0 YN=5 NY=1 1=SBID U=ubid L=L2BID ;
            CHOICE TYPE=I YY=dyy  YN=M3  NY=M4  NN=N_ALWAYS 1=STRT U=UU L=L2BID ;
            CHOICE TYPE=SINGLE VAR=LIKEIT  BID=CASHVAL  YES="2 3" ;

        MODEL DOUBLE -PROBIT -2STAGE  ;
            CHOICE TYPE=SINGLE VAR=LIKEIT  BID=CASHVAL  YES="2 3" ;
            RESP1 DO_CARE ;


  Notes:
     * To estimate a 2-stage probit:
         i)  include -PROBIT (in the MODEL DOUBLE keyphrase)
        ii)  include -2STAGE (in the MODEL DOUBLE keyphrase)
       iii)  specify a RESP1 keyword (identifies participants) -- see below for the details
        iv)  specify a X1 keyword (the first stage independent variables).


    * the 2-stage probit models are simultaneously estimated (FIML) two-stage models:
        i)  The first, "participation", stagewhich is estimated with a standard Probit model.
            RESP1 is used to form the dependent variable for the first stage.
       ii)  The second, bid-response, stage is estimated with a single-bounded probit.
           choice and bid variables are used in this stage. 
    * Since different X values are used in each stage, you MUST specify X1 (first stage) and
      X (second stage)


IIc9.f. Description of the other DOUBLE keywords

The following describes other keywords used with the DOUBLE Model

 BID      : Specify BID values -- used if TYPE=M
 BOUND    : Recode upper and lower bounds
 JACKKNIFE : Use a Jackknife procedure to estiamte coefficient, and WTP, distribution
 NEVERYES :  Identify non-particpants, and set bounds accordingly
 RESPONSE : Specify response variables  -- used if TYPE=M
 RESP1    : First stage response variables -- used in 2-stage models
 STARTRHO : Starting values for the RHO coefficient -- used in bivariate and 2-stage models
 WTP      : Compute willingness to pay
 X1       :  X variables used in the first stage -- used in 2-stage probit models


Detailed descriptions of the above keywords:


 BID: Specify bid values (or variables) for use by the MULTIPLE type of CHOICE
      
     Syntax:

   BID val_list ;
     or
   BID var_list;
  
     where 
    val_list: space delimited list of numeric values
    var_list: space delimited list of variable names (variables in the dataset)

     Note that the BIDTYPE option (specified in the  CHOICE TYPE=M keyphrase).
     dictates whether BID should contain a val_list or a var_list.

    Examples:
   BID  MBID1 MBID2 MBID3 MBID4 FBID ;
   BID 10 20 50 100 500 1000 ;



 BOUND: Specify truncation limits for upper and lower bids

     BOUND is used to recode upper and lower bounds. 
     Basically, BOUND is used to convert YY and NN responses to YN responses; or to convert
     NY (or YN) to YY and NN responses.

     Note: BOUND is NOT used in single-bounded 2-stage models!

     For YY and NN responses:

       When a YY (or NN) response is given (or in the single bound model), the upper (lower)
       bid value is impliticly equal to infinity (-infinity). 
       That is, 
           * the probability of a YY is computed by integrating from -(XB - BID_HIGH) to infinity.
           * the probability of a NN is computed by integrating from -infinity to -(XB - BID_LOW)

        In many cases, these bounds are not reasonable. In particular, the implicit use of
        -infinity bounds for a lower bound is extreme -- a more likely value would be zero. 
        That is, individuals who answered NN should be treated as answering Y to a 
        zero-bid value, but NO to the low bid value. That is they shoud be treated as NY 
        responses -- NO to the lowest bid they were actually presented with, but YES to 
        a bid of 0 (even though they weren't actually asked to respond to a zero bid).

        Similarly, you might want to use total income as an upper bid value -- 
        so that individuals answering YY would answer NO to a bid value equal to 
        their entire income, but YES to the upper bid value.
    
      For YN and NY responses provided using the eXplicit TYPE:

        When X (explicit) CHOICE TYPE is used, you must specify actual values for the 
        upper and lower bounds. However, some respondents may have been YY respondents (or NN), 
        in which case the upper (lower) bound with arbitrary "placeholder values" 
        (say, 100000 or 0) used for the upper and lower bounds.

        In this case, when "placeholder" values with no real meaning are sometimes used,
        you can use BOUNDS to convert these observations to YY (or NN) observations --
        essentially, you recode a high (low) placeholder value to infinity (-infinity)

    BOUND is used to impose these bounds. You can also use the NEVERYES option (described) below 
    along with BOUND -- this allows you to impose two kinds of bounds (on different classes of
     respondents).

    Basic syntax:

         BOUND UP_VAL=aval  LOW_VAL=aval  LOW_VAR=avarname UP_VAR=avarname ;

     where:
      *  avarname should be one of the variables in the dataset.
      *  UP_VAL and LOW_VAL are used to specify explicit values to use as upper and lower bounds.
      *  LOW_VAR and UP_VAR are used to specify variables that contain values to use as
         lower and upper bounds.

     For YY and NN responses, these values are used (instead of the implicit infinity values).
     For YN and NY responses, when the actual bound is equal to UP_VAL or LOW_VAL (or to the
     value of the LOW_VAR or UP_VAR variable), the observation is converted to a YY or NN
     response (the bounds are converted to infinity or -infinity).

    Alternative syntax:
       BOUND  UP_IND=ivarname  LOW_IND_VALS=VList
                   UP_VAL=aval UP_VAR=avarname
              LOW_IND=ivarname UP_IND_VALS=VList 
              LOW_VAL=aval LOW_VAR=avarname         
                         
    where VList is either a single value, or a list of value in quotes (i.e.; "3 6 7" ).

    Using this syntax ...

       YY and NN:  the upper (or lower) bounds are only set (to the UP_VAL and LOW_VAL, or
       the UP_VAR and LOW_VAR) when the UP_IND (or LOW_IND) variable equals 1 (by default), 
       or equals the value (or one of the values) set in UP_IND_VALS (or LOW_IND_VALS).  
    
       NY and YN: the upper (or lower) bounds are recoded (to infinity of -infinity) when
       when the UP_IND (or LOW_IND) variable equals 1 (by default), 
       or equals the value (or one of the values) set in UP_IND_VALS (or LOW_IND_VALS).  

       Examples ...
           BOUND LOW_IND=PAY0 LOW_VAL=0 ;
         yields  (for an NN response): 
           PAY0=0 -- respondent would not pay 0 : the default is used (lower bound of -infinity)
           PAY0=1 -- respondent would pay 0     : a lower bound of 0 is used
    
           BOUND  UP_IND=BIDU UP_IND_VALS="1000 2000" ;
         yields (for a YN response, perhaps as specified using TYPE=X):
           BIDU=1000 -- an infinite bounds (observation is converted to a YY)
           BIDU=400  -- no change

     Notes:
        * You should NOT use UP_VAL and UP_VAR at the same time -- if you do, UP_VAR is used.   
        * You should NOT use LOW_VAL and LOW_VAR at the same time -- if you do, LOW_VAR is used  
        * For YN and NY responses, if both the upper and lower bounds match (say, both are equal
          to the UP_VAL and LOW_VAL), a fatal error will occur.
        * Caution: The YN and NY recoding is ONLY done for X and M types.
          If, for some reason, you want to recode YN or NY responses to YY or NN,
          you will need to use DISCRETE's data manipulation tools (such as RECODE or XNEW).

       * If NEVERYES is also used:
            * BOUND is done first.
            * NEVERYES is done second.
         See the description of NEVERYES for further details.
      
     Examples:
       CHOICE TYPE=D VAR=ACHOICE  ; BOUND  LOW_VAL=0 ;
       CHOICE TYPE=D VAR=ACHOICE  ; 
                 BOUND  LOW_IND=USER1 LOW_IND_VALS="1 2 3" LOW_VAL=0 UP_VAR=INCOME ;
    

 JACKKINFE: Use a Jackknife procedure to estimate the distribution of coefficients.

       JACKKINFE is used to re-compute coefficient values by a resampling method.
       Many different betas (coefficient vectors) computed, with each beta computed using
       a different "sample" of the observations in the actual dataset.

       Using the matrix of betas, averages, standard deviations, and confidence intervals 
       of each coefficient reported. 

       Syntax:
     JACKKNIFE  REPS=nreps  N=nobs
   
       where...
       nreps : the number of "replications" -- the number of different coefficient vectors to
                    compute
            nobs  : the number of observations per replication. Or, if a fraction between 0 and 1,
                    a "percent of available".
 

       Examples:
     JACKKNIFE ;
          JACK REPS=1000 N=400 ;
          JACK REPS=200 N=0.75 ;

       Note that JACK is a shorthand for JACKKINFE.

       If N=nobs is not specified, or if N=1, nobs will equal the number of available
       observations. That is, if the dataset has 320 useable observations, nobs will equal 320.

       If REPS=nreps is not specified, 100 is used.

       Notes:

         * Since it requires restimating REPS times, JACKKINFE can be very time consuming.
            
    * In each replication, nobs is chosen WITH REPLACEMENT from the set of observations.
           Thus, for each replication a completely different set of observations is used (a set which
           may contain multiple copies of one or more observations).
   
         * Probably the major reason for use of JACKKNIFE is to estimate a distribution of 
           WTP.  See the description of WTP below for the details.

       
 NEVERYES:  Identify non-particpants, and set bounds accordingly

   There may be respondents who would never say YES, regardless of how low the bid is.
   In particular, even if the bid was 0, they would not say YES.

   These respondents may be different from respondents who say NO to the lowest bid -- such
   respondents may say YES at slightly higher then zero.

   You can use NEVERYES to identify these "non-participants".  

        Basically...

             * All responses identified as non-particpants are treated as NO responses with 
               an upper bound of 0 and a lower bound of -infinity.

   Syntax:
      NEVERYES  avar;
   or
      NEVERYES  VAR=avar  vals=VList  ; 
    
   avar should be a variable that identifies "non-particants". 
        vals=VList is optional:
         VList is either a single value (vals=2), 
         or a list of values in quotes (vals="3 6 7" ).

        If you do not specify a VList, a non-zero value of avar identifies a non-participant.  
   If VLIst is specified, non-particpants are identified by having an avar value that appears in the VList.

        Notes:
     * NEVERYES is NOT used in  the single-bounded 2-stage probit model
     * if you use "v1 v2 ..." form of VList , you 
          * must include the values between double quotes
          * can not have more than 100 values

          * The first syntax is (NEVERYES avar) is a shortcut for:
                 NEVERYESS var=avar vals=1  ;
   

   Examples:
      NEVERYES  var=DONTCARE ;
      NEVERYES var=LIKEIT  vals=0 ;
           NEVERYES var=YOUNG vals="10 11 12 13 14 15 16" ;
      NEVERYES var=CYNICS ;



   Using NEVERYES with BOUND.

       NEVERYES is designed to be used along with BOUND.
       First, BOUND is used to set the upper and lower bounds (and the type of response).
       Then NEVERYES is used.

       Hence, the NEVERYES indicator takes precedence.

       For example:
      BOUND LOW_IND=DOCARE LOW_VAL=0 
            UP_VAR=INCOME ;
      NEVERYES VAR=DONTCARE  ;
      means:
      For respondents who do care, but answered N (or NN), use:
          Lower bound: 0
          Upper bound: lowest bid value, 
      For respondents who do care, and answered Y (or YY), use:
          Lower bound: highest bid value
          Upper bound: infinity
      For respondents who do NOT care (their response is implicitily NN), use:
          Lower bound: -infinity
          Upper bound: 0
      Actually, the actual response of non-participants is ignored -- DISCRETE assumes that
      non-participants must answer N (or NN), and ignores evidence to the contrary!

      Alternative -- you can use the same variable in BOUND and in NEVERYES:   
      BOUND LOW_IND=DONTCARE low_ind_vals=0 LOW_VAL=0 
            UP_VAR=INCOME ;
      NEVERYES VAR=DONTCARE vals=1 ;


 RESPONSE: Specify response variables, for used by the MULTIPLE type of CHOICE

     Syntax:

   RESPONSE var_list;
  
     where 
    var_list: space delimited list of variable names (variables in the dataset)

      RESPONSE should contain a set of variable names. Each variable name
      indicates whether a YES or NO answer was given for the corresponding bid (the values
      or variables listed in the BID keyword).
  
      The actual coding (or what values signal YES) depends on the value of the YES option 
      (specified in the  CHOICE TYPE=M keyphrase).


    Examples
   RESPONSE R1 R2 RMID R4 TOBIGBID  ;

      
 RESP1: First stage response variables -- used in 2-stage model.

      RESP1 is used to identify participants --- respondents who answered yes to a first stage
         question.

      Basically ...
            * All responses identified as non-particpants are treated as first-stage NO responses,
          participants are either YN (2nd stage no), or YY (2nd stage YES) observations.
   

      Syntax:
     RESP1 VAR=varname COND=type  VAL=aval
      where
           varname is a variable name
           COND is optional. If specified, it should be one of: EQ GT GE LT LE NE
           aval is optional, and is only used if COND is used. It should be a single numeric value.    
      If cond is EQ, aval can be a quoted string of numeric values (YES if varnames equals 
                any of the values in this list)

       If cond is not specified, a value of 1 means YES (all other values mean NO)

       Shortcuts:

           * RESP1 varname ;
               is a shorthand for:
             RESP1 vAR=varname COND=eq VAL=1 ;

           * RESP1 var=varname YES=vlist;
               is a shorthand for:
             RESP1 vAR=varname COND=eq VAL=vlist ;

           * RESP1 varname YES=vlist;   (Varname MUST precede YES=Vlist)
               is a shorthand for:
             RESP1 vAR=varname COND=eq VAL=vlist ;


       Example:
        RESP1 DIDIT  ;                    -- DIDIT=1 means "YES"
        RESP1 DIDIT  YES="1 2" ;       -- DIDIT=1 or 2 means "YES"
        RESP1 VAR=AMTPAY COND=ne VAL=0 ; -- AMTPAY=1 (or 2, or 3,...) means "YES"


 STARTRHO: Starting values for the RHO coefficient (bivariate and 2-stage models)

    STARTRHO is used to specify one, or more, starting values for the RHO parameter
    used in the bivariate and 2-stage models.  

    Syntax:
    STARTRHO vlist ;
    where vlist is a space delimited list of starting values for rho.
    These values must be between -1.0 and 1.0.

    Example:
       STARTRHO -0.5 -0.2 0.2 .06 ;




 WTP: Compute willingness to pay

     Syntax:
       WTP  [KR=n];

     where the KR is optional, and n is a positive integer.
     If specified, KR specifies "n" iterations of a Krinsky Robb computation of the
     distribution of willingness to pay.

     Example:   WTP ;
         WTP KR=1000 ;

     Several different computations of willingness to pay are computed (depending on the model).

   * Linear: X*beta.

        * Expectation: The expected value, given the WTP>0. This is computed as:
      Integral (s=0 to infinity) [ 1 - F(s,X*beta,sd)] 

           where F is the model specific CDF.
        PROBIT: cdfn((s-x*beta)/sd)
        LOGIT:  1./(1+exp(-(x[ii]-uuse)./sd)))
             WEIBIT:  1 - exp(-exp((s-x*beta)/sd))

        * Expectation allowing <0: The expected value, given that WTP can be <0. 
          This is computed as:
             Integral (s=0 to infinity) [ 1 - F(s,X*beta,sd)]   -  Integral(-infinity to 0) F(s,X*beta,sd)

        * Truncated and Censored "analytic" measures:

      * TOBIT means are used in the probit model:

          Truncated: X*beta +  sd * pdfn(x*beta/sd) / cdfn(x*beta/sd)
          Censored:  cdfn(x*beta/sd) * X*beta +  sd * pdfn(x*beta/sd)
     
      * HECKIT means are used in the 2-stage probit model:
          Truncated: X*beta +  rho* sd * pdfn(x1*beta1) / cdfn(x1*beta1) 
               Censored: cdfn(x1*beta) * X*beta +  rho* sd * pdfn(x1*beta1)

      * LOGISTIC means are used in the logit model:
          Truncated:  sd * ln(1 + exp(x*beta/sd) )

      Notes:
         * for both expectation measures, numeric integration methods are used.
         * pdfn and cdfn refer to pdf and cdf of the standard normal.
         * for the bivariate double-bounded probit, WTP is computed just using the beta and
           sd from the "2nd response". First response information (and the rho) are NOT used.

      Note that if you have specified JACKKNIFE, KR is ignored -- the matrix of betas produced
      by the Jackknife procedure will be used insteadl

      If you did specified KR, or a JACKKNIFE keyword:
          Several statistics are computed for several different measures.
          These are the mean, standard deviation, median, and several confidence intervals (5, 10
          90 and 95%). Confidence intervals are computed by sorting the "n" different computations
          of WTP (using each a different beta), and taking the appropriate row.
          For example, if 1000 vectors usedd, the 10% confidence interval is simply the value of the
         100th row (of the sorted values of WTP).

     The number of beta vectors to compute is determined by:
   * By the REPS= argument of the JACKKNIFE keyword, 
     or
   * By the KR=n argument of the WTP keyword.

     If JACKKNIFE is specified, the KR argument is ignored.


  X1:  X variables used in the first stage (of a 2-stage probit)
      Syntax: same as X.

      Example:  X1 -CONST V1 V15 V135 ;



     ===================================================================

III. Examples

 THESE NEED TO BE FIXED !!!

