Docmentation for SNP.TXT

This is a legacy program, and lacks decent documentation.

Instead, the following sample GRBL2 Command Files (used to specify
varibles, etcp) files may be helpful.


A note on GRBL2 command files:
  GRBL2 command files use "keyphrases".
  
  The syntax of these keyphrases is:

     KEYWORD option_list ;

  where:
    Keyword: one of several keywords understood by SNP
    option_list: a list of one or more space delimited options.
		 The syntax of these options depends on the keyword.
		 FOr example: for X, the option_list is a list of variable names.
		              for LENGTH, it is a 0/1 flag.
  Notes: 
     * comments are enclosed between ampersand characters.
     * each keyphrase MUST end with a semi-colon 
     * keywords and options are case-INsensitive
     * to run the model, you must have  RUN ; keyphrase (after you have specified all your options
     * Several keyphrases (the FOURIER,OTHER, MISC, and BOOTSTRAP sets of phrases) are optional 
       (default values are used if they are not specified
     * All options are reset to default values by MODEL keyphrases.


===================================================

For the COUNT model:


 The following keyphrases are supported (in alphabetical order):
    COST   : The travel cost variable. REQUIRED.
    ITERB  : Bootstrap iterations
    INCREM : increments for the second stage cs approach, >500 best 
    INPUT  : Specify a gauss dataset.  REQUIRED.
    LENGTH : Used by fourier technique
    LNTC   : linear/logged computation of TC 
    MODEL  : Specify which model to estimate.  REQUIRED.
    MXINC  : number of increments wanted for the second (by individual cs) , method 
    ORDER  : used by fourier technique
    OUTPUT : The output file to write results to. 
    QUADX  : include quadratic terms

    RANDOM : 1=use built-in seed values; 
    RSQ    : get rsquared 
    STAGE2 :  calc second stage (and dY/dTC)
    TAKELN : log other regressors  
    TC     : what to use for TC 
    TITLE  : A Title to display before displaying model results.
    X      : List of independent variables.  REQUIRED.
    WEIGHT : Observation weight
    WEIGHT2 : Aggregation weight
    Y      : the  dependent variable.   REQUIRED.

For options set by keyhprases that are not REQUIRED, defaults are used if necessary.

-------------------------------------------- 

@ INPUT: specify a gauss dataset. 
   If you do not specify a path, SNP will look in the  current GAUSS working directory 
   (which you can change using the File-Change_working_directory option in the GAUSS taskbar).
   You do NOT have to include the .DAT extension
@


INPUT snp_tcm ;

@ OUTPUT:  The output file to write results to. 
    If you do not specify a path, SNP will look in the  current GAUSS working directory 
    To delete any pre-existing version of this output file, include -RESET (after the filename);
    Example: OUTPUT SNP.OUT -RESET ;
@

OUTPUT countsnp.out  -reset ;	       @TRIPS POP87 TCOST INC89 PSBAG H20DEL BAG@ 


@ TITLE: A Title to display before displaying model results.
  The title can be several lines long.
  If you include "$DATE" (without the " quotes) in the title, the current date and time will be inserted.
@

TITLE Results from SNP (generated on $date ) ;


@ MODEL: Specify which model to estimate
 Currently, SPIKE and COUNT are supported only SPIKE is supported.

 To specify the COUNT model, use: MODEL COUNT ;

 For the SPIKE model, see the SNP_SPIKE.IN file.

 For the COUNT model, you can specify several options:
   * Parametric or Fourier regression:
       To do a flexible fourier transformation :  MODEL COUNT ; or MODEL COUNT -FOURIER ;
       To do a parameteric regression          :  MODEL COUNT -PARAM ;

   * Truncation, or endogenous stratifcation
       All observations available                      : MODEL COUNT ; or MODEL COUNT -ALL ;
       Truncated (>0 observations only are available)  : MODEL COUNT -TRUNC ;
       Endogenous stratification (truncated, and large 
                    values more likely to be observed) : MODEL COUNT -ENDOG ;
     
       Note: if you use -ENDOG or -TRUNC, you Y variable must NOT have values < 1.

   You can combined both these options. For example: MODEL COUNT -PARAM -TRUNC ;

@

MODEL count -param ;

@  Y : the  dependent variable. Required.
      The dependent variable should be a 0/1 dummy variable, with
      1 for a YES and 0 for a NO @

Y TRIPS ;


@ COST: The travel cost variable.  Required. @

  COST TCOST ;



@ X: List of independent variables.
     Note that a constant is ALWAYS included.
     To specify no X variables, use:  X 0 ; @

  X   INC89 PSBAG H20DEL ;

@ WEIGHT: weight variable 
  0 : do NOT weight
  varname : use value of varname variable @

  WEIGHT 0 ;

@  --  FOURIER TERMS   
  Notes:
      * The current implementation allows a maximum LENGHT of 2. 
      *  ORDER is unlimited.  
      * QUADX is not fully tested for QUADX = 2.
@


@ LENGTH: sum of absolute value of each element in the 1Xg multi-indexes
  vectors; number of vectors is function of length and of g, the number
  of variables you chose, excluding the constant. 
    1 = max if you want  to use the orginal variable names in the output. 
    2 = max allowed  length in the current implementation (and negative elements are
         ignored for now  @

 LENGTH 1 ;


@ ORDER: Must be integer >= 1.
     1 = default order of transformation; 
     2 = maximum if you  want to use original variables names; 
         if use more, then default names @

  ORDER  1; 

@ QUADX: 
   1 = no quadratic terms, 
   2 = quadratic terms @

 QUADX  1; 

 

@  -- bootstrap stuff @

@ RANDOM:
   1=use built-in seed values; 
   2=reset seed values for each outer loop
        Built-in seed values are useful for comparing outputs as the results are
        directly comparable except for the parameter you change @

  RANDOM  1 ;

@ITERB: bootstrap iterations: max = 1000.
 Setting ITERB = 1 causes the program to estimate the results for the actual 
  data set.  Setting it to higher values produces bootstrap results. @

 
  iterB  5 ;  


@ -- Misc stuff  @


@ WEIGHT2: regression weight (eg; zonal population)
    WEIGHT2 0 : Do NOT weight
    WEIGHT2 varname  : Use varname variable as the weight

   Note that WEIGHT and WEIGHT are different. @
  
  WEIGHT2 POP87 ;




@ LNTC : linear/logged computation of TC 
    0  =  linear in TC or rtdist
     1=  if TC is logged  @

  LNTC 0 ;   


@ TAKELN : log other regressors  
  0=don't take log of other regressors
  1= do   @

  TAKELN   0 ; 


@ TC: what to use for TC 
    1 = TC is used
    2 = rtdist is used @

  TC 1;


@ RSQ: get rsquared 
  0 = No 
  1 = Yes   @

  RSQ  1; 



@ STAGE2:  calc second stage (and dY/dTC)
   0 = NO
   1=  Yes @

  STAGE2 0 ; 


@ INCREM: increments for the second stage cs approach, >500 best @
  
  INCREM 500  ;


@ MXINC: number of increments wanted for the second (by individual cs) , method @
  
   MXINC 500 ;




run ;



===================================

For the SPIKE model:

 The following keyphrases are supported (in alphabetical order):
    BID    : The bid variable.  REQUIRED.
    INCOME : The income variable, or whatever you want to use  for the upper bound. REQUIRED.
    ITERB  : Bootstrap iterations
    INPUT  : Specify a gauss dataset.   REQUIRED.
    LENGTH : Used by fourier technique
    MODEL  : Specify which model to estimate.  REQUIRED.
    ORDER  : used by fourier technique
    OUTPUT : The output file to write results to. 
    QUADX  : include quadratic terms
    REPS   : Default number of repetitions for Krinsky and Robb  
    RANDOM : 1=use built-in seed values; 
    TITLE  : A Title to display before displaying model results.
    X      : List of independent variables.
    XLOG   : Which X variables to log 
    WEIGHT : Weight the observations
    Y      : the  dependent variable.   REQUIRED.

For options set by keyhprases that are not REQUIRED, defaults are used if necessary.


-------------------------------------------- 

@ INPUT: specify a gauss dataset. 
   If you do not specify a path, SNP will look in the  current GAUSS working directory 
   (which you can change using the File-Change_working_directory option in the GAUSS taskbar).
   You do NOT have to include the .DAT extension
@

INPUT snp_test ;



@ OUTPUT:  The output file to write results to. 
    If you do not specify a path, SNP will look in the  current GAUSS working directory 
    To delete any pre-existing version of this output file, include -RESET (after the filename);
    Example: OUTPUT SNP.OUT -RESET ;
@

OUTPUT snpspike.out  -reset ;

@ TITLE: A Title to display before displaying model results.
  The title can be several lines long.
  If you include "$DATE" (without the " quotes) in the title, the current date and time will be inserted.
@

TITLE Results from SNP (generated on $date ) ;

@ MODEL: Specify which SNP model to estimate
 Currently, SPIKE and COUNT are.
 See SNP_COUNT.IN for the COUNT model.

 For the SPIKE model, you can specify an option:

      To use LOGISTIC as for starting values, use : MODEL SPIKE -LOG ;
      To use PROBIT for starting values, use      : MODEL SPIKE -PROBIT ;

     Note: MODEL SPIKE  ; is the same as MODEL SPIKE -PROBIT ;

@

MODEL spike -probit  ;


@  Y : the  dependent variable. 
      The dependent variable should be a 0/1 dummy variable, with
      1 for a YES and 0 for a NO @
Y Y ;


@ INCOME: The income variable, or whatever you want to use  for the upper bound; 
  To specify a value, use VAL=aval; (i.e.;  INCOME VAL=50000 ;
  To specify a  value use VAR=varname (i.e.;  INCOME VAR=YEARINC ; @

  INCOME VAR=INCOME ;

@ BID: The bid variable. 
   To use log of bid variable, use: BID varname -LOG ; @

  BID BID ;


@ X: List of independent variables.
     Note that a constant is ALWAYS included.
     To specify no X variables, use:  X 0 ; @

  X   INCOME ;


@ XLOG:  which X variables to log 
    xlog x1name ... xkname ;   -- specify  X variables to log
  or
    xlog 0 ;     -- do NOT log variables 
  or
    xlog * ;	   -- log ALL the variables 

 Note: x1name .. MUST be specified FIRST by an X keyword
@

  XLOG 0  ;

@ WEIGHT: weight variable 
  0 : do NOT weight
  varname : use value of varname variable @

  WEIGHT 0 ; 

@  --  FOURIER TERMS   
  Notes:
      * The current implementation allows a maximum LENGHT of 2. 
      *  ORDER is unlimited.  
      * QUADX is not fully tested for QUADX = 2.
@

@ LENGTH: sum of absolute value of each element in the 1Xg multi-indexes
  vectors; number of vectors is function of length and of g, the number
  of variables you chose, excluding the constant. 
    1 = max if you want  to use the orginal variable names in the output. 
    2 = max allowed  length in the current implementation (and negative elements are
         ignored for now  @

 LENGTH 1 ;


@ ORDER: Must be integer >= 1.
     1 = default order of transformation; 
     2 = maximum if you  want to use original variables names; 
         if use more, then default names @

  ORDER  1; 

@ QUADX: 
   1 = no quadratic terms, 
   2 = quadratic terms @

 QUADX  1; 

 
@ -- Other settings @

@ REPS: default number of repetitions for Krinsky and Robb @ 

  REPS  1000; 


@  -- bootstrap stuff @

@ RANDOM:
   1=use built-in seed values; 
   2=reset seed values for each outer loop
        Built-in seed values are useful for comparing outputs as the results are
        directly comparable except for the parameter you change @

  RANDOM  1 ;

@ITERB: bootstrap iterations: max = 1000.
 Setting ITERB = 1 causes the program to estimate the results for the actual 
  data set.  Setting it to higher values produces bootstrap results. @
 
  iterB  5 ;  

run ;

