! Discriminant analysis by integer programming (DiscrmSwiss.lng);
! This is a form of categorical regression in which the dependent
  variable is a categorical variable, e.g., Good or Bad.
 Basic idea:
   Given values of various characteristics of an object,
 predict its category, e.g.,
  Is a prospective customer a good credit risk, or bad?
  Is a paper banknote good or counterfeit?
  Does a patient have a certain disease or not?
 We compute the weights in a scoring formula, so that
Score(i) >= 0 implies a good item, < 0 implies bad.
  There are various objectives one can use in finding an
optimal scoring function. Here we use the objective of 
  Minimize the number of misclassifications;
! Keywords: Discriminant analysis, Classification, Clustering,
     Categorical regression, ChartScatter, Data Mining, Grouping, 
     Scatter chart, Statistics;
SETS: 
 TEST: WGT, ZUSE;
 OBS: DROP, SCORE;
 OXT(OBS, TEST): TSCR;
 OBS1(OBS): X1, Y1;
 OBS2(OBS): X2, Y2;
ENDSETS
DATA: ! Genuine and counterfeit banknotes (100 Swiss Franks), various measurements. Banknotes BN1 to BN100 are genuine (Good=1), all others are counterfeit (Good=0). Dataset courtesy of H. Riedwyl, Bern, Switzerland; WGTSUSEDMX = 2; ! Max # of weights to use; WGTMX = 99999; ! Max absolute value of any weight; DEPVAR = 7; ! Index of the dependent variable (Good); TEST= Length Left Right Bottom Top Diagonal Good; OBS, TSCR= BN1 214.8 131.0 131.1 9.0 9.7 141.0 1 BN2 214.6 129.7 129.7 8.1 9.5 141.7 1 BN3 214.8 129.7 129.7 8.7 9.6 142.2 1 BN4 214.8 129.7 129.6 7.5 10.4 142.0 1 BN5 215.0 129.6 129.7 10.4 7.7 141.8 1 BN6 215.7 130.8 130.5 9.0 10.1 141.4 1 BN7 215.5 129.5 129.7 7.9 9.6 141.6 1 BN8 214.5 129.6 129.2 7.2 10.7 141.7 1 BN9 214.9 129.4 129.7 8.2 11.0 141.9 1 BN10 215.2 130.4 130.3 9.2 10.0 140.7 1 BN11 215.3 130.4 130.3 7.9 11.7 141.8 1 BN12 215.1 129.5 129.6 7.7 10.5 142.2 1 BN13 215.2 130.8 129.6 7.9 10.8 141.4 1 BN14 214.7 129.7 129.7 7.7 10.9 141.7 1 BN15 215.1 129.9 129.7 7.7 10.8 141.8 1 BN16 214.5 129.8 129.8 9.3 8.5 141.6 1 BN17 214.6 129.9 130.1 8.2 9.8 141.7 1 BN18 215.0 129.9 129.7 9.0 9.0 141.9 1 BN19 215.2 129.6 129.6 7.4 11.5 141.5 1 BN20 214.7 130.2 129.9 8.6 10.0 141.9 1 BN21 215.0 129.9 129.3 8.4 10.0 141.4 1 BN22 215.6 130.5 130.0 8.1 10.3 141.6 1 BN23 215.3 130.6 130.0 8.4 10.8 141.5 1 BN24 215.7 130.2 130.0 8.7 10.0 141.6 1 BN25 215.1 129.7 129.9 7.4 10.8 141.1 1 BN26 215.3 130.4 130.4 8.0 11.0 142.3 1 BN27 215.5 130.2 130.1 8.9 9.8 142.4 1 BN28 215.1 130.3 130.3 9.8 9.5 141.9 1 BN29 215.1 130.0 130.0 7.4 10.5 141.8 1 BN30 214.8 129.7 129.3 8.3 9.0 142.0 1 BN31 215.2 130.1 129.8 7.9 10.7 141.8 1 BN32 214.8 129.7 129.7 8.6 9.1 142.3 1 BN33 215.0 130.0 129.6 7.7 10.5 140.7 1 BN34 215.6 130.4 130.1 8.4 10.3 141.0 1 BN35 215.9 130.4 130.0 8.9 10.6 141.4 1 BN36 214.6 130.2 130.2 9.4 9.7 141.8 1 BN37 215.5 130.3 130.0 8.4 9.7 141.8 1 BN38 215.3 129.9 129.4 7.9 10.0 142.0 1 BN39 215.3 130.3 130.1 8.5 9.3 142.1 1 BN40 213.9 130.3 129.0 8.1 9.7 141.3 1 BN41 214.4 129.8 129.2 8.9 9.4 142.3 1 BN42 214.8 130.1 129.6 8.8 9.9 140.9 1 BN43 214.9 129.6 129.4 9.3 9.0 141.7 1 BN44 214.9 130.4 129.7 9.0 9.8 140.9 1 BN45 214.8 129.4 129.1 8.2 10.2 141.0 1 BN46 214.3 129.5 129.4 8.3 10.2 141.8 1 BN47 214.8 129.9 129.7 8.3 10.2 141.5 1 BN48 214.8 129.9 129.7 7.3 10.9 142.0 1 BN49 214.6 129.7 129.8 7.9 10.3 141.1 1 BN50 214.5 129.0 129.6 7.8 9.8 142.0 1 BN51 214.6 129.8 129.4 7.2 10.0 141.3 1 BN52 215.3 130.6 130.0 9.5 9.7 141.1 1 BN53 214.5 130.1 130.0 7.8 10.9 140.9 1 BN54 215.4 130.2 130.2 7.6 10.9 141.6 1 BN55 214.5 129.4 129.5 7.9 10.0 141.4 1 BN56 215.2 129.7 129.4 9.2 9.4 142.0 1 BN57 215.7 130.0 129.4 9.2 10.4 141.2 1 BN58 215.0 129.6 129.4 8.8 9.0 141.1 1 BN59 215.1 130.1 129.9 7.9 11.0 141.3 1 BN60 215.1 130.0 129.8 8.2 10.3 141.4 1 BN61 215.1 129.6 129.3 8.3 9.9 141.6 1 BN62 215.3 129.7 129.4 7.5 10.5 141.5 1 BN63 215.4 129.8 129.4 8.0 10.6 141.5 1 BN64 214.5 130.0 129.5 8.0 10.8 141.4 1 BN65 215.0 130.0 129.8 8.6 10.6 141.5 1 BN66 215.2 130.6 130.0 8.8 10.6 140.8 1 BN67 214.6 129.5 129.2 7.7 10.3 141.3 1 BN68 214.8 129.7 129.3 9.1 9.5 141.5 1 BN69 215.1 129.6 129.8 8.6 9.8 141.8 1 BN70 214.9 130.2 130.2 8.0 11.2 139.6 1 BN71 213.8 129.8 129.5 8.4 11.1 140.9 1 BN72 215.2 129.9 129.5 8.2 10.3 141.4 1 BN73 215.0 129.6 130.2 8.7 10.0 141.2 1 BN74 214.4 129.9 129.6 7.5 10.5 141.8 1 BN75 215.2 129.9 129.7 7.2 10.6 142.1 1 BN76 214.1 129.6 129.3 7.6 10.7 141.7 1 BN77 214.9 129.9 130.1 8.8 10.0 141.2 1 BN78 214.6 129.8 129.4 7.4 10.6 141.0 1 BN79 215.2 130.5 129.8 7.9 10.9 140.9 1 BN80 214.6 129.9 129.4 7.9 10.0 141.8 1 BN81 215.1 129.7 129.7 8.6 10.3 140.6 1 BN82 214.9 129.8 129.6 7.5 10.3 141.0 1 BN83 215.2 129.7 129.1 9.0 9.7 141.9 1 BN84 215.2 130.1 129.9 7.9 10.8 141.3 1 BN85 215.4 130.7 130.2 9.0 11.1 141.2 1 BN86 215.1 129.9 129.6 8.9 10.2 141.5 1 BN87 215.2 129.9 129.7 8.7 9.5 141.6 1 BN88 215.0 129.6 129.2 8.4 10.2 142.1 1 BN89 214.9 130.3 129.9 7.4 11.2 141.5 1 BN90 215.0 129.9 129.7 8.0 10.5 142.0 1 BN91 214.7 129.7 129.3 8.6 9.6 141.6 1 BN92 215.4 130.0 129.9 8.5 9.7 141.4 1 BN93 214.9 129.4 129.5 8.2 9.9 141.5 1 BN94 214.5 129.5 129.3 7.4 10.7 141.5 1 BN95 214.7 129.6 129.5 8.3 10.0 142.0 1 BN96 215.6 129.9 129.9 9.0 9.5 141.7 1 BN97 215.0 130.4 130.3 9.1 10.2 141.1 1 BN98 214.4 129.7 129.5 8.0 10.3 141.2 1 BN99 215.1 130.0 129.8 9.1 10.2 141.5 1 BN100 214.7 130.0 129.4 7.8 10.0 141.2 1 BN101 214.4 130.1 130.3 9.7 11.7 139.8 0 BN102 214.9 130.5 130.2 11.0 11.5 139.5 0 BN103 214.9 130.3 130.1 8.7 11.7 140.2 0 BN104 215.0 130.4 130.6 9.9 10.9 140.3 0 BN105 214.7 130.2 130.3 11.8 10.9 139.7 0 BN106 215.0 130.2 130.2 10.6 10.7 139.9 0 BN107 215.3 130.3 130.1 9.3 12.1 140.2 0 BN108 214.8 130.1 130.4 9.8 11.5 139.9 0 BN109 215.0 130.2 129.9 10.0 11.9 139.4 0 BN110 215.2 130.6 130.8 10.4 11.2 140.3 0 BN111 215.2 130.4 130.3 8.0 11.5 139.2 0 BN112 215.1 130.5 130.3 10.6 11.5 140.1 0 BN113 215.4 130.7 131.1 9.7 11.8 140.6 0 BN114 214.9 130.4 129.9 11.4 11.0 139.9 0 BN115 215.1 130.3 130.0 10.6 10.8 139.7 0 BN116 215.5 130.4 130.0 8.2 11.2 139.2 0 BN117 214.7 130.6 130.1 11.8 10.5 139.8 0 BN118 214.7 130.4 130.1 12.1 10.4 139.9 0 BN119 214.8 130.5 130.2 11.0 11.0 140.0 0 BN120 214.4 130.2 129.9 10.1 12.0 139.2 0 BN121 214.8 130.3 130.4 10.1 12.1 139.6 0 BN122 215.1 130.6 130.3 12.3 10.2 139.6 0 BN123 215.3 130.8 131.1 11.6 10.6 140.2 0 BN124 215.1 130.7 130.4 10.5 11.2 139.7 0 BN125 214.7 130.5 130.5 9.9 10.3 140.1 0 BN126 214.9 130.0 130.3 10.2 11.4 139.6 0 BN127 215.0 130.4 130.4 9.4 11.6 140.2 0 BN128 215.5 130.7 130.3 10.2 11.8 140.0 0 BN129 215.1 130.2 130.2 10.1 11.3 140.3 0 BN130 214.5 130.2 130.6 9.8 12.1 139.9 0 BN131 214.3 130.2 130.0 10.7 10.5 139.8 0 BN132 214.5 130.2 129.8 12.3 11.2 139.2 0 BN133 214.9 130.5 130.2 10.6 11.5 139.9 0 BN134 214.6 130.2 130.4 10.5 11.8 139.7 0 BN135 214.2 130.0 130.2 11.0 11.2 139.5 0 BN136 214.8 130.1 130.1 11.9 11.1 139.5 0 BN137 214.6 129.8 130.2 10.7 11.1 139.4 0 BN138 214.9 130.7 130.3 9.3 11.2 138.3 0 BN139 214.6 130.4 130.4 11.3 10.8 139.8 0 BN140 214.5 130.5 130.2 11.8 10.2 139.6 0 BN141 214.8 130.2 130.3 10.0 11.9 139.3 0 BN142 214.7 130.0 129.4 10.2 11.0 139.2 0 BN143 214.6 130.2 130.4 11.2 10.7 139.9 0 BN144 215.0 130.5 130.4 10.6 11.1 139.9 0 BN145 214.5 129.8 129.8 11.4 10.0 139.3 0 BN146 214.9 130.6 130.4 11.9 10.5 139.8 0 BN147 215.0 130.5 130.4 11.4 10.7 139.9 0 BN148 215.3 130.6 130.3 9.3 11.3 138.1 0 BN149 214.7 130.2 130.1 10.7 11.0 139.4 0 BN150 214.9 129.9 130.0 9.9 12.3 139.4 0 BN151 214.9 130.3 129.9 11.9 10.6 139.8 0 BN152 214.6 129.9 129.7 11.9 10.1 139.0 0 BN153 214.6 129.7 129.3 10.4 11.0 139.3 0 BN154 214.5 130.1 130.1 12.1 10.3 139.4 0 BN155 214.5 130.3 130.0 11.0 11.5 139.5 0 BN156 215.1 130.0 130.3 11.6 10.5 139.7 0 BN157 214.2 129.7 129.6 10.3 11.4 139.5 0 BN158 214.4 130.1 130.0 11.3 10.7 139.2 0 BN159 214.8 130.4 130.6 12.5 10.0 139.3 0 BN160 214.6 130.6 130.1 8.1 12.1 137.9 0 BN161 215.6 130.1 129.7 7.4 12.2 138.4 0 BN162 214.9 130.5 130.1 9.9 10.2 138.1 0 BN163 214.6 130.1 130.0 11.5 10.6 139.5 0 BN164 214.7 130.1 130.2 11.6 10.9 139.1 0 BN165 214.3 130.3 130.0 11.4 10.5 139.8 0 BN166 215.1 130.3 130.6 10.3 12.0 139.7 0 BN167 216.3 130.7 130.4 10.0 10.1 138.8 0 BN168 215.6 130.4 130.1 9.6 11.2 138.6 0 BN169 214.8 129.9 129.8 9.6 12.0 139.6 0 BN170 214.9 130.0 129.9 11.4 10.9 139.7 0 BN171 213.9 130.7 130.5 8.7 11.5 137.8 0 BN172 214.2 130.6 130.4 12.0 10.2 139.6 0 BN173 214.8 130.5 130.3 11.8 10.5 139.4 0 BN174 214.8 129.6 130.0 10.4 11.6 139.2 0 BN175 214.8 130.1 130.0 11.4 10.5 139.6 0 BN176 214.9 130.4 130.2 11.9 10.7 139.0 0 BN177 214.3 130.1 130.1 11.6 10.5 139.7 0 BN178 214.5 130.4 130.0 9.9 12.0 139.6 0 BN179 214.8 130.5 130.3 10.2 12.1 139.1 0 BN180 214.5 130.2 130.4 8.2 11.8 137.8 0 BN181 215.0 130.4 130.1 11.4 10.7 139.1 0 BN182 214.8 130.6 130.6 8.0 11.4 138.7 0 BN183 215.0 130.5 130.1 11.0 11.4 139.3 0 BN184 214.6 130.5 130.4 10.1 11.4 139.3 0 BN185 214.7 130.2 130.1 10.7 11.1 139.5 0 BN186 214.7 130.4 130.0 11.5 10.7 139.4 0 BN187 214.5 130.4 130.0 8.0 12.2 138.5 0 BN188 214.8 130.0 129.7 11.4 10.6 139.2 0 BN189 214.8 129.9 130.2 9.6 11.9 139.4 0 BN190 214.6 130.3 130.2 12.7 9.1 139.2 0 BN191 215.1 130.2 129.8 10.2 12.0 139.4 0 BN192 215.4 130.5 130.6 8.8 11.0 138.6 0 BN193 214.7 130.3 130.2 10.8 11.1 139.2 0 BN194 215.0 130.5 130.3 9.6 11.0 138.5 0 BN195 214.9 130.3 130.5 11.6 10.6 139.8 0 BN196 215.0 130.4 130.3 9.9 12.1 139.6 0 BN197 215.1 130.3 129.9 10.3 11.5 139.7 0 BN198 214.8 130.3 130.4 10.6 11.1 140.0 0 BN199 214.7 130.7 130.8 11.2 11.2 139.4 0 BN200 214.3 129.9 129.9 10.2 11.5 139.6 0 ; ENDDATA SUBMODEL DISCRAMP: ! Minimize number of observations dropped to get a partition; MIN = OBJV; OBJV = @SUM( OBS( I): DROP( I)); ! For bad observations, if DROP(I)=0, we want a strictly negative score; @FOR( OBS(I)| TSCR( I, DEPVAR) #EQ# 0: SCORE( I) <= -1 + WGTMX*DROP( I); SCORE( I) >= - WGTMX*(1- DROP(I)); SCORE( I) = WGT0 + @SUM( TEST( J) | J #NE# DEPVAR: WGT( J)* TSCR(I,J)); @FREE( SCORE(I)); ); ! For good observations, if DROP(I)=0, we want a strictly positive score; @FOR( OBS(I)| TSCR( I, DEPVAR) #EQ# 1: SCORE( I) >= 1 - WGTMX*DROP( I); SCORE( I) <= WGTMX*(1-DROP(I)); SCORE( I) = WGT0 + @SUM( TEST( J) | J #NE# DEPVAR: WGT( J)* TSCR(I,J)); @FREE( SCORE(I)); ); @FREE( WGT0); @FOR( TEST( J): @FREE( WGT( J));); ! The WFT(J) are unrestricted in sign; @FOR( OBS(I): @BIN( DROP(I)) ! The DROP(I) are 0 or 1; ); ! Constraints limit number of nonzero weights; @FOR( TEST( K) | K #NE# DEPVAR: WGT( K) <= WGTMX*ZUSE( K); -WGT( K) <= WGTMX*ZUSE( K); @BIN( ZUSE( K)); ); @SUM( TEST( K) | K #NE# DEPVAR: ZUSE( K)) <= WGTSUSEDMX; ENDSUBMODEL
CALC: @SOLVE( DISCRAMP); ! Get ready to plot a 2 dimensional subdimension; ! Set D1, D2 = 2 dimensions used; D1 = 0; @FOR( TEST( K) | ZUSE( k) #GT# 0.5: @IFC( D1 #EQ# 0: D1 = K; @ELSE D2 = K; ); ); ! Create set of the GOOD ones, with 2 dimensions in X1, Y1; @FOR( OBS(I) | TSCR( I, DEPVAR) #EQ# 1: @INSERT( OBS1, I); X1( I) = TSCR(I,D1); Y1( I) = TSCR(I,D2); ); ! Create set of the BAD ones, with 2 dimensions in X2, Y2; @FOR( OBS(I) | TSCR( I, DEPVAR) #EQ# 0: @INSERT( OBS2, I); X2( I) = TSCR(I,D1); Y2( I) = TSCR(I,D2); ); @WRITE( ' Measure WGT', @NEWLINE(1)); @WRITE( ' CONSTANT ', @FORMAT( WGT0, '10.3f'), @NEWLINE(1)); @FOR( TEST( J) | J #NE# DEPVAR: @WRITE( @FORMAT( TEST( J),'9s'), @FORMAT( WGT( J), '10.3f'), @NEWLINE(1)); ); @WRITE( @NEWLINE(1)); @WRITE(' If CONSTANT + @SUM( TEST( j): WGT(j)*TSCR(i,j)) >= 0,', @NEWLINE(1)); @WRITE(' Then predict as GOOD, else Predict as BAD.', @NEWLINE(1)); @WRITE( @NEWLINE(1),'Number items incorrectly predicted= ', OBJV, @NEWLINE(1)); ! Now do a scatter plot; @CHARTSCATTER( 'Swiss Bank Notes: Good vs. Counterfeit', !Chart title; @FORMAT(TEST(D1),"7s")+' MEASURE', !Legend for X axis; @FORMAT(TEST(D2),"7s")+' MEASURE', !Legend for Y axis; 'Good', x1, y1, !Point set 1; 'Counterfeit', x2, y2); !Point set 2; ENDCALC