! Discriminant analysis by integer programming (DiscrmSwiss.lng);
! This is a form of categorical regression in which the dependent
variable is a categorical variable, e.g., Good or Bad.
Basic idea:
Given values of various characteristics of an object,
predict its category, e.g.,
Is a prospective customer a good credit risk, or bad?
Is a paper banknote good or counterfeit?
Does a patient have a certain disease or not?
We compute the weights in a scoring formula, so that
Score(i) >= 0 implies a good item, < 0 implies bad.
There are various objectives one can use in finding an
optimal scoring function. Here we use the objective of
Minimize the number of misclassifications;
! Keywords: Discriminant analysis, Classification, Clustering,
Categorical regression, ChartScatter, Data Mining, Grouping,
Scatter chart, Statistics;
SETS:
TEST: WGT, ZUSE;
OBS: DROP, SCORE;
OXT(OBS, TEST): TSCR;
OBS1(OBS): X1, Y1;
OBS2(OBS): X2, Y2;
ENDSETS DATA:
! Genuine and counterfeit banknotes (100 Swiss Franks),
various measurements.
Banknotes BN1 to BN100 are genuine (Good=1),
all others are counterfeit (Good=0).
Dataset courtesy of H. Riedwyl, Bern, Switzerland;
WGTSUSEDMX = 2; ! Max # of weights to use;
WGTMX = 99999; ! Max absolute value of any weight;
DEPVAR = 7; ! Index of the dependent variable (Good);
TEST=
Length Left Right Bottom Top Diagonal Good;
OBS, TSCR=
BN1 214.8 131.0 131.1 9.0 9.7 141.0 1
BN2 214.6 129.7 129.7 8.1 9.5 141.7 1
BN3 214.8 129.7 129.7 8.7 9.6 142.2 1
BN4 214.8 129.7 129.6 7.5 10.4 142.0 1
BN5 215.0 129.6 129.7 10.4 7.7 141.8 1
BN6 215.7 130.8 130.5 9.0 10.1 141.4 1
BN7 215.5 129.5 129.7 7.9 9.6 141.6 1
BN8 214.5 129.6 129.2 7.2 10.7 141.7 1
BN9 214.9 129.4 129.7 8.2 11.0 141.9 1
BN10 215.2 130.4 130.3 9.2 10.0 140.7 1
BN11 215.3 130.4 130.3 7.9 11.7 141.8 1
BN12 215.1 129.5 129.6 7.7 10.5 142.2 1
BN13 215.2 130.8 129.6 7.9 10.8 141.4 1
BN14 214.7 129.7 129.7 7.7 10.9 141.7 1
BN15 215.1 129.9 129.7 7.7 10.8 141.8 1
BN16 214.5 129.8 129.8 9.3 8.5 141.6 1
BN17 214.6 129.9 130.1 8.2 9.8 141.7 1
BN18 215.0 129.9 129.7 9.0 9.0 141.9 1
BN19 215.2 129.6 129.6 7.4 11.5 141.5 1
BN20 214.7 130.2 129.9 8.6 10.0 141.9 1
BN21 215.0 129.9 129.3 8.4 10.0 141.4 1
BN22 215.6 130.5 130.0 8.1 10.3 141.6 1
BN23 215.3 130.6 130.0 8.4 10.8 141.5 1
BN24 215.7 130.2 130.0 8.7 10.0 141.6 1
BN25 215.1 129.7 129.9 7.4 10.8 141.1 1
BN26 215.3 130.4 130.4 8.0 11.0 142.3 1
BN27 215.5 130.2 130.1 8.9 9.8 142.4 1
BN28 215.1 130.3 130.3 9.8 9.5 141.9 1
BN29 215.1 130.0 130.0 7.4 10.5 141.8 1
BN30 214.8 129.7 129.3 8.3 9.0 142.0 1
BN31 215.2 130.1 129.8 7.9 10.7 141.8 1
BN32 214.8 129.7 129.7 8.6 9.1 142.3 1
BN33 215.0 130.0 129.6 7.7 10.5 140.7 1
BN34 215.6 130.4 130.1 8.4 10.3 141.0 1
BN35 215.9 130.4 130.0 8.9 10.6 141.4 1
BN36 214.6 130.2 130.2 9.4 9.7 141.8 1
BN37 215.5 130.3 130.0 8.4 9.7 141.8 1
BN38 215.3 129.9 129.4 7.9 10.0 142.0 1
BN39 215.3 130.3 130.1 8.5 9.3 142.1 1
BN40 213.9 130.3 129.0 8.1 9.7 141.3 1
BN41 214.4 129.8 129.2 8.9 9.4 142.3 1
BN42 214.8 130.1 129.6 8.8 9.9 140.9 1
BN43 214.9 129.6 129.4 9.3 9.0 141.7 1
BN44 214.9 130.4 129.7 9.0 9.8 140.9 1
BN45 214.8 129.4 129.1 8.2 10.2 141.0 1
BN46 214.3 129.5 129.4 8.3 10.2 141.8 1
BN47 214.8 129.9 129.7 8.3 10.2 141.5 1
BN48 214.8 129.9 129.7 7.3 10.9 142.0 1
BN49 214.6 129.7 129.8 7.9 10.3 141.1 1
BN50 214.5 129.0 129.6 7.8 9.8 142.0 1
BN51 214.6 129.8 129.4 7.2 10.0 141.3 1
BN52 215.3 130.6 130.0 9.5 9.7 141.1 1
BN53 214.5 130.1 130.0 7.8 10.9 140.9 1
BN54 215.4 130.2 130.2 7.6 10.9 141.6 1
BN55 214.5 129.4 129.5 7.9 10.0 141.4 1
BN56 215.2 129.7 129.4 9.2 9.4 142.0 1
BN57 215.7 130.0 129.4 9.2 10.4 141.2 1
BN58 215.0 129.6 129.4 8.8 9.0 141.1 1
BN59 215.1 130.1 129.9 7.9 11.0 141.3 1
BN60 215.1 130.0 129.8 8.2 10.3 141.4 1
BN61 215.1 129.6 129.3 8.3 9.9 141.6 1
BN62 215.3 129.7 129.4 7.5 10.5 141.5 1
BN63 215.4 129.8 129.4 8.0 10.6 141.5 1
BN64 214.5 130.0 129.5 8.0 10.8 141.4 1
BN65 215.0 130.0 129.8 8.6 10.6 141.5 1
BN66 215.2 130.6 130.0 8.8 10.6 140.8 1
BN67 214.6 129.5 129.2 7.7 10.3 141.3 1
BN68 214.8 129.7 129.3 9.1 9.5 141.5 1
BN69 215.1 129.6 129.8 8.6 9.8 141.8 1
BN70 214.9 130.2 130.2 8.0 11.2 139.6 1
BN71 213.8 129.8 129.5 8.4 11.1 140.9 1
BN72 215.2 129.9 129.5 8.2 10.3 141.4 1
BN73 215.0 129.6 130.2 8.7 10.0 141.2 1
BN74 214.4 129.9 129.6 7.5 10.5 141.8 1
BN75 215.2 129.9 129.7 7.2 10.6 142.1 1
BN76 214.1 129.6 129.3 7.6 10.7 141.7 1
BN77 214.9 129.9 130.1 8.8 10.0 141.2 1
BN78 214.6 129.8 129.4 7.4 10.6 141.0 1
BN79 215.2 130.5 129.8 7.9 10.9 140.9 1
BN80 214.6 129.9 129.4 7.9 10.0 141.8 1
BN81 215.1 129.7 129.7 8.6 10.3 140.6 1
BN82 214.9 129.8 129.6 7.5 10.3 141.0 1
BN83 215.2 129.7 129.1 9.0 9.7 141.9 1
BN84 215.2 130.1 129.9 7.9 10.8 141.3 1
BN85 215.4 130.7 130.2 9.0 11.1 141.2 1
BN86 215.1 129.9 129.6 8.9 10.2 141.5 1
BN87 215.2 129.9 129.7 8.7 9.5 141.6 1
BN88 215.0 129.6 129.2 8.4 10.2 142.1 1
BN89 214.9 130.3 129.9 7.4 11.2 141.5 1
BN90 215.0 129.9 129.7 8.0 10.5 142.0 1
BN91 214.7 129.7 129.3 8.6 9.6 141.6 1
BN92 215.4 130.0 129.9 8.5 9.7 141.4 1
BN93 214.9 129.4 129.5 8.2 9.9 141.5 1
BN94 214.5 129.5 129.3 7.4 10.7 141.5 1
BN95 214.7 129.6 129.5 8.3 10.0 142.0 1
BN96 215.6 129.9 129.9 9.0 9.5 141.7 1
BN97 215.0 130.4 130.3 9.1 10.2 141.1 1
BN98 214.4 129.7 129.5 8.0 10.3 141.2 1
BN99 215.1 130.0 129.8 9.1 10.2 141.5 1
BN100 214.7 130.0 129.4 7.8 10.0 141.2 1
BN101 214.4 130.1 130.3 9.7 11.7 139.8 0
BN102 214.9 130.5 130.2 11.0 11.5 139.5 0
BN103 214.9 130.3 130.1 8.7 11.7 140.2 0
BN104 215.0 130.4 130.6 9.9 10.9 140.3 0
BN105 214.7 130.2 130.3 11.8 10.9 139.7 0
BN106 215.0 130.2 130.2 10.6 10.7 139.9 0
BN107 215.3 130.3 130.1 9.3 12.1 140.2 0
BN108 214.8 130.1 130.4 9.8 11.5 139.9 0
BN109 215.0 130.2 129.9 10.0 11.9 139.4 0
BN110 215.2 130.6 130.8 10.4 11.2 140.3 0
BN111 215.2 130.4 130.3 8.0 11.5 139.2 0
BN112 215.1 130.5 130.3 10.6 11.5 140.1 0
BN113 215.4 130.7 131.1 9.7 11.8 140.6 0
BN114 214.9 130.4 129.9 11.4 11.0 139.9 0
BN115 215.1 130.3 130.0 10.6 10.8 139.7 0
BN116 215.5 130.4 130.0 8.2 11.2 139.2 0
BN117 214.7 130.6 130.1 11.8 10.5 139.8 0
BN118 214.7 130.4 130.1 12.1 10.4 139.9 0
BN119 214.8 130.5 130.2 11.0 11.0 140.0 0
BN120 214.4 130.2 129.9 10.1 12.0 139.2 0
BN121 214.8 130.3 130.4 10.1 12.1 139.6 0
BN122 215.1 130.6 130.3 12.3 10.2 139.6 0
BN123 215.3 130.8 131.1 11.6 10.6 140.2 0
BN124 215.1 130.7 130.4 10.5 11.2 139.7 0
BN125 214.7 130.5 130.5 9.9 10.3 140.1 0
BN126 214.9 130.0 130.3 10.2 11.4 139.6 0
BN127 215.0 130.4 130.4 9.4 11.6 140.2 0
BN128 215.5 130.7 130.3 10.2 11.8 140.0 0
BN129 215.1 130.2 130.2 10.1 11.3 140.3 0
BN130 214.5 130.2 130.6 9.8 12.1 139.9 0
BN131 214.3 130.2 130.0 10.7 10.5 139.8 0
BN132 214.5 130.2 129.8 12.3 11.2 139.2 0
BN133 214.9 130.5 130.2 10.6 11.5 139.9 0
BN134 214.6 130.2 130.4 10.5 11.8 139.7 0
BN135 214.2 130.0 130.2 11.0 11.2 139.5 0
BN136 214.8 130.1 130.1 11.9 11.1 139.5 0
BN137 214.6 129.8 130.2 10.7 11.1 139.4 0
BN138 214.9 130.7 130.3 9.3 11.2 138.3 0
BN139 214.6 130.4 130.4 11.3 10.8 139.8 0
BN140 214.5 130.5 130.2 11.8 10.2 139.6 0
BN141 214.8 130.2 130.3 10.0 11.9 139.3 0
BN142 214.7 130.0 129.4 10.2 11.0 139.2 0
BN143 214.6 130.2 130.4 11.2 10.7 139.9 0
BN144 215.0 130.5 130.4 10.6 11.1 139.9 0
BN145 214.5 129.8 129.8 11.4 10.0 139.3 0
BN146 214.9 130.6 130.4 11.9 10.5 139.8 0
BN147 215.0 130.5 130.4 11.4 10.7 139.9 0
BN148 215.3 130.6 130.3 9.3 11.3 138.1 0
BN149 214.7 130.2 130.1 10.7 11.0 139.4 0
BN150 214.9 129.9 130.0 9.9 12.3 139.4 0
BN151 214.9 130.3 129.9 11.9 10.6 139.8 0
BN152 214.6 129.9 129.7 11.9 10.1 139.0 0
BN153 214.6 129.7 129.3 10.4 11.0 139.3 0
BN154 214.5 130.1 130.1 12.1 10.3 139.4 0
BN155 214.5 130.3 130.0 11.0 11.5 139.5 0
BN156 215.1 130.0 130.3 11.6 10.5 139.7 0
BN157 214.2 129.7 129.6 10.3 11.4 139.5 0
BN158 214.4 130.1 130.0 11.3 10.7 139.2 0
BN159 214.8 130.4 130.6 12.5 10.0 139.3 0
BN160 214.6 130.6 130.1 8.1 12.1 137.9 0
BN161 215.6 130.1 129.7 7.4 12.2 138.4 0
BN162 214.9 130.5 130.1 9.9 10.2 138.1 0
BN163 214.6 130.1 130.0 11.5 10.6 139.5 0
BN164 214.7 130.1 130.2 11.6 10.9 139.1 0
BN165 214.3 130.3 130.0 11.4 10.5 139.8 0
BN166 215.1 130.3 130.6 10.3 12.0 139.7 0
BN167 216.3 130.7 130.4 10.0 10.1 138.8 0
BN168 215.6 130.4 130.1 9.6 11.2 138.6 0
BN169 214.8 129.9 129.8 9.6 12.0 139.6 0
BN170 214.9 130.0 129.9 11.4 10.9 139.7 0
BN171 213.9 130.7 130.5 8.7 11.5 137.8 0
BN172 214.2 130.6 130.4 12.0 10.2 139.6 0
BN173 214.8 130.5 130.3 11.8 10.5 139.4 0
BN174 214.8 129.6 130.0 10.4 11.6 139.2 0
BN175 214.8 130.1 130.0 11.4 10.5 139.6 0
BN176 214.9 130.4 130.2 11.9 10.7 139.0 0
BN177 214.3 130.1 130.1 11.6 10.5 139.7 0
BN178 214.5 130.4 130.0 9.9 12.0 139.6 0
BN179 214.8 130.5 130.3 10.2 12.1 139.1 0
BN180 214.5 130.2 130.4 8.2 11.8 137.8 0
BN181 215.0 130.4 130.1 11.4 10.7 139.1 0
BN182 214.8 130.6 130.6 8.0 11.4 138.7 0
BN183 215.0 130.5 130.1 11.0 11.4 139.3 0
BN184 214.6 130.5 130.4 10.1 11.4 139.3 0
BN185 214.7 130.2 130.1 10.7 11.1 139.5 0
BN186 214.7 130.4 130.0 11.5 10.7 139.4 0
BN187 214.5 130.4 130.0 8.0 12.2 138.5 0
BN188 214.8 130.0 129.7 11.4 10.6 139.2 0
BN189 214.8 129.9 130.2 9.6 11.9 139.4 0
BN190 214.6 130.3 130.2 12.7 9.1 139.2 0
BN191 215.1 130.2 129.8 10.2 12.0 139.4 0
BN192 215.4 130.5 130.6 8.8 11.0 138.6 0
BN193 214.7 130.3 130.2 10.8 11.1 139.2 0
BN194 215.0 130.5 130.3 9.6 11.0 138.5 0
BN195 214.9 130.3 130.5 11.6 10.6 139.8 0
BN196 215.0 130.4 130.3 9.9 12.1 139.6 0
BN197 215.1 130.3 129.9 10.3 11.5 139.7 0
BN198 214.8 130.3 130.4 10.6 11.1 140.0 0
BN199 214.7 130.7 130.8 11.2 11.2 139.4 0
BN200 214.3 129.9 129.9 10.2 11.5 139.6 0
;
ENDDATA
SUBMODEL DISCRAMP:
! Minimize number of observations dropped to get a partition;
MIN = OBJV;
OBJV = @SUM( OBS( I): DROP( I));
! For bad observations, if DROP(I)=0, we want a strictly negative score;
@FOR( OBS(I)| TSCR( I, DEPVAR) #EQ# 0:
SCORE( I) <= -1 + WGTMX*DROP( I);
SCORE( I) >= - WGTMX*(1- DROP(I));
SCORE( I) =
WGT0 + @SUM( TEST( J) | J #NE# DEPVAR: WGT( J)* TSCR(I,J));
@FREE( SCORE(I));
);
! For good observations, if DROP(I)=0, we want a strictly positive score;
@FOR( OBS(I)| TSCR( I, DEPVAR) #EQ# 1:
SCORE( I) >= 1 - WGTMX*DROP( I);
SCORE( I) <= WGTMX*(1-DROP(I));
SCORE( I) =
WGT0 + @SUM( TEST( J) | J #NE# DEPVAR: WGT( J)* TSCR(I,J));
@FREE( SCORE(I));
);
@FREE( WGT0);
@FOR( TEST( J): @FREE( WGT( J));); ! The WFT(J) are unrestricted in sign;
@FOR( OBS(I): @BIN( DROP(I)) ! The DROP(I) are 0 or 1;
);
! Constraints limit number of nonzero weights;
@FOR( TEST( K) | K #NE# DEPVAR:
WGT( K) <= WGTMX*ZUSE( K);
-WGT( K) <= WGTMX*ZUSE( K);
@BIN( ZUSE( K));
);
@SUM( TEST( K) | K #NE# DEPVAR: ZUSE( K)) <= WGTSUSEDMX;
ENDSUBMODEL
CALC:
@SOLVE( DISCRAMP);
! Get ready to plot a 2 dimensional subdimension;
! Set D1, D2 = 2 dimensions used;
D1 = 0;
@FOR( TEST( K) | ZUSE( k) #GT# 0.5:
@IFC( D1 #EQ# 0:
D1 = K;
@ELSE
D2 = K;
);
);
! Create set of the GOOD ones, with 2 dimensions in X1, Y1;
@FOR( OBS(I) | TSCR( I, DEPVAR) #EQ# 1:
@INSERT( OBS1, I);
X1( I) = TSCR(I,D1);
Y1( I) = TSCR(I,D2);
);
! Create set of the BAD ones, with 2 dimensions in X2, Y2;
@FOR( OBS(I) | TSCR( I, DEPVAR) #EQ# 0:
@INSERT( OBS2, I);
X2( I) = TSCR(I,D1);
Y2( I) = TSCR(I,D2);
);
@WRITE( ' Measure WGT', @NEWLINE(1));
@WRITE( ' CONSTANT ', @FORMAT( WGT0, '10.3f'), @NEWLINE(1));
@FOR( TEST( J) | J #NE# DEPVAR:
@WRITE( @FORMAT( TEST( J),'9s'), @FORMAT( WGT( J), '10.3f'), @NEWLINE(1));
);
@WRITE( @NEWLINE(1));
@WRITE(' If CONSTANT + @SUM( TEST( j): WGT(j)*TSCR(i,j)) >= 0,', @NEWLINE(1));
@WRITE(' Then predict as GOOD, else Predict as BAD.', @NEWLINE(1));
@WRITE( @NEWLINE(1),'Number items incorrectly predicted= ', OBJV, @NEWLINE(1));
! Now do a scatter plot;
@CHARTSCATTER( 'Swiss Bank Notes: Good vs. Counterfeit', !Chart title;
@FORMAT(TEST(D1),"7s")+' MEASURE', !Legend for X axis;
@FORMAT(TEST(D2),"7s")+' MEASURE', !Legend for Y axis;
'Good', x1, y1, !Point set 1;
'Counterfeit', x2, y2); !Point set 2;
ENDCALC
|