Lindo Systems

Sets:
  ! The sample weighting problem.               (SampBal.lng)
Suppose you have a sample of 1000 people, classified according to
the two dimensions:
  Gender: female or male, and 
  Income:  Low, MediumLow, Medium, or High
In our target population, the "rim" fractions to apply:
Female: 50%,  Male: 50%,
LowIncome: 40%, MediumLow: 30%, Medium: 20%, High: 10%.
Our sample of 1000 has the following number in the various cells:
           Low    MediumLow   Medium   High
  Female:  220 	  147	       106	    55
    Male:  193	  149         94	    36

We want to predict a dependent variable, e.g., how the population will vote
on a particular issue.
   We say the sample is representative if the fraction of the sample that is
of a given income level is the same as in the general population, the
fraction of the sample that are female is the same as in the target population, etc.
   Our sample above is not representative of the target population in this sense.
Notice that (220 + 193)/1000 = 0.413 > 0.4 so the first income level is over-represented.
Similarly, (220 + 147 + 106 + 55)/1000 = 0.528 > 0.5
so females are over-represented.
    One thing we could do is to discard "unrepresentative"  observations 
from the sample so that the remaining observations accurately match the target population. 
Thus, we have a smaller sample size. We want to drop the smallest number
of observations to achieve a close match to the population.
A slightly more general approach is to not completely drop an observation,
but rather, reduce the weights given to "unrepresentative" observations. 
Which weights should be adjusted, and by how much? 
   If the sample is not representative, we may want to choose a weight other than 1.0 
for each observation so that the weighted fractions of the sample match the
population fractions. For our example there are 2 + 4 = 6 target fractions to
be matched.  There are 1000 different weights to be chosen (or 8 weights if we
apply the same weight to all observations in the same cell), so there are lots of
different weight combinations that match the target population fractions.
Which weight combination should we choose?
The "Max effective sample Size" (MS) approach chooses the weights that
   a) match the population targets, and
   b) minimize the variance of the resulting estimate, or equivalently,
      maximize the size of an equivalent representative sample that has
      the same variance.

 Here are some cell weights that  perfectly match the rim targets. 
           Low     MediumLow   Medium     High
  Female: 0.31	  1.065170  1.700566  1.726545
    Male: 1.719171  0.96255   0.21	    0.14
Notice that
(0.31*220 + 1.719171*193)/1000 = 400/1000 = 0.4
so the first income level is matched.
Are these the best possible weights?  It can be shown that these 
weights have an effective sample size of 729, - considerably lower than
the unweighted sample size of 1000.
   If on the other hand, you choose the following weights: 
           Low     MediumLow   Medium     High
  Female:  0.731    1.182      1.444     0.228
    Male:  1.240    0.847      0.499     2.430

it has an effective sample size of 843.  Considerably better than 729.
   There are various intermediate things we could do in the sense we 
could consider the spectrum ranging from 
  placing great importance on closely matching the population rim targets vs.
  not letting the weights vary too much from 1 but 
      not exactly matching the population target fractions.
In either case, we want to minimize the variance of our estimator.
If you are willing to allow a very slight violation of the rim targets,
then the following weights:
           Low     MediumLow   Medium     High
  Female:  0.940    0.971      0.96      1.059
    Male:  1.015    1.045      1.038     1.134
have an effective sample size of 997, almost as good as 1000.
It does a fairly good job of matching the rim targets as follows:
 Achieved rim target percent
 Variable  Level  WgtPercent
   GENDER    1    51.0
   GENDER    2    49.0
   INCOME    1    40.3
   INCOME    2    29.9
   INCOME    3    20.0
   INCOME    4     9.9
        
For example:  (0.940*220 + 1.015*193) = 402.695/1000 = 40.3.

! Ref: Madansky, A. and L. Schrage (2017), "Maximizing effective sample size
 while sample balancing," Quirks Marketing Research Review;

! Keywords:  Marketing, Quadratic optimization, Sampling, Statistics;
 CELL :
   count, HWgt, ! HWgt is just for comparison;
   weightcell, 
   income, age, region, achvcountc;
 variable;
 level;
 VXL( variable, level): targpcent, achvcount, sampno,
                                   achvcounth;
 CXV( CELL, variable): Lnum, VarLvl;
! Keywords: Sample balancing, Survey analysis, Re-weighting, Raking;
endsets
data:

!Case 2x4: We sampled 1000 people. There are two explanatory variabes,
Gender and Income level.  In the target population we expect the percentages:
   Female: 50%
     Male: 50%
  Income1: 40%
  Income2: 30%
  Income3: 20%
  Income4: 10%
; 
! Minimizing rim error is all that matters;
!Case2x4;  alpha = 1;

!Case2x4;  level = 1 2 3 4;

!Case2x4;  variable = Gender Income;

! The population rim target percentages for
  each variable and combination of level;
!Case2x4;
    VXL  Targpcent =
  Gender 1  50
  Gender 2  50
  Income 1  40
  Income 2  30
  Income 3  20
  Income 4  10;

! Cell number and combination of levels sampled;
!Case2x4;
CELL VarLvl=	
  1   1 1 
  2   1 2 
  3   1 3 
  4   1 4 
  5   2 1 
  6   2 2 
  7   2 3 
  8   2 4 
   ;

! Count in each cell & heuristic weights.
The actual numbers in our sample for each of
the  2 genders and 4 income levels are below.
Notice that (220 + 193)/1000 = 0.413 > 0.4 so
the first income level is over-represented.
Similarly, (220 + 147 + 106 + 55)/1000 = 0.528 > 0.5
so the first gender is over-represented.;
!Case2x4;
count	 =
220	147	106	55	
193	149	94	36	
;
! Here are some cell weights that do perfectly
match the rim targets. Note that
(0.31*220 + 1.719171*193)/1000 = 400/1000 = 0.4
so the first income level is matched;
!Case2x4;
HWgt =
0.31	    1.065170	1.700566	1.726545	
1.719171  0.96255       0.21	      0.14	
;
! Various interesting settings of alpha;
!Case01   alpha = 1; 
!Case01  level = 1 2 3 4 5 6 7 8 9 10;
!Case01  variable = income_ age_ region_;
! The population rim target percentages;
!Case01 
   VXL,   targpcent =
 income_ 1  17.95
 income_ 2  23.20
 income_ 3  27.28
 income_ 4  14.34
 income_ 5  17.23
 age_    1   3.87
 age_    2   9.07
 age_    3  19.09
 age_    4  21.60
 age_    5  18.02
 age_    6   6.44
 age_    7   5.17
 age_    8   8.80
 age_    9   5.91
 age_   10   2.03
 region_   1   5.14
 region_   2  14.21
 region_   3  16.44
 region_   4   7.14
 region_   5  19.05
 region_   6   6.30
 region_   7  10.91
 region_   8   6.40
 region_   9  14.41 
;
! Sample data. This data set has
   5 possible income levels,
  10 age levels,
   9 regions, giving
 450 potential cells.
  ;
!Case01 
CELL income age region count   HWgt =
1	1	1	5	1	7.03868	
2	1	2	3	5	2.64602	
3	1	2	5	1	2.89112	
4	1	2	6	1	2.83734	
5	1	2	7	1	2.80156	
6	1	3	1	1	1.44292	
7	1	3	3	3	1.29764	
8	1	3	4	4	0.84916	
9	1	3	5	1	1.54273	
10	1	3	6	2	1.48895	
11	1	3	7	3	1.45317	
12	1	3	8	2	1.27067	
13	1	3	9	1	1.46797	
14	1	4	1	2	1.1352	
15	1	4	2	2	1.87272	
16	1	4	3	3	0.98992	
17	1	4	4	7	0.54144	
18	1	4	5	11	1.23501	
19	1	4	6	2	1.18123	
20	1	4	7	4	1.14545	
21	1	4	8	4	0.96296	
22	1	4	9	5	1.16026	
23	1	5	1	2	1.10644	
24	1	5	2	3	1.84395	
25	1	5	3	4	0.96116	
26	1	5	4	2	0.51268	
27	1	5	5	5	1.20625	
28	1	5	6	1	1.15247	
29	1	5	8	2	0.9342	
30	1	5	9	1	1.13149	
31	1	6	1	1	1.05367	
32	1	6	2	1	1.79119	
33	1	6	3	1	0.90839	
34	1	6	4	2	0.45992	
35	1	6	5	2	1.15349	
36	1	6	6	1	1.09971	
37	1	6	8	1	0.88143	
38	1	6	9	1	1.07873	
39	1	7	1	1	1.20926	
40	1	7	2	1	1.94677	
41	1	7	5	2	1.30907	
42	1	7	6	1	1.25529	
43	1	7	7	1	1.21951	
44	1	7	9	1	1.23431	
45	1	8	2	4	2.38434	
46	1	8	3	2	1.50154	
47	1	8	4	3	1.05307	
48	1	8	5	1	1.74664	
49	1	8	9	1	1.67188	
50	1	9	3	3	2.45789	
51	1	9	4	2	2.00941	
52	1	9	5	2	2.70298	
53	1	9	9	3	2.62822	
54	2	2	3	1	2.84731	
55	2	2	4	2	2.39883	
56	2	2	5	3	3.0924	
57	2	2	7	4	3.00284	
58	2	2	8	1	2.82035	
59	2	3	3	4	1.49892	
60	2	3	4	4	1.05045	
61	2	3	5	4	1.74402	
62	2	3	6	4	1.69024	
63	2	3	7	3	1.65446	
64	2	3	8	4	1.47196	
65	2	3	9	5	1.66926	
66	2	4	1	3	1.33649	
67	2	4	2	6	2.07401	
68	2	4	3	8	1.19121	
69	2	4	4	6	0.74273	
70	2	4	5	6	1.4363	
71	2	4	6	2	1.38252	
72	2	4	7	4	1.34674	
73	2	4	8	2	1.16425	
74	2	4	9	3	1.36154	
75	2	5	1	1	1.30773	
76	2	5	2	1	2.04524	
77	2	5	3	6	1.16245	
78	2	5	4	5	0.71397	
79	2	5	5	6	1.40754	
80	2	5	6	2	1.35376	
81	2	5	7	3	1.31798	
82	2	5	8	2	1.13548	
83	2	5	9	6	1.33278	
84	2	6	1	1	1.25496	
85	2	6	3	3	1.10968	
86	2	6	4	1	0.6612	
87	2	6	5	2	1.3547	
88	2	6	6	2	1.30099	
89	2	6	7	2	1.26521	
90	2	6	8	2	1.08272	
91	2	6	9	2	1.28002	
92	2	7	8	1	1.2383	
93	2	7	9	5	1.4356	
94	2	8	2	1	2.58563	
95	2	8	3	4	1.70283	
96	2	8	4	1	1.25435	
97	2	8	5	1	1.94792	
98	2	8	6	2	1.89414	
99	2	8	7	2	1.85836	
100	2	8	8	1	1.67587	
101	2	8	9	2	1.87317	
102	2	9	3	1	2.65918	
103	2	9	4	2	2.2107	
104	2	10	3	1	5.3832	
105	3	1	4	1	5.92843	
106	3	2	1	1	2.37462	
107	3	2	2	3	3.11213	
108	3	2	5	2	2.47443	
109	3	2	6	1	2.42065	
110	3	2	8	1	2.20238	
111	3	2	9	1	2.39967	
112	3	3	1	3	1.02623	
113	3	3	2	5	1.76375	
114	3	3	3	16	0.88095	
115	3	3	4	7	0.43247	
116	3	3	5	9	1.12604	
117	3	3	6	5	1.07226	
118	3	3	7	7	1.03648	
119	3	3	8	6	0.85399	
120	3	3	9	11	1.05128	
121	3	4	1	3	0.71852	
122	3	4	2	5	1.45603	
123	3	4	3	15	0.57324	
124	3	4	4	13	0.12476	
125	3	4	5	14	0.81833	
126	3	4	6	3	0.76455	
127	3	4	7	4	0.72877	
128	3	4	8	8	0.54628	
129	3	4	9	5	0.74357	
130	3	5	1	2	0.68975	
131	3	5	2	12	1.42727	
132	3	5	3	16	0.54447	
133	3	5	4	12	0.096	
134	3	5	5	14	0.78956	
135	3	5	6	3	0.73578	
136	3	5	7	9	0.70001	
137	3	5	8	3	0.51751	
138	3	5	9	8	0.71481	
139	3	6	1	1	0.63699	
140	3	6	2	2	1.3745	
141	3	6	3	5	0.49171	
142	3	6	4	4	0.04323	
143	3	6	5	7	0.7368	
144	3	6	6	2	0.68302	
145	3	6	7	2	0.64724	
146	3	6	8	2	0.46475	
147	3	6	9	2	0.66204	
148	3	7	1	2	0.79257	
149	3	7	2	3	1.53009	
150	3	7	3	9	0.64729	
151	3	7	4	5	0.19881	
152	3	7	5	2	0.89238	
153	3	7	6	2	0.8386	
154	3	7	8	4	0.62033	
155	3	7	9	2	0.81763	
156	3	8	1	1	1.23014	
157	3	8	2	1	1.96765	
158	3	8	3	2	1.08486	
159	3	8	4	3	0.63638	
160	3	8	5	1	1.32995	
161	3	8	8	2	1.0579	
162	3	8	9	2	1.25519	
163	3	9	2	1	2.924	
164	3	9	3	1	2.0412	
165	3	9	5	1	2.2863	
166	3	9	8	1	2.01424	
167	3	9	9	3	2.21154	
168	3	10	4	1	4.31675	
169	3	10	5	1	5.01032	
170	4	1	2	1	7.0697	
171	4	2	3	1	2.03933	
172	4	2	5	1	2.28442	
173	4	2	7	1	2.19486	
174	4	3	1	5	0.83622	
175	4	3	2	2	1.57374	
176	4	3	3	4	0.69094	
177	4	3	4	7	0.24247	
178	4	3	5	7	0.93604	
179	4	3	6	1	0.88226	
180	4	3	7	3	0.84648	
181	4	3	8	2	0.66398	
182	4	3	9	10	0.86128	
183	4	4	1	3	0.52851	
184	4	4	2	8	1.26603	
185	4	4	3	14	0.38323	
186	4	4	4	5	0.001	
187	4	4	5	8	0.62832	
188	4	4	6	2	0.57454	
189	4	4	7	7	0.53876	
190	4	4	8	8	0.35627	
191	4	4	9	9	0.55357	
192	4	5	1	3	0.49975	
193	4	5	2	4	1.23726	
194	4	5	3	17	0.35447	
195	4	5	4	7	0.001	
196	4	5	5	11	0.59956	
197	4	5	6	2	0.54578	
198	4	5	7	6	0.51	
199	4	5	8	3	0.32751	
200	4	5	9	6	0.5248	
201	4	6	1	2	0.44698	
202	4	6	3	7	0.3017	
203	4	6	5	3	0.54679	
204	4	6	7	4	0.45723	
205	4	6	8	4	0.27474	
206	4	6	9	4	0.47204	
207	4	7	3	2	0.45729	
208	4	7	4	2	0.00881	
209	4	7	5	1	0.70238	
210	4	7	9	1	0.62762	
211	4	8	1	1	1.04013	
212	4	8	2	1	1.77765	
213	4	8	3	2	0.89485	
214	4	8	4	2	0.44637	
215	4	8	5	8	1.13994	
216	4	8	6	2	1.08616	
217	4	8	7	1	1.05038	
218	4	8	8	2	0.86789	
219	4	8	9	2	1.06519	
220	4	9	4	1	1.40272	
221	4	9	6	1	2.04251	
222	5	1	3	1	6.31875	
223	5	1	4	1	5.87028	
224	5	1	7	1	6.47429	
225	5	2	1	1	2.31647	
226	5	2	7	1	2.32672	
227	5	2	9	1	2.34152	
228	5	3	1	2	0.96808	
229	5	3	3	5	0.8228	
230	5	3	4	4	0.37432	
231	5	3	5	3	1.06789	
232	5	3	6	1	1.01411	
233	5	3	7	6	0.97833	
234	5	3	8	3	0.79584	
235	5	3	9	6	0.99313	
236	5	4	1	3	0.66037	
237	5	4	2	4	1.39788	
238	5	4	3	5	0.51509	
239	5	4	4	6	0.06661	
240	5	4	5	11	0.76018	
241	5	4	6	4	0.7064	
242	5	4	7	10	0.67062	
243	5	4	8	5	0.48812	
244	5	4	9	14	0.68542	
245	5	5	1	4	0.6316	
246	5	5	2	4	1.36912	
247	5	5	3	15	0.48632	
248	5	5	4	4	0.03784	
249	5	5	5	11	0.73141	
250	5	5	6	5	0.67763	
251	5	5	7	7	0.64185	
252	5	5	8	2	0.45936	
253	5	5	9	6	0.65666	
254	5	6	2	3	1.31635	
255	5	6	3	1	0.43356	
256	5	6	4	1	0.001	
257	5	6	5	5	0.67865	
258	5	6	6	1	0.62487	
259	5	6	7	1	0.58909	
260	5	6	8	2	0.4066	
261	5	6	9	2	0.60389	
262	5	7	1	4	0.73442	
263	5	7	4	1	0.1406	
264	5	7	7	5	0.74467	
265	5	7	8	2	0.56218	
266	5	7	9	4	0.75948	
267	5	8	1	1	1.17199	
268	5	8	5	1	1.2718	
269	5	8	7	1	1.18224	
270	5	8	9	4	1.19704	
271	5	9	1	1	2.12833	
272	5	9	4	1	1.53458	
273	5	9	6	1	2.17437	
274	5	9	9	1	2.15339	
275	5	10	2	1	5.58987	
;


! Sample data. This data set has
   5 possible income levels,
  10 age levels,
   9 regions, giving
 450 potential cells.
  ;
!CaseOK    alpha = 1; 
!CaseOK   level = 1 2 3 4 5 6 7 8 9 10;
!CaseOK   variable = income_ age_ region_;
! The population rim target percentages;
!CaseOK  
   VXL,   targpcent =
 income_ 1  17.95
 income_ 2  23.20
 income_ 3  27.28
 income_ 4  14.34
 income_ 5  17.23
 age_    1   3.87
 age_    2   9.07
 age_    3  19.09
 age_    4  21.60
 age_    5  18.02
 age_    6   6.44
 age_    7   5.17
 age_    8   8.80
 age_    9   5.91
 age_   10   2.03
 region_   1   5.14
 region_   2  14.21
 region_   3  16.44
 region_   4   7.14
 region_   5  19.05
 region_   6   6.30
 region_   7  10.91
 region_   8   6.40
 region_   9  14.41 
;
! Cell number and combination of levels sampled;
!CaseOK 
CELL	  VarLvl=	
1	1	1	5	
2	1	2	3	
3	1	2	5	
4	1	2	6	
5	1	2	7	
6	1	3	1	
7	1	3	3	
8	1	3	4	
9	1	3	5	
10	1	3	6	
11	1	3	7	
12	1	3	8	
13	1	3	9	
14	1	4	1	
15	1	4	2	
16	1	4	3	
17	1	4	4	
18	1	4	5	
19	1	4	6	
20	1	4	7	
21	1	4	8	
22	1	4	9	
23	1	5	1	
24	1	5	2	
25	1	5	3	
26	1	5	4	
27	1	5	5	
28	1	5	6	
29	1	5	8	
30	1	5	9	
31	1	6	1	
32	1	6	2	
33	1	6	3	
34	1	6	4	
35	1	6	5	
36	1	6	6	
37	1	6	8	
38	1	6	9	
39	1	7	1	
40	1	7	2	
41	1	7	5	
42	1	7	6	
43	1	7	7	
44	1	7	9	
45	1	8	2	
46	1	8	3	
47	1	8	4	
48	1	8	5	
49	1	8	9	
50	1	9	3	
51	1	9	4	
52	1	9	5	
53	1	9	9	
54	2	2	3	
55	2	2	4	
56	2	2	5	
57	2	2	7	
58	2	2	8	
59	2	3	3	
60	2	3	4	
61	2	3	5	
62	2	3	6	
63	2	3	7	
64	2	3	8	
65	2	3	9	
66	2	4	1	
67	2	4	2	
68	2	4	3	
69	2	4	4	
70	2	4	5	
71	2	4	6	
72	2	4	7	
73	2	4	8	
74	2	4	9	
75	2	5	1	
76	2	5	2	
77	2	5	3	
78	2	5	4	
79	2	5	5	
80	2	5	6	
81	2	5	7	
82	2	5	8	
83	2	5	9	
84	2	6	1	
85	2	6	3	
86	2	6	4	
87	2	6	5	
88	2	6	6	
89	2	6	7	
90	2	6	8	
91	2	6	9	
92	2	7	8	
93	2	7	9	
94	2	8	2	
95	2	8	3	
96	2	8	4	
97	2	8	5	
98	2	8	6	
99	2	8	7	
100	2	8	8	
101	2	8	9	
102	2	9	3	
103	2	9	4	
104	2	10	3	
105	3	1	4	
106	3	2	1	
107	3	2	2	
108	3	2	5	
109	3	2	6	
110	3	2	8	
111	3	2	9	
112	3	3	1	
113	3	3	2	
114	3	3	3	
115	3	3	4	
116	3	3	5	
117	3	3	6	
118	3	3	7	
119	3	3	8	
120	3	3	9	
121	3	4	1	
122	3	4	2	
123	3	4	3	
124	3	4	4	
125	3	4	5	
126	3	4	6	
127	3	4	7	
128	3	4	8	
129	3	4	9	
130	3	5	1	
131	3	5	2	
132	3	5	3	
133	3	5	4	
134	3	5	5	
135	3	5	6	
136	3	5	7	
137	3	5	8	
138	3	5	9	
139	3	6	1	
140	3	6	2	
141	3	6	3	
142	3	6	4	
143	3	6	5	
144	3	6	6	
145	3	6	7	
146	3	6	8	
147	3	6	9	
148	3	7	1	
149	3	7	2	
150	3	7	3	
151	3	7	4	
152	3	7	5	
153	3	7	6	
154	3	7	8	
155	3	7	9	
156	3	8	1	
157	3	8	2	
158	3	8	3	
159	3	8	4	
160	3	8	5	
161	3	8	8	
162	3	8	9	
163	3	9	2	
164	3	9	3	
165	3	9	5	
166	3	9	8	
167	3	9	9	
168	3	10	4	
169	3	10	5	
170	4	1	2	
171	4	2	3	
172	4	2	5	
173	4	2	7	
174	4	3	1	
175	4	3	2	
176	4	3	3	
177	4	3	4	
178	4	3	5	
179	4	3	6	
180	4	3	7	
181	4	3	8	
182	4	3	9	
183	4	4	1	
184	4	4	2	
185	4	4	3	
186	4	4	4	
187	4	4	5	
188	4	4	6	
189	4	4	7	
190	4	4	8	
191	4	4	9	
192	4	5	1	
193	4	5	2	
194	4	5	3	
195	4	5	4	
196	4	5	5	
197	4	5	6	
198	4	5	7	
199	4	5	8	
200	4	5	9	
201	4	6	1	
202	4	6	3	
203	4	6	5	
204	4	6	7	
205	4	6	8	
206	4	6	9	
207	4	7	3	
208	4	7	4	
209	4	7	5	
210	4	7	9	
211	4	8	1	
212	4	8	2	
213	4	8	3	
214	4	8	4	
215	4	8	5	
216	4	8	6	
217	4	8	7	
218	4	8	8	
219	4	8	9	
220	4	9	4	
221	4	9	6	
222	5	1	3	
223	5	1	4	
224	5	1	7	
225	5	2	1	
226	5	2	7	
227	5	2	9	
228	5	3	1	
229	5	3	3	
230	5	3	4	
231	5	3	5	
232	5	3	6	
233	5	3	7	
234	5	3	8	
235	5	3	9	
236	5	4	1	
237	5	4	2	
238	5	4	3	
239	5	4	4	
240	5	4	5	
241	5	4	6	
242	5	4	7	
243	5	4	8	
244	5	4	9	
245	5	5	1	
246	5	5	2	
247	5	5	3	
248	5	5	4	
249	5	5	5	
250	5	5	6	
251	5	5	7	
252	5	5	8	
253	5	5	9	
254	5	6	2	
255	5	6	3	
256	5	6	4	
257	5	6	5	
258	5	6	6	
259	5	6	7	
260	5	6	8	
261	5	6	9	
262	5	7	1	
263	5	7	4	
264	5	7	7	
265	5	7	8	
266	5	7	9	
267	5	8	1	
268	5	8	5	
269	5	8	7	
270	5	8	9	
271	5	9	1	
272	5	9	4	
273	5	9	6	
274	5	9	9	
275	5	10	2 ;	

!CaseOK 
count	HWgt =	
1	7.031564	
5	2.643345	
1	2.888197	
1	2.834471	
1	2.798728	
1	1.441461	
3	1.296328	
4	0.848302	
1	1.54117	
2	1.487445	
3	1.451701	
2	1.269385	
1	1.466486	
2	1.134052	
2	1.870827	
3	0.988919	
7	0.540893	
11	1.233761	
2	1.180036	
4	1.144292	
4	0.961986	
5	1.159087	
2	1.105321	
3	1.842086	
4	0.960188	
2	0.512162	
5	1.20503	
1	1.151305	
2	0.933256	
1	1.130346	
1	1.052605	
1	1.789379	
1	0.907472	
2	0.459455	
2	1.152324	
1	1.098598	
1	0.880539	
1	1.077639	
1	1.208037	
1	1.944802	
2	1.307747	
1	1.254021	
1	1.218277	
1	1.233062	
4	2.381929	
2	1.500022	
3	1.052005	
1	1.744874	
1	1.67019	
3	2.455405	
2	2.007379	
2	2.700247	
3	2.625563	
1	2.844431	
2	2.396405	
3	3.089274	
4	2.999804	
1	2.817499	
4	1.497405	
4	1.049388	
4	1.742257	
4	1.688531	
3	1.652787	
4	1.470472	
5	1.667572	
3	1.335139	
6	2.071913	
8	1.190006	
6	0.741979	
6	1.434848	
2	1.381122	
4	1.345378	
2	1.163073	
3	1.360164	
1	1.306408	
1	2.043172	
6	1.161275	
5	0.713248	
6	1.406117	
2	1.352391	
3	1.316648	
2	1.134332	
6	1.331433	
1	1.253691	
3	1.108558	
1	0.660532	
2	1.35333	
2	1.299675	
2	1.263931	
2	1.081625	
2	1.278726	
1	1.237048	
5	1.434149	
1	2.583016	
4	1.701108	
1	1.253082	
1	1.945951	
2	1.892225	
2	1.856481	
1	1.674176	
2	1.871276	
1	2.656492	
2	2.208465	
1	5.377758	
1	5.922436	
1	2.372219	
3	3.108984	
2	2.471928	
1	2.418203	
1	2.200153	
1	2.397244	
3	1.025192	
5	1.761967	
16	0.880059	
7	0.432033	
9	1.124902	
5	1.071176	
7	1.035432	
6	0.853127	
11	1.050217	
3	0.717794	
5	1.454558	
15	0.57266	
13	0.124634	
14	0.817503	
3	0.763777	
4	0.728033	
8	0.545728	
5	0.742818	
2	0.689053	
12	1.425827	
16	0.54392	
12	0.095903	
14	0.788762	
3	0.735036	
9	0.699302	
3	0.516987	
8	0.714087	
1	0.636346	
2	1.37311	
5	0.491213	
4	0.043186	
7	0.736055	
2	0.682329	
2	0.646586	
2	0.46428	
2	0.661371	
2	0.791769	
3	1.528543	
9	0.646636	
5	0.198609	
2	0.891478	
2	0.837752	
4	0.619703	
2	0.816803	
1	1.228896	
1	1.965661	
2	1.083763	
3	0.635737	
1	1.328605	
2	1.05683	
2	1.253921	
1	2.921044	
1	2.039136	
1	2.283989	
1	2.012204	
3	2.209304	
1	4.312386	
1	5.005255	
1	7.062553	
1	2.037268	
1	2.28211	
1	2.192641	
5	0.835375	
2	1.572149	
4	0.690241	
7	0.242225	
7	0.935094	
1	0.881368	
3	0.845624	
2	0.663309	
10	0.860409	
3	0.527976	
8	1.26475	
14	0.382843	
5	0.000999	
8	0.627685	
2	0.573959	
7	0.538215	
8	0.35591	
9	0.55301	
3	0.499245	
4	1.236009	
17	0.354112	
7	0.000999	
11	0.598954	
2	0.545228	
6	0.509484	
3	0.327179	
6	0.524269	
2	0.446528	
7	0.301395	
3	0.546237	
4	0.456768	
4	0.274462	
4	0.471563	
2	0.456828	
2	0.008801	
1	0.70167	
1	0.626985	
1	1.039078	
1	1.775853	
2	0.893945	
2	0.445919	
8	1.138788	
2	1.085062	
1	1.049318	
2	0.867013	
2	1.064113	
1	1.401302	
1	2.040445	
1	6.312362	
1	5.864345	
1	6.467745	
1	2.314128	
1	2.324368	
1	2.339153	
2	0.967101	
5	0.821968	
4	0.373942	
3	1.06681	
1	1.013085	
6	0.977341	
3	0.795035	
6	0.992126	
3	0.659702	
4	1.396467	
5	0.514569	
6	0.066543	
11	0.759411	
4	0.705686	
10	0.669942	
5	0.487627	
14	0.684727	
4	0.630961	
4	1.367736	
15	0.485828	
4	0.037802	
11	0.730671	
5	0.676945	
7	0.641201	
2	0.458896	
6	0.655996	
3	1.315019	
1	0.433122	
1	0.000999	
5	0.677964	
1	0.624238	
1	0.588494	
2	0.406189	
2	0.603279	
4	0.733678	
1	0.140458	
5	0.743917	
2	0.561612	
4	0.758712	
1	1.170805	
1	1.270514	
1	1.181045	
4	1.19583	
1	2.126178	
1	1.533029	
1	2.172172	
1	2.151213	
1	5.584219	
;

! Data set Dorofeev and Grant, suggests negative weights;
!CaseDG   alpha = 1; 
!CaseDG   level = 1 2 3 ;
!CaseDG   variable = EXV1 EXV2;
! The population rim target percentages for
  each variable and combination of level;
!CaseDG 
    VXL  Targpcent =
  EXV1 1  55.5556
  EXV1 2  33.3333
  EXV1 3  11.1111
  EXV2 1  22.2222
  EXV2 2  33.3333
  EXV2 3  44.4444 ;

! Cell number and combination of levels sampled;
!CaseDG 
CELL VarLvl=	
  1   1 1 
  2   1 2 
  3   1 3 
  4   2 1 
  5   2 2 
  6   2 3 
  7   3 1 
  8   3 2 
  9   3 3 ;

! Count in each cell & heuristic weights;
!CaseDG 
count	HWgt =
  5    0
  7    1.06881
 10    1.76551
  3    4.91677
  0    0
  0    0
  9    0
 10    0.417
  1    0.94301 ;
enddata
submodel samplebal:
! If alpha = 0, we get straight sampling, alpha = 1: weighted sampling with perfect match
    if we allow weightcell < 0;
 min = alphat* rimerr +(1- alphat)* cellerr;
   rimerr <= rimerrUL;    ! In case we want to constrain rimerr;
   cellerr <= cellerrUL;  ! In case we want to constrain rimerr;

!  For each variable i and level j, compute the achieved count
  when weightcell( c), is applied to each observation in cell c;
!  For all explanatory variables i and their levels j
  the achieved count over all cells sampled is... ;
   @for( VXL( i, j) :
     [ACNT]  achvcount( i, j) = @sum( CELL( c) | VarLvl( c, i) #EQ# j : weightcell( c) * count( c))
       );

! Special case. If a cell is empty, its weight = 0;
     @for( CELL( c) | count( c) #EQ# 0: weightcell( c) = 0);

! Add this if we want the estimator to be unbiased;
   [SIZE] @sum( CELL( c): weightcell( c) * count( c)) = sampsize;

! Compute rim error measure. Should watch out for sampno( i, j) = 0;
   [CRIMER] rimerr =  @sum( VXL( i, j) :
                    (( achvcount( i, j) - targpcent( i, j)* sampsize/100) / sampno( i, j))^2 );

! Compute cost of increased variance;
    [CCELLR] cellerr = @sum( CELL( c): count( c)*(1 - weightcell( c))^2);
! Standard formula for rim error;
! rimerrstd = (rimerr/m)^0.5;
! @free(rimerrstd);
endsubmodel

calc:
  @SET( 'TERSEO',2);    ! Output level (0:verb, 1:terse, 2:only errors, 3:none);
  m = @size( VXL);      ! number of rim targets;
  sampsize = @sum( CELL( c): count( c));

! Compute number in sample for each variable i that have level j;
   @for( VXL( i, j):
       sampno( i, j) = @sum( CELL( c) | VarLvl( c, i) #eq# j: count( c));
       );

 ! Compute achieved counts for the heuristic weights;
     
! Add this to check the heuristic weights estimator to be unbiased;
   sampsizeh = @sum( CELL( c): HWgt( c) * count( c));
  alphat = alpha; ! Set temporary alpha = alpha;
 @solve( samplebal);

 ! In case alpha = 1, minimize other error as secondary objective;
  @ifc( alpha #EQ# 1:
    alphat = 0;
    rimerrUL = rimerr;
    @solve( samplebal);
      );


 ! In case alpha = 0, minimize other error as secondary objective;
  @ifc( alpha #EQ# 0:
    alphat = 1;
    cellerrUL = cellerr;
    @solve( samplebal);
      );

 ! Compute standard error measure: (Sum of squared rim errors/NumberRimTargets)^0.5;
  rimerrstd = ( rimerr/ m)^0.5;

 ! Compute cell error measure for existing heuristic method;
  cellHeur = @sum( CELL( c): count( c)*(1 - HWgt( c))^2);

 @write('     Input data:', @newline( 1));
 @for( VXL( i, j) :
     achvcounth( i, j) = @sum( CELL( c) | VarLvl( c, i) #EQ# j : HWgt( c) * count( c))
       );
 
! Compute rim error measure. Should watch out for sampno( i, j) = 0;
    rimerrh =  @sum( VXL( i, j) : (( achvcounth( i, j) - targpcent( i, j)* sampsize/100) / sampno( i, j))^2 );
    rimerrstdh = ( rimerrh/ m)^0.5;

  rimerrstdh = ( rimerrh/ m)^0.5;

 @write( @format( @size( variable),'12.0f'),' = number explanatory variables', @newline(1));
 @write( @format( @size( CELL),'12.0f'),' = number cells with count > 0', @newline(1));
 @write( @format( m,'12.0f'),' = number brackets over all variables', @newline(1));
 @write( @format( sampsize,'12.0f'),' = sample size(sum of the counts)', @newline(1));
 EffSmpSzH = ( sampsize^2)/( sampsize + cellHeur);
 @write( @format( EffSmpSzH,'12.2f'), ' = Effective sample size of heuristic wgts',  @newline(1));

 @write( @format( rimerrstdh,'12.5f'), ' = root mean standardized rim error of heuristic weights',  @newline(1));
 @write( @newline( 1));

 @write('     Sample Balancing Results', @newline(1));
  @write( @format( alpha,'12.4f'),' = alpha ( weight on rim errors, 1-alpha= weight on max sample size) ',
                @newline(1));
! Report effective sample size;
  EffSmpSz = ( sampsize^2)/( sampsize + cellerr);
 @write( @format( EffSmpSz,'12.2f'), ' = Effective sample size (Optimal for given alpha) ',  @newline(1));
 @write( @format( rimerrstd,'12.5f'), ' = root mean standardized rim error',  @newline(1));
 @write( @format( cellerr, '12.3f'), ' = individual weight squared deviation from 1.0 ',  @newline(1));

 @write('  CELL  ');
 @for( variable( v):
   @write( @format( variable( v),'9s'));
     ); 
 
 @write( '  Count       Weight  ', @newline( 1));
  @for( CELL( c) :
   @write( @format( c,'5.0f'),' ');
   @for( variable( v):
     @write( @format( VarLvl( c, v),'9.0f'))
       );
   @write( @format( count( c),'8.0f'), '   ', @format( weightcell( c),'10.3f'),@newline(1));
     );

  @write(@newline( 1), ' Achieved rim target percent', @newline( 1));
    @write(' Variable  Level  WgtPercent', @newline( 1));
     @for( VXL( i, j) :
       @write( @format( Variable( i),'9s'), ' ', @format(  j,'4.0f'),
          ' ', @format( 100* achvcount( i, j)/Sampsize,'7.1f'), @newline( 1));
       );
endcalc