Lindo Systems
! Districting/Clustering example in LINGO (district.lng);
! Partition or group a set of small population units into
a specified number of larger clusters/districts/territories,
so that each cluster looks good according to various measures.
This problem arises in the design of political districts,
sales territories, and other clustering applications.
This model considers six features/measures:
1) Size Similarity: the population assigned to each cluster should be
close to the same for all clusters,
2) Closeness: the average distance of the units in a cluster from the
cluster center should be small, e.g., if transport costs are important,
3) Compactness: The total boundary length of the clusters should be small.
A unitless measure of compactness, called the Polsby-Popper score,
is the ratio 4*PI*Area/Perimeter^2.
For the most compact shape, a circle, this ratio is 1. You cannot
tile any area with circles, the best you can do is regular hexagons,
for which this ratio is .
4) Connected: Each cluster should be connected geographically, i.e., for
any two units in the same cluster, there must be a path connecting them.
In network terminology, each district contains a spanning tree,
5) Feature Similarity: The maximum width of cluster by some measure, e.g.,
time zone difference, should be small,
6) Gerrymandering feature: If each unit has some fraction of its
population belonging to our political party, we want to have a large
number of clusters in which our party has the majority,
! Keywords: Assignment, Compactness, Connectedness, Cluster analysis, Districting,
Gerrymander, LINGO, Location, Plant location, Political districting,
Polsby, Popper, Redistricting, Sales districting, Territory design, ;
SETS:
unit: popln, pmsru, pmsrd, ! Population measure items;
distc, ! Distance measure for a cluster;
ourpart, vmsr,! Gerrymander measure;
tzone, ! Longitude or time zone measure;
tzmn, tzmx, ! Min and max time zone in cluster;
tborder;! Total border length of this unit;
basecan(unit) : circum;! Base candidate;
uxu(unit, unit): dist, y;! All ordered pairs of units;
neighbor(unit,unit): sborder;! Neighbor pairs, unordered;
! Combinations of bases and neighbors;
bothin(basecan,unit,unit)| @IN(basecan,&1) #AND# @IN(neighbor,&2,&3): z;
ENDSETS
DATA:
!This data set based in part on:
Garfinkel, R. and G. Nemhauser(1970), "Optimal Political
Districting by Implicit Enumeration Techniques",
Management Science, vol. 16, no. 8, pp.495-508.
The region, with district(population) looks like(:
_____________________________________
/ | | \
| 1 (17) | 2 (16) | |
|---------+--+--------+------| |
| | 4 (14)| 5 (8)| |
| 3 |--------+--+-----+| 9 |
| (12) | | | (7)|
|---------+ 7 | 8 | |
| 6 (15)| (16)| (15)| |
\_________|________|_________|________/
;
! Number of districts/clusters required;
nclust = 3;
! Target population size for a district;
targpop = 40;
! Max population in a district;
maxpop = 70;
! Weights to apply to the measures of district cost;
pwgt = 2;! Weight for not coming close to ideal cluster
population;
dwgt = 10;! Weight to apply to the total distance
of units from cluster base;
vwgt = 1;! Weight for having a cluster our party cannot win;
cwgt = 1;! Weight for having compact clusters;
! Upper limits on how bad various measures may be;
plim = 9999;! Upper limit for not coming close to ideal district
population;
nlim = 9999;! Upper limit on total distance from center
over all districts;
vlim = 9999;! Upper limit on how many districts our party cannot win;
tdlim = 2;! Upper limit on time difference of units within a cluster;
! Specify the units, population, population
that will vote for our party, longitude/time zone, total border;
unit popln ourpart tzone tborder =
1 17 7 15 7.8
2 16 9 17 9.2
3 12 8 15 8.0
4 14 8 16 9.0
5 8 3 18 4.8
6 15 7 15 5.8
7 16 6 16 12.2
8 15 4 17 9.6
9 7 5 19 13.4;
! Candidates to be a base;
basecan = 2 3 4 6 8 9;
! A distance matrix between units i and j.
The metric used here is: # other units need
to pass thru to get from i to j;
! 1 2 3 4 5 6 7 8 9;
dist=
0 0 0 0 1 1 1 1 1 !1;
0 0 1 0 0 2 1 0 0 !2;
0 1 0 0 1 0 0 1 2 !3;
0 0 0 0 0 1 0 0 1 !4;
1 0 1 0 0 2 1 0 0 !5;
1 2 0 1 2 0 0 1 2 !6;
1 1 0 0 1 0 0 0 1 !7;
1 0 1 0 0 1 0 0 0 !8;
1 0 2 1 0 2 1 0 0 !9;
;
! Specify the neighbor states and the
length of their shared border;
neighbor, sborder=
1, 2, 1.4
1, 3, 1.4
1, 4, 1.1
2, 4, 2.1
2, 5, 1.1
2, 9, 1.4
3, 4, 1.3
3, 6, 1.4
3, 7, 1.3
4, 5, 2.3
4, 7, 2.3
4, 8, 0.9
5, 8, 1.1
5, 9, 1.3
6, 7, 1.5
7, 8, 2.8
8, 9, 2.8;
ENDDATA
! Definitions:
y(i,i) = 1 if population unit i is a district/cluster center,
y(i,j) = 1, for i <> j, if i is a center and
j is assigned to it.
z(i,j,k) = 1 if both j and k are assigned to i and j and k
are neighbors;
! This simple, direct formulation is fine for modest sized
problems. For large problems one will probably have to
resort to a column generation approach;
!Constraints related to forming districts/clusters/territories;
! Each unit k must be assigned to some cluster i,
or be a cluster center/base;
@FOR( unit(k):
@SUM( unit(i)| @IN( basecan, i): y(i,k)) = 1;
);
! If i not in the set basecan, then i cannot be a base;
@for( unit(i) | #NOT# @IN( basecan, i): y(i,i) = 0);
@FOR(uxu(i,j):
! If i assigned to j, then i must be a center;
y(i,j) <= y(i,i);
! No fractional assignments allowed, make them binary(0/1);
@BIN(y(i,j));
);
! Must have the target number of districts/clusters;
@SUM( unit(i): y(i,i)) = nclust;
! Constraints related to computing various badness measures;
! Compute up and down deviation from target measure for each cluster;
@for( unit(i):
pmsru(i) - pmsrd(i) = @sum(unit(j): popln(j)*y(i,j)) - targpop*y(i,i);
! Max population in a cluster;
@sum(unit(j): popln(j)*y(i,j)) <= maxpop*y(i,i);
);
! Overall, treat up/over and down/under the same;
popdev = @sum(unit(i): pmsru(i) + pmsrd(i));
popdev <= plim;
! Compute the distance cost for each cluster, the
sum of the distances from the base for each cluster;
@for( unit(i):
distc(i) = @sum(unit(j): dist(i,j)*y(i,j));
);
! and overall;
dmsr = @sum(unit(i): distc(i));
dmsr <= nlim;
! Gerrymander related constraints;
! Set vmsr(i) = 1 if our party gets < 50% of vote in
this district;
@for(unit(i):
@bin(vmsr(i));
.5*maxpop*vmsr(i) >= @sum(unit(j): (.5*popln(j)-ourpart(j))*y(i,j))
);
! Overall Gerrymander/vote measure;
voteloss=@sum(unit(i): vmsr(i));
voteloss <= vlim;
! Connectedness contraints. If j is assigned to i, then there
must be a connected path from j to i. More specifically,
either i and j are neighbors, or j must have
a neigbor, k, who is closer to i than j is, and
k must also be assigned to i;
@for( basecan(i):
@for( unit(j)| ( j #NE# i) #AND# (#NOT# @IN(neighbor,i,j)) #AND# (#NOT# @IN(neighbor,j,i)):
y(i,j) <=
@sum(neighbor(j,k) | dist(k,i) #LT# dist(j,i) #OR# (dist(k,i) #EQ# dist(j,i) #AND# k #LT# i) : y(i,k))
+ @sum(neighbor(k,j) | dist(k,i) #LT# dist(j,i) #OR# (dist(k,i) #EQ# dist(j,i) #AND# k #LT# i) : y(i,k));
);
);
! Constraints related to the width of a cluster or some dimension, e.g., time zone,
! Force tzmx(i) >= largest time zone assigned to cluster i,
tzmn(i) <= smallest time zone assigned to cluster i.
Note, we do not require equality;
@FOR( basecan(i):
@FOR( UNIT(j):
tzmx(i) >= tzone(j)*y(i,j);
tzmn(i) <= tzone(i)*y(i,i) - (tzone(i)-tzone(j))*y(i,j);
);
);
! Upper limit on maximum time difference in a cluster. Observe that
if tzmx(i) - tzmn(i) <= tdlim, then given the direction of the
inequalities above, it implies that
max( time zone assigned to i) - min(time zone assigned to i) <= tdlim;
@FOR( basecan(i):
tzmx(i) - tzmn(i) <= tdlim*y(i,i)
);
! Constraints related to compactness. Note, total boundary of a cluster
= total boundary of individual units in cluster minus 2*shared
boundaries of neighbors in the cluster.
A unitless measure of compactness = (boundary length)/area^0.5.
The most compact shape, a circle, has
2*r*pi/(pi*r^2)^0.5 = 2*pi^0.5 ~ 3.545.
To model compactness, allow
z(i,j,k) = 1 only if both j and k are assigned to i,
and j and k are neighbors;
@FOR( basecan(i):
@FOR( neighbor(j,k):
z(i,j,k) <= y(i,j);
z(i,j,k) <= y(i,k);
);
! Set circum(i) = circumference/periphery of cluster i;
circum(i) = @sum(uxu(i,j): tborder(j)*y(i,j))
- @sum(neighbor(j,k): 2*sborder(j,k)*z(i,j,k));
);
compactness = @sum(basecan(i): circum(i));
! Minimize the weighted cost of the clusters constructed;
Min = pwgt*popdev + dwgt*dmsr + vwgt*voteloss + cwgt*compactness;