Lindo Systems

! Districting/Clustering example in LINGO    (district.lng);
! Partition or group a set of small population units into 
  a specified number of larger clusters/districts/territories,
  so that each cluster looks good according to various measures.
  This problem arises in the design of political districts,
  sales territories, and other clustering applications.
 This model considers six features/measures:
   1) Size Similarity: the population assigned to each cluster should be
       close to the same for all clusters,
   2) Closeness: the average distance of the units in a cluster from the
      cluster center should be small, e.g., if transport costs are important,
   3) Compactness: The total boundary length of the clusters should be small.
      A unitless measure of compactness, called the Polsby-Popper score,
      is the ratio 4*PI*Area/Perimeter^2.
      For the most compact shape, a circle, this ratio is 1. You cannot
      tile any area with circles, the best you can do is regular hexagons,
      for which this ratio is .
   4) Connected: Each cluster should be connected geographically, i.e., for
       any two units in the same cluster, there must be a path connecting them.
       In network terminology, each district contains a spanning tree,
   5) Feature Similarity: The maximum width of cluster by some measure, e.g., 
       time zone difference, should be small,
   6) Gerrymandering feature: If each unit has some fraction of its
       population belonging to our political party, we want to have a large
       number of clusters in which our party has the majority,

! Keywords: Assignment, Compactness, Connectedness, Cluster analysis, Districting, 
    Gerrymander, LINGO, Location, Plant location, Political districting, 
    Polsby, Popper, Redistricting, Sales districting, Territory design, ;
SETS:
  unit: popln, pmsru, pmsrd,  ! Population measure items;
        distc,        ! Distance measure for a cluster;
        ourpart, vmsr,! Gerrymander measure;
        tzone,        ! Longitude or time zone measure;
        tzmn, tzmx,   ! Min and max time zone in cluster;
        tborder;! Total border length of this unit;
  basecan(unit) : circum;! Base candidate;
  uxu(unit, unit): dist, y;! All ordered pairs of units;
  neighbor(unit,unit): sborder;! Neighbor pairs, unordered;
                      ! Combinations of bases and neighbors;
  bothin(basecan,unit,unit)| @IN(basecan,&1) #AND# @IN(neighbor,&2,&3): z;
ENDSETS
DATA:

!This data set based in part on:
 Garfinkel, R. and G. Nemhauser(1970), "Optimal Political
  Districting by Implicit Enumeration Techniques",
  Management Science, vol. 16, no. 8, pp.495-508.

The region, with district(population) looks like(:
  _____________________________________
 /           |                |        \
 |   1 (17)  |     2 (16)     |        |
 |---------+--+--------+------|        |
 |         |     4 (14)| 5 (8)|        |   
 |   3     |--------+--+-----+|    9   |
 |   (12)  |        |         |     (7)|
 |---------+    7   |   8     |        |
 |   6 (15)|    (16)|     (15)|        |
 \_________|________|_________|________/
 ;

! Number of districts/clusters required;
   nclust = 3;
! Target population size for a district;
   targpop =  40;
! Max population in a district;
   maxpop = 70;

! Weights to apply to the measures of district cost;
  pwgt =  2;! Weight for not coming close to ideal cluster
                population;
  dwgt = 10;! Weight to apply to the total distance
                of units from cluster base;
  vwgt =  1;! Weight for having a cluster our party cannot win;
  cwgt =  1;! Weight for having compact clusters;

! Upper limits on how bad various measures may be;
  plim = 9999;! Upper limit for not coming close to ideal district
                   population;
  nlim = 9999;! Upper limit on total distance from center 
                   over all districts;
  vlim = 9999;! Upper limit on how many districts our party cannot win;

  tdlim = 2;! Upper limit on time difference of units within a cluster;

! Specify the units, population, population
  that will vote for our party, longitude/time zone, total border;
   unit popln ourpart tzone tborder =
     1    17    7      15     7.8
     2    16    9      17     9.2
     3    12    8      15     8.0
     4    14    8      16     9.0
     5     8    3      18     4.8
     6    15    7      15     5.8
     7    16    6      16    12.2
     8    15    4      17     9.6
     9     7    5      19    13.4;

 ! Candidates to be a base;
    basecan = 2  3  4  6  8  9;

! A distance matrix between units i and j. 
  The metric used here is: # other units need
  to pass thru to get from i to j;
!      1  2  3  4  5  6  7  8  9;
  dist= 
       0  0  0  0  1  1  1  1  1 !1;
       0  0  1  0  0  2  1  0  0 !2;
       0  1  0  0  1  0  0  1  2 !3;
       0  0  0  0  0  1  0  0  1 !4;
       1  0  1  0  0  2  1  0  0 !5;
       1  2  0  1  2  0  0  1  2 !6;
       1  1  0  0  1  0  0  0  1 !7;
       1  0  1  0  0  1  0  0  0 !8;
       1  0  2  1  0  2  1  0  0 !9;
  ;

! Specify the neighbor states and the 
   length of their shared border;
  neighbor, sborder=
   1, 2,    1.4
   1, 3,    1.4
   1, 4,    1.1
   2, 4,    2.1
   2, 5,    1.1
   2, 9,    1.4
   3, 4,    1.3
   3, 6,    1.4
   3, 7,    1.3
   4, 5,    2.3
   4, 7,    2.3
   4, 8,    0.9
   5, 8,    1.1
   5, 9,    1.3
   6, 7,    1.5
   7, 8,    2.8
   8, 9,    2.8;
  ENDDATA 

 ! Definitions:
    y(i,i) = 1 if population unit i is a district/cluster center,
    y(i,j) = 1, for i <> j, if i is a center and 
                j is assigned to it.
    z(i,j,k) = 1 if both j and k are assigned to i and j and k
                  are neighbors;

! This simple, direct formulation is fine for modest sized
 problems.  For large problems one will probably have to 
 resort to a column generation approach;

!Constraints related to forming districts/clusters/territories;
! Each unit k must be assigned to some cluster i,
    or be a cluster center/base;
   @FOR( unit(k):
       @SUM( unit(i)| @IN( basecan, i): y(i,k)) = 1;
       );
! If i not in the set basecan, then i cannot be a base;
   @for( unit(i) | #NOT# @IN( basecan, i): y(i,i) = 0);

   @FOR(uxu(i,j):
! If i assigned to j, then i must be a center;
       y(i,j) <= y(i,i);
! No fractional assignments allowed, make them binary(0/1);
      @BIN(y(i,j));
       );

! Must have the target number of districts/clusters;
     @SUM( unit(i): y(i,i)) = nclust;

! Constraints related to computing various badness measures;

! Compute up and down deviation from target measure for each cluster;
   @for( unit(i):
     pmsru(i) - pmsrd(i) = @sum(unit(j): popln(j)*y(i,j)) - targpop*y(i,i);
 ! Max population in a cluster;
     @sum(unit(j): popln(j)*y(i,j)) <= maxpop*y(i,i);
       );
! Overall, treat up/over and down/under the same;
     popdev = @sum(unit(i): pmsru(i) + pmsrd(i));
     popdev   <= plim;

 ! Compute the distance cost for each cluster, the
    sum of the distances from the base for each cluster;
   @for( unit(i):
     distc(i) = @sum(unit(j): dist(i,j)*y(i,j));
       );
 ! and overall;
     dmsr = @sum(unit(i): distc(i));
     dmsr   <= nlim;

 ! Gerrymander related constraints;
 !  Set vmsr(i) = 1 if our party gets < 50% of vote in
      this district;
   @for(unit(i):
      @bin(vmsr(i));
      .5*maxpop*vmsr(i) >= @sum(unit(j): (.5*popln(j)-ourpart(j))*y(i,j))  
       );
 
 ! Overall Gerrymander/vote measure;
     voteloss=@sum(unit(i): vmsr(i));
     voteloss <= vlim;


 ! Connectedness contraints.  If j is assigned to i, then there
     must be a connected path from j to i.  More specifically,
     either i and j are neighbors, or j must have 
     a neigbor, k, who is closer to i than j is, and
     k must also be assigned to i;
     @for( basecan(i):
      @for( unit(j)| ( j #NE# i) #AND# (#NOT# @IN(neighbor,i,j)) #AND# (#NOT# @IN(neighbor,j,i)):
        y(i,j) <= 
           @sum(neighbor(j,k) | dist(k,i) #LT# dist(j,i) #OR# (dist(k,i) #EQ# dist(j,i) #AND# k #LT# i) : y(i,k)) 
         + @sum(neighbor(k,j) | dist(k,i) #LT# dist(j,i) #OR# (dist(k,i) #EQ# dist(j,i) #AND# k #LT# i) : y(i,k));
          );
         );

 ! Constraints related to the width of a cluster or some dimension, e.g., time zone,
 ! Force tzmx(i) >= largest  time zone assigned to cluster i,
         tzmn(i) <= smallest time zone assigned to cluster i.
      Note, we do not require equality;
      @FOR( basecan(i):
        @FOR( UNIT(j):
         tzmx(i) >= tzone(j)*y(i,j);
         tzmn(i) <= tzone(i)*y(i,i) - (tzone(i)-tzone(j))*y(i,j);
              );
          );

 ! Upper limit on maximum time difference in a cluster. Observe that
   if tzmx(i) - tzmn(i) <= tdlim, then given the direction of the
   inequalities above, it implies that
    max( time zone assigned to i) - min(time zone assigned to i) <= tdlim;
       @FOR( basecan(i):
         tzmx(i) - tzmn(i) <= tdlim*y(i,i)
          );

  ! Constraints related to compactness. Note, total boundary of a cluster
    = total boundary of individual units in cluster minus 2*shared
      boundaries of neighbors in the cluster.  
    A unitless measure of compactness = (boundary length)/area^0.5.
    The most compact shape, a circle, has
    2*r*pi/(pi*r^2)^0.5 = 2*pi^0.5 ~ 3.545.
     To model compactness, allow
      z(i,j,k) = 1 only if both j and k are assigned to i,
                 and j and k are neighbors;
      @FOR( basecan(i):
        @FOR( neighbor(j,k):
          z(i,j,k) <= y(i,j);
          z(i,j,k) <= y(i,k);
            );
 !     Set circum(i) = circumference/periphery of cluster i;
         circum(i) = @sum(uxu(i,j): tborder(j)*y(i,j))
                   - @sum(neighbor(j,k): 2*sborder(j,k)*z(i,j,k));
          );
       compactness = @sum(basecan(i): circum(i));

 ! Minimize the weighted cost of the clusters constructed;
     Min = pwgt*popdev + dwgt*dmsr + vwgt*voteloss + cwgt*compactness;