! Chi-square Goodness of Fit test.  We want to test
  how well an hypothesized distribution fits actual data.
  For this case, the hypothesized distribution is the
  Poisson with same mean as the actual data;
! Keywords: Chi-square test, Goodness-of-fit test, 
            Poisson distribution;
SETS:
  cell: actual, expected, pd ;
ENDSETS
DATA: ! Data on (sales/day, num_days) for sales of reams of a certain type of letter-size paper in a store; actual = 66 ! days of 0 sales; 73 ! days of 1 unit sale; 63 ! days of 2 unit sales; 41 ! days of 3 unit sales; 37 ! days of 4 unit sales; 24 ! days of 5 unit sales; 12 ! days of 6 unit sales; 15 ! days of 7 unit sales; 10 ! days of 8 units sales; 9 ! days of 9 unit sales; 1 ! days of 10 unit sales; 2 ! days of 11 unit sales; 3 ! days of 12 unit sales; 0 ! days of 13 unit sales; 1 ! days of 14 unit sales; 3 ! days of 15 unit sales; 1 ! days of 16 unit sales; 0 ! days of 17 unit sales; 0 ! days of 18 unit sales; 0 ! days of 19 unit sales; 0 ! days of 20 unit sales; 0 ! days of 21 unit sales; 0 ! days of 22 unit sales; 1 ! days of 23 unit sales; ; ENDDATA CALC: ! Compute expected number in each cell, assuming: 1) cells are 0, 1, 2,... 2) distribution is Poisson; ! Set ns = total observations; ns = @SUM(cell(i): actual(i)); ! Compute mean and variance; mean = @SUM(cell(i): (i-1)*actual(i))/ns; var = @SUM(cell(i):(((i-1) - mean)^2)*actual(i))/ns; ! Compute the distribution from the cdf; pd(1) = @pps(mean, 0); ! Probability of zero; @FOR( cell(i)| i #GT# 1: pd(i) = @PPS(mean,i-1)-@PPS(mean,i-2); ); ! Because of truncation, probabilities may sum to < 1. Rescale so sum to 1; sump = @SUM(cell: pd); @FOR(cell: pd = pd/sump); ! Compute expected number in each cell; @FOR( cell(i): expected(i) = pd(i)*ns; ); ! Compute Chi-square value. Use original cells if they contain > 4 observations; chibig = @SUM(cell(i)| expected(i) #gt#4: ((actual(i)-expected(i))^2)/expected(i)); ! Group remaining small cells into one big group one; esmall = (1 - @SUM(cell(i)| expected(i) #gt# 4: pd(i)))* ns; chitot = chibig + ((@SUM( cell(i)| expected(i) #le# 4: actual(i)) - esmall)^2)/esmall; ! Compute degrees of freedom; df= @sum(cell(i)| expected(i) #gt#4:1)!number cells with > 4 entries; + 1 !group cell of all <= 4 cells; -1 !sum of all cell values = ns, e.g. if ns-1 cells give a perfect match, then so will the last cell; -1; !we estimated mean from the original data; ! Get probability a chi-square r.v. could be this large; conf = 1 - @PCX(df,chitot); ENDCALC DATA: ! Write out some results; @TEXT() = ' '; @TEXT() = ' Mean, var=', mean, var; @TEXT() = 'Unscaled probabilities =', sump; @TEXT() = 'Chi-square stat, df=', chitot, df; @TEXT() = 'Prob Chi-square stat could be that large=', conf; @TEXT() = 'Sales Actual Expected'; @TEXT() = @WRITEFOR(cell(i): @FORMAT( i-1, '3.0f'),' ', @FORMAT(actual(i),'3.0f'), ' ', @FORMAT(expected(i),'8.4f'), @NEWLINE(1)); ENDDATA