! We want to extimate what fraction of items in a certain population are Red. How big a sample size do we need as a function of: M = population size, F = actual, but unknown to us, fraction of the population that is Red, N = our sample size, R = resulting number (a random variable) of items in our sample that are Red, P = R/N = our estimate of the fraction items that are Red in the population, T = Target standard deviation for P that we specify. Then, based on the Hypergeometric distribution: Var( R) = N*F*M*(M-F*M)*(M-N)/(M*M*(M-1)) = N*F*(1-F)*(M-N)/(M-1) Var(P) = Var( R)/( N*N) = F*(1-F)*(M-N)/(N*(M-1)) If we want our estimator to have some specified target standard deviation T or less, ( implies variance of T*T) this means we want to solve: T*T = F*(1-F)*(M-N)/(N*(M-1)), or if we multiply through by N: N*T*T= F*(1-F)*(M-N)/(M-1), or N*(T*T + F*(1-F)/(M-1)) = F*(1-F)*M/(M-1), or N = (F*(1-F)*M/(M-1))/ (T*T + F*(1-F)/(M-1)), As M goes to infinity, notice that N approaches (from below): (F*(1-F))/(T*T); ! Keywords: Chart, ChartPCurve, Graph, Hypergeometric distribution, Sampling; !Let's do some plotting. ; PROCEDURE SAMPSIZE: ! Compute the required sample size, given M, F, and T; N = (F*(1-F)*M/(M-1))/(T*T + F*(1-F)/(M-1)); ENDPROCEDURE CALC: F = 0.5; ! F = 0.5 is the obvious/worst case Null/strawman hypothesis; T = 0.05; ATOTE = (F*(1-F))/(T*T) ; MUL = 500; ! Upper limit on N for plotting purposes; ! Generate a chart; @CHARTPCURVE( 'How Big a Sample Do We Need for T = ' +@FORMAT(t,"4.2f")+' F = '+@FORMAT(F,"4.2f")+ ' (Asymptote= '+ @FORMAT( ATOTE,"5.0f")+')', 'Population Size','Sample Size', SAMPSIZE, M, 2, MUL, 'SampSize vs PopSize', N); ENDCALC