! We want to extimate what fraction of items in a
certain population are Red.
How big a sample size do we need as a function of:
M = population size,
F = actual, but unknown to us, fraction of the population that is Red,
N = our sample size,
R = resulting number (a random variable) of items in our sample that are Red,
P = R/N = our estimate of the fraction items that are Red in the population,
T = Target standard deviation for P that we specify.
Then, based on the Hypergeometric distribution:
Var( R) = N*F*M*(M-F*M)*(M-N)/(M*M*(M-1))
= N*F*(1-F)*(M-N)/(M-1)
Var(P) = Var( R)/( N*N) = F*(1-F)*(M-N)/(N*(M-1))
If we want our estimator to have some specified target standard deviation T or less,
( implies variance of T*T) this means we want to solve:
T*T = F*(1-F)*(M-N)/(N*(M-1)), or if we multiply through by N:
N*T*T= F*(1-F)*(M-N)/(M-1), or
N*(T*T + F*(1-F)/(M-1)) = F*(1-F)*M/(M-1), or
N = (F*(1-F)*M/(M-1))/ (T*T + F*(1-F)/(M-1)),
As M goes to infinity, notice that
N approaches (from below): (F*(1-F))/(T*T);
! Keywords: Chart, ChartPCurve, Graph, Hypergeometric distribution, Sampling;
!Let's do some plotting.
;
PROCEDURE SAMPSIZE:
! Compute the required sample size, given M, F, and T;
N = (F*(1-F)*M/(M-1))/(T*T + F*(1-F)/(M-1));
ENDPROCEDURE
CALC:
F = 0.5; ! F = 0.5 is the obvious/worst case Null/strawman hypothesis;
T = 0.05;
ATOTE = (F*(1-F))/(T*T) ;
MUL = 500; ! Upper limit on N for plotting purposes;
! Generate a chart;
@CHARTPCURVE( 'How Big a Sample Do We Need for T = ' +@FORMAT(t,"4.2f")+' F = '+@FORMAT(F,"4.2f")+
' (Asymptote= '+ @FORMAT( ATOTE,"5.0f")+')',
'Population Size','Sample Size',
SAMPSIZE, M, 2, MUL, 'SampSize vs PopSize', N);
ENDCALC
|