The Poisson distribution can be deduced from the binomial distribution. Consider a stochastic variable \(X\) that is binomially distributed with \(n\) draws and a success rate \(p\).
$$X \sim \mathcal{B}(n,p)$$The probability \(P(X=x)\) that \(x\) out of \(n\) trials are successful is given by
$$P(X=x) =\frac{n!}{x!(n-x)!} p^x (1-p)^{n-x} $$Defining the parameter \(\mu\) as
$$\mu \triangleq p \cdot n$$the probability becomes
$$P(X=x) =\frac{n!}{x!(n-x)!} \Big(\frac{\mu}{n}\Big)^x \Big(1-\frac{\mu}{n}\Big)^{n-x} = \frac{n!}{x!(n-x)!} \frac{\mu^x}{n^x} \Big(1-\frac{\mu}{n}\Big)^{n} \Big(1-\frac{\mu}{n}\Big)^{-x}$$The Poisson distribution can be obtained by taking the limit of this expression for \(n \to \infty\). Note that this assumption is usually valid when we work with nuclear decay as a sample of a certain radionuclide will contain many atoms.
\begin{equation} \lim_{n \to \infty} P(X=x) = \frac{\mu^x}{x!} \lim_{n \to \infty} \frac{n!}{(n-x)!n^x} \Big(1-\frac{\mu}{n}\Big)^{n} \Big(1-\frac{\mu}{n}\Big)^{-x} \label{eq:limit} \end{equation}The first factor converges to 1 as \(n \to \infty\).
$$\lim_{n \to \infty} \frac{n!}{(n-x)!n^x} = \lim_{n \to \infty} \frac{n \cdot (n-1) \ldots (n-x+1)}{n \cdot n \ldots n} = \lim_{n \to \infty} \frac{n}{n} \cdot \lim_{n \to \infty} \frac{n-1}{n} \ldots \lim_{n \to \infty} \frac{n-x+1}{n} = 1$$The second factor resembles the definition of \(e\).
$$\lim_{n \to \infty} \Big(1-\frac{\mu}{n}\Big)^{n} = \Big[ \lim_{n \to \infty} \Big(1+\frac{1}{-\frac{n}{\mu}}\Big)^{-\frac{n}{\mu}} \Big]^{-\mu} = e^{-\mu}$$The third factor simply converges 1.
$$\lim_{n \to \infty} \Big(1-\frac{\mu}{n}\Big)^{-x} = 1^{-x} = 1$$Equation \eqref{eq:limit} then becomes
$$\lim_{n \to \infty} P(X=x) = \frac{\mu^x e^{-\mu}}{x!}$$This distribution is called the Poisson distribution.
$$\boxed{X \sim \text{Pois}(\mu) \rightarrow P(X=x) = \frac{\mu^x e^{-\mu}}{x!}}$$The expected value and variance of the Poisson distribution are
\begin{align*} E\{X\} &= \mu \\ \sigma^2\{X\} &= \mu \end{align*}Let's assume that we have a sample of \(n\) radioactive atoms. \(p\) is the probability that 1 atom decays in a certain time \(T\).
\begin{align*} &n = \text{number of atoms in sample}\\ &p = \text{probability that 1 atom decays in a certain time } T \\ &\mu = n \cdot p \end{align*}Assuming that \(n\) is very large, we can use the Poisson distribution to model the decay of the sample.
$$\text{Number of decays in time } T = X \sim \text{Pois}(\mu)$$If we repeat the experiment many times (keeping the time \(T\) constant), we will expect the average number of decays to be \(\mu\) with variance \(\mu\). This is because the expected value and variance of the distribution are equal to \(\mu\).
In real life, we only perform 1 experiment in a certain time \(T\). After all, if you perform 100 experiment with time \(T\), you might as well combine those 100 experiments into 1 big experiment with time \(100T\). Thus, when we count the number of decays, we will be close to the actual value \(\mu\), but there will be some deviation.
$$\text{draw a certain value from } X \rightarrow \hat\mu$$The actual standard deviation is \(\sqrt{\mu}\), but this value is not known to us, which is why \(\sqrt{\hat\mu}\) is the best estimate we have of the standard deviation.
$$\boxed{ \text{Estimate of } \mu \rightarrow \hat{\mu} \text{ with standard standard deviation } \sqrt{\hat\mu}}$$How does the activity of a radioactive sample relate to the average number of counts \(\mu\) in a certain time \(T\)? The answer is quite simple: the activity is just the average number of counts divided by the experiment time.
$$A = \frac{\mu}{T}$$An estimate of the activity can be obtained by estimating \(\mu\)
$$\hat A = \frac{\hat \mu}{T}$$The standard deviation of this estimate is
$$\sigma\{\hat A\} = \frac{\sigma\{\hat \mu\}}{T} = \frac{\sqrt{\hat \mu}}{T} = \frac{\sqrt{\hat \mu}}{T} \frac{\sqrt{\hat \mu}}{\sqrt{\hat \mu}} = \frac{\hat A}{\sqrt{\hat\mu}}$$The relative uncertainty on the activity \(A\) is then given by
$$\boxed{\frac{\sigma\{{\hat A}\}}{\hat A} = \frac{1}{\sqrt{\hat\mu}}}$$This is the \(\sqrt{N}\)-law. It shows that if we want to reduce the uncertainty by a factor of 2, we need to measure 4 times more counts, meaning that we have to increase the measurement time by a factor 4.