====== Kolmogorov-Smirnov test ====== The Kolmogorov-Smirnov test is used to test if an [[kb:probstat:empirical_cumulative_distribution]] follows a particular distribution. ===== Glivenko-Cantelli Theorem (Fundamental theorem of statistics) ===== Let $F(t)$ be the true CDF of $X_1, ..., X_n \stackrel{iid}{\sim} X$. Let $F_n(t)$ be the empirical cdf of $X_1, ..., X_n$. Then, $$\sup_{t\in \mathbb{R}}|F_n(t)-F(t)|\xrightarrow[n\rightarrow\infty]{a.s.}0$$ This tells us that as the number of samples increases, the empirical and true CDFs will converge for all values of $t$. ===== Asymptotic normality ===== $$\sqrt{n}(F_n(t)-F(t))\xrightarrow[n\to\infty]{(d)}\mathcal{N}(0,F(t)(1-F(t)))$$ ===== Donsker's Theorem ===== If $F$ is continuous, then $$\sqrt{n}\sup_{t\in\mathbb{R}}|F_n(t)-F(t)|\xrightarrow[n\to\infty]{(d)}\sup_{0 \frac{q_\alpha}{\sqrt{n}}\}$$ where $q_\alpha$ is the ($1-\alpha$) quantile of $Z$. And the p-value is: $$p = \mathbb{P}[Z > T_n]$$ ===== Calculating the KS test statistic ===== Let $X_{(i)}$ be the $i$th smallest sample. ($X_{(1)}\leq X_{(2)}\leq ... X_{(n)}$). Then, the formula for the test statistic $T_n$ becomes: $$T_n=\max_i\{\max(|\frac{i-1}{n}-F^0(X_{(i)})|, |\frac{i}{n}-F^0(X_{(i)})|)\}$$ This formula checks all of the samples (discontinuities in the [[kb:probstat:empirical_cumulative_distribution|empirical cdf]]) and finds the maximum distance between the empirical cdf and the cdf we are checking against ($F^0$). This test statistic is a pivotal statistic because it does not depend on the distribution of $X_i$s. In other words, this test statistic is general for all KS tests, not specific to any one.