Show pageOld revisionsBacklinksExport to PDFBack to top This page is read only. You can view the source, but not change it. Ask your administrator if you think this is wrong. ====== Goodness of fit test ====== Let $X$ be a random variable. Given i.i.d. copies of $X$, the goodness of fit test determines whether $X$ has a certain distribution (e.g. normal, uniform, Student's T). ===== Multinomial distribution ===== The multinomial distribution is a generalization of the binomial distribution. A multinomial distribution with $K$ modalities has $K$ possible outcomes (for the binomial distribution, $K = 2$). The parameters are as follows: * $n'$ is the number of trials. * $p_i$ is the probability of observing the $i$th outcome in a single trial. $\sum_{i=1}^Kp_i$ must equal 1. Let $\textbf{p} = [p_1 \ p_2 \ ... \ p_K]^T$. The multinomial distribution can be represented as a random vector $\textbf{N} \in \mathbb{Z}$, where $N^{(i)}$ represents the count of outcome $i$. The multinomial pdf for all $\textbf{n}$ such that $\sum_{i=1}^{K}n^{(i)}=n', n^{(i)} \geq 0, i = 1,...,K$, and $n^{(i)} \in \mathbb{Z}, i = 1,...,K$ is given by: $$p_N(N^{(1)}=n^{(1)}, ... ,N^{(K)}=n^{(K)}) = \frac{n'!}{n^{(1)}!n^{(2)}! ... n^{(K)}!}\prod_{i=1}^{K}p_i^{n^{(i)}}$$ The likelihood function of a sequence of $n$ trials $X_1,X_2, ... , X_n ~ X$ is $$L_n(X_1, ... ,X_n, p_1, ... , p_K) = p_1^{N_1}p_2^{N_2} ... p_K^{N_K}$$ From this result, the maximum likelihood estimator for each of the probabilities is: $$\hat{p}_i = \frac{N_i}{n'}$$ ===== $\chi^2$ test for multinomial distribution ===== The $\chi^s$ test tests if a discrete RV comes from a certain PMF. Theorem $$n\sum_{j=1}^K\frac{(\hat{\textbf{p}}-\textbf{p}_j^0)^2}{\textbf{p}_j^0}\rightarrow_{n\rightarrow\infty}^{(d)}\chi^2_{K-1}$$ For a test of the form $$H_0: \textbf{p} = \textbf{p}^0$$ $$H_1: \textbf{p} \neq \textbf{p}^0$$ the test statistic is $$T_n=n\sum_{j=1}^K\frac{(\hat{p}_j-p_j^0)^2}{p_j^0}$$ A $\chi^2$ test with asymptotic level $\alpha$ is $$\psi = \mathbf{1}\left\{ T_ n > q_\alpha^{K-1} \right\}$$ The asymptotic p-value is then given by $P(\chi^2_{K-1} > T_n)$. **Note that the degrees of freedom is given by $K-1$, the number of modalities minus one, not by $N-1$, the number of observations minus one.** This can be calculated in MATLAB using the following command: <code matlab> chi2cdf(T_n, K-1, 'upper') </code> kb/goodness_of_fit_test.txt Last modified: 2024-04-30 04:03by 127.0.0.1