Differences

This shows you the differences between two versions of the page.

--- kb:estimation_methods [2022-02-12 18:15] – [Plugin estimator] jaeyoung
+++ kb:estimation_methods [2024-04-30 04:03] (current) – external edit 127.0.0.1
@@ Line 7: / Line 7: @@
 For the plugin estimator, simply plug in the data, weighting each data point by its associated probability.
-Mean:
+==== Mean ====
 $$ \mu = \mathbb{E}[X] $$
@@ Line 13: / Line 14: @@
 $$ \hat{M} = \frac{1}{n} \sum_{i=1}^{n} X_i = \hat{\mathbb{E}}[X] $$
-Variance:
+==== Variance ====
 $$ v = \mathbb{E}\left[\left(X - \mathbb{E}[X] \right)^2 \right] $$
@@ Line 19: / Line 21: @@
 $$ \hat{V} = \frac{1}{n} \sum_{i=1}^{n} (X_i - \mu)^2 $$
-Median:
+==== Median ====
 $$ a = \mathrm{median}(\mathbb{P}) $$
-$$ \hat{A} = \mathrm{median}(\hat{\mathbb{P}}}) $$
+$$ \hat{A} = \mathrm{median}(\hat{\mathbb{P}}) $$
+===== Feature matching =====
+A feature is a property of a distribution, including but not limited to mean, variance, and median.
+The goal of feature matching is to make an estimate for the parameter(s) of the distribution so that the feature(s) of the distribution match the features of the data.
+For a given probability distribution $\mathbb{P}$ with parameter $\theta$, we can extract feature(s) $h^\theta = g(\mathbb{P}^\theta)$. We can also calculate the features for the empirical distribution $\hat{h} = g(\hat{\mathbb{P}})$. Then solve for $\theta$ by setting $h^\theta = \hat{h}$.
+==== Method of moments ====
+Moments of distributions are commonly used as features for feature matching. The $k$-th moment of a random variable $X$ is $\mathbb{E}[X^k]$.
+To estimate the moment from empirical data $X_1, ... X_n$, replace expectation with the average:
+$$ \hat{\mathbb{E}}[X^k] = \frac{1}{n} \sum_{i=1}^n X_i^k $$
+===== Maximum likelihood estimator =====
+Assume a probability mass or distribution function with parameter(s) $\theta$. Given a set of data points $ X = (X_1, ..., X_n) $, the likelihood function is the product of the PMFs of all of the points for a discrete distribution, or the product of the PDFs of all of the points for a continuous distribution.
+Discrete (PMF):
+$$ L^\theta(x_1, ..., x_n) = \prod_{i=1}^{n} \mathbb{P}^\theta (X_i = x_i) $$
+Continuous (PDF):
+$$ L^\theta(x_1, ..., x_n) = \prod_{i=1}^{n} f_{X_i}^\theta (x_i) $$
+==== Log-likelihood ====
+It is usually easier to maximize the log of the likelihood function, known as the log-likelihood function. This is of course equivalent to maximizing the likelihood function.
+Discrete (PMF):
+$$ \max_\theta \sum_{i=1}^n \log \mathbb{P}^\theta (X_i = x_i) $$
+Continuous (PDF):
+$$ \max_\theta \sum_{i=1}^n \log f_{X_i}^\theta (x_i) $$