Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
kb:estimation_methods [2022-02-12 18:15] – [Plugin estimator] jaeyoung | kb:estimation_methods [2024-04-30 04:03] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 7: | Line 7: | ||
For the plugin estimator, simply plug in the data, weighting each data point by its associated probability. | For the plugin estimator, simply plug in the data, weighting each data point by its associated probability. | ||
- | Mean: | + | ==== Mean ==== |
$$ \mu = \mathbb{E}[X] $$ | $$ \mu = \mathbb{E}[X] $$ | ||
Line 13: | Line 14: | ||
$$ \hat{M} = \frac{1}{n} \sum_{i=1}^{n} X_i = \hat{\mathbb{E}}[X] $$ | $$ \hat{M} = \frac{1}{n} \sum_{i=1}^{n} X_i = \hat{\mathbb{E}}[X] $$ | ||
- | Variance: | + | ==== Variance |
$$ v = \mathbb{E}\left[\left(X - \mathbb{E}[X] \right)^2 \right] $$ | $$ v = \mathbb{E}\left[\left(X - \mathbb{E}[X] \right)^2 \right] $$ | ||
Line 19: | Line 21: | ||
$$ \hat{V} = \frac{1}{n} \sum_{i=1}^{n} (X_i - \mu)^2 $$ | $$ \hat{V} = \frac{1}{n} \sum_{i=1}^{n} (X_i - \mu)^2 $$ | ||
- | Median: | + | ==== Median |
$$ a = \mathrm{median}(\mathbb{P}) $$ | $$ a = \mathrm{median}(\mathbb{P}) $$ | ||
- | $$ \hat{A} = \mathrm{median}(\hat{\mathbb{P}}}) $$ | + | $$ \hat{A} = \mathrm{median}(\hat{\mathbb{P}}) $$ |
+ | |||
+ | ===== Feature matching ===== | ||
+ | |||
+ | A feature is a property of a distribution, | ||
+ | |||
+ | The goal of feature matching is to make an estimate for the parameter(s) of the distribution so that the feature(s) of the distribution match the features of the data. | ||
+ | |||
+ | For a given probability distribution $\mathbb{P}$ with parameter $\theta$, we can extract feature(s) $h^\theta = g(\mathbb{P}^\theta)$. We can also calculate the features for the empirical distribution $\hat{h} = g(\hat{\mathbb{P}})$. Then solve for $\theta$ by setting $h^\theta = \hat{h}$. | ||
+ | |||
+ | ==== Method of moments ==== | ||
+ | |||
+ | Moments of distributions are commonly used as features for feature matching. The $k$-th moment of a random variable $X$ is $\mathbb{E}[X^k]$. | ||
+ | |||
+ | To estimate the moment from empirical data $X_1, ... X_n$, replace expectation with the average: | ||
+ | |||
+ | $$ \hat{\mathbb{E}}[X^k] = \frac{1}{n} \sum_{i=1}^n X_i^k $$ | ||
+ | ===== Maximum likelihood estimator ===== | ||
+ | |||
+ | Assume a probability mass or distribution function with parameter(s) $\theta$. Given a set of data points $ X = (X_1, ..., X_n) $, the likelihood function is the product of the PMFs of all of the points for a discrete distribution, | ||
+ | |||
+ | Discrete (PMF): | ||
+ | |||
+ | $$ L^\theta(x_1, | ||
+ | |||
+ | Continuous (PDF): | ||
+ | |||
+ | $$ L^\theta(x_1, | ||
+ | |||
+ | ==== Log-likelihood ==== | ||
+ | |||
+ | It is usually easier to maximize the log of the likelihood function, known as the log-likelihood function. This is of course equivalent to maximizing the likelihood function. | ||
+ | |||
+ | Discrete (PMF): | ||
+ | |||
+ | $$ \max_\theta \sum_{i=1}^n \log \mathbb{P}^\theta (X_i = x_i) $$ | ||
+ | |||
+ | Continuous (PDF): | ||
+ | |||
+ | $$ \max_\theta \sum_{i=1}^n \log f_{X_i}^\theta (x_i) $$ |