Results for ""
The expectation-maximization method (EM algorithm) is an unsupervised learning algorithm that uses observable data to uncover latent variables.
Many relevant features are frequent in real-world machine learning applications, although only a fraction may be observable. When dealing with variables that are sometimes visible and sometimes not, it is possible to use the instances when the variable is seen or observed to learn and predict for the cases when it is not visible or observed.
This method is commonly referred to as dealing with missing data. Machine learning algorithms can learn patterns and correlations from observed data using the available instances where the variable is observable. These known patterns can subsequently be utilized to predict variable values when the variable is missing or not observable.
The EM algorithm can be used to deal with variables that are only partially observable. When specific variables can be observed, we can use those observations to learn and estimate their values. The values of these variables can then be predicted even when they are not observable.
The EM method iteratively optimizes numerous unsupervised machine learning techniques to find the maximum likelihood or posterior estimate of parameters in statistical models with unobserved latent variables. The EM approach, which can manage missing data, is often used for latent variable models. It comprises an estimating phase (E-step) and a maximization step (M-step), forming an iterative method for improving model fit.
The EM method aims to maximize the likelihood of the observed data by iteratively repeating these stages. It is often used for unsupervised learning tasks like clustering, where latent variables are inferred, and it has applications in various domains like machine learning, computer vision, and natural language processing.
# Perform EM algorithm for 20 epochs
num_epochs = 20
log_likelihoods = []
for epoch in range(num_epochs):
# E-step: Compute responsibilities
gamma1 = pi1_hat * norm.pdf(X, mu1_hat, sigma1_hat)
gamma2 = pi2_hat * norm.pdf(X, mu2_hat, sigma2_hat)
total = gamma1 + gamma2
gamma1 /= total
gamma2 /= total
# M-step: Update parameters
mu1_hat = np.sum(gamma1 * X) / np.sum(gamma1)
mu2_hat = np.sum(gamma2 * X) / np.sum(gamma2)
sigma1_hat = np.sqrt(np.sum(gamma1 * (X - mu1_hat)**2) / np.sum(gamma1))
sigma2_hat = np.sqrt(np.sum(gamma2 * (X - mu2_hat)**2) / np.sum(gamma2))
pi1_hat = np.mean(gamma1)
pi2_hat = np.mean(gamma2)
# Compute log-likelihood
log_likelihood = np.sum(np.log(pi1_hat * norm.pdf(X, mu1_hat, sigma1_hat)
+ pi2_hat * norm.pdf(X, mu2_hat, sigma2_hat)))
log_likelihoods.append(log_likelihood)
# Plot log-likelihood values over epochs
plt.plot(range(1, num_epochs+1), log_likelihoods)
plt.xlabel('Epoch')
plt.ylabel('Log-Likelihood')
plt.title('Log-Likelihood vs. Epoch')
plt.show()
Code source: Geeksforgeeks
EM generally converges to a local optimum rather than the global optimum, with no overall restriction on the convergence rate. It is feasible that it is arbitrarily bad in large dimensions and that there are infinite local optima. As a result, there is a need for alternative techniques of guaranteeing learning, particularly in high-dimensional settings. Moment-based approaches or spectrum techniques are EM alternatives that provide higher consistency guarantees.
Moment-based techniques for learning a probabilistic model's parameters have recently gained popularity because they provide assurances such as global convergence under certain conditions, in contrast to EM, which frequently suffers from the problem of becoming stuck in local optima. Algorithms with learning guarantees can be derived for various relevant models such as mixture models, HMMs, etc.
Image source: Unsplash