大学,考研,学习大数据,机器学习,人工智能机器学习与数据挖掘

模式识别课程(二)-高斯混合模型EM算法GMM实现 python

2019-10-19  本文已影响0人  阿瑟_TJRS

前言

GMM-EM

观测的数据情况

P(Y|\theta)=\sum^c_{j=1}\Pi_jf(Y|\theta_j)

参数估计的目标

EM算法主要步骤

实现代码

'''
Author: Arthur_Pang

Time: 2019/10/19 18:47

Description: Pattern Recognition Course 2th assignment - Gaussian mixture model Expectation Maximum 

'''
'''

高斯混合模型具体实现

'''
from sklearn.datasets.samples_generator import make_blobs
import logging
import numpy as np
from scipy.stats import multivariate_normal
from matplotlib import pyplot
logging.basicConfig(level = logging.INFO,format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s')
logger = logging.getLogger()
class MyGMM(object):
    def __init__(self,s,c,t):
        self.data=None
        self.label=None
        
        self.sample_num=s #样本数量
        self.clusters=c    #样本类别数量
        self.features_num=2 #数据维度
        self.iter_num=t # 迭代次数
        self.mu=None
        self.conv=None
        self.pc=None
        
        pass
    
    def update(self):
        pass
    
    '''
    func:调用sklearn中的make_blobs生成高斯混合分布的样本
    
    '''
    def make_gauss_probs(self):
        logger.info('生成{}条{}类别样本'.format(self.sample_num,self.clusters))
        self.data,self.label=make_blobs(n_samples=self.sample_num,n_features=self.features_num,centers=self.clusters)
        pyplot.scatter(self.data[:,0],self.data[:,1],c=self.label);
        pyplot.title('samples distribution')
        pyplot.show()
        
        
        #print(data,target)

        pass
    '''
    func:EM算法第一步,初始化参数 
    
    2类别数据的话 有6个参数:\mu_1 \mu_2 \sigma_1 \sigma_2 P_c_1  P_c_2
    
    使用矩阵表示的话:用mu conv 和 pc表示
    '''
    def init_params(self):
        N=self.sample_num
        D=self.clusters
        K=self.features_num
        mu = np.random.rand(K, D)
        conv = np.array([np.eye(D)] * K)
        pc = np.array([1.0 / K] * K)
        logger.info("Parameters initialized.")
        print('初始化参数',"mu:", mu, "conv:", conv, "pc:", pc,sep='\n')
        self.mu=mu
        self.conv=conv
        self.pc=pc

        

    '''
    func: 
    返回高斯模型对应的概率密度函数值 即f(X|\theta)
    '''
    def phi(self,Y, mu_k, cov_k):
        norm = multivariate_normal(mean=mu_k, cov=cov_k)
        return norm.pdf(Y)

    '''
    func:EM算法的第E步,计算期望
    return: 返回T矩阵 
    '''
    def get_Expectation(self):
        # 记录样本
        gamma = np.mat(np.zeros((self.sample_num, self.features_num)))
        #
        prob = np.zeros((self.sample_num, self.features_num))
        #计算各高斯模型中样本出现的概率
        for k in range(self.clusters):
            prob[:, k] = self.phi(self.data, self.mu[k], self.conv[k])
        prob = np.mat(prob)
        #计算各高斯模型的响应 
        for k in range(self.clusters):
            gamma[:, k] = self.pc[k] * prob[:, k]
        # 除去分母
        for i in range(self.sample_num):
            gamma[i, :] /= np.sum(gamma[i, :])
        return gamma
    '''
    func: EM算法第M步 迭代模型参数
    '''
    def maximize(self,gamma):
        
        #初始化参数值
      
        cov = []
        
        # 更新每个模型的参数
        for k in range(self.clusters):
            # 第 k 个模型对所有样本的响应度之和
            Nk = np.sum(gamma[:, k])
            # 更新 mu
            # 对每个特征求均值
            self.mu[k, :] = np.sum(np.multiply(self.data, gamma[:, k]), axis=0) / Nk
            # 更新 conv
            cov_k = ( self.data - self.mu[k]).T * np.multiply((self.data - self.mu[k]), gamma[:, k]) / Nk
            cov.append(cov_k)
            # 更新 pc
            self.pc[k] = Nk / self.sample_num
        self.conv = np.array(cov)
    '''
    func: 执行EM算法
    '''
    def EM(self):
        self.make_gauss_probs()
        self.init_params()
        logger.info('begin iterations')
        for i in range(self.iter_num):
            gamma=self.get_Expectation()
            self.maximize(gamma)
        logger.info(' we get the params ')
        print("mu:", self.mu, "conv:", self.conv, "pc:", self.pc,sep='\n')
      
       
    
gmm=MyGMM(100,2,100)
gmm.EM()



参考资料

https://github.com/wrayzheng/gmm-em-clustering/blob/master/gmm.py
https://blog.csdn.net/chasdmeng/article/details/38709063

上一篇 下一篇

猜你喜欢

热点阅读