leetcode字符串匹配算法之KMP算法

2020-02-11  本文已影响0人  Domibaba

本篇介绍一种高效的字符串匹配算法——KMP算法。

KMP算法是一种改进的字符串匹配算法,由D.E.Knuth,J.H.Morris和V.R.Pratt提出,取三个人名字的第一个字母,简称KMP算法。KMP的核心思想就是在某次匹配失败的时候,利用已经匹配的成功的字符串的信息——称之为部分匹配表,来减少模式串和主串的匹配次数,因此KMP算法很高效,的时间复杂度O(m+n)

从上面的介绍可以看出,KMP的核心就是要利用好已经匹配的信息,避免重复匹配,关键就在于如何得到“部分匹配表”,要理解这一过程,我们一步一步来分解下字符串的匹配过程。假设我们需要在字符串BDABCDABCEABCDABCSF(主串)中来查找ABCDABCSF(模式串)

字符 A B C D A B C S F
索引 0 1 2 3 4 5 6 7 8
部分匹配值 0 0 0 0 1 2 3 0 0
模式字符串移动的距离 = K - next[K - 1]
重新开始匹配的字符位置 = K,也就是从pattern[K]开始与主串匹配

部分匹配表和KMP算法的实现

#include <stdio.h>
#include <string.h>
#include <stdlib.h>

void printPMT(const char *pattern, int *next, int len) {
    printf("------Partial Match Table--------\n");
    for (int i = 0; i < len; i++) {
        printf("%c ", pattern[i]);
    }
    printf("\n");

    for (int i = 0; i < len; i++) {
        printf("%d ", next[i]);
    }
    printf("\n");
    printf("----------------------------------\n");
}

void next(const char *pattern, int *next, int len) {
    for (int i = 0; i < len; i++) {
        next[i] = 0;
        for (int preIndex = i; preIndex > 0;) {
            if (pattern[next[preIndex - 1]] == pattern[i]) {
                next[i] = next[preIndex - 1] + 1;
                break;
            }
            preIndex = next[preIndex - 1];
        }
    }

    printPMT(pattern, next, len);
}

int kmp_match(const char *str, const char *pattern) {
    if (str == NULL || *str == 0 || pattern == NULL || *pattern == 0) {
        return -1;
    }
    
    int lenOfPattern = strlen(pattern);
    int *nextTbl = (int *)malloc(lenOfPattern * sizeof(int));
    if (nextTbl == NULL) {
        return -1;
    }
    memset(nextTbl, 0, lenOfPattern * sizeof(int));
    next(pattern, nextTbl, lenOfPattern);

    int matchIndexOfMainString = 0;
    int matchStartIndexOfMainString = 0;
    int matchIndexOfPatternString = 0;
    int lenOfMainString = strlen(str);
    while (true) {  
        for (; matchIndexOfMainString < lenOfMainString && matchIndexOfPatternString < lenOfPattern; matchIndexOfMainString++, matchIndexOfPatternString++) {
            if (str[matchIndexOfMainString] != pattern[matchIndexOfPatternString]) {
                break;
            }
        }

        if (matchIndexOfPatternString == lenOfPattern) {
            free(nextTbl);
            return matchStartIndexOfMainString;
        }

        if (matchIndexOfMainString == lenOfMainString) {
            free(nextTbl);
            return -1;
        }
        
        if (matchIndexOfPatternString == 0) {
            matchStartIndexOfMainString += 1;
            matchIndexOfMainString = matchStartIndexOfMainString;
        } else {
            matchStartIndexOfMainString += matchIndexOfPatternString - nextTbl[matchIndexOfPatternString - 1];
            matchIndexOfPatternString = nextTbl[matchIndexOfPatternString - 1];
        }

    }
    free(nextTbl);

    return -1;
}

char *myStrstr(char *str, char *pattern) {
    int pos = kmp_match(str, pattern);
    if (pos >= 0) {
        return &str[pos];
    }
    return NULL;
}

int main(int argc, char **argv) {
    const char *str = "aaaaabaa";
    const char *pattern = "aabaa";
    int pos = kmp_match(str, pattern);
    if (pos >= 0) {
        printf("Find %s in pos %d\n", pattern, pos);
    } else {
        printf("Match failed!\n");
    }

    return 0;
}
上一篇下一篇

猜你喜欢

热点阅读