go 汉字长度及全角转半角

2021-09-03 本文已影响0人小ocean

汉字

在 Golang 中，如果字符串中出现中文字符不能直接调用 len 函数来统计字符串字符长度，这是因为在 Go 中，字符串是以 UTF-8 为格式进行存储的，在字符串上调用 len 函数，取得的是字符串包含的 byte 的个数。
每个中文字，占3个byte。英文字符，占一个byte。

str1 := "你好啊H" 
fmt.Println(len(str1)) // 打印结果：10

func TestFunc2(t *testing.T) {
    const nihongo = "你好呀"
    for index, runeValue := range nihongo {
        fmt.Printf("%#U starts at byte position %d\n", runeValue, index)
    }

    //%#U，这个格式，即打印Unicode
    //U+4F60 '你' starts at byte position 0
    //U+597D '好' starts at byte position 3
    //U+5440 '呀' starts at byte position 6
}

如何计算汉字的长度呢？

func TestFunc2(t *testing.T) {
    const nihongo = "你好呀"
    for index, runeValue := range nihongo {
        fmt.Printf("%#U starts at byte position %d\n", runeValue, index)
    }

    //%#U，这个格式，即打印Unicode
    //U+4F60 '你' starts at byte position 0
    //U+597D '好' starts at byte position 3
    //U+5440 '呀' starts at byte position 6

    fmt.Printf("长度：=%d \n", utf8.RuneCountInString(nihongo))

    //  或者
    fmt.Printf("长度：=%d \n", len([]rune(nihongo)))
}

ASCII 字符串长度使用 len() 函数。
Unicode 字符串长度使用 utf8.RuneCountInString() 函数。

全角字符串转换半角字符串

// 全角字符串转换半角字符串
// 参数：
//      str：字符串
// 返回：
//      转换半角后的字符串
func FullWidthStrToHalfWidthStr(str string) (result string) {
    for _, charCode := range str {
        inside_code := charCode
        if inside_code == 12288 {
            inside_code = 32
        } else {
            inside_code -= 65248
        }

        if inside_code < 32 || inside_code > 126 {
            result += string(charCode)
        } else {
            result += string(inside_code)
        }
    }

    return result
}

全角字符从unicode编码从65281~65374
半角字符从unicode编码从 33~126
空格比较特殊,全角为 12288,半角为 32
而且除空格外,全角/半角按unicode编码排序在顺序上是对应的
所以可以直接通过用+-法来处理非空格数据,对空格单独处理
全角指的是一个字符占2个标准字符的位置（例如中国汉字）。
半角指的是占1个标准字符的位置（例如普通的字符a）。ASCII以内的就叫半角符号，以外的就叫全角符号。
注意：
并不是所有的全角字符都能被转换为半角字符，例如汉字是全角字符，占2个字符的位置，但它无法被转换；只有英文字母、数字键、符号键等才能可以做全角和半角之间的转换。

go 汉字长度及全角转半角

汉字

全角字符串转换半角字符串

猜你喜欢

热点阅读