Primitive Type char

1.0.0 ·

Expand description

一个字符类型。

char 类型代表一个字符。更具体地说，由于 ‘character’ 在 Unicode 中不是一个明确定义的概念，因此 char 是一个 Unicode 标量值。

本文档描述了 char 类型上的许多方法和 trait 实现。由于技术原因，the std::char module 中还有其他单独的文档。

Validity

char 是一个 Unicode 标量值，它是除代理代码点之外的任何 Unicode 代码点。这有一个固定的数字定义: 代码点在 0 到 0x10FFFF 的范围内，包括 0 到 0x10FFFF。 UTF-16 使用的代理代码点在 0xD800 到 0xDFFF 范围内。

无论是作为字面量还是在运行时，都不能构造不是 Unicode 标量值的 char:

// 这些都是编译器错误
['\u{D800}', '\u{DFFF}', '\u{110000}'];

Run

// Panics; from_u32 返回 None。
char::from_u32(0xDE01).unwrap();

Run

// 未定义的行为
unsafe { char::from_u32_unchecked(0x110000) };

Run

USV 也是可以在 UTF-8 中编码的精确值集。因为 char 值是 USV 而 str 值是有效的 UTF-8，所以将任何 char 存储在 str 中或从 str 读取任何字符作为 char 是安全的。

编译器可以理解有效 char 值的差距，因此在下面的示例中，这两个范围被理解为涵盖了可能的 char 值的整个范围，并且非穷举匹配没有错误。

let c: char = 'a';
match c {
    '\0' ..= '\u{D7FF}' => false,
    '\u{E000}' ..= '\u{10FFFF}' => true,
};

Run

所有的 USV 都是有效的 char 值，但并不是所有的都代表一个真实的字符。许多 USV 目前没有分配给一个字符，但将来可能会被分配 (“保留”); 有些永远不会是字符 (“非字符”); 并且有些可能被不同的用户赋予不同的含义 (“私有使用”)。

Representation

char 的大小始终为四个字节。这与给定字符作为 String 的一部分的表示形式不同。例如：

let v = vec!['h', 'e', 'l', 'l', 'o'];

// 五个元素乘以每个元素四个字节
assert_eq!(20, v.len() * std::mem::size_of::<char>());

let s = String::from("hello");

// 5 个元素乘以每个元素一个字节
assert_eq!(5, s.len() * std::mem::size_of::<u8>());

Run

与往常一样，请记住，人类对 ‘character’ 的直觉可能不是 map 到 Unicode 的定义。例如，尽管看起来相似，但 ‘é’ 字符是一个 Unicode 代码点，而 ‘é’ 是两个 Unicode 代码点：

let mut chars = "é".chars();
// U+00e9: '带锐音符的拉丁小写字母 e'
assert_eq!(Some('\u{00e9}'), chars.next());
assert_eq!(None, chars.next());

let mut chars = "é".chars();
// U+0065: ' 拉丁小写字母 e'
assert_eq!(Some('\u{0065}'), chars.next());
// U+0301: '结合重音'
assert_eq!(Some('\u{0301}'), chars.next());
assert_eq!(None, chars.next());

Run

这意味着 will 上方的第一个字符串的内容适合 char，而第二个字符串 will 的内容则不会。

尝试使用第二个字符串的内容创建 char 字面量会产生错误：

error: character literal may only contain one codepoint: 'é'
let c = 'é';
        ^^^

char 的 4 字节固定大小的另一个含义是，每个字符处理可能最终会使用更多的内存：

let s = String::from("love: ❤️");
let v: Vec<char> = s.chars().collect();

assert_eq!(12, std::mem::size_of_val(&s[..]));
assert_eq!(32, std::mem::size_of_val(&v[..]));

Primitive Type char

Implementations§

impl char

pub const MAX: char = '\u{10ffff}'

pub const REPLACEMENT_CHARACTER: char = '�'

pub const UNICODE_VERSION: (u8, u8, u8) = crate::unicode::UNICODE_VERSION

pub fn decode_utf16<I>(iter: I) -> DecodeUtf16<<I as IntoIterator>::IntoIter> ⓘwhere I: IntoIterator<Item = u16>,

pub const fn from_u32(i: u32) -> Option<char>

pub unsafe fn from_u32_unchecked(i: u32) -> char

pub const fn from_digit(num: u32, radix: u32) -> Option<char>

pub fn is_digit(self, radix: u32) -> bool

pub const fn to_digit(self, radix: u32) -> Option<u32>

pub fn escape_unicode(self) -> EscapeUnicode ⓘ

pub fn escape_debug(self) -> EscapeDebug ⓘ

pub fn escape_default(self) -> EscapeDefault ⓘ

pub const fn len_utf8(self) -> usize

pub const fn len_utf16(self) -> usize

pub fn encode_utf8(self, dst: &mut [u8]) -> &mut str

pub fn encode_utf16(self, dst: &mut [u16]) -> &mut [u16]

pub fn is_alphabetic(self) -> bool

pub fn is_lowercase(self) -> bool

pub fn is_uppercase(self) -> bool

pub fn is_whitespace(self) -> bool

pub fn is_alphanumeric(self) -> bool

pub fn is_control(self) -> bool

pub fn is_numeric(self) -> bool

pub fn to_lowercase(self) -> ToLowercase ⓘ

pub fn to_uppercase(self) -> ToUppercase ⓘ

pub const fn is_ascii(&self) -> bool

pub const fn as_ascii(&self) -> Option<AsciiChar>

pub const fn to_ascii_uppercase(&self) -> char

pub const fn to_ascii_lowercase(&self) -> char

pub const fn eq_ignore_ascii_case(&self, other: &char) -> bool

pub fn make_ascii_uppercase(&mut self)

pub fn make_ascii_lowercase(&mut self)

pub const fn is_ascii_alphabetic(&self) -> bool

pub const fn is_ascii_uppercase(&self) -> bool

pub const fn is_ascii_lowercase(&self) -> bool

pub const fn is_ascii_alphanumeric(&self) -> bool

pub const fn is_ascii_digit(&self) -> bool

pub fn is_ascii_octdigit(&self) -> bool

pub const fn is_ascii_hexdigit(&self) -> bool

pub const fn is_ascii_punctuation(&self) -> bool

pub const fn is_ascii_graphic(&self) -> bool

pub const fn is_ascii_whitespace(&self) -> bool

pub const fn is_ascii_control(&self) -> bool

Trait Implementations§

impl AsciiExt for char

type Owned = char

fn is_ascii(&self) -> bool

fn to_ascii_uppercase(&self) -> Self::Owned

fn to_ascii_lowercase(&self) -> Self::Owned

fn eq_ignore_ascii_case(&self, o: &Self) -> bool

fn make_ascii_uppercase(&mut self)

fn make_ascii_lowercase(&mut self)

impl Clone for char

fn clone(&self) -> char

fn clone_from(&mut self, source: &Self)

impl Debug for char

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

impl Default for char

fn default() -> char

impl Display for char

fn fmt(&self, f: &mut Formatter<'_>) -> Result<(), Error>

impl<'a> Extend<&'a char> for String

fn extend<I>(&mut self, iter: I)where I: IntoIterator<Item = &'a char>,

fn extend_one(&mut self, _: &'a char)

fn extend_reserve(&mut self, additional: usize)

impl Extend<char> for String

fn extend<I>(&mut self, iter: I)where I: IntoIterator<Item = char>,

fn extend_one(&mut self, c: char)

fn extend_reserve(&mut self, additional: usize)

impl From<char> for String

fn from(c: char) -> String

impl From<char> for u128

fn from(c: char) -> u128

impl From<char> for u32

fn from(c: char) -> u32

impl From<char> for u64

fn from(c: char) -> u64