令牌

[lex.token.intro]

令牌是文法中由正则（非递归）语言定义的基本构造。Rust 源代码输入可以分解为以下几种类型的令牌：

关键字
标识符
字面量
生命周期
标点符号
分隔符

在本参考手册的文法中，“简单”令牌以字符串表构造形式给出，并显示在等宽字体中。

[lex.token.literal]

字面量

字面量是在字面量表达式中使用的令牌。

示例

字符和字符串

	示例	`#` 符号数量¹	字符	转义序列
字符	`'H'`	0	所有 Unicode 字符	引用 & ASCII & Unicode
字符串	`"hello"`	0	所有 Unicode 字符	引用 & ASCII & Unicode
原始字符串	`r#"hello"#`	<256	所有 Unicode 字符	`N/A`
字节	`b'H'`	0	所有 ASCII 字符	引用 & 字节
字节字符串	`b"hello"`	0	所有 ASCII 字符	引用 & 字节
原始字节字符串	`br#"hello"#`	<256	所有 ASCII 字符	`N/A`
C 字符串	`c"hello"`	0	所有 Unicode 字符	引用 & 字节 & Unicode
原始 C 字符串	`cr#"hello"#`	<256	所有 Unicode 字符	`N/A`

同一字面量两侧的 # 符号数量必须相同。

注意

字符字面量和字符串字面量令牌绝不会包含紧跟着 U+000A (LF) 的 U+000D (CR) 序列：这对字符事先已转换为单个 U+000A (LF)。

ASCII 转义序列

	名称
`\x41`	7 位字符码（恰好 2 位十六进制数字，最大到 0x7F）
`\n`	换行
`\r`	回车
`\t`	制表符
`\\`	反斜杠
`\0`	空字符

字节转义序列

	名称
`\x7F`	8 位字符码（恰好 2 位十六进制数字）
`\n`	换行
`\r`	回车
`\t`	制表符
`\\`	反斜杠
`\0`	空字符

Unicode 转义序列

	名称
`\u{7FFF}`	24 位 Unicode 字符码（最多 6 位十六进制数字）

引用转义序列

	名称
`\'`	单引号
`\"`	双引号

数字

数字字面量²	示例	指数部分
十进制整数	`98_222`	`N/A`
十六进制整数	`0xff`	`N/A`
八进制整数	`0o77`	`N/A`
二进制整数	`0b1111_0000`	`N/A`
浮点数	`123.0E+77`	`可选`

所有数字字面量都允许使用 _ 作为视觉分隔符：1_234.0E+18f64

[lex.token.literal.suffix]

后缀

[lex.token.literal.literal.suffix.intro]

后缀是跟在字面量主要部分后面（中间没有空白字符）的字符序列，其形式与非原始标识符或关键字相同。

[lex.token.literal.suffix.syntax]

^{词法分析器}
SUFFIX : IDENTIFIER_OR_KEYWORD
SUFFIX_NO_E : SUFFIX _{不以 e 或 E 开头}

[lex.token.literal.suffix.validity]

任何类型的字面量（字符串、整数等）带有任何后缀，作为令牌都是有效的。

带有任何后缀的字面量令牌可以传递给宏而不产生错误。宏本身会决定如何解释此类令牌以及是否产生错误。特别是，按示例宏（by-example macros）的 literal 片段说明符会匹配带有任意后缀的字面量令牌。

#![allow(unused)]
fn main() {
macro_rules! blackhole { ($tt:tt) => () }
macro_rules! blackhole_lit { ($l:literal) => () }

blackhole!("string"suffix); // OK
blackhole_lit!(1suffix); // OK
}

[lex.token.literal.suffix.parse]

然而，被解释为字面量表达式或模式的字面量令牌上的后缀受到限制。非数字字面量令牌上的任何后缀都会被拒绝，数字字面量令牌只接受以下列表中的后缀。

整数	浮点数
`u8`, `i8`, `u16`, `i16`, `u32`, `i32`, `u64`, `i64`, `u128`, `i128`, `usize`, `isize`	`f32`, `f64`

字符和字符串字面量

[lex.token.literal.char]

字符字面量

[lex.token.literal.char.syntax]

^{词法分析器}
CHAR_LITERAL
   ' ( ~[' \ \n \r \t] | QUOTE_ESCAPE | ASCII_ESCAPE | UNICODE_ESCAPE ) ' SUFFIX^?

QUOTE_ESCAPE
   \' | \"

ASCII_ESCAPE
      \x HEX_DIGIT HEX_DIGIT
   | \n | \r | \t | \\ | \0

UNICODE_ESCAPE
   \u{ ( HEX_DIGIT _^* )^1..6 }

[lex.token.literal.char.intro]

一个字符字面量是包含在两个 U+0027（单引号）字符之间的单个 Unicode 字符，但 U+0027 本身除外，它必须由前面的 U+005C（\）字符进行转义。

[lex.token.literal.str]

字符串字面量

[lex.token.literal.str.syntax]

^{词法分析器}
STRING_LITERAL
   " (
      ~[" \ IsolatedCR]
      | QUOTE_ESCAPE
      | ASCII_ESCAPE
      | UNICODE_ESCAPE
      | STRING_CONTINUE
   )^* " SUFFIX^?

STRING_CONTINUE
   \ 紧跟着 \n

[lex.token.literal.str.intro]

一个字符串字面量是包含在两个 U+0022（双引号）字符之间的任意 Unicode 字符序列，但 U+0022 本身除外，它必须由前面的 U+005C（\）字符进行转义。

[lex.token.literal.str.linefeed]

由字符 U+000A (LF) 表示的换行符允许出现在字符串字面量中。当一个未转义的 U+005C（\）字符紧接在一个换行符之前时，该换行符不会出现在令牌所表示的字符串中。详细信息参见字符串续行转义序列。字符 U+000D (CR) 不得出现在字符串字面量中，除非作为此类字符串续行转义序列的一部分。

[lex.token.literal.char-escape]

字符转义序列

[lex.token.literal.char-escape.intro]

一些额外的转义序列可用于字符字面量或非原始字符串字面量。转义序列以 U+005C（\）开头，并采用以下形式之一继续

[lex.token.literal.char-escape.ascii]

一个7 位码点转义序列以 U+0078（x）开头，紧跟着恰好两位十六进制数字，其值最大为 0x7F。它表示值等于提供的十六进制值的 ASCII 字符。不允许使用更高的值，因为它们可能表示 Unicode 码点或字节值，存在歧义。

[lex.token.literal.char-escape.unicode]

一个24 位码点转义序列以 U+0075（u）开头，紧跟着最多六位被花括号 U+007B（{）和 U+007D（}）包围的十六进制数字。它表示值等于提供的十六进制值的 Unicode 码点。

[lex.token.literal.char-escape.whitespace]

一个空白字符转义序列是字符 U+006E（n）、U+0072（r）或 U+0074（t）之一，分别表示 Unicode 值 U+000A (LF)、U+000D (CR) 或 U+0009 (HT)。

[lex.token.literal.char-escape.null]

空字符转义序列是字符 U+0030（0），并表示 Unicode 值 U+0000 (NUL)。

[lex.token.literal.char-escape.slash]

反斜杠转义序列是字符 U+005C（\），它本身必须进行转义才能表示自身。

[lex.token.literal.str-raw]

原始字符串字面量

[lex.token.literal.str-raw.syntax]

^{词法分析器}
RAW_STRING_LITERAL
   r RAW_STRING_CONTENT SUFFIX^?

RAW_STRING_CONTENT
      " ( ~ IsolatedCR )^{* (non-greedy)} "
   | # RAW_STRING_CONTENT #

[lex.token.literal.str-raw.intro]

原始字符串字面量不处理任何转义序列。它们以字符 U+0072（r）开头，紧跟着少于 256 个字符 U+0023（#）和一个 U+0022（双引号）字符。

[lex.token.literal.str-raw.body]

原始字符串体可以包含除了 U+000D (CR) 之外的任意 Unicode 字符序列。它只能通过紧跟着与开头 U+0022（双引号）字符之前相同数量的 U+0023（#）字符的另一个 U+0022（双引号）字符来终止。

[lex.token.literal.str-raw.content]

原始字符串体中包含的所有 Unicode 字符都表示它们自身，字符 U+0022（双引号）（除非其后紧跟着用于开始原始字符串字面量的 # 字符数量相同或更多）或 U+005C（\）没有特殊含义。

字符串字面量示例

#![allow(unused)]
fn main() {
"foo"; r"foo";                     // foo
"\"foo\""; r#""foo""#;             // "foo"

"foo #\"# bar";
r##"foo #"# bar"##;                // foo #"# bar

"\x52"; "R"; r"R";                 // R
"\\x52"; r"\x52";                  // \x52
}

字节字面量和字节字符串字面量

[lex.token.byte]

字节字面量

[lex.token.byte.syntax]

^{词法分析器}
BYTE_LITERAL
   b' ( ASCII_FOR_CHAR | BYTE_ESCAPE ) ' SUFFIX^?

ASCII_FOR_CHAR
   任意 ASCII 字符（即 0x00 到 0x7F），除了 ', \, \n, \r 或 \t

BYTE_ESCAPE
      \x HEX_DIGIT HEX_DIGIT
   | \n | \r | \t | \\ | \0 | \' | \"

[lex.token.byte.intro]

一个字节字面量是一个单个 ASCII 字符（在 U+0000 到 U+007F 范围内）或一个单独的转义序列，前面是字符 U+0062（b）和 U+0027（单引号），后面是字符 U+0027。如果字符 U+0027 出现在字面量中，它必须由前面的 U+005C（\）字符进行转义。它等同于一个 u8 无符号 8 位整数数字字面量。

[lex.token.str-byte]

字节字符串字面量

[lex.token.str-byte.syntax]

^{词法分析器}
BYTE_STRING_LITERAL
b" ( ASCII_FOR_STRING | BYTE_ESCAPE | STRING_CONTINUE )^* " SUFFIX^?

ASCII_FOR_STRING
任意 ASCII 字符（即 0x00 到 0x7F），除了 "、\ 和 IsolatedCR

[lex.token.str-byte.intro]

一个非原始字节字符串字面量是 ASCII 字符和转义序列的序列，前面是字符 U+0062（b）和 U+0022（双引号），后面是字符 U+0022。如果字符 U+0022 出现在字面量中，它必须由前面的 U+005C（\）字符进行转义。或者，字节字符串字面量可以是如下定义的原始字节字符串字面量。

[lex.token.str-byte.linefeed]

由字符 U+000A (LF) 表示的换行符允许出现在字节字符串字面量中。当一个未转义的 U+005C（\）字符紧接在一个换行符之前时，该换行符不会出现在令牌所表示的字符串中。详细信息参见字符串续行转义序列。字符 U+000D (CR) 不得出现在字节字符串字面量中，除非作为此类字符串续行转义序列的一部分。

[lex.token.str-byte.escape]

一些额外的转义序列可用于字节字面量或非原始字节字符串字面量。转义序列以 U+005C（\）开头，并采用以下形式之一继续

[lex.token.str-byte.escape-byte]

一个字节转义序列以 U+0078（x）开头，紧跟着恰好两位十六进制数字。它表示值等于提供的十六进制值的字节。

[lex.token.str-byte.escape-whitespace]

一个空白字符转义序列是字符 U+006E（n）、U+0072（r）或 U+0074（t）之一，分别表示字节值 0x0A (ASCII LF)、0x0D (ASCII CR) 或 0x09 (ASCII HT)。

[lex.token.str-byte.escape-null]

空字符转义序列是字符 U+0030（0），并表示字节值 0x00 (ASCII NUL)。

[lex.token.str-byte.escape-slash]

反斜杠转义序列是字符 U+005C（\），它必须进行转义才能表示其 ASCII 编码 0x5C。

[lex.token.str-byte-raw]

原始字节字符串字面量

[lex.token.str-byte-raw.syntax]

^{词法分析器}
RAW_BYTE_STRING_LITERAL
   br RAW_BYTE_STRING_CONTENT SUFFIX^?

RAW_BYTE_STRING_CONTENT
      " ASCII_FOR_RAW^{* (non-greedy)} "
   | # RAW_BYTE_STRING_CONTENT #

ASCII_FOR_RAW
   任意 ASCII 字符（即 0x00 到 0x7F），除了 IsolatedCR

[lex.token.str-byte-raw.intro]

原始字节字符串字面量不处理任何转义序列。它们以字符 U+0062（b）开头，紧跟着 U+0072（r），再紧跟着少于 256 个字符 U+0023（#）和一个 U+0022（双引号）字符。

[lex.token.str-byte-raw.body]

原始字符串体可以包含除了 U+000D (CR) 之外的任意 ASCII 字符序列。它只能通过紧跟着与开头 U+0022（双引号）字符之前相同数量的 U+0023（#）字符的另一个 U+0022（双引号）字符来终止。原始字节字符串字面量不能包含任何非 ASCII 字节。

[lex.token.literal.str-byte-raw.content]

原始字符串体中包含的所有字符都表示其 ASCII 编码，字符 U+0022（双引号）（除非其后紧跟着用于开始原始字符串字面量的 # 字符数量相同或更多）或 U+005C（\）没有特殊含义。

字节字符串字面量示例

#![allow(unused)]
fn main() {
b"foo"; br"foo";                     // foo
b"\"foo\""; br#""foo""#;             // "foo"

b"foo #\"# bar";
br##"foo #"# bar"##;                 // foo #"# bar

b"\x52"; b"R"; br"R";                // R
b"\\x52"; br"\x52";                  // \x52
}

C string and raw C string literals

[lex.token.str-c]

C string literals

[lex.token.str-c.syntax]

^{词法分析器}
C_STRING_LITERAL
   c" (
      ~[" \ IsolatedCR NUL]
      | BYTE_ESCAPE except \0 or \x00
      | UNICODE_ESCAPE except \u{0}, \u{00}, …, \u{000000}
      | STRING_CONTINUE
   )^* " SUFFIX^?

[lex.token.str-c.intro]

A C string literal is a sequence of Unicode characters and escapes, preceded by the characters U+0063 (c) and U+0022 (double-quote), and followed by the character U+0022. If the character U+0022 is present within the literal, it must be escaped by a preceding U+005C (\) character. Alternatively, a C string literal can be a raw C string literal, defined below.

[lex.token.str-c.null]

C strings are implicitly terminated by byte 0x00, so the C string literal c"" is equivalent to manually constructing a &CStr from the byte string literal b"\x00". Other than the implicit terminator, byte 0x00 is not permitted within a C string.

[lex.token.str-c.linefeed]

Line-breaks, represented by the character U+000A (LF), are allowed in C string literals. When an unescaped U+005C character (\) occurs immediately before a line break, the line break does not appear in the string represented by the token. See String continuation escapes for details. The character U+000D (CR) may not appear in a C string literal other than as part of such a string continuation escape.

[lex.token.str-c.escape]

Some additional escapes are available in non-raw C string literals. An escape starts with a U+005C (\) and continues with one of the following forms

[lex.token.str-c.escape-byte]

一个字节转义序列以 U+0078（x）开头，紧跟着恰好两位十六进制数字。它表示值等于提供的十六进制值的字节。

[lex.token.str-c.escape-unicode]

A 24-bit code point escape starts with U+0075 (u) and is followed by up to six hex digits surrounded by braces U+007B ({) and U+007D (}). It denotes the Unicode code point equal to the provided hex value, encoded as UTF-8.

[lex.token.str-c.escape-whitespace]

一个空白字符转义序列是字符 U+006E（n）、U+0072（r）或 U+0074（t）之一，分别表示字节值 0x0A (ASCII LF)、0x0D (ASCII CR) 或 0x09 (ASCII HT)。

[lex.token.str-c.escape-slash]

反斜杠转义序列是字符 U+005C（\），它必须进行转义才能表示其 ASCII 编码 0x5C。

[lex.token.str-c.char-unicode]

A C string represents bytes with no defined encoding, but a C string literal may contain Unicode characters above U+007F. Such characters will be replaced with the bytes of that character’s UTF-8 representation.

The following C string literals are equivalent

#![allow(unused)]
fn main() {
c"æ";        // LATIN SMALL LETTER AE (U+00E6)
c"\u{00E6}";
c"\xC3\xA6";
}

[lex.token.str-c.edition2021]

Edition differences: C string literals are accepted in the 2021 edition or later. In earlier additions the token c"" is lexed as c "".

[lex.token.str-c-raw]

Raw C string literals

[lex.token.str-c-raw.syntax]

^{词法分析器}
RAW_C_STRING_LITERAL
   cr RAW_C_STRING_CONTENT SUFFIX^?

RAW_C_STRING_CONTENT
      " ( ~ IsolatedCR NUL )^{* (non-greedy)} "
   | # RAW_C_STRING_CONTENT #

[lex.token.str-c-raw.intro]

Raw C string literals do not process any escapes. They start with the character U+0063 (c), followed by U+0072 (r), followed by fewer than 256 of the character U+0023 (#), and a U+0022 (double-quote) character.

[lex.token.str-c-raw.body]

The raw C string body can contain any sequence of Unicode characters other than U+0000 (NUL) and U+000D (CR). It is terminated only by another U+0022 (double-quote) character, followed by the same number of U+0023 (#) characters that preceded the opening U+0022 (double-quote) character.

[lex.token.str-c-raw.content]

All characters contained in the raw C string body represent themselves in UTF-8 encoding. The characters U+0022 (double-quote) (except when followed by at least as many U+0023 (#) characters as were used to start the raw C string literal) or U+005C (\) do not have any special meaning.

[lex.token.str-c-raw.edition2021]

Edition differences: Raw C string literals are accepted in the 2021 edition or later. In earlier additions the token cr"" is lexed as cr "", and cr#""# is lexed as cr #""# (which is non-grammatical).

Examples for C string and raw C string literals

#![allow(unused)]
fn main() {
c"foo"; cr"foo";                     // foo
c"\"foo\""; cr#""foo""#;             // "foo"

c"foo #\"# bar";
cr##"foo #"# bar"##;                 // foo #"# bar

c"\x52"; c"R"; cr"R";                // R
c"\\x52"; cr"\x52";                  // \x52
}

[lex.token.literal.num]

Number literals

A number literal is either an integer literal or a floating-point literal. The grammar for recognizing the two kinds of literals is mixed.

[lex.token.literal.int]

Integer literals

[lex.token.literal.int.syntax]

^{词法分析器}
INTEGER_LITERAL
   ( DEC_LITERAL | BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) SUFFIX_NO_E^?

DEC_LITERAL
   DEC_DIGIT (DEC_DIGIT|_)^*

BIN_LITERAL
   0b (BIN_DIGIT|_)^* BIN_DIGIT (BIN_DIGIT|_)^*

OCT_LITERAL
   0o (OCT_DIGIT|_)^* OCT_DIGIT (OCT_DIGIT|_)^*

HEX_LITERAL
   0x (HEX_DIGIT|_)^* HEX_DIGIT (HEX_DIGIT|_)^*

BIN_DIGIT : [0-1]

OCT_DIGIT : [0-7]

DEC_DIGIT : [0-9]

HEX_DIGIT : [0-9 a-f A-F]

[lex.token.literal.int.kind]

An integer literal has one of four forms

[lex.token.literal.int.kind-dec]

A decimal literal starts with a decimal digit and continues with any mixture of decimal digits and underscores.

[lex.token.literal.int.kind-hex]

A hex literal starts with the character sequence U+0030 U+0078 (0x) and continues as any mixture (with at least one digit) of hex digits and underscores.

[lex.token.literal.int.kind-oct]

An octal literal starts with the character sequence U+0030 U+006F (0o) and continues as any mixture (with at least one digit) of octal digits and underscores.

[lex.token.literal.int.kind-bin]

A binary literal starts with the character sequence U+0030 U+0062 (0b) and continues as any mixture (with at least one digit) of binary digits and underscores.

[lex.token.literal.int.restriction]

Like any literal, an integer literal may be followed (immediately, without any spaces) by a suffix as described above. The suffix may not begin with e or E, as that would be interpreted as the exponent of a floating-point literal. See Integer literal expressions for the effect of these suffixes.

Examples of integer literals which are accepted as literal expressions

#![allow(unused)]
fn main() {
#![allow(overflowing_literals)]
123;
123i32;
123u32;
123_u32;

0xff;
0xff_u8;
0x01_f32; // integer 7986, not floating-point 1.0
0x01_e3;  // integer 483, not floating-point 1000.0

0o70;
0o70_i16;

0b1111_1111_1001_0000;
0b1111_1111_1001_0000i64;
0b________1;

0usize;

// These are too big for their type, but are accepted as literal expressions.
128_i8;
256_u8;

// This is an integer literal, accepted as a floating-point literal expression.
5f32;
}

Note that -1i8, for example, is analyzed as two tokens: - followed by 1i8.

Examples of integer literals which are not accepted as literal expressions

#![allow(unused)]
fn main() {
#[cfg(FALSE)] {
0invalidSuffix;
123AFB43;
0b010a;
0xAB_CD_EF_GH;
0b1111_f32;
}
}

[lex.token.literal.int.tuple-field]

Tuple index

[lex.token.literal.int.tuple-field.syntax]

^{词法分析器}
TUPLE_INDEX
INTEGER_LITERAL

[lex.token.literal.int.tuple-field.intro]

A tuple index is used to refer to the fields of tuples, tuple structs, and tuple variants.

[lex.token.literal.int.tuple-field.eq]

Tuple indices are compared with the literal token directly. Tuple indices start with 0 and each successive index increments the value by 1 as a decimal value. Thus, only decimal values will match, and the value must not have any extra 0 prefix characters.

#![allow(unused)]
fn main() {
let example = ("dog", "cat", "horse");
let dog = example.0;
let cat = example.1;
// The following examples are invalid.
let cat = example.01;  // ERROR no field named `01`
let horse = example.0b10;  // ERROR no field named `0b10`
}

注意

Tuple indices may include certain suffixes, but this is not intended to be valid, and may be removed in a future version. See https://github.com/rust-lang/rust/issues/60210 for more information.

[lex.token.literal.float]

Floating-point literals

[lex.token.literal.float.syntax]

^{词法分析器}
FLOAT_LITERAL
      DEC_LITERAL . (not immediately followed by ., _ or an XID_Start character)
   | DEC_LITERAL . DEC_LITERAL SUFFIX_NO_E^?
   | DEC_LITERAL (. DEC_LITERAL)^? FLOAT_EXPONENT SUFFIX^?

FLOAT_EXPONENT
   (e|E) (+|-)^? (DEC_DIGIT|_)^* DEC_DIGIT (DEC_DIGIT|_)^*

[lex.token.literal.float.form]

A floating-point literal has one of two forms

A decimal literal followed by a period character U+002E (.). This is optionally followed by another decimal literal, with an optional exponent.
A single decimal literal followed by an exponent.

[lex.token.literal.float.suffix]

Like integer literals, a floating-point literal may be followed by a suffix, so long as the pre-suffix part does not end with U+002E (.). The suffix may not begin with e or E if the literal does not include an exponent. See Floating-point literal expressions for the effect of these suffixes.

Examples of floating-point literals which are accepted as literal expressions

#![allow(unused)]
fn main() {
123.0f64;
0.1f64;
0.1f32;
12E+99_f64;
let x: f64 = 2.;
}

This last example is different because it is not possible to use the suffix syntax with a floating point literal end.token.ing in a period. 2.f64 would attempt to call a method named f64 on 2.

Note that -1.0, for example, is analyzed as two tokens: - followed by 1.0.

Examples of floating-point literals which are not accepted as literal expressions

#![allow(unused)]
fn main() {
#[cfg(FALSE)] {
2.0f80;
2e5f80;
2e5e6;
2.0e5e6;
1.3e10u64;
}
}

[lex.token.literal.reserved]

Reserved forms similar to number literals

^{词法分析器}
RESERVED_NUMBER
      BIN_LITERAL [2-9]
   | OCT_LITERAL [8-9]
   | ( BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) .
         (not immediately followed by ., _ or an XID_Start character)
   | ( BIN_LITERAL | OCT_LITERAL ) (e|E)
   | 0b _^* end of input or not BIN_DIGIT
   | 0o _^* end of input or not OCT_DIGIT
   | 0x _^* end of input or not HEX_DIGIT
   | DEC_LITERAL ( . DEC_LITERAL)^? (e|E) (+|-)^? end of input or not DEC_DIGIT

[lex.token.literal.reserved.intro]

The following lexical forms similar to number literals are reserved forms. Due to the possible ambiguity these raise, they are rejected by the tokenizer instead of being interpreted as separate tokens.

[lex.token.literal.reserved.out-of-range]

An unsuffixed binary or octal literal followed, without intervening whitespace, by a decimal digit out of the range for its radix.

[lex.token.literal.reserved.period]

An unsuffixed binary, octal, or hexadecimal literal followed, without intervening whitespace, by a period character (with the same restrictions on what follows the period as for floating-point literals).

[lex.token.literal.reserved.exp]

An unsuffixed binary or octal literal followed, without intervening whitespace, by the character e or E.

[lex.token.literal.reserved.empty-with-radix]

Input which begins with one of the radix prefixes but is not a valid binary, octal, or hexadecimal literal (because it contains no digits).

[lex.token.literal.reserved.empty-exp]

Input which has the form of a floating-point literal with no digits in the exponent.

Examples of reserved forms

#![allow(unused)]
fn main() {
0b0102;  // this is not `0b010` followed by `2`
0o1279;  // this is not `0o127` followed by `9`
0x80.0;  // this is not `0x80` followed by `.` and `0`
0b101e;  // this is not a suffixed literal, or `0b101` followed by `e`
0b;      // this is not an integer literal, or `0` followed by `b`
0b_;     // this is not an integer literal, or `0` followed by `b_`
2e;      // this is not a floating-point literal, or `2` followed by `e`
2.0e;    // this is not a floating-point literal, or `2.0` followed by `e`
2em;     // this is not a suffixed literal, or `2` followed by `em`
2.0em;   // this is not a suffixed literal, or `2.0` followed by `em`
}

[lex.token.life]

Lifetimes and loop labels

[lex.token.life.syntax]

^{词法分析器}
LIFETIME_TOKEN
      ' IDENTIFIER_OR_KEYWORD (not immediately followed by ')
   | '_ (not immediately followed by ')
   | RAW_LIFETIME

LIFETIME_OR_LABEL
      ' NON_KEYWORD_IDENTIFIER (not immediately followed by ')
   | RAW_LIFETIME

RAW_LIFETIME
   'r# IDENTIFIER_OR_KEYWORD _{Except crate, self, super, Self} (not immediately followed by ')

RESERVED_RAW_LIFETIME : 'r#_ (not immediately followed by ')

[lex.token.life.intro]

Lifetime parameters and loop labels use LIFETIME_OR_LABEL tokens. Any LIFETIME_TOKEN will be accepted by the lexer, and for example, can be used in macros.

[lex.token.life.raw.intro]

A raw lifetime is like a normal lifetime, but its identifier is prefixed by r#. (Note that the r# prefix is not included as part of the actual lifetime.)

[lex.token.life.raw.allowed]

Unlike a normal lifetime, a raw lifetime may be any strict or reserved keyword except the ones listed above for RAW_LIFETIME.

[lex.token.life.raw.reserved]

It is an error to use the RESERVED_RAW_LIFETIME token 'r#_ in order to avoid confusion with the placeholder lifetime.

[lex.token.life.raw.edition2021]

Edition differences: Raw lifetimes are accepted in the 2021 edition or later. In earlier additions the token 'r#lt is lexed as 'r # lt.

[lex.token.punct]

标点符号

[lex.token.punct.intro]

Punctuation symbol tokens are listed here for completeness. Their individual usages and meanings are defined in the linked pages.

Symbol	名称	Usage
`+`	Plus	Addition, Trait Bounds, Macro Kleene Matcher
`-`	Minus	Subtraction, Negation
`*`	Star	Multiplication, Dereference, Raw Pointers, Macro Kleene Matcher, Use wildcards
`/`	Slash	Division
`%`	Percent	Remainder
`^`	Caret	Bitwise and Logical XOR
`!`	Not	Bitwise and Logical NOT, Macro Calls, Inner Attributes, Never Type, Negative impls
`&`	And	Bitwise and Logical AND, Borrow, References, Reference patterns
`\|`	Or	Bitwise and Logical OR, Closures, Patterns in match, if let, and while let
`&&`	AndAnd	Lazy AND, Borrow, References, Reference patterns
`\|\|`	OrOr	Lazy OR, Closures
`<<`	Shl	Shift Left, Nested Generics
`>>`	Shr	Shift Right, Nested Generics
`+=`	PlusEq	Addition assignment
`-=`	MinusEq	Subtraction assignment
`*=`	StarEq	Multiplication assignment
`/=`	SlashEq	Division assignment
`%=`	PercentEq	Remainder assignment
`^=`	CaretEq	Bitwise XOR assignment
`&=`	AndEq	Bitwise And assignment
`\|=`	OrEq	Bitwise Or assignment
`<<=`	ShlEq	Shift Left assignment
`>>=`	ShrEq	Shift Right assignment, Nested Generics
`=`	Eq	Assignment, Attributes, Various type definitions
`==`	EqEq	Equal
`!=`	Ne	Not Equal
`>`	Gt	Greater than, Generics, Paths
`<`	Lt	Less than, Generics, Paths
`>=`	Ge	Greater than or equal to, Generics
`<=`	Le	Less than or equal to
`@`	At	Subpattern binding
`_`	Underscore	Wildcard patterns, Inferred types, Unnamed items in constants, extern crates, use declarations, and destructuring assignment
`.`	Dot	Field access, Tuple index
`..`	DotDot	Range, Struct expressions, Patterns, Range Patterns
`...`	DotDotDot	Variadic functions, Range patterns
`..=`	DotDotEq	Inclusive Range, Range patterns
`,`	Comma	Various separators
`;`	Semi	Terminator for various items and statements, Array types
`:`	Colon	Various separators
`::`	PathSep	Path separator
`->`	RArrow	Function return type, Closure return type, Function pointer type
`=>`	FatArrow	Match arms, Macros
`<-`	LArrow	The left arrow symbol has been unused since before Rust 1.0, but it is still treated as a single token
`#`	Pound	Attributes
`$`	Dollar	Macros
`?`	Question	Question mark operator, Questionably sized, Macro Kleene Matcher
`~`	Tilde	The tilde operator has been unused since before Rust 1.0, but its token may still be used

[lex.token.delim]

分隔符

Bracket punctuation is used in various parts of the grammar. An open bracket must always be paired with a close bracket. Brackets and the tokens within them are referred to as “token trees” in macros. The three types of brackets are

Bracket	Type
`{` `}`	Curly braces
`[` `]`	Square brackets
`(` `)`	Parentheses

[lex.token.reserved-prefix]

Reserved prefixes

[lex.token.reserved-prefix.syntax]

^{Lexer 2021+}
RESERVED_TOKEN_DOUBLE_QUOTE : ( IDENTIFIER_OR_KEYWORD _{Except b or c or r or br or cr} | _ ) "
RESERVED_TOKEN_SINGLE_QUOTE : ( IDENTIFIER_OR_KEYWORD _{Except b} | _ ) '
RESERVED_TOKEN_POUND : ( IDENTIFIER_OR_KEYWORD _{Except r or br or cr} | _ ) #
RESERVED_TOKEN_LIFETIME : ' (IDENTIFIER_OR_KEYWORD _{Except r} | _) #

[lex.token.reserved-prefix.intro]

Some lexical forms known as reserved prefixes are reserved for future use.

[lex.token.reserved-prefix.id]

Source input which would otherwise be lexically interpreted as a non-raw identifier (or a keyword or _) which is immediately followed by a #, ', or " character (without intervening whitespace) is identified as a reserved prefix.

[lex.token.reserved-prefix.raw-token]

Note that raw identifiers, raw string literals, and raw byte string literals may contain a # character but are not interpreted as containing a reserved prefix.

[lex.token.reserved-prefix.strings]

Similarly the r, b, br, c, and cr prefixes used in raw string literals, byte literals, byte string literals, raw byte string literals, C string literals, and raw C string literals are not interpreted as reserved prefixes.

[lex.token.reserved-prefix.life]

Source input which would otherwise be lexically interpreted as a non-raw lifetime (or a keyword or _) which is immediately followed by a # character (without intervening whitespace) is identified as a reserved lifetime prefix.

[lex.token.reserved-prefix.edition2021]

Edition differences: Starting with the 2021 edition, reserved prefixes are reported as an error by the lexer (in particular, they cannot be passed to macros).

Before the 2021 edition, reserved prefixes are accepted by the lexer and interpreted as multiple tokens (for example, one token for the identifier or keyword, followed by a # token).

Examples accepted in all editions

#![allow(unused)]
fn main() {
macro_rules! lexes {($($_:tt)*) => {}}
lexes!{a #foo}
lexes!{continue 'foo}
lexes!{match "..." {}}
lexes!{r#let#foo}         // three tokens: r#let # foo
lexes!{'prefix #lt}
}

Examples accepted before the 2021 edition but rejected later

#![allow(unused)]
fn main() {
macro_rules! lexes {($($_:tt)*) => {}}
lexes!{a#foo}
lexes!{continue'foo}
lexes!{match"..." {}}
lexes!{'prefix#lt}
}

[lex.token.reserved-guards]

Reserved guards

[lex.token.reserved-guards.syntax]

^{Lexer 2024+}
RESERVED_GUARDED_STRING_LITERAL : #⁺ STRING_LITERAL
RESERVED_POUNDS : #^2..

[lex.token.reserved-guards.intro]

The reserved guards are syntax reserved for future use, and will generate a compile error if used.

[lex.token.reserved-guards.string-literal]

The reserved guarded string literal is a token of one or more U+0023 (#) immediately followed by a STRING_LITERAL.

[lex.token.reserved-guards.pounds]

The reserved pounds is a token of two or more U+0023 (#).

[lex.token.reserved-guards.edition2024]

Edition differences: Before the 2024 edition, reserved guards are accepted by the lexer and interpreted as multiple tokens. For example, the #"foo"# form is interpreted as three tokens. ## is interpreted as two tokens.

Rust 参考手册