字符串
Rust 中最常用的两种字符串类型是 String
和 &str
。
String
类型存储为字节向量 (Vec<u8>
),但保证始终是有效的 UTF-8 序列。 String
类型是堆分配的,可增长的,并且不是以空字符结尾的。
&str
是一个切片 (&[u8]
),它总是指向一个有效的 UTF-8 序列,并且可以用来查看 String
的内容,就像 &[T]
是 Vec<T>
的视图一样。
fn main() { // (all the type annotations are superfluous) // A reference to a string allocated in read only memory let pangram: &'static str = "the quick brown fox jumps over the lazy dog"; println!("Pangram: {}", pangram); // Iterate over words in reverse, no new string is allocated println!("Words in reverse"); for word in pangram.split_whitespace().rev() { println!("> {}", word); } // Copy chars into a vector, sort and remove duplicates let mut chars: Vec<char> = pangram.chars().collect(); chars.sort(); chars.dedup(); // Create an empty and growable `String` let mut string = String::new(); for c in chars { // Insert a char at the end of string string.push(c); // Insert a string at the end of string string.push_str(", "); } // The trimmed string is a slice to the original string, hence no new // allocation is performed let chars_to_trim: &[char] = &[' ', ',']; let trimmed_str: &str = string.trim_matches(chars_to_trim); println!("Used characters: {}", trimmed_str); // Heap allocate a string let alice = String::from("I like dogs"); // Allocate new memory and store the modified string there let bob: String = alice.replace("dog", "cat"); println!("Alice says: {}", alice); println!("Bob says: {}", bob); }
更多 str
/String
方法可以在 std::str 和 std::string 模块下找到。
字面量和转义
有多种方法可以编写包含特殊字符的字符串字面量。所有方法都会产生类似的 &str
,因此最好使用最方便编写的形式。类似地,也有多种方法可以编写字节字符串字面量,所有方法都会产生 &[u8; N]
。
通常,特殊字符使用反斜杠字符 \
进行转义。这样,您就可以在字符串中添加任何字符,即使是不可打印的字符以及您不知道如何键入的字符。如果您想要一个文字反斜杠,请使用另一个反斜杠对其进行转义:\\
。
出现在字面量中的字符串或字符字面量分隔符必须进行转义:"\""
、'\''
。
fn main() { // You can use escapes to write bytes by their hexadecimal values... let byte_escape = "I'm writing \x52\x75\x73\x74!"; println!("What are you doing\x3F (\\x3F means ?) {}", byte_escape); // ...or Unicode code points. let unicode_codepoint = "\u{211D}"; let character_name = "\"DOUBLE-STRUCK CAPITAL R\""; println!("Unicode character {} (U+211D) is called {}", unicode_codepoint, character_name ); let long_string = "String literals can span multiple lines. The linebreak and indentation here ->\ <- can be escaped too!"; println!("{}", long_string); }
有时需要转义的字符太多,或者直接按原样编写字符串要方便得多。这就是原始字符串字面量的用武之地。
fn main() { let raw_str = r"Escapes don't work here: \x3F \u{211D}"; println!("{}", raw_str); // If you need quotes in a raw string, add a pair of #s let quotes = r#"And then I said: "There is no escape!""#; println!("{}", quotes); // If you need "# in your string, just use more #s in the delimiter. // You can use up to 65535 #s. let longer_delimiter = r###"A string with "# in it. And even "##!"###; println!("{}", longer_delimiter); }
想要一个不是 UTF-8 的字符串?(请记住,str
和 String
必须是有效的 UTF-8)。或者,您可能想要一个主要由文本组成的字节数组?字节字符串来拯救您!
use std::str; fn main() { // Note that this is not actually a `&str` let bytestring: &[u8; 21] = b"this is a byte string"; // Byte arrays don't have the `Display` trait, so printing them is a bit limited println!("A byte string: {:?}", bytestring); // Byte strings can have byte escapes... let escaped = b"\x52\x75\x73\x74 as bytes"; // ...but no unicode escapes // let escaped = b"\u{211D} is not allowed"; println!("Some escaped bytes: {:?}", escaped); // Raw byte strings work just like raw strings let raw_bytestring = br"\u{211D} is not escaped here"; println!("{:?}", raw_bytestring); // Converting a byte array to `str` can fail if let Ok(my_str) = str::from_utf8(raw_bytestring) { println!("And the same as text: '{}'", my_str); } let _quotes = br#"You can also use "fancier" formatting, \ like with normal raw strings"#; // Byte strings don't have to be UTF-8 let shift_jis = b"\x82\xe6\x82\xa8\x82\xb1\x82\xbb"; // "ようこそ" in SHIFT-JIS // But then they can't always be converted to `str` match str::from_utf8(shift_jis) { Ok(my_str) => println!("Conversion successful: '{}'", my_str), Err(e) => println!("Conversion failed: {:?}", e), }; }
有关字符编码之间转换的信息,请查看 encoding crate。
有关编写字符串字面量和转义字符的方法的更详细列表,请参阅 Rust 参考手册的 “标记”一章。