C++ primer ch3 - Strings, Vectors, and Arrays 阅读笔记

Posted on 2024-07-30 In programming Disqus: Word count in article: 17k Reading time ≈ 16 mins.

3.1 Namespace using Declarations

A Separate using Declaration Is Required for Each Name. The important part is that there must be a using declaration for each name we use, and each declaration must end in a semicolon.

C++ Primer第五版只是介绍C++11，那时候还只能用using declarations一个个声明需要的name，但是C++17已经支持comma-separated list in using-declaration。

https://en.cppreference.com/w/cpp/language/using_declaration

#include <iostream>
// using declarations for names from the standard library
using std::cin;
using std::cout; using std::endl;    // OK

using std::cout, std::endl;    // Wrong in C++11, but available from C++17

Headers Should Not Include using Declarations

头文件在预处理的时候会被直接复制到用#include的那一行，所以在头文件里用using declaration就相当于在（引用了头文件的）cpp里用，可能会意外地导致命名冲突。

3.2 Library `string` Type

3.2.1 Defining and Initializing `string`s

用string literal初始化string，literal尾部的null character不会被复制到string里。

copy initialization

When we initialize a variable using =, we are asking the compiler to copy initialize

1	string s5 = "hiya"; // copy initialization

direct initialization

when we omit the =, we use direct initialization

1 2	string s6("hiya"); // direct initialization string s7(10, ’c’); // direct initialization; s7 is cccccccccc

copy initialization with multi-value initializer

1	string s8 = string(10, ’c’); // copy initialization; s8 is cccccccccc

上面copy initialization其实新建了一个临时变量，相当于下面的写法：

1 2	string temp(10, ’c’); // temp is cccccccccc string s8 = temp; // copy temp into s8

3.2.2 Operations on `string`s

Reading and Writing `string`s

string input operator (>>) 会忽略开头的所有whitespace（空格、换行、tab）。

The `string::size_type` Type

string的size，类型是string::size_type。不确定具体的类型，但是一定是unsigned。

在用for循环遍历string的时候，循环变量也最好用string::size_type，或者用decltype和size()。

// EN p94
// process characters in s until we run out of characters or we hit a whitespace
for (decltype(s.size()) index = 0;
    index != s.size() && !isspace(s[index]); ++index)
        s[index] = toupper(s[index]); // capitalize the current character

（但是以前从来没在意size_type，都是直接用int i循环了。）

3.2.3 Dealing with the Characters in a `string`

ADVICE: USE THE C++ VERSIONS OF C LIBRARY HEADERS

C++ library包含了C library。C library里头文件是以name.h格式命名的，在C++ library里头文件则是用cname命名的。如ctype.h, cctype。

C++ library头文件里的成员（函数名、变量名之类的）都是定义在std namespace里的，而C library则不是。所以在C++程序里推荐用C++版本的library，避免命名冲突。

Processing Only Some Characters?

subscript operator (the [] operator): The result of using an out-of-range subscript is undefined.

Processing Every Character? Use Range-Based `for`

Range-Based for （range for statement):

// EN p91
string str("some string");
// print the characters in str one character to a line
for (auto c : str) // for every char in str
    cout << c << endl; // print the current character followed by a newline

3.3 Library `vector` Type

vector是一个class template，template本身不是一种类型，但是template给定元素的类型就可以生成新的类型，如vector<int>。

编译器从template创造class和function的过程叫做instantiation（实例化）。

vector只能存objects，也就是说它不能存reference。

3.3.1 Defining and Initializing `vector`s

vector的复制：从旧vector复制每一个元素到新vector

1
2
3

// EN p97
vector<int> ivec2(ivec); // copy elements of ivec into ivec2
vector<int> ivec3 = ivec; // copy elements of ivec into ivec3

Value Initialization

如果创建vector的时候只给定了size，那么vector里的元素会被value-initialized：

如果元素是built-in type，那么会被初始化为0。
如果是class type，那么会被default initialized

1
2
3

// EN p98
vector<int> ivec(10); // ten elements, each initialized to 0
vector<string> svec(10); // ten elements, each an empty string

List Initializer or Element Count?

When we use curly braces, {...}, we’re saying that, if possible, we want to list initialize the object. That is, if there is a way to use the values inside the curly braces as a list of element initializers, the class will do so. Only if it is not possible to list initialize the object will the other ways to initialize the object be considered.

如果使用花括号{}来初始化一个vector，那么编译器优先看能不能用List initialization。如果不行，那再考虑其他的初始化方法。

// EN p100
vector<string> v5{"hi"}; // list initialization: v5 has one element
vector<string> v6("hi"); // error: can’t construct a vector from a string literal
vector<string> v7{10}; // v7 has ten default-initialized elements
vector<string> v8{10, "hi"}; // v8 has ten elements with value "hi"

上面的例子，v8正常的初始化写法应该是v8(10, "hi")，但是这里用花括号，表明我们想用list initializer，但是花括号里值的类型都不同，不可能是list initializer，所以编译器尝试其他的初始化方法（此处的情况是direct initialization）。

3.3.2 Adding Elements to a `vector`

KEY CONCEPT: VECTORS GROW EFFICIENTLY

The standard requires that vector implementations can efficiently add elements at run time. Because vectors grow efficiently, it is often unnecessary—and can result in poorer performance—to define a vector of a specific size.

C++标准要求vector的实现要能在运行时高效地添加元素，所以通常没有必要在创建vector的时候指定size。

indirection operator * https://en.cppreference.com/w/cpp/language/operator_member_access

The operand of the built-in indirection operator must be pointer to object or a pointer to function, and the result is the lvalue referring to the object or function to which expr points.

Programming Implications of Adding Elements to a `vector`

range for 的body里不能改变正在被遍历的vector的size

3.3.3 Other `vector` Operations

我们可以比较vectors，前提是两个vector的元素类型相同。

v1 == v2, v1 != v2: v1 and v2 are equal if they have the same number of elements and each element in v1 is equal to the corresponding element in v2.

<, <=, >, >=: Have their normal meanings using dictionary ordering.

3.4 Introducing Iterators

Technically speaking, a string is not a container type, but string supports many of the container operations.

尽管从技术上来讲，string不是一种container，但是container支持的很多操作string也支持。

和pointer类似，iterator也能让我们indirect access to an object。

一个valid iterator：

要么指向一个元素
要么指向container的最后一个元素之后一位（one past the last element）。

3.4.1 Using Iterators

拥有iterator的类型都会有返回iterator的（成员）函数，其中两个是begin和end。

begin返回的iterator指向第一个元素。
end返回的iterator指向的位置是one past the end。通常也被叫做off-the-end iterator，或简称为 end iterator。

如果container是空的，那么begin和end返回的iterator都是off-the-end iterator。

In general, we do not know (or care about) the precise type that an iterator has.

一般来说，我们不知道，也不关心iterator确切的类型。

Iterator Operations

Dereference（解引用） invalid iterator或者off-the-end iterator都是undefined behavior。

*iter Returns a reference to the element denoted by the iterator iter.

对iterator dereference得到的是reference！

iter1 == iter2, iter1 != iter2: Compares two iterators for equality (inequality). Two iterators are equal if they denote the same element or if they are the off-the-end iterator for the same container.

两个iterator相等，意味着

要么他们指向相同的元素
要么他们是同一个container的off-the-end iterator

Moving Iterators from One Element to Another

++iter Increments iter to refer to the next element in the container.
--iter Decrements iter to refer to the previous element in the container.

end返回的iterator不指向任何元素，所以不能对它++，也不能dereference。(但是可以--)。

Iterator Types

std library的container定义了它的iterator的类型：iterator and const_iterator

// EN p108
vector<int>::iterator it; // it can read and write vector<int> elements
string::iterator it2; // it2 can read and write characters in a string
vector<int>::const_iterator it3; // it3 can read but not write elements
string::const_iterator it4; // it4 can read but not write characters

The `begin` and `end` Operations

对于non const的container，也可以用cbegin，cend获得const_iterator：

1
2
3

// EN p109
vector<int> v;
auto it3 = v.cbegin(); // it3 has type vector<int>::const_iterator

Combining Dereference and Member Access

it->mem is a synonym for (*it).mem

Some vector Operations Invalidate Iterators

任何改变vector size的操作都会使所有iterator失效（invalid）。

3.4.2 Iterator Arithmetic

string和vector的iterator支持一些额外的操作：

iter + n, iter - n: Adding (subtracting) an integral value n to (from) an iterator yields an iterator that many elements forward (backward) within the container. The resulting iterator must denote elements in, or one past the end of, the same container.

iter1 += n, iter1 -= n: Compound-assignment for iterator addition and subtraction. Assigns to iter1 the value of adding n to, or subtracting n from, iter1.

iter1 - iter2: Subtracting two iterators yields the number that when added to the right-hand iterator yields the left-hand iterator. The iterators must denote elements in, or one past the end of, the same container.

>, >=, <, <=: Relational operators on iterators. One iterator is less than another if it refers to an element that appears in the container before the one referred to by the other iterator. The iterators must denote elements in, or one past the end of, the same container.

两个iterator相减，或者要比较两个iterator的大小，他们必须是来自同一个container。

两个iterator相减，结果的类型是difference_type，是一种singed integral。

3.5 Arrays

3.5.1 Defining and Initializing Built-in Arrays

Array是一种compound type。

dimension指定了array的元素个数。而array的元素个数是array类型的一部分。因此dimension必须在编译时就已知，也就是说，dimension必须是constant expression。

在函数体中声明的array，如果没有初始化，会被default-initialized。array中的元素的值是undefined values。

不能用auto创建array。

array只能存object，也就是说不能存reference。

Explicitly Initializing Array Elements

// EN p114
const unsigned sz = 3;
int ia1[sz] = {0,1,2}; // array of three ints with values 0, 1, 2
int a2[] = {0, 1, 2}; // an array of dimension 3
int a3[5] = {0, 1, 2}; // equivalent to a3[] = {0, 1, 2, 0, 0}
string a4[3] = {"hi", "bye"}; // same as a4[] = {"hi", "bye", ""}
int a5[2] = {0,1,2}; // error: too many initializers

声明array的时候dimension可以不填，但是这样就必须给初始值（list initialize），编译器可以推断array的size。

如果声明array时给的dimension比initializer list里元素的个数更多，那么是剩下的元素会被value initialized。

Character Arrays Are Special

如果用string literal来初始化char array，那么在数组末尾会增加一个null character（\0）。

1
2
3

// EN p114
char a3[] = "C++"; // null terminator added automatically
const char a4[6] = "Daniel"; // error: no space for the null!

No Copy or Assignment

不能用一个array去初始化另一个array. 有一些编译器可能支持array assignment，但是并不是标准规定的特性。

1
2
3

int a[] = {0, 1, 2}; // array of three ints
int a2[] = a; // error: cannot initialize one array with another
a2 = a; // error: cannot assign one array to another

Understanding Complicated Array Declarations

看螺旋法则吧（Clockwise/Spiral Rule）（https://c-faq.com/decl/spiral.anderson.html）

There are three simple steps to follow:

Starting with the unknown element, move in a spiral/clockwise direction; when ecountering the following elements replace them with the corresponding english statements:

[X] or []
   => Array X size of... or Array undefined size of... 
   
(type1, type2)
   => function passing type1 and type2 returning... 
   
=> pointer(s) to...

Keep doing this in a spiral/clockwise direction until all tokens have been covered.

Always resolve anything in parenthesis first!

Example #1: Simple declaration

Question we ask ourselves: What is str?

str is an...

We move in a spiral clockwise direction starting with str and the first character we see is a [ so, that means we have an array, so...
   `str` is an array 10 of... 
Continue in a spiral clockwise direction, and the next thing we encounter is the * so, that means we have pointers, so...
   `str` is an array 10 of pointers to... 
Continue in a spiral direction and we see the end of the line (the ;), so keep going and we get to the type char, so...
   `str` is an array 10 of pointers to `char` 
We have now "visited" every token; therefore we are done!

Example #2: Pointer to Function declaration

Question we ask ourselves: What is fp?
`fp` is a... 

Moving in a spiral clockwise direction, the first thing we see is a `)`; therefore, fp is inside parenthesis, so we continue the spiral inside the parenthesis and the next character seen is the `*`, so...

    `fp` is a pointer to... 

We are now out of the parenthesis and continuing in a spiral clockwise direction, we see the `(`; therefore, we have a function, so...

    `fp` is a pointer to a function passing an int and a pointer to float returning... 

Continuing in a spiral fashion, we then see the `*` character, so...

    `fp` is a pointer to a function passing an int and a pointer to float returning a pointer to... 

Continuing in a spiral fashion we see the `;`, but we haven't visited all tokens, so we continue and finally get to the type `char`, so...

    `fp` is a pointer to a function passing an int and a pointer to float returning a pointer to a `char` 

如果把*fp外的括号去掉会变成什么呢？

char **fp( int, float *);

fp就变成了：fp是一个function，这个function返回一个指针，这个指针指向一个指向char的指针

fp is a function (int, float *) returning a pointer to a pointer to char

3.5.2 Accessing the Elements of an Array

用subscript访问array的元素，subscript的变量的类型是size_t，是一种unsigned类型。

3.5.3 Pointers and Arrays

当我们用array的时候，编译器通常会把array转换成pointer。

when we use an object of array type, we are really using a pointer to the first element in that array.

// EN p117
string nums[] = {"one", "two", "three"}; // array of strings
string *p = &nums[0]; // p points to the first element in nums
string *p2 = nums; // equivalent to p2 = &nums[0]

当我们用array作为初始值，用auto定义一个变量的时候，编译器推断变量的类型是pointer。

// EN p117
int ia[] = {0,1,2,3,4,5,6,7,8,9}; // ia is an array of ten ints
auto ia2(ia); // ia2 is an int* that points to the first element in ia

// the following is equivalent to the above
auto ia2(&ia[0]); // now it’s clear that ia2 has type int*

但是如果用decltype，推断的类型不会发生转换，还是array。

// EN p118
// ia3 is an array of ten ints
decltype(ia) ia3 = {0,1,2,3,4,5,6,7,8,9};
ia3 = p; // error: can’t assign an int* to an array
ia3[4] = i; // ok: assigns the value of i to an element in ia3

Pointers Are Iterators

指向array的元素的pointer，支持和vector、string的iterator相同的操作（increment、decrement...）

The Library begin and end Functions

// EN p118
int ia[] = {0,1,2,3,4,5,6,7,8,9}; // ia is an array of ten ints
int *beg = begin(ia); // pointer to the first element in ia
int *last = end(ia); // pointer one past the last element in ia

当然也和iterator一样，不能dereference或者increment off-the-end pointer

Pointer Arithmetic

pointer也可以用3.4.1和3.4.2里对iterator定义的操作。

两个pointer相减得到的结果的类型是ptrdiff_t，是一种signed integral

null pointer和指向非array的object的pointer，也可以用这些操作，虽然现在看起来好像没什么用。

Subcripts and Pointers

可以看到array和pointer联系非常紧密。而且可以对pointer用subscript，subscript还可以是负值。

array用的是built-in subscript operator，但是vector和string用的是他们类定义的operator，前者可以是负数，后者不行。

int ia[] = {0,2,4,6,8}; // array with 5 elements of type int
int i = ia[2]; // ia is converted to a pointer to the first element in ia
            // ia[2] fetches the element to which (ia + 2) points
int *p = ia; // p points to the first element in ia
i = *(p + 2); // equivalent to i = ia[2]

int *p = &ia[2]; // p points to the element indexed by 2
int j = p[1]; // p[1] is equivalent to *(p + 1),
            // p[1] is the same element as ia[3]
int k = p[-2]; // p[-2] is the same element as ia[0]

3.5.4 C-Style Character Strings

Although C++ supports C-style strings, they should not be used by C++ programs. C-style strings are a surprisingly rich source of bugs and are the root cause of many security problems. They’re also harder to use!

不推荐在C++程序里使用C-style字符串。

C-stlye string是一个convention，这个convention是指怎样去表示和使用character strings。遵循这个convention的string，使用character array，并以null character （\0）结尾。

C Library String Functions

strlen(p) Returns the length of p, not counting the null.

strcmp(p1, p2) Compares p1 and p2 for equality. Returns 0 if p1 == p2, a positive value if p1 > p2, a negative value if p1 < p2.

strcat(p1, p2) Appends p2 to p1. Returns p1.

strcpy(p1, p2) Copies p2 into p1. Returns p1.

如果一个char array的结尾不是null的话，用strlen的结果是undefined

Comparing Strings

比较C-style string要用strcmp

使用普通的relational或者equality操作符（> < = !=...）去比较两个char array的话，相当于比较两个不相关的pointer，结果是undefined。

Exercise 3.37: What does the following program do?

const char ca[] = {'h', 'e', 'l', 'l', 'o'};
const char *cp = ca;
while (*cp) {
    cout << *cp << endl;
    ++cp;
}

'\0' null terminator，实际上\开头的escape sequence。\可以接上1至3个八进制数字，表示的是一个字符的数值。

Some examples (assumingthe Latin-1 character set):

\7 (bell) \12 (newline) \40 (blank)

\0 (null) \115 (’M’) 4d (’M’)

\0是null，数值上正好是0。所以如果用在字符串末尾，然后用指针遍历字符串，当遍历到\0的时候，因为数值为0就是false，循环就停止了。

3.5.5 Interfacing to Older Code

Mixing Library strings and C-Style Strings

可以用string literal初始化一个string。可以通过c_str方法得到一个C-style string。但是c_str返回的array并不会永远都是valid，当原本的string内容改变了，之前返回的array就失效了。

1 2	string s("Hello World"); // s holds Hello World const char *str = s.c_str(); // ok

Using an Array to Initialize a vector

// EN p125
int int_arr[] = {0, 1, 2, 3, 4, 5};
// ivec has six elements; each is a copy of the corresponding element in int_arr
vector<int> ivec(begin(int_arr), end(int_arr));

// copies three elements: int_arr[1], int_arr[2], int_arr[3]
vector<int> subVec(int_arr + 1, int_arr + 4);

ADVICE: USE LIBRARY TYPES INSTEAD OF ARRAYS

Modern C++ programs should use vectors and iterators instead of built-in arrays and pointers, and use strings rather than C-style array-based character strings.

3.6 Multidimensional Arrays

subscript的数量没有限制

1 2	int arr[10][20][30] = {0}; // initialize all elements to 0 // int arr1[1][2][3][4]...

a two-dimensional array, the first dimension is usually referred to as the row and the second as the column.

Initializing the Elements of a Multidimensional Array

多维数组的初始化，可以用多层list，也可以不用

// EN p 126
int ia[3][4] = { // three elements; each element is an array of size 4
    {0, 1, 2, 3}, // initializers for the row indexed by 0
    {4, 5, 6, 7}, // initializers for the row indexed by 1
    {8, 9, 10, 11} // initializers for the row indexed by 2
};

// equivalent initialization without the optional nested braces for each row
int ia[3][4] = {0,1,2,3,4,5,6,7,8,9,10,11};

但是如果每一行只想初始化部分元素的话，还是要用多层list，理所当然，不然的话就只是初始化前几个元素了。

// explicitly initialize only element 0 in each row
int ia[3][4] = {{ 0 }, { 4 }, { 8 }};

// explicitly initialize row 0; the remaining elements are value initialized
int ix[3][4] = {0, 3, 6, 9};

Subscripting a Multidimensional Array

如果提供的subscript数量少于定义数组时给的dimension，那么得到的会是内部的数组。

// EN p127
// assigns the first element of arr to the last element in the last row of ia
ia[2][3] = arr[0][0][0];
int (&row)[4] = ia[1]; // binds row to the second four-element array in ia

Using a Range `for` with Multidimensional Arrays

如果使用range based for遍历多维数组，那么除了最内层的array，外层的for循环都必须用reference

1
2
3

for (auto &row : ia) // for every element in the outer array
    for (auto col : row) // for every element in the inner array
        cout << col << endl;

如果外层for不用reference的话会怎样呢。因为range for其实是个语法糖，编译器会转换成一个普通的for循环

for (auto row : ia)
// 上面相当于下面，省略了不相关内容
for (auto beg = begin(ia) ...)
{
    auto row = *beg;
    ...
}

begin(ia)得到的是一个指向ia的第一个元素的指针，所以类型是一个指向有size为4的array的指针：int (*p)[4]。

而对*beg得到的就是一个reference，这个reference绑定的obect是一个size为4的array。

我们知道auto会忽略reference，而使用array相当于使用一个指向array第一个元素的指针，所以auto row = *beg;会让row的类型变成一个指向int的指针。

关于range based for更多内容，参考我另一篇range based for的笔记。

Pointers and Multidimensional Arrays

a multidimensional array is really an array of arrays, the pointer type to which the array converts is a pointer to the first inner array:

// EN p129
int ia[3][4]; // array of size 3; each element is an array of ints of size 4
int (*p)[4] = ia; // p points to an array of four ints
p = &ia[2]; // p now points to the last element in ia

感想

这一章讲数组的内容占了很大篇幅，但是工作中几乎没见过用数组的，至少也是vector。可能在嵌入式里用的比较多？而且数组很多情况下会被当成指针，多维数组就更复杂，感觉很容易出错，也不易读。书里也说了现代c++程序推荐用vector、iterator和string。