Chapter 2

2021-03-13 本文已影响0人 Wilbur_

2.1 Information storage

most computers use blocks of 8 bits, or bytes, as the smallest addressable unit of memory. A machine-level program views memory as a very large array of bytes, referred to as virtual memory.

这句话我觉得还是很重要的，每个byte都有他自己的地址，malloc的时候就是根据这个赋予的地址？

Every byte of memory is identified by a unique number, known as its address, and the set of all possible addresses is known as the virtual address space.

C中的pointer就是通过指向一个程序object在virtual memory的第一个字节来实现的

For example, the value of a pointer in C—whether it points to an integer, a structure, or some other program object—is the virtual address of the first byte of some block of storage.

C的compiler是存储type的信息的，但当机器实现的时候实际上并没有type的信息，也就是说int 还是char还是float在它眼里都是一块字节，程序本身就是一系列字节排在一起

The C compiler also associates type information with each pointer, so that it can generate different machine-level code to access the value stored at the location designated by the pointer depending on the type of that value. Although the C compiler maintains this type information, the actual machine-level program it generates has no information about data types. It simply treats each program object as a block of bytes and the program itself as a sequence of bytes.

2.1.1 Hexadecimal Notation

A single byte consists of 8 bits. In binary notation, its value ranges from 000000002 to 111111112. When viewed as a decimal integer, its value ranges from 010 to 25510. Neither notation is very convenient for describing bit patterns. Binary notation is too verbose, while with decimal notation it is tedious to convert to and from bit patterns. Instead, we write bit patterns as base-16, or hexadecimal numbers.

Figure2.2

little endian 和 big endian

两种不同的排序方式，手机上面大部分都是little endian

figure2.6

Representing Strings

0 means null in C??

A string in C is encoded by an array of characters terminated by the null(having value 0) character.

boolean algebra

这个想法很有意思，我之前学propositional logic law 的时候没这么想过……

image.png

2.2 Integer representation

Figure 2.8

Two's complement encoding

为了实现负数，最大一位数被当作负数，然后之后的数都还是当作正数的，这样1111其实就是表达为-1，因为-8 + 7 等于-1.
下面是公式

2.3 and 2.4

2.14

This line was generated by a disassembler, a tool that determines the instruction sequence represented by an executable program file. We will learn more about disassemblers and how to interpret lines such as this in Chapter 3. For now, we simply note that this line states that the hexadecimal byte sequence 01 05 43 0b 20 00 is the byte-level representation of an instruction that adds a word of data to the value stored at an address computed by adding 0x200b43 to the current value of the program counter, the address of the next instruction to be executed. If we take the final 4 bytes of the sequence 43 0b 20 00 and write them in reverse order, we have 00 20 0b 43.

我突然想起来谷歌那道面试题可能还是有实际作用的，因为你在reverse 一个由个别单词组成的string 而不用额外空间的时候好像这里就可以用到，因为它是按byte来读取的，而每台机器的byte order有可能不一样，所以有的机器需要做倒序处理，这样不用而外空间来操作就可以省出很多空间了，特别是读取数据量特别大的时候。

Two's complement

有正负数的时候第一位永远都是表示这个数位是否是负，而那位的二进制表示如果是1，那么就是-2^w
所以unsigned转成signed数字的时候就是加上-2^w，因为第w位（最大位）的数字表示为负数了

Casting

In casting from unsigned to int, the underlying bit representation stays the same. This is a general rule for how most C implementations handle conversions between signed and unsigned numbers with the same word size -- the numeric values might change, but the bit patterns do not.
The bit representation of 32 bit int for unsigned value is same as -1 bit pattern then it is signed int.
That is,

T2U32(−1) = 4,294,967,295, and U2T32(4,294,967,295) = −1. That is, UMax has the same bit representation in un- signed form as does −1 in two’s-complement form. We can also see the relationship between these two numbers: 1 + UMaxw = 2w.

Going the other direction, the most left bit value will represent negative. So the unsigned value will reduce by 2^w where w is the left most bit. When it is 4 bit word, change from unsigned to signed, the number will reduce 2 ^ 3.

2's complement

Integer Alrithmatic

两个w bit 长度的数相加需要w+1bits （因为会进位）而如果我们只保存w bit 长度的结果，那么我们实际上是对两个数的和进行了mod 2^(w)的操作

This can be characterized as a form of modular arithmetic, computing the sum modulo 2w by simply discarding any bits with weight greater than 2w−1 in the bit-level representation of x + y. For example, consider a 4-bit number representation with x = 9 and y = 12, having bit representations [1001] and [1100], respectively. Their sum is 21, having a 5-bit representation [10101]. But if we discard the high-order bit, we get [0101], that is, decimal value 5. This matches the value 21 mod 16 = 5.

Overflow

4 cases. This graph is helpful.
Case 1 Negative overflow
Case 2 Normal negative
Case 3 Normal positive
Case 4 Positive Overflow

Overflow example for 4 bit int

与算法

因为two's complement 的性质，-x 跟 ~x + 1 是一样的，那么x & -x 其实就保留了最后一位1。
&其实就是对两个set取并集。而-x只有最后一位1是一样的。

x&-x

Integer multiplication

Historically, the integer multiply instruction on many machines was fairly slow, requiring 10 or more clock cycles, whereas other integer operations—such as addition, subtraction, bit-level operations, and shifting—required only 1 clock cycle. Even on the Intel Core i7 Haswell we use as our reference machine, integer multiply requires 3 clock cycles. As a consequence, one important optimization used by compilers is to attempt to replace multiplications by constant factors with combinations of shift and addition operations.

float alrithmitic

因为float group不是associative，所以a+b+c != b+c+a，这很重要因为顺序matters。那么之前我们做测试，一个float 乘以很大的一个数的时候可能就把小数位忽略掉了。