COMP9021 Principles of Programmi

2017-07-28 本文已影响0人 Sisyphus235

原本想把每周的课程内容放在一篇文档中，无奈Martin的信息密度太大，所以把每周内容按照自然发生的状况拆解为两篇--optional lecture & lecture。由于逻辑层级比较多，所以标题使用paper的标题方法，X + X.X + X.XX ...，方便检索。

1.Pre-reading

1.1 Introduction to Unix

1.1.1 Unix Commands (with/without options/arguments)

（1）cal: 进入calendar，只有一个option "cal"
（2）cal 2017：进入2017年历，在option "cal" 后加入一个argument "2017"
（3）cal 3 2017：进入2017年3月月历，在option "cal" 后加入2个argument "3"和"2017"
（4）date：显示当前日期时间，只有一个option "date"
（5）清空显示--control + L
（6）./source + "文件名"--在terminal中运行“文件名”，例如 . test或者source test
（7）sudo + "命令"--以系统管理员身份运行"命令"，sudo是superuser do，例如sudo pip3 install bs4，用系统管理员身份运行pip3安装bs4包
（8）tar xf + "文件名"--解压“文件名”，tar是tape archive，xf是extract file，文件应该是压缩文件
（9）esc + b--命令行向后回退，b是backward
（10）ese + f--命令行向前前进，f是forward

课程review：
（1）cd代表change directory
（2）ls代表list
（3）cat代表concatenate
（4）python默认进入python2
（5）python3进入python3
（6）退出python的快捷键是control + D
（7）echo用来显示内容
（8）echo “alias python = python3” > .profile
改变Terminal中环境变量，python代表python3

1.1.2 Syntax for paths

（1）pwd--print working directory，显示当前路径
（2）mkdir--make directory，可以简单理解为windows下创建文件夹，例如mkdir test。还可以同时创建多个同级路径，例如mkdir test1 test2 test3
（3）control + A--回到输入命令的最前面
（4）esc + del--删除前面输入的命令，esc和del都是键盘上的按键
（5）TAB--自动补全命令，比如输入路径时，路径名为/Users/lecture_1/，那么可以输入/U + TAB。TAB + TAB显示所有符合已经输入前缀的文件
（6）路径中"~"（引号中内容代表命令）代表home directory，由环境变量设定的位置，默认是盘符，课上Martin设置在COMP9021的路径上
（7）cd ..--返回路径上一级
（8）mkdir -p XXX/XXX--创建多级路径，-p是path的缩写，后面的XXX/XXX是目录下的目录，例如mkdir -p home/test
（9）ls "路径名"--显示”路径名“下文件，例如ls home。ls -a显示所有文件，包括隐藏文件。ls -l显示文件详细信息，包括权限等。
（10）absolute path绝对路径，指目录下的绝对位置，直接到达目标位置，通常是从盘符开始的路径，例如 /Users/desktop/XXX
（11）>/touch + "文件名"--创建"文件名"，>或者touch都可以作为option，后面加一个文件名即可，例如> file_1或者touch file_1
（12）rm + "文件名"--删除"文件名"，例如rm file_1
（13）mv + "文件名" + "路径"--把"文件名"移动到"路径"，例如mv file_1 ../test，含义是把file_1移动到当前目录上一级再转移到上级目录的test路径下
（14）chmod--change mode设置文件权限的命令，后面的数字表示不同用户或用户组的权限，r是read，w是write，x是execute，详见https://en.wikipedia.org/wiki/Chmod，中文见https://zh.wikipedia.org/wiki/Chmod
（15）echo $PATH--显示echo执行路径，注意PATH要大写
（16）cp -r + "路径1" +"路径2"--把"路径1"复制到"路径2"，cp是copy，-r是recursive，例如cp -r ../test1 ../test2，含义是把当前目录上一级的test1路径拷贝到上一级目录的test2
（17）man + "option"--查看"option"的manual，例如man cp，查看copy的manual，查看时"空格"是向下操作，"U"是向上操作，"Q"是退出manual
（18）*--wild card通配符，例如，test_*代表所有以test_开头的文件

1.2 Software Installation and Jupyter

1.2.1 安装Jupyter

pip3 install jupyter
（1）pip3 install XXX--使用pip3安装XXX，比如安装课上用到的bs4，pip3 install bs4
（2）pip3 list--显示所有pip3安装的内容
（3）pip3 list --outdated--显示需要更新的安装包
（4）pip3 -U "安装包名"--更新"安装包名"，-U是update，例如pip3 -U jupiter

如何解决安装后无法运行Jupyter的问题？
1.确定安装了jupiter
2.输入find / -name "jupiter"
3.输入2后看到对应的路径例如:/Library/Frameworks/Python.framework/Versions/3.6/bin/jupyter
4.export PATH=$PATH:/Library/Frameworks/Python.framework/Versions/3.6/bin/ #注意$PATH后边是你自己查找得结果不一样的人可能不一样不要丢掉冒号最后只输入到文件夹把jupiter去掉
5.再运行jupiter试试
——muyang

1.2.2 运行Jupyter

jupyter notebook
（1）在jupyter中运行文件代码的快捷键是control + enter
（2）清除运行的结果—在jupyter页面选择cell-all output-clear
（3）在Terminal中终止jupyte的快捷键是control + C

1.3 Running Python code

（1）在terminal中运行python文件，输入python3 + ”文件名“，例如，python3 test.py，注意，文件名一定要以.py结尾，以.py结尾的文件也叫做module
（2）在terminal中运行python，输入python3，会出现”>>>“的prompt，之后像python软件中一样操作，比如输入2 ** 3，terminal输出8
（3）在terminal中输入python3运行后，可以用import + "文件名"的形式导入module
（4）文件名不要有空格出现，否则terminal中处理项目会出现错误，养成把空格用_替代的习惯
（5）vim中，w命令是move forward by one word，b命令是move backward by one word，X命令是delete previous character，r命令是replace character，:wq命令是保存退出vim

2.Lecture

考试形式是机考

2.1 Jupyter Notebook Sheets

可以从课程材料中下载，建议用jupter notebook运行，自行学习。学习中，先判断运行结果，再运行核对，有问题及时google查询或者在python tutorial中查看。

2.2 Turing machine

传说中的图灵机，可以理解为数学逻辑机，可以看作等价于任何有限逻辑数学过程的终极强大逻辑机器。图灵机用机器来模拟人们的数学运算，机器只有两种操作：1.书写或者擦除某个符号；2.移动位置。图灵机是一种无限长的纸带（tape），开始的位置是head，有一套控制规则table，以及一个状态寄存器（用来记录当前状态）。
A Turing machine is a mathematical model of computation that defines an abstract machine which manipulates symbols on a strip of tape according to a table of rules. (https://en.wikipedia.org/wiki/Turing_machine)
课程材料中有一个文件是turing_machine_simulator.py，使用上文提到的运行python的方式运行它以启动图灵机。关于该图灵机的使用方法，在图形操作界面的最上方有一个Turing Machine Stimulor Help，查看具体的使用方法。（核心内容已经加粗）

Tape:
Control clicking to the right or to the left of the current rightmost or leftmost cell, respectively, adds a new cell.
Control clicking on the current rightmost or leftmost added cell removes it.
Clicking on any cell flips the bit it contains from 1 to 0 or from 0 to 1.
Program:
A program is a set of instructions of the form (state1, bit1, state2, bit2, dir) where state1 and state2 have to be alphanumeric words with at most 8 characters, bit1 and bit2 have to be 0 or 1, and dir has to be L or R.
When the TM machine is in state state1 with its head pointing to a cell containing bit1, then it changes bit1 to bit2 in that cell, modifies its state to state2, and moves its head one cell to the right or to the left as determined by dir.
The TM machine is supposed to be deterministic, hence the program should not contain two instructions starting with the same pair (state1, bit1).
The program can contain comments, namely, lines starting with #.
Execution:
When the leftmost button displays Start, the status indicator is red, the tape can be modified, the program can be edited, the Step and Continue buttons are disabled, and no State or Iteration is displayed.
Once this button has been pressed, it displays Stop, the status indicator is green, the tape cannot be modified, the program cannot be edited, and the current State and Iteration are displayed.
When execution stops, either because no instruction can be executed or because Stop has been pressed, the Step and Continue buttons are disabled and the leftmost button displays Reset; it has to be pressed to restore the tape to its initial configuration, with only the "origin" cell containing 1.
Pressing the Start button prompts the user for an initial state, which has to be an alphanumeric word with at most 8 characters, and commences execution provided at least one cell contains 1, in which case the head initially points to the leftmost cell containing 1.
The Step button executes one instruction, if possible; otherwise execution stops.
The Continue buttom executes up to 1,000 instructions, if possible; otherwise execution stops.
The Stop button allows one to start a new excution in case it is either not desirable or not possible to terminate execution with a sequence of clicks on the Step or Continue buttons.

二进制中，1是1，2是10，3是11；图灵机中，只用位置表达数字，1是1，2是11，3是111。

2.2.1 无限改变数字

从左向右保留现有状态所有的1，把0全都变成1，无限操作

stupid 1 stupid 1 R
如果stupid状态遇见1，那么保留该状态和数字1，向右移动1位
stupid 0 stupid 1 R
如果stupid状态遇见0，那么保留该状态，将数字0改成1，向右移动1位

由于所有的情况都有定义，所以程序会无限进行下去，直到溢出。

2.2.2 有限改变数字

从左向右把从开始连续的1全都变成0，保留所有的0，所有1都改变成0时停止

stupid 1 stupid 0 R
如果stupid状态遇见1，那么保留该状态，将数字1改成0，向右移动1位

由于stupid状态没有定义遇见0，所以遇见0时程序结束

2.2.3 数字加1运算

（1）最直接想到的解决方式是从左向右运行程序，把右边遇到的第一个0改成1

work 1 work 1 R
work 0 end 1 R/L
如果work状态遇见0，那么改变状态到end，将数字0改成1，向左/右移动1位，这里移动方向不重要

程序结束的原因是改变到end状态后，程序中没有定义任何关于遇到end状态下的0或者1该如何处理，所以停止运行。
当数字很小的时候，这个程序运行没什么问题，但是当数字非常大，比如1亿，这个时候从左到右不断check是不是1的过程会变得非常缓慢，所以要优化程序。
（2）优化程序的办法就是不去寻找最右边的第一个0，而是思考我们核心要完成什么任务？应该是在一堆连续的1上增加一个1，这个1增加在右边还是左边都没有根本影响。所以想到可以直接在左侧增加一个1来完成任务。

work 1 work 1 L
work 0 end 1 L/R

2.2.4 两个数的加法

如何存储两个数字？用一个0间隔开两个数，比如存储3和5，那么表达是：1110111111，左边3个1代表3，然后1个0代表separation，之后5个1代表5.
基于上述方法的加法解决方式：由于是两个数相加得到一个数，所以必须把separation的0改成1，这样在原基础上就会得到比正确结果大1的数，所以要删除最开始的一个1。例如3+5，最开始的存储是1110111111，如果把separation的0改成1，那么变成1111111111，代表数字9，比答案8大1，所以要再删除最开始的一个1。为什么是最开始的1？根据上面数字加1运算的实例，删除左侧的1是速度最快的。

del 1 mov 0 R
mov 1 mov 1 R
mov 0 end 1 R

2.2.5 一个数除以2

如果一个数是偶数，那么除以2意味着把原来2n个1变成n个1，例如8/2=4，就是11111111变成1111；
如果一个数是奇数，那么除以2意味着把原来2n+1个1变成n个1，例如7/2=3，就是1111111变成111。
所以，程序应该是先从最开始删除遇见的2个1，然后一路向右到原来数字的结尾，找到一个separation0，在其后一位增加一个1。然后一路向左返回到数字开始的地方，重复之前的操作。终止结果是separation0左侧再也没有两个1（一个1是奇数的情况，直接去尾即可）

del1 1 del2 0 R
删除遇见的第一个1，状态从’删除1‘改成’删除2‘，因为共要删除两个1，用状态的删除X来记录删除的个数
del2 1 movR1 0 R
删除遇见的第二个1，状态从’删除2‘改成’右移1‘，因为删除2个1后要一路右移直到创建结果，用状态’右移1‘的数字1代表在第一个数中
movR1 1 movR1 1 R
如果在第一个数中，状态和数字都不变，一路向右直到间隔符0
movR1 0 movR2 0 R
第一个数字结束时，遇见数字0，把它当做间隔符，数字不变，状态改为’右移2‘，用状态’右移2‘的数字2代表进入第二个数（即结果的数字）
movR2 1 movR2 1 R
如果在第二个数字中右移遇见数字1，则一路继续向右直到第二个数字结尾
movR2 0 movL2 1 L
如果在第二个数字中遇见第一个数字0，则把它变成数字1，完成除以2的任务，然后状态变成’左移2‘，用状态’左移2‘的数字2代表在第二个数中
movL2 1 movL2 1 L
如果在第二个数字中左移遇见数字1，则一路继续向左直到间隔符0
movL2 0 movL1 0 L
第二个数字结束时，遇见间隔符0，数字不变，状态改为’左移1‘，用状态’左移1‘的数字1代表进入第一个数
movL1 1 movL1 1 L
如果在第一个数字中左移遇见数字1，则一路继续向左直到第一个数字结束
movL1 0 del1 0 R
第一个数字结束后遇见数字0，数字不变，状态从’左移1‘转变到’删除1‘，继续前面的步骤，直到运算结束

程序终止的原因是：如果数字是偶数，那么会在del1状态遇见0，该指令未定义，程序结束；如果数字是奇数，那么会在del2状态遇见0，该指令未定义，程序结束.

2.3 Python3-Introduction to operators, lists, dictionaries, strings and control structures

使用课程提供的jupyter notebook sheet辅助学习。

2.3.1 python模拟图灵机

f = open('division_by_2.txt')  
for line in f:
    print(line)
f.close()
>>>
del1 1 del2 0 R

del2 1 mov1R 0 R

mov1R 1 mov1R 1 R

mov1R 0 mov2R 0 R

mov2R 1 mov2R 1 R

mov2R 0 mov1L 1 L

mov1L 1 mov1L 1 L

mov1L 0 mov2L 0 L

mov2L 1 mov2L 1 L

mov2L 0 del1 0 R

打开division_by_2.txt，逐行读取并输出。这个代码的问题有两个。一个是写起来麻烦，一旦忘记close文件会出问题，有隐患；另一个是输出难看，中间有空行，是因为print默认的end = '\n'。修改如下：

with open('division_by_2.txt') as f:
    for line in f:
        print(line, end = '')

>>>
del1 1 del2 0 R
del2 1 mov1R 0 R
mov1R 1 mov1R 1 R
mov1R 0 mov2R 0 R
mov2R 1 mov2R 1 R
mov2R 0 mov1L 1 L
mov1L 1 mov1L 1 L
mov1L 0 mov2L 0 L
mov2L 1 mov2L 1 L
mov2L 0 del1 0 R

with ... as ...的文件操作容易编写且不易出错。接下来进一步对每行读取的信息进行处理，首先要分隔开读到的信息，这样才有意义，使用split()内置函数。

with open('division_by_2.txt') as f:
    for line in f:
        print(line.split())
>>>
['del1', '1', 'del2', '0', 'R']
['del2', '1', 'mov1R', '0', 'R']
['mov1R', '1', 'mov1R', '1', 'R']
['mov1R', '0', 'mov2R', '0', 'R']
['mov2R', '1', 'mov2R', '1', 'R']
['mov2R', '0', 'mov1L', '1', 'L']
['mov1L', '1', 'mov1L', '1', 'L']
['mov1L', '0', 'mov2L', '0', 'L']
['mov2L', '1', 'mov2L', '1', 'L']
['mov2L', '0', 'del1', '0', 'R']

split(X)是python的内置函数，基本含义是根据X分割内容并输出到一个list中，如果()为空，default是以空格分割内容。
然后Martin讲了很多基础的dictionary, assignment, list, tuple的基础知识，容易查询，在此不赘述。
这样的输出结果不好看，所以接着把每行中写入list的信息assign给相应的变量，方便处理：

with open('division_by_2.txt') as f:
    for line in f:
        state_1, bit_1, state_2, bit_2, direction = line.split()
        print(state_1, state_2)
>>>
del1 del2
del2 mov1R
mov1R mov1R
mov1R mov2R
mov2R mov2R
mov2R mov1L
mov1L mov1L
mov1L mov2L
mov2L mov2L
mov2L del1

然后用dictionary的映射来表征状态和数字的改变，建立一个图灵机的运行机制：

instructions = {}

with open('division_by_2.txt') as f:
    for line in f:
        state_1, bit_1, state_2, bit_2, direction = line.split()
        instructions[state_1, bit_1] = state_2, bit_2, direction
instructions
>>>
{('del1', '1'): ('del2', '0', 'R'),
 ('del2', '1'): ('mov1R', '0', 'R'),
 ('mov1L', '0'): ('mov2L', '0', 'L'),
 ('mov1L', '1'): ('mov1L', '1', 'L'),
 ('mov1R', '0'): ('mov2R', '0', 'R'),
 ('mov1R', '1'): ('mov1R', '1', 'R'),
 ('mov2L', '0'): ('del1', '0', 'R'),
 ('mov2L', '1'): ('mov2L', '1', 'L'),
 ('mov2R', '0'): ('mov1L', '1', 'L'),
 ('mov2R', '1'): ('mov2R', '1', 'R')}

字典的输出不好看，用箭头来优化输出内容：

instructions = {}

with open('division_by_2.txt') as f:
    for line in f:
        state_1, bit_1, state_2, bit_2, direction = line.split()
        instructions[state_1, bit_1] = state_2, bit_2, direction
    for key in instructions:
        print(key, "-->", instructions[key])
>>>
('del1', '1') --> ('del2', '0', 'R')
('del2', '1') --> ('mov1R', '0', 'R')
('mov1R', '1') --> ('mov1R', '1', 'R')
('mov1R', '0') --> ('mov2R', '0', 'R')
('mov2R', '1') --> ('mov2R', '1', 'R')
('mov2R', '0') --> ('mov1L', '1', 'L')
('mov1L', '1') --> ('mov1L', '1', 'L')
('mov1L', '0') --> ('mov2L', '0', 'L')
('mov2L', '1') --> ('mov2L', '1', 'L')
('mov2L', '0') --> ('del1', '0', 'R')

完成图灵机的基本操作规范后，尝试模拟图灵机的”01“图形界面：

instructions = {}

with open('division_by_2.txt') as f:
    for line in f:
        state_1, bit_1, state_2, bit_2, direction = line.split()
        instructions[state_1, bit_1] = state_2, bit_2, direction

tape = [0] * 3 + [1] * 7 + [0] * 6

current_state = 'del1'
current_position = 3
current_bit = tape[current_position]

while (current_state, current_bit) in instructions:
    next_state, new_bit, direction = instructions[current_state, current_bit]
    print(next_state, new_bit, direction)

建立一个叫tape的list来模拟”01“图形界面，并初始化当前状态是del1，位置是3，这个位置上的数字是1。
然后根据前文建立的instruction字典的图灵机基本操作规范，找到并assign三个变量next_state, new_bit, direction的值。
接下来要完成的是同步在模拟”01“图形界面显示0和1的变化，也就是在tape的list中改变数字0和1。为了达到这个目的，先要把dictionary中bit_1和bit_2的type从str改成int，这样方便运算：

instructions = {}

with open('division_by_2.txt') as f:
    for line in f:
        state_1, bit_1, state_2, bit_2, direction = line.split()
        instructions[state_1, int(bit_1)] = state_2, int(bit_2), direction

tape = [0] * 3 + [1] * 7 + [0] * 6

current_state = 'del1'
current_position = 3
current_bit = tape[current_position]

print(tape)
while (current_state, current_bit) in instructions:
    next_state, new_bit, direction = instructions[current_state, current_bit]
    tape[current_position] = new_bit
    current_state = next_state
    if direction == 'R':
        current_position += 1
    else:
        current_position -= 1
    current_bit = tape[current_position]
    print(tape)
>>>
[0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0]
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0]

输出结果不好看，可以通过写print的函数来优化输出方法：

instructions = {}

def print_tape():
    print(' '.join(str(e) for e in tape))
    
def print_state():
    print('  ' * current_position, current_state, sep = '')
    
def print_tape_and_state():
    print_tape()
    print_state()

with open('division_by_2.txt') as f:
    for line in f:
        state_1, bit_1, state_2, bit_2, direction = line.split()
        instructions[state_1, int(bit_1)] = state_2, int(bit_2), direction

tape = [0] * 3 + [1] * 7 + [0] * 6

current_state = 'del1'
current_position = 3
current_bit = tape[current_position]

print_tape_and_state()
while (current_state, current_bit) in instructions:
    next_state, new_bit, direction = instructions[current_state, current_bit]
    tape[current_position] = new_bit
    current_state = next_state
    if direction == 'R':
        current_position += 1
    else:
        current_position -= 1
    current_bit = tape[current_position]
    print_tape_and_state()
>>>
0 0 0 1 1 1 1 1 1 1 0 0 0 0 0 0
      del1
0 0 0 0 1 1 1 1 1 1 0 0 0 0 0 0
        del2
0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0
          mov1R
0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0
            mov1R
0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0
              mov1R
0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0
                mov1R
0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0
                  mov1R
0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0
                    mov1R
0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0
                      mov2R
0 0 0 0 0 1 1 1 1 1 0 1 0 0 0 0
                    mov1L
0 0 0 0 0 1 1 1 1 1 0 1 0 0 0 0
                  mov2L
0 0 0 0 0 1 1 1 1 1 0 1 0 0 0 0
                mov2L
0 0 0 0 0 1 1 1 1 1 0 1 0 0 0 0
              mov2L
0 0 0 0 0 1 1 1 1 1 0 1 0 0 0 0
            mov2L
0 0 0 0 0 1 1 1 1 1 0 1 0 0 0 0
          mov2L
0 0 0 0 0 1 1 1 1 1 0 1 0 0 0 0
        mov2L
0 0 0 0 0 1 1 1 1 1 0 1 0 0 0 0
          del1
0 0 0 0 0 0 1 1 1 1 0 1 0 0 0 0
            del2
0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0
              mov1R
0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0
                mov1R
0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0
                  mov1R
0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0
                    mov1R
0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0
                      mov2R
0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0
                        mov2R
0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 0
                      mov1L
0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 0
                    mov1L
0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 0
                  mov2L
0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 0
                mov2L
0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 0
              mov2L
0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 0
            mov2L
0 0 0 0 0 0 0 1 1 1 0 1 1 0 0 0
              del1
0 0 0 0 0 0 0 0 1 1 0 1 1 0 0 0
                del2
0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0
                  mov1R
0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0
                    mov1R
0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0
                      mov2R
0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0
                        mov2R
0 0 0 0 0 0 0 0 0 1 0 1 1 0 0 0
                          mov2R
0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0
                        mov1L
0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0
                      mov1L
0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0
                    mov1L
0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0
                  mov2L
0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0
                mov2L
0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0
                  del1
0 0 0 0 0 0 0 0 0 0 0 1 1 1 0 0
                    del2

使用join()内置函数把每一行的list用空格连接起来，然后在下一行更换state的位置把state输出。
转换state和bit的if条件判断看起来很难看，继续优化：

instructions = {}

def print_tape():
    print(' '.join(str(e) for e in tape))
    
def print_state():
    print('  ' * current_position, current_state, sep = '')
    
def print_tape_and_state():
    print_tape()
    print_state()

with open('division_by_2.txt') as f:
    for line in f:
        state_1, bit_1, state_2, bit_2, direction = line.split()
        instructions[state_1, int(bit_1)] = state_2, int(bit_2), direction

tape = [0] * 3 + [1] * 7 + [0] * 6

current_state = 'del1'
current_position = 3
current_bit = tape[current_position]

print_tape_and_state()
while (current_state, current_bit) in instructions:
    next_state, new_bit, direction = instructions[current_state, current_bit]
    tape[current_position] = new_bit
    current_state = next_state
    current_position += (direction == 'R') * 2 - 1
    #如果direction是'R'，那么这个boolean结果是1，所以现在的位置就+1；
    #如果direction不是'R'，那么这个boolean结果是0，所以现在的位置就-1。
    current_bit = tape[current_position]
    print_tape_and_state()
>>>
输出结果和上面的一样

2.3.2 python网络爬取处理world bank数据

python3 worldbank.py运行课程材料里的网络爬虫程序。如果报错，可能是缺少bs4和openpyxl的安装包，使用前文提到的pip3方法安装。
因为爬到的数据格式是' $123.67 million '之类的，在处理数据前要先把这样的string转换成真正的number。

units = {'thousand': 10**3, 'million': 10**6, 'billion': 10**9}

s = '   $123.67 million '
units['million']
>>>
1000000

思路和python模拟图灵机是一样的，创建一个叫做units的dictionary建立string（million, billion, etc.）和units（10**6, 10**9）之间的映射关系。
接着应用建立的units机制处理爬下来的string类型的数据：

units = {'thousand': 10**3, 'million': 10**6, 'billion': 10**9}

s = '   $123.67 million '

for unit in units:
    if unit in s:
        s = s.rstrip(unit)
print(s)
>>>
   $123.67 million

使用内置函数rstrip(unit)来去除爬取的string中的unit文字（比如，million, billion等在units字典中的内容）。
运行发现出错，没能实现目标，原因是unit前面有空格，继续改进：

units = {'thousand': 10**3, 'million': 10**6, 'billion': 10**9}

s = '   $123.67 million '

for unit in units:
    if unit in s:
        s = s.strip().rstrip(unit)
print(s)
>>>
$123.67

这样完成去掉unit的工作，再去掉$符号：

units = {'thousand': 10**3, 'million': 10**6, 'billion': 10**9}

s = '   $123.67 million '

for unit in units:
    if unit in s:
        s = s.strip().rstrip(unit).lstrip('$')
print(s)
>>>
123.67

用内置函数lstrip('$')从左边去掉了$符号，但输出结果中在123.67后面还有空格残存，所以继续优化程序：

units = {'thousand': 10**3, 'million': 10**6, 'billion': 10**9}

# thousand/million/billion
s = '   $123.67 million '

for unit in units:
    if unit in s:
        x = float(s.strip().rstrip(unit).lstrip('$')) * units[unit]
print(x)
>>>
123.67

然后把去掉unit的数字修改为真正的数字，比如123.67应该是123.67*(10**6)

units = {'thousand': 10**3, 'million': 10**6, 'billion': 10**9}

# thousand/million/billion
s = '   $123.67 million '

for unit in units:
    if unit in s:
        x = int(float(s.strip().rstrip(unit).lstrip('$')) * units[unit])
print(x)
>>>
123670000

为了把爬取下来的数字放到表格中，还需要知道放入的行号，所以使用内置函数enumerate()来同时存储行号和数字：

L = [10, 23, 67, 98]
for number, value in enumerate(L, 2):
    print(number, value)
>>>
2 10
3 23
4 67
5 98

从第二行开始存储爬取的数据。

往期回顾
COMP9021 Principles of Programming WEEK1 Optional