TensorFlow + NLP 入门
业务需要,研究下TF和NLP(Natural language processing)。
Step 1 安装TensorFlow本地开发环境
先安装下TensorFlow,没TensorFlow没法开心的玩耍。
接下来直奔主题!
Google TF官网 -> Install(各式各样的安装方法,找最简单的🔍)-> Docker安装(Docker这东西真是人见人爱、花见花开)
docker pull tensorflow/tensorflow
镜像拉取好了,执行下:
docker run -it tensorflow/tensorflow bash
root@e7b70c1079df:/# python --version
Python 2.7.12
安装成功
官方执行本地代码的指令,备份下,后面好复制,常用!
docker run -it --rm -v $PWD:/tmp -w /tmp tensorflow/tensorflow python ./script.py
其他的有空在看,至此Python 2.7的TF在我的mac上已经顺利运行起来了。
Step 2 下载大神的NLP源码
https://github.com/dennybritz/cnn-text-classification-tf
This code belongs to the "Implementing a CNN for Text Classification in Tensorflow" blog post.
It is slightly simplified implementation of Kim's Convolutional Neural Networks for Sentence Classification paper in Tensorflow.
Requirements
- Python 3
- Tensorflow > 0.12
- Numpy
Shit。。。没注意,是Python 3环境,重新找下镜像 - -!
docker pull tensorflow/tensorflow:latest-py3-jupyter
有2个Tag,一个是latest-py3,另一个是latest-py3-jupyter,我选了带jupyter的,以备不时之需✌️
Pull好镜像后重复下上面的命令:
docker run -it tensorflow/tensorflow:latest-py3-jupyter bash
root@cb55072b95e5:/tf# python --version
Python 3.5.2
完美✌️😜
Step 3 是骡子是马,拉起来溜溜
下载好的源代码,用编辑器打开,我用的PyCharm,然后到程序根目录,执行命令:
docker run -it --rm -v $PWD:/tmp -w /tmp tensorflow/tensorflow:latest-py3-jupyter bash
进到容器里看下
root@88ef18eae92c:/tmp# ll
total 48
drwxr-xr-x 11 root root 352 Mar 11 16:34 ./
drwxr-xr-x 1 root root 4096 Mar 11 16:34 ../
-rwxr-xr-x 1 root root 870 Jul 20 2018 .gitignore*
drwxr-xr-x 6 root root 192 Mar 11 16:32 .idea/
-rwxr-xr-x 1 root root 11357 Jul 20 2018 LICENSE*
-rwxr-xr-x 1 root root 2280 Jul 20 2018 README.md*
drwxr-xr-x 3 root root 96 Jul 20 2018 data/
-rwxr-xr-x 1 root root 2472 Jul 20 2018 data_helpers.py*
-rwxr-xr-x 1 root root 3738 Jul 20 2018 eval.py*
-rwxr-xr-x 1 root root 3776 Jul 20 2018 text_cnn.py*
-rwxr-xr-x 1 root root 9073 Jul 20 2018 train.py*
至此,本地代码、Docker镜像、本地编辑器,完美连接在一起,可以开心的玩耍了✌️
按照大神的README.md指引,运行一下train脚本,看看help:
root@88ef18eae92c:/tmp# ./train.py --help
USAGE: ./train.py [flags]
flags:
./train.py:
--[no]allow_soft_placement: Allow device soft device placement
(default: 'true')
--batch_size: Batch Size (default: 64)
(default: '64')
(an integer)
--checkpoint_every: Save model after this many steps (default: 100)
(default: '100')
(an integer)
--dev_sample_percentage: Percentage of the training data to use for validation
(default: '0.1')
(a number)
--dropout_keep_prob: Dropout keep probability (default: 0.5)
(default: '0.5')
(a number)
--embedding_dim: Dimensionality of character embedding (default: 128)
(default: '128')
(an integer)
--evaluate_every: Evaluate model on dev set after this many steps (default: 100)
(default: '100')
(an integer)
--filter_sizes: Comma-separated filter sizes (default: '3,4,5')
(default: '3,4,5')
--l2_reg_lambda: L2 regularization lambda (default: 0.0)
(default: '0.0')
(a number)
--[no]log_device_placement: Log placement of ops on devices
(default: 'false')
--negative_data_file: Data source for the negative data.
(default: './data/rt-polaritydata/rt-polarity.neg')
--num_checkpoints: Number of checkpoints to store (default: 5)
(default: '5')
(an integer)
--num_epochs: Number of training epochs (default: 200)
(default: '200')
(an integer)
--num_filters: Number of filters per filter size (default: 128)
(default: '128')
(an integer)
--positive_data_file: Data source for the positive data.
(default: './data/rt-polaritydata/rt-polarity.pos')
Try --helpfull to get a list of all flags.
小小的鸡冻,漂亮滴打出了help内容 😄
略微看下,然。。。看不懂,先train一下试试吧
./train.py
顺利运行,一个字“稳”
Evaluation:
2019-03-11T16:54:25.745084: step 10300, loss 3.65937, acc 0.711069
Saved model checkpoint to /tmp/runs/1552322335/checkpoints/model-10300
2019-03-11T16:54:26.545635: step 10301, loss 0.000296011, acc 1
2019-03-11T16:54:26.776015: step 10302, loss 5.95611e-05, acc 1
2019-03-11T16:54:26.850238: step 10303, loss 0.00182802, acc 1
2019-03-11T16:54:26.932814: step 10304, loss 1.41934e-05, acc 1
2019-03-11T16:54:27.001730: step 10305, loss 0.00106164, acc 1
2019-03-11T16:54:27.072143: step 10306, loss 0.00159799, acc 1
2019-03-11T16:54:27.141833: step 10307, loss 0.000124719, acc 1
2019-03-11T16:54:27.216436: step 10308, loss 3.5929e-06, acc 1
2019-03-11T16:54:27.289951: step 10309, loss 1.2785e-05, acc 1
2019-03-11T16:54:27.362660: step 10310, loss 0.00844685, acc 1
2019-03-11T16:54:27.430708: step 10311, loss 0.000167686, acc 1
2019-03-11T16:54:27.502281: step 10312, loss 0.000110473, acc 1
2019-03-11T16:54:27.572092: step 10313, loss 0.000771175, acc 1
2019-03-11T16:54:27.644654: step 10314, loss 3.88898e-06, acc 1
2019-03-11T16:54:27.714136: step 10315, loss 0.000124581, acc 1
2019-03-11T16:54:27.781347: step 10316, loss 5.31748e-06, acc 1
2019-03-11T16:54:27.855259: step 10317, loss 0.000178186, acc 1
2019-03-11T16:54:27.925776: step 10318, loss 1.3183e-05, acc 1
2019-03-11T16:54:28.001957: step 10319, loss 0.000173645, acc 1
Step 4 阅读大大神之作
《Convolutional Neural Networks for Sentence Classification》
Author:
Yoon Kim
New York University
https://arxiv.org/pdf/1408.5882.pdf
《A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional
Neural Networks for Sentence Classification》
Author:
Ye Zhang
Dept. of Computer Science
University of Texas at Austin
Byron C. Wallace
iSchool
University of Texas at Austin
https://arxiv.org/pdf/1510.03820.pdf
我们回头见!