TensorFlow与NLP学习

TensorFlow + NLP 入门

2019-03-12  本文已影响0人  Joshua_精东

业务需要,研究下TF和NLP(Natural language processing)。

Step 1 安装TensorFlow本地开发环境

先安装下TensorFlow,没TensorFlow没法开心的玩耍。

接下来直奔主题!

Google TF官网 -> Install(各式各样的安装方法,找最简单的🔍)-> Docker安装(Docker这东西真是人见人爱、花见花开)

docker pull tensorflow/tensorflow

镜像拉取好了,执行下:

docker run -it tensorflow/tensorflow bash

root@e7b70c1079df:/# python --version
Python 2.7.12

安装成功

官方执行本地代码的指令,备份下,后面好复制,常用!

docker run -it --rm -v $PWD:/tmp -w /tmp tensorflow/tensorflow python ./script.py

其他的有空在看,至此Python 2.7的TF在我的mac上已经顺利运行起来了。

Step 2 下载大神的NLP源码

https://github.com/dennybritz/cnn-text-classification-tf

This code belongs to the "Implementing a CNN for Text Classification in Tensorflow" blog post.

It is slightly simplified implementation of Kim's Convolutional Neural Networks for Sentence Classification paper in Tensorflow.

Requirements

Shit。。。没注意,是Python 3环境,重新找下镜像 - -!
docker pull tensorflow/tensorflow:latest-py3-jupyter
有2个Tag,一个是latest-py3,另一个是latest-py3-jupyter,我选了带jupyter的,以备不时之需✌️
Pull好镜像后重复下上面的命令:
docker run -it tensorflow/tensorflow:latest-py3-jupyter bash

root@cb55072b95e5:/tf# python --version
Python 3.5.2

完美✌️😜

Step 3 是骡子是马,拉起来溜溜

下载好的源代码,用编辑器打开,我用的PyCharm,然后到程序根目录,执行命令:
docker run -it --rm -v $PWD:/tmp -w /tmp tensorflow/tensorflow:latest-py3-jupyter bash
进到容器里看下

root@88ef18eae92c:/tmp# ll
total 48
drwxr-xr-x 11 root root   352 Mar 11 16:34 ./
drwxr-xr-x  1 root root  4096 Mar 11 16:34 ../
-rwxr-xr-x  1 root root   870 Jul 20  2018 .gitignore*
drwxr-xr-x  6 root root   192 Mar 11 16:32 .idea/
-rwxr-xr-x  1 root root 11357 Jul 20  2018 LICENSE*
-rwxr-xr-x  1 root root  2280 Jul 20  2018 README.md*
drwxr-xr-x  3 root root    96 Jul 20  2018 data/
-rwxr-xr-x  1 root root  2472 Jul 20  2018 data_helpers.py*
-rwxr-xr-x  1 root root  3738 Jul 20  2018 eval.py*
-rwxr-xr-x  1 root root  3776 Jul 20  2018 text_cnn.py*
-rwxr-xr-x  1 root root  9073 Jul 20  2018 train.py*

至此,本地代码、Docker镜像、本地编辑器,完美连接在一起,可以开心的玩耍了✌️

按照大神的README.md指引,运行一下train脚本,看看help:

root@88ef18eae92c:/tmp# ./train.py --help

       USAGE: ./train.py [flags]
flags:

./train.py:
  --[no]allow_soft_placement: Allow device soft device placement
    (default: 'true')
  --batch_size: Batch Size (default: 64)
    (default: '64')
    (an integer)
  --checkpoint_every: Save model after this many steps (default: 100)
    (default: '100')
    (an integer)
  --dev_sample_percentage: Percentage of the training data to use for validation
    (default: '0.1')
    (a number)
  --dropout_keep_prob: Dropout keep probability (default: 0.5)
    (default: '0.5')
    (a number)
  --embedding_dim: Dimensionality of character embedding (default: 128)
    (default: '128')
    (an integer)
  --evaluate_every: Evaluate model on dev set after this many steps (default: 100)
    (default: '100')
    (an integer)
  --filter_sizes: Comma-separated filter sizes (default: '3,4,5')
    (default: '3,4,5')
  --l2_reg_lambda: L2 regularization lambda (default: 0.0)
    (default: '0.0')
    (a number)
  --[no]log_device_placement: Log placement of ops on devices
    (default: 'false')
  --negative_data_file: Data source for the negative data.
    (default: './data/rt-polaritydata/rt-polarity.neg')
  --num_checkpoints: Number of checkpoints to store (default: 5)
    (default: '5')
    (an integer)
  --num_epochs: Number of training epochs (default: 200)
    (default: '200')
    (an integer)
  --num_filters: Number of filters per filter size (default: 128)
    (default: '128')
    (an integer)
  --positive_data_file: Data source for the positive data.
    (default: './data/rt-polaritydata/rt-polarity.pos')

Try --helpfull to get a list of all flags.

小小的鸡冻,漂亮滴打出了help内容 😄
略微看下,然。。。看不懂,先train一下试试吧
./train.py
顺利运行,一个字“稳”

Evaluation:
2019-03-11T16:54:25.745084: step 10300, loss 3.65937, acc 0.711069

Saved model checkpoint to /tmp/runs/1552322335/checkpoints/model-10300

2019-03-11T16:54:26.545635: step 10301, loss 0.000296011, acc 1
2019-03-11T16:54:26.776015: step 10302, loss 5.95611e-05, acc 1
2019-03-11T16:54:26.850238: step 10303, loss 0.00182802, acc 1
2019-03-11T16:54:26.932814: step 10304, loss 1.41934e-05, acc 1
2019-03-11T16:54:27.001730: step 10305, loss 0.00106164, acc 1
2019-03-11T16:54:27.072143: step 10306, loss 0.00159799, acc 1
2019-03-11T16:54:27.141833: step 10307, loss 0.000124719, acc 1
2019-03-11T16:54:27.216436: step 10308, loss 3.5929e-06, acc 1
2019-03-11T16:54:27.289951: step 10309, loss 1.2785e-05, acc 1
2019-03-11T16:54:27.362660: step 10310, loss 0.00844685, acc 1
2019-03-11T16:54:27.430708: step 10311, loss 0.000167686, acc 1
2019-03-11T16:54:27.502281: step 10312, loss 0.000110473, acc 1
2019-03-11T16:54:27.572092: step 10313, loss 0.000771175, acc 1
2019-03-11T16:54:27.644654: step 10314, loss 3.88898e-06, acc 1
2019-03-11T16:54:27.714136: step 10315, loss 0.000124581, acc 1
2019-03-11T16:54:27.781347: step 10316, loss 5.31748e-06, acc 1
2019-03-11T16:54:27.855259: step 10317, loss 0.000178186, acc 1
2019-03-11T16:54:27.925776: step 10318, loss 1.3183e-05, acc 1
2019-03-11T16:54:28.001957: step 10319, loss 0.000173645, acc 1

Step 4 阅读大大神之作

《Convolutional Neural Networks for Sentence Classification》
Author:
Yoon Kim
New York University

https://arxiv.org/pdf/1408.5882.pdf

《A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional
Neural Networks for Sentence Classification》
Author:
Ye Zhang
Dept. of Computer Science
University of Texas at Austin

Byron C. Wallace
iSchool
University of Texas at Austin

https://arxiv.org/pdf/1510.03820.pdf

我们回头见!

上一篇下一篇

猜你喜欢

热点阅读