Toolkits 综合NLP工具包
by 清华 (C++/Java/Python)
by 中科院 (Java)
by 哈工大 (C++)
by 复旦 (Java)
by Boson (商业API服务)
(Python) Python library for processing Chinese text
(Python) 纯python编写的中文自然语言处理包,取名于“牙牙学语”
(Python) Deep Learning NLP Pipeline implemented on Tensorflow with pretrained Chinese models.
(C++ & Python) Chinese Natural Language Processing tools and examples
(Python) Annotator for Chinese Text Corpus 中文文本标注工具
Popular NLP Toolkits for English/Multi-Language 常用的英文或支持多语言的NLP工具包
by Stanford (Java)
(Python) Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora.
Chinese Word Segment 中文分词
(Python) 做最好的 Python 中文分词组件
(Python) Iterated Dilated Convolutions for Chinese Word Segmentation
(Python) Genius是一个开源的python中文分词组件,采用 CRF(Conditional Random Field)条件随机场算法。
Information Extraction 信息提取
(C++) library and tools for information extraction
(Haskell) Language, engine, and tooling for expressing, testing, and evaluating composable language rules on input strings.
(Python) IEPY is an open source tool for Information Extraction focused on Relation Extraction.
Snorkel: A training data creation and management system focused on information extraction
Neural Relation Extraction implemented with LSTM in TensorFlow
Chinese Named Entity Recognition with IDCNN/biLSTM+CRF, and Relation Extraction with biGRU+2ATT 中文实体识别与关系提取
QA & Chatbot 问答和聊天机器人
(Python) turn natural language into structured data
(Python) machine learning based dialogue engine for conversational software
(Python) ChatterBot is a machine learning, conversational dialog engine for creating chat bots.
(Python) 基於向量匹配的情境式聊天機器人
(PHP) 一款开放源码的PHP问答系统,基于Laravel框架开发,容易扩展,具有强大的负载能力和稳定性。
(Java) 一个Java实现的人机问答系统,能够自动分析问题并给出候选答案。
使用TensorFlow实现的Sequence to Sequence的聊天机器人模型
Corpus 中文语料
密码neqs 出处应该是梁斌penny大神
(for training spaCy POS)
A Chinese sentiment dataset may be useful for sentiment analysis.
Chinese Emergency Corpus
chinese conversation corpus
Datasets for Training Chatbot System
[52nlp介绍Blog] OpenData in insurance area for Machine Learning Tasks
唐宋两朝近一万四千古诗人, 接近5.5万首唐诗加26万宋诗. 两宋时期1564位词人,21050首词。
Organizations 相关中文NLP组织和会议
Main conferences, journals, workshops and shared tasks in NLP community.
Learning Materials 学习资料
Stanford CS224n Natural Language Processing with Deep Learning 2017
Speech and Language Processing
by Dan Jurafsky and James H. Martin