RL4LM笔记

2023-01-10  本文已影响0人  臻甄

仓库链接:https://github.com/allenai/RL4LMs
论文链接:https://arxiv.org/abs/2210.01241
网站链接:https://rl4lms.apps.allenai.org/

论文:Is Reindorcement Learning (Not) for Natural Language Processing?: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization.

摘要

RL4LM:Reinforcement Learning for Language Model

git clone https://github.com/allenai/RL4LMs.git
cd RL4LMs
pip install -e .
python scripts/training/train_text_generation.py --config_path scripts/training/task_configs/summarization/t5_ppo.yml

GRUE:General Reinforcement Understanding Evaluation

NLPO:Natural Language Policy Optimization

附录(指标)

词法指标(例如:ROUGE、BLEU、SacreBLEU、METEOR)

语义度量(例如:BERTSCORE、BLEURT)

任务特定指标(例如:PARENT、CIDER、SPICE)

来自预训练分类器的分数

一些经验信息

上一篇 下一篇

猜你喜欢

热点阅读