Prompt Learning Intro

(TODO)

Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

http://pretrain.nlpedia.ai/

1. bigger model, harder fine-tuning

T5 11b para

GPT-3 175b para

The process of fine-tuning is to train a new model from pre-trained models while using domain-specific data. It can be hard and expensive when the model is huge. How to prevent the direct process of training the whole model?

2. Intro of prompt learning

Prompt learning bridges the gap between pre-training and fine-tuning. It is like a hint to the model.

1捕获.PNG

2捕获.PNG

Add additional context (template) with a [MASK] position
Project labels to label words (verbalizer)

In this way, people solve the problem mentioned above and are only needed to design template and verbalizer to actually use PLM in practice.

Three key elements of prompt learning:

Pre-trained Model
- Auto-regressive (GPT, GPT-2, GPT-3, OPT…)
- Masked Language Modeling (BERT, RoBERTa, DeBERTa…)
- Encoder-Decoder (T5, BART…)
Template
- Manually Design
- Auto Generation
- Textual or Continuous…
Verbalizer
- Manually Design
- Expanding by external knowledge…

3. Pre-trained Model

Auto-regressive

GPT, GPT-2, GPT-3, OPT…

suitable for super-large pretrained models (almost all the models with over 100b para)
Auto-regressive Prompt
[MASK] is at the end of the sentence.
good at generating

Masked Language Modeling

BERT, RoBERTa, DeBERTa…

suitable for natural language understanding(NLU)
Cloze-style Prompt
the position of [MASK] is arbitrary
good at understanding

Encoder-Decoder

T5, BART…

Bidirectional attention for encoder
Auto-regressive for decoder
good at generating and understanding

4. Template

Manully Design based no the characteristics of the task
Auto Generation with search or optimization
Textual or Continuous
Structured, incorporating with rules

Templates have various forms:

TL;DR too long don’t read

Examples

Ensembling Templates

Use multi prompts and ensemble the results to find a better answer.

Some strategies:

Automatic Search

Trigger Token

训练找到template中一定位置的embedding，和trigger token比较，找到最接近的填入，生成template，生成的template是没有意义的话，但是效果很好。

是否存在最优的prompt？

Encoder-Decoder

Use a Encoder-Decoder model (e.g. T5) to generate template

The generated template conforms to human language logics.

Optimization of Continuous Prompts

P-tuning

新建一些embedding，用已有的word embedding初始化，并进行训练，使其含义模糊