Prompt
Prompt Learning Intro
(TODO)
1. bigger model, harder fine-tuning
T5 11b para
GPT-3 175b para
The process of fine-tuning is to train a new model from pre-trained models while using domain-specific data. It can be hard and expensive when the model is huge. How to prevent the direct process of training the whole model?
2. Intro of prompt learning
Prompt learning bridges the gap between pre-training and fine-tuning. It is like a hint to the model.
- Add additional context (template) with a [MASK] position
- Project labels to label words (verbalizer)
In this way, people solve the problem mentioned above and are only needed to design template and verbalizer to actually use PLM in practice.
Three key elements of prompt learning:
- Pre-trained Model
- Auto-regressive (GPT, GPT-2, GPT-3, OPT…)
- Masked Language Modeling (BERT, RoBERTa, DeBERTa…)
- Encoder-Decoder (T5, BART…)
- Template
- Manually Design
- Auto Generation
- Textual or Continuous…
- Verbalizer
- Manually Design
- Expanding by external knowledge…
3. Pre-trained Model
Auto-regressive
GPT, GPT-2, GPT-3, OPT…
- suitable for super-large pretrained models (almost all the models with over 100b para)
- Auto-regressive Prompt
- [MASK] is at the end of the sentence.
- good at generating
Masked Language Modeling
BERT, RoBERTa, DeBERTa…
- suitable for natural language understanding(NLU)
- Cloze-style Prompt
- the position of [MASK] is arbitrary
- good at understanding
Encoder-Decoder
T5, BART…
- Bidirectional attention for encoder
- Auto-regressive for decoder
- good at generating and understanding
4. Template
- Manully Design based no the characteristics of the task
- Auto Generation with search or optimization
- Textual or Continuous
- Structured, incorporating with rules
Templates have various forms:
TL;DR too long don’t read
Examples
Ensembling Templates
Use multi prompts and ensemble the results to find a better answer.
Some strategies:
Automatic Search
- Trigger Token
训练找到template中一定位置的embedding,和trigger token比较,找到最接近的填入,生成template,生成的template是没有意义的话,但是效果很好。
是否存在最优的prompt?
- Encoder-Decoder
Use a Encoder-Decoder model (e.g. T5) to generate template
The generated template conforms to human language logics.
Optimization of Continuous Prompts
P-tuning
新建一些embedding,用已有的word embedding初始化,并进行训练,使其含义模糊