Self-Refine: Iterative Refinement with Self-Feedback

With Self-Refine, LLMs can generate feedback on their work, use it to improve the output, and repeat this process.

Website | Paper

Table of Contents

Updates

Nov 2023 : Added visual self-refine examples and colabs . Use GPT4-V to write tikz code for diagrams, and improve them iteratively.

Stokes' Theorem Example

Unicorn Example

Setup

We use prompt-lib for querying LLMs. You can install prompt-lib by running the following commands:

git clone https://github.com/reasoning-machines/prompt-lib
pip install prompt-lib/

Depending on your default settings, you may want to use the following to set a PYTHONPATH:

export
 PYTHONPATH=
"
.:../:.:src:../:../../:.:prompt-lib
"

Getting Started with Acronym Generation

python -u src/acronym/run.py 
"
Using language models of code for few-shot commonsense
"

Since we sample with temperature 0.7, you'll get different results each time you run the above command.

0
 INIT
>
 Using language models of code for few
-
shot commonsense

0
 GEN
>
 CLoCK
0
 SCORES
>
 *
 Ease of pronunciation
:
 CLoCK is 
pronounced
 "
clah-k
"
.
 This is 
easy
 acronym to pronounce
.
 4
/
5
 
*
 Ease of spelling
:
 CLoCK is 
easy
 to spell
.
 5
/
5
 
*
 Relation to title
:
 CLoCK stands for 
"
Code Language Models of Commonsense Knowledge
"
 which is 
related
 to the title
.
 5
/
5
 
*
 Positive connotation
:
 CLoCK is 
a
 positive acronym
.
 It
 implies accuracy and efficiency
.
 5
/
5
 
*
 Well
-
known
:
 CLoCK is 
not
 a well
-
known acronym
.
 2
/
5


*
 Total score
:
 21
/
25


1
 GEN
>
 CMNSC
1
 SCORES
>
 *
 Ease of pronunciation
:
 CMNSC is 
pronounced
 "
see-min-es-see
"
.
 This is 
a
 difficult acronym to pronounce
.
 1
/
5
 
*
 Ease of spelling
:
 CMNSC is 
difficult
 to spell
.
 1
/
5
 
*
 Relation to title
:
 CMNSC stands for 
"
Commonsense
"
 which is 
related
 to the title
.
 4
/
5
 
*
 Positive connotation
:
 CMNSC is 
a
 neutral acronym
.
 3
/
5
 
*
 Well
-
known
:
 CMNSC is 
not
 a well
-
known acronym
.
 1
/
5


*
 Total score
:
 10
/
25


2
 GEN
>
 COMMIT
2
 SCORES
>
 *
 Ease of pronunciation
:
 COMMIT is 
pronounced
 "
com-mit
"
.
 This is 
an
 easy acronym to pronounce
.
 4
/
5

*
 Ease of spelling
:
 COMMIT is 
easy
 to spell
.
 5
/
5

*
 Relation to title
:
 COMMIT stands for 
"
Commonsense Model of Code for Intuitive Training
"
 which is 
related
 to the title
.
 5
/
5
 
*
 Positive connotation
:
 COMMIT is 
a
 positive acronym
.
 It
 implies commitment and dedication
.
 5
/
5

*
 Well
-
known
:
 COMMIT is 
not
 a well
-
known acronym
.
 2
/
5


*
 Total score
:
 21
/
25

Dialogue Response Generation

PYTHONPATH=
"
.
"
 python -u src/responsegen/run.py --output 
<
OUTPUT FILE
>
 --size 
<
DATA SIZE
>

Use size 0 for running on all test instances

Code Readability Improvement

Note: Please unzip 'data/tasks/codeclean/code_readability/codenet-python-train.jsonl.zip' before running the following commands!

Running:

PYTHONPATH=
"
.
"
 python -u src/readability/readability.py --output 
<
OUTPUT FILE
>

Evaluation:

PYTHONPATH=
"
.
"
 python -u src/readability/{count_comment
|
count_function
|
count_meaningful_var}.py --file 
<
INPUT FILE
>

Commongen

We use a hard version of commongen. The data is located in data/prompt/commongen. You can download the data by running the following commands:

python -u src/commongen/run.py cmd stair bubble team dryer puppy aliens cat

GSM-8k

To run the GSM-8k task:

python -u src/gsm/run.py

The outputs will be saved in data/tasks/gsm/gsm_outputs.jsonl
To evaluate the outputs:

python src/gsm/gsm_selfref_eval.py --path  data/tasks/gsm/gsm_outputs.jsonl

The evaluation script will also generate a report ( data/tasks/gsm/gsm_outputs.jsonl.reports.txt) showing examples of wrong generations, feedback, and refined feedback generations.

Yelp

To run the Yelp task:

python -u src/sentiment_transfer_sr/run.py data/tasks/yelp/yelp-extreme.jso
nl 4 none

The outputs will be saved in data/tasks/yelp/

PIE

To run the PIE task:

python -u src/pie/run.py --slow_programs_file data/tasks/pie/codenet-python-test-1k.jsonl --max_attempts 4 --outfile data/tasks/pie/output --feedback_type rich

For evaluation details, please see docs/pie_eval.md .

General setup

Each task has three different types of prompts:

Init: used to initialize the task. This is how the initial output is generated.
Feedback: used to get feedback from the model on the intermediate results.
Iterate: used to get the next iteration from the model, based on the feedback.

Every task has a run.py that initializes the prompts and runs the task.
As an example, the prompts for commongen are as follows:

Init prompt:

python src/commongen/task_init.py

Feedback prompt:

 python src/commongen/feedback.py

Iterate prompt:

python src/commongen/task_iterate.py

You can also see these prompts on our website .

Citation

@misc{madaan2023selfrefine,
      title
=
{Self
-
Refine: Iterative Refinement with Self
-
Feedback}, 
      author
=
{Aman Madaan 
and
 Niket Tandon 
and
 Prakhar Gupta 
and
 Skyler Hallinan 
and
 Luyu Gao 
and
 Sarah Wiegreffe 
and
 Uri Alon 
and
 Nouha Dziri 
and
 Shrimai Prabhumoye 
and
 Yiming Yang 
and
 Sean Welleck 
and
 Bodhisattwa Prasad Majumder 
and
 Shashank Gupta 
and
 Amir Yazdanbakhsh 
and
 Peter Clark},
      year
=
{
2023
},
      eprint
=
{
2303
.
17651
},
      archivePrefix
=
{arXiv},
      primaryClass
=
{
cs
.
CL
}
}

flowchart LR
    Generator -->|Initializes| Unrefined
    Critic_1 --> Critique_fb
    ... --> Critique_fb
    Critic_k --> Critique_fb
    Critique_fb --> Unrefined{Output to Refine}
    Unrefined --> Refiner
    Refiner --> |R: y_t, x, fb| Refined_Output{Refined Output}
    Refined_Output --> |Stopping Criteria Not Met| Unrefined

Name		Name	Last commit message	Last commit date
Latest commit History 120 Commits
.github/ workflows		.github/ workflows
colabs		colabs
data		data
docs		docs
src		src
.gitignore		.gitignore
CITATION.bib		CITATION.bib
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/ workflows

.github/ workflows

colabs

colabs

data

data

docs

docs

src

src

.gitignore

.gitignore

CITATION.bib

CITATION.bib

LICENSE

LICENSE

README.md

README.md

Repository files navigation

Self-Refine: Iterative Refinement with Self-Feedback

Website | Paper

Updates

Setup

Getting Started with Acronym Generation

Dialogue Response Generation

Code Readability Improvement

Commongen

GSM-8k

Yelp

PIE

General setup

Citation

About

Releases

Packages

Contributors 4

Languages

License

madaan/self-refine

Folders and files

Latest commit

History

Repository files navigation

Self-Refine: Iterative Refinement with Self-Feedback

Website | Paper

Updates

Setup

Getting Started with Acronym Generation

Dialogue Response Generation

Code Readability Improvement

Commongen

GSM-8k

Yelp

PIE

General setup

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Languages