Text Generation
Transformers
PyTorch
code
gpt2
custom_code
Eval Results
text-generation-inference
Inference Endpoints

Prompting to reproduce MBPP test results

#25
by juliuscheng - opened

Hi, I'm trying to reproduce SantaCoder test results on MBPP from the paper, and I'm wondering what is the recommended way to prompt the model.

MBPP provides text instructions, e.g. "Write a function to reverse words in a given string.", which the SantaCoder model card explicitly advises against using. Nevertheless, I try to prompt the model in one of two ways (in Python):

  1. Function signature, followed by docstring:
def reverse_words(s):
    """Write a function to reverse words in a given string."""
  1. Comment, followed by function signature
# Write a function to reverse words in a given string.
def reverse_words(s):

In both cases I get reasonable output, except that after defining the function, generation repeats until max_length without terminating in the following manner:

def reverse_words(s):
    """Write a function to reverse words in a given string."""
    return''.join(s.split()[::-1])


def reverse_words_2(s):
    """Write a function to reverse words in a given string."""
    return''.join(s.split()[::-1])


def reverse_words_3(s):
    """Write a function to reverse words in a given string."""
    return''.join(s.split()[::-1])

Should I change the prompting method, or is this output acceptable and I should just truncate the output manually? I am trying to reproduce the eval results from the paper as closely as possible. Thanks for your help.

BigCode org

Hi we evaluated using the MultiPL-E version of MBPP which already implements functions signatures, so evaluation is very similar to Human-Eval

Thank you! And regarding the other part of my question, generation with greedy search or sampling with temperature=0.2 does not terminate in the way shown above. Should I manually truncate the output?

BigCode org

How are you doing the generations? If you use model.generate() it should stop at eos token if comes up, if it doesn't come up often you can add a stopping criteria like it's done here. Note that then you need to post-process the output to only keep the first function like it's done here. You can also find more examples in our evaluation harness

This answers my question, thank you for the great and prompt responses!

juliuscheng changed discussion status to closed

Sign up or log in to comment