How to Infer

🫖 Prepare the inference

Inferring is asking your model to "guess" or "generate" something new based on a prompt.

Reupload your files

⚠️ ONLY IF YOU RESTARTED YOUR SESSION ⚠️

ONLY IF you exited your previous session, you must reupload the model and reinstall the packages. You can do so by playing the hidden cells below.

If you are still in the same session, and your distilgpt2-finetuned folder is still there, then you can skip this part.

Hidden cell

## --- ONLY RUN THIS IF YOU RESTARTED YOUR SESSION ---

## UPLOAD YOUR DISTILGPT2-FINETUNED.ZIP
model_zip = files.upload()

## then unzip (change "distilgpt2-finetuned" to your model name if it's different)
!unzip distilgpt2-finetuned.zip

## reinstall transformers and torch
!pip install transformers[torch]

## redeclare model_path, tokenizer, and model data
model_path = "./distilgpt2-finetuned" ## change name if different
tokenizer = AutoTokenizer.from_pretrained(model_path)
model = AutoModelForCausalLM.from_pretrained(model_path)

Prepare your prompt

Remember when all your dataset started with I heard that...? Now we use this as a sentence starter, so that our model will more probably imitate our dataset.

Also, the model needs numbers, not words, so we need to tokenize the prompt.

This process will create input_ids: our text turned into numbers; and an attention_mask: a map indicating what is text and what is padding.

prompt = "I heard that"

inputs = tokenizer(prompt, return_tensors="pt", padding=True)
input_ids = inputs["input_ids"]
attention_mask = inputs["attention_mask"]

✒️ Generate Output!

Set the instructions for your generation model.

max_length → max amount of words generated. Keep between 25-50

temperature → the highest the value the more random the output. Keep between 0.7 and 0.9

num_return_sequences → total number of generations. Make more if you want to produce more outputs at once.

Decode your result from numbers to words using tokenizer.decode(), then write it below using print().

output = model.generate(
    input_ids,
    attention_mask=attention_mask,
    max_length=25,
    do_sample=True,
    top_p=0.95,
    temperature=0.8,
    num_return_sequences=1,
    pad_token_id=tokenizer.eos_token_id
)

generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

If you made it this far, congratulations!! Now make your model speak. Check out❓How to TTS