Generating Structured Output with LLMs (Part 1)
Large Language Models (LLMs) excel at generating human-like text, but what if you need structured output like JSON, XML, HTML, or Markdown? Structured text is essential because computers can efficiently parse and utilize it. Fortunately, LLMs can generate structured output out-of-the-box, thanks to the vast amount of structured data in their training sets.
Prompting for Structured Output
Here’s how I used the gpt-3.5-turbo model with a tailored system prompt to produce structured output:
SYSTEM_PROMPT = "You are a Programmer. Whenever asked a question, return the answer in the requested programming language or markup language. Your output will be directly used as input by other downstream computer programs. So no string delimiters wrapping it, no yapping, no markdown, no fenced code blocks."
Example Generations
>> print(generate_text("Hello in html with h1 tag"))
# Output: <h1>Hello</h1>
>> print(generate_text("Hello in markdown with h1 tag"))
# Output: # Hello
>> print(generate_text("Hello in json"))
# Output: {"message": "Hello"}
>> print(generate_text("print Hello in SQL"))
# Output: SELECT 'Hello';
>> print(generate_text("write code to print Hello in Python"))
# Output: print("Hello")
Instructions Aren’t Enough!
Even though LLMs are great at following instructions, they sometimes generate incorrect tokens. While natural languages can handle small mistakes, structured languages can’t. One wrong comma can break your JSON object. This is where Finite State Machines (FSM) and Grammars come into play.
For decades, we’ve used these concepts in computer science. Your IDE or compiler uses them to catch syntax errors. We can leverage the same principles to ensure the LLM-generated output adheres to the correct grammar.
Here are the important notes about JSON mode from OpenAI
- When using JSON mode, always instruct the model to produce JSON via some message in the conversation, for example via your system message. If you don’t include an explicit instruction to generate JSON, the model may generate an unending stream of whitespace and the request may run continually until it reaches the token limit. To help ensure you don’t forget, the API will throw an error if the string
"JSON"
does not appear somewhere in the context.- The JSON in the message the model returns may be partial (i.e. cut off) if
finish_reason
is length, which indicates the generation exceededmax_tokens
or the conversation exceeded the token limit. To guard against this, checkfinish_reason
before parsing the response.- JSON mode will not guarantee the output matches any specific schema, only that it is valid and parses without errors.
Do the first two points make sense now? If we don’t instruct the model, it will generate human-like text, which will often be rejected because it doesn’t conform to JSON grammar.
Note: JSON is a widely used format, and all major LLM vendors and runtimes support it. While other formats are not supported out of the box, you can use packages like outline or lm-format-enforcer to enforce grammar.
Specifying Schema
Now, let’s talk about the third point in the OpenAI JSON Mode notes. In most cases, you want your JSON object to have a certain schema. LLM + Grammar can reliably generate a valid JSON string but can’t enforce the schema for you. The solution to this is twofold:
- Schema Definition: Make sure you provide the required schema definition to the LLM during prompting. The model will try its best to stick to the provided schema.
- Schema Validation: Next, you need a way to validate the JSON object against the schema.
Pydantic is the most popular package in Python for schema validation. It can help you with both 1 and 2. Once you define your Pydantic model, you can call the model_json_schema()
method to get the JSON schema of the model. Here is the code:
import json
from pydantic import BaseModel
# Model definition
class Person(BaseModel):
name: str
age: int
occupation: str
SYSTEM_PROMPT = """You will be given a name of a celebrity. Please return the response in the following JSON format:
{schema}
Remember, the response will be fed directly to the next downstream program. So no string delimiters wrapping it, no yapping, no markdown, no fenced code blocks.
"""
# Injecting schema definition
SYSTEM_PROMPT = SYSTEM_PROMPT.format(
schema=json.dumps(Person.model_json_schema(), indent=2)
)
# Calling LLM
response = generate_text("write code to print Hello in Python")
# Validating `response` string against Model Schema
response_json = json.loads(response)
person = Person(**response_json)
print(person)
As you can see, this approach, though effective, involves repetitive steps and is prone to human error. Hence, I personally like to use instructor
when working with LLM to generate JSON output. The instructor
package streamlines this process. It patches client packages of major LLM vendors to automatically handle schema injection and validation. It’s a very light wrapper and does the following:
- Updates the system prompt to produce JSON output and also adds the JSON schema for your Pydantic model.
- Turns on JSON mode.
- Validates the JSON output to make sure it follows the specified JSON schema.
This drastically reduces boilerplate code and minimizes the risk of errors. You just have to patch your client
object and pass response_model
during inference. Here is the updated code after introducing instructor
:
import instructor
# Patching
client = instructor.from_openai(openai_client)
SYSTEM_PROMPT = """You will be given a name of a celebrity, return me the details."""
response = instructor_client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=Person, # Extra argument to specify the JSON Schema
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": PROMPT},
]
)
type(response) # Output: Person
If it helps, here is the code snippet for System prompt and the code snippet for ingestion logic to see how
instructor
injects the JSON schema into the system prompt.
Summary
To summarize, here are the steps to ensure LLMs generate valid structured output:
- Explicit Instruction: Instruct your LLM to generate structured output.
- Grammar Enforcement: Use grammar to validate the generated text. JSON mode is widely supported and robust, but other formats may need tools like outline or lm-format-enforcer.
- Schema Specification: Provide the schema during prompting and validate the output against it. Tools like Pydantic and instructor simplify this process.
Resources
- JSON mode in Ollama and vLLM (search for
guided-decoding-backend
option). - Have not tired it, but looks interesting, Super-json-mode.
- Read Efficient Guided Generation for Large Language Models by the authors of the
outline
package. - Grammar support in Llama.cpp.
Now, go forth and generate structured outputs like a pro! Happy coding!