Agent Foundry
All Problems

#63. Output Length Limiter

EasyGuardrailsCost Optimization

The Problem

Your assistant agent was built to give "thorough, detailed answers" — and it took that instruction to heart. Ask it a simple question and it returns a 2,000-word essay, burning through tokens and overwhelming users who just wanted a quick answer. The LLM is capable of being concise; the problem is that nothing constrains its output length. Your job is to add a length limit so the agent's responses stay within a reasonable word or character count while still being helpful.

Examples

Example 1

User input: Explain how neural networks work

Current (bad) output: A 2,000-word essay covering every aspect of neural networks from perceptrons to transformers, with detailed mathematical notation and historical context.

Expected (good) output: A concise 150–200 word summary that covers the key concepts (layers, weights, activation functions, backpropagation) without going overboard.

Example 2

User input: What is Python?

Current (bad) output: A 1,500-word treatise on Python's history, design philosophy, syntax, standard library, ecosystem, and comparison with other languages.

Expected (good) output: A brief 100–150 word answer explaining that Python is a high-level programming language known for readability and versatility.

Example 3

User input: How do I reverse a list in Python?

Current (bad) output: An 800-word explanation covering five different methods, with background on list data structures and Big-O analysis.

Expected (good) output: A quick answer with one or two methods (list.reverse(), list[::-1]) in under 100 words.

Your Task

Add output length constraints so the agent:

  • Keeps responses within a reasonable limit (e.g. 200 words or 800 characters).
  • Truncates gracefully at sentence boundaries if a hard limit is needed.
  • Still provides a useful, complete answer within the shorter format.
  • Reduces token usage compared to the unconstrained version.

Evaluation

Submissions are checked for the following:

  • Output is within length limit: The response stays within the configured word or character cap.
  • No mid-sentence cutoff: Truncation, if needed, ends cleanly at a sentence boundary.
  • Response is still useful: The shorter answer still meaningfully addresses the user's question.
  • Token usage reduced: The output uses significantly fewer tokens than the unconstrained version.

Constraints

  • You must cap the agent's output to a reasonable length (e.g. 200 words or 800 characters)
  • The truncation must not cut mid-sentence if possible
  • The agent should still provide a complete, useful answer within the limit
Starter Code
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate

llm = ChatOpenAI(model="gpt-4o-mini")

# BUG: No length constraints — the agent writes 2000-word essays
prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Provide thorough, detailed answers."),
    ("human", "{input}"),
])

chain = prompt | llm

result = chain.invoke({"input": "Explain how neural networks work"})
print(result.content)
print(f"\n--- Output length: {len(result.content.split())} words ---")
Open in Google Colab
Evaluation Criteria0/4