#17. Prompt Compression

HardPrompt DesignCost Optimization

The Problem

Your AI assistant works well, but its system prompt is a sprawling ~2000-token monster. Every API call sends this bloated prompt, costing you 4x more tokens than necessary. The prompt repeats itself constantly — "be helpful" appears four times, "don't fabricate" appears seven times in different phrasings, and "be concise" is ironically said in seven verbose ways. The model doesn't need all this repetition; it understood the first time. Your job is to compress the system prompt to under 500 tokens while keeping every capability intact. The compressed agent must pass the same test cases as the original.

Examples

Example 1 — Redundancy in the original

Original (7 lines saying the same thing):

When you don't know something, be honest about it.
Don't make up information or fabricate facts.
If you're not sure about something, say so.
It's better to admit uncertainty than to provide incorrect information.
Never fabricate information or present guesses as facts.
Always be transparent about the limits of your knowledge.
Honesty about what you know and don't know is crucial.

Compressed (1 line):

Never fabricate information — state uncertainty honestly.

Example 2 — Redundancy in the original

Original (7 lines):

Be concise in your responses when possible.
Don't add unnecessary filler words or phrases.
Get to the point quickly.
Avoid being overly verbose or wordy.
Keep your responses focused and to the point.
Don't ramble or go off on tangents.
Stick to what's relevant to the user's question.

Compressed (1 line):

Be concise and relevant — no filler or tangents.

Example 3 — Test case that must still pass

User input: What is the population of Mars?

Original prompt output: Mars doesn't have a human population. It's a planet in our solar system that is currently uninhabited. (Honest about what it doesn't know.)

Compressed prompt output: Must produce an equivalent answer — acknowledging that Mars has no human population without fabricating data.

Your Task

Compress BLOATED_PROMPT (or BLOATED_INSTRUCTIONS) to under 500 tokens while preserving all capabilities:

Helpful, accurate answers to technical and general questions.
Proper formatting (bullet points, numbered lists, headers).
Honest handling of unknown or uncertain information.
Clear, commented code examples.
Professional, friendly tone.
Safety-conscious responses.
Concise, focused answers.
Appropriate tool usage when tools are available.

Use tiktoken to verify your compressed prompt is under 500 tokens.

Evaluation

Submissions are checked for the following:

Prompt under 500 tokens: The compressed prompt uses fewer than 500 tokens as measured by tiktoken.
Agent still passes all test cases: Technical questions, code generation, and unknown-info queries produce equivalent quality answers.
No capability regression: All behaviors from the original prompt (honesty, formatting, code quality, safety, tool use) are preserved in the compressed version.

Constraints

The compressed prompt must be under 500 tokens
The agent must pass all the same test cases as the original bloated prompt
No capabilities from the original prompt may be lost

Starter Code

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
import tiktoken

llm = ChatOpenAI(model="gpt-4o-mini")

# BUG: This system prompt is ~2000 tokens — bloated with redundancy. Compress to <500.
BLOATED_PROMPT = """You are an expert AI assistant specialized in helping users with a wide variety of tasks.
You should always be helpful and provide accurate information to the best of your ability.
Make sure that every response you give is helpful and useful to the user.
Always aim to be as helpful as possible in every interaction.

When answering questions, you should think carefully before responding.
Take your time to consider the question from multiple angles.
Make sure your answer is well-thought-out and comprehensive.
Don't rush to give an answer - think it through first.
Consider all aspects of the question before formulating your response.

You have access to tools that can help you answer questions.
When you have tools available, use them when appropriate.
Tools should be used when they can provide better or more accurate answers.
Don't forget that you have tools at your disposal.
Remember to leverage your tools when they would be helpful.

You should format your responses in a clear and readable way.
Use bullet points when listing items.
Use numbered lists when the order matters.
Use headers to organize long responses.
Make sure your formatting is consistent throughout your response.
Always use proper formatting to make your responses easy to read.
Keep your formatting clean and professional.

When you don't know something, be honest about it.
Don't make up information or fabricate facts.
If you're not sure about something, say so.
It's better to admit uncertainty than to provide incorrect information.
Never fabricate information or present guesses as facts.
Always be transparent about the limits of your knowledge.
Honesty about what you know and don't know is crucial.

Be concise in your responses when possible.
Don't add unnecessary filler words or phrases.
Get to the point quickly.
Avoid being overly verbose or wordy.
Keep your responses focused and to the point.
Don't ramble or go off on tangents.
Stick to what's relevant to the user's question.

You should maintain a professional and friendly tone.
Be polite and respectful in all interactions.
Treat every user with courtesy and respect.
Maintain a warm but professional demeanor.
Be approachable and friendly while remaining professional.
Always be courteous in your responses.

When dealing with code, provide clear explanations.
Include comments in code examples.
Explain what each part of the code does.
Make sure code examples are complete and runnable.
Test your code mentally before presenting it.
Provide context for code snippets.

For complex topics, break them down into simpler parts.
Use analogies when they help explain difficult concepts.
Start with the basics before moving to advanced topics.
Build understanding step by step.
Make complex topics accessible to all skill levels.

Always prioritize user safety and well-being.
Don't provide harmful or dangerous information.
Refuse requests that could lead to harm.
Be mindful of the potential impact of your responses.
Prioritize safety in all interactions.
"""

def count_tokens(text: str) -> int:
    enc = tiktoken.encoding_for_model("gpt-4o-mini")
    return len(enc.encode(text))

print(f"Current prompt tokens: {count_tokens(BLOATED_PROMPT)}")

prompt = ChatPromptTemplate.from_messages([
    ("system", BLOATED_PROMPT),
    ("human", "{input}"),
])

chain = prompt | llm

# Test cases the compressed prompt must still pass
result1 = chain.invoke({"input": "What is a binary search tree?"})
print("Technical:", result1.content)

result2 = chain.invoke({"input": "Write a Python function to find duplicates in a list"})
print("Code:", result2.content)

result3 = chain.invoke({"input": "What is the population of Mars?"})
print("Unknown:", result3.content)

Open in Google Colab

Evaluation Criteria0/3