Agent Foundry
All Problems

#84. Competitive Agent Selection

HardMulti-AgentEvaluation

The Problem

You have a single agent that solves coding challenges. Its output quality varies from run to run, and there is no way to know if a particular solution is good without manual review. Your task is to implement a competitive selection pattern: three solver agents independently tackle the same problem, and a separate judge agent evaluates all three solutions and selects the best one with a clear rationale.

Examples

Example 1

User input: Find the longest palindromic substring in a given string.

Current (bad) output: A single solution with no comparison — it might be brute-force O(n³) when a better approach exists.

Expected (good) output: Three agents produce different solutions (e.g., brute force, dynamic programming, expand-around-center). The judge evaluates correctness, time complexity, and code clarity, then selects the best solution with an explanation like: "Solution B (expand-around-center) is preferred for its O(n²) time complexity and readable implementation."

Example 2

User input: Implement a thread-safe LRU cache.

Current (bad) output: One solution that may or may not handle concurrency correctly.

Expected (good) output: Three competing implementations. The judge identifies which one correctly handles thread safety, has the best API design, and is most performant, then selects the winner.

Your Task

Refactor the starter code so that:

  • Three solver agents with different approaches independently solve the same problem.
  • A judge agent (separate from the solvers) evaluates all three solutions.
  • The judge selects the best solution and provides a rationale for its choice.
  • The final output is the winning solution with the judge's explanation.

Evaluation

Submissions are checked for the following:

  • Multiple solver agents: At least three agents independently solve the same problem.
  • Judge agent evaluates solutions: A separate judge agent reviews all solutions and selects the best one.
  • Judge provides rationale: The judge explains why the selected solution is the best.
  • Best solution is selected: The final output is the solution the judge deemed best, not an arbitrary pick.

Constraints

  • At least three agents must independently solve the same problem
  • A separate judge agent must evaluate all solutions and pick the best
  • The judge must not be one of the competing agents
  • The judge must provide a rationale for its selection
Starter Code
from crewai import Agent, Task, Crew, Process
from crewai import LLM

llm = LLM(model="gpt-4o-mini")

# BUG: Only one agent solves the problem — no competition or comparison
# TODO: Add two more solver agents and a judge to pick the best solution
solver = Agent(
    role="Problem Solver",
    goal="Solve coding challenges with clean, efficient solutions",
    backstory="You are an experienced software engineer.",
    llm=llm,
)

solve_task = Task(
    description="Write a function to find the longest palindromic substring in a given string. Problem: {problem}",
    expected_output="A working Python function with explanation",
    agent=solver,
)

crew = Crew(
    agents=[solver],
    tasks=[solve_task],
    process=Process.sequential,
)

result = crew.kickoff(inputs={"problem": "Find the longest palindromic substring in a given string"})
print(result)
Open in Google Colab
Evaluation Criteria0/4