Map-Reduce over Documents - Problems

The Problem

Your document-processing pipeline summarizes 20 reports one at a time, then combines them into an executive report. With each LLM call taking ~1 second, the sequential approach takes over 20 seconds—far too slow for a product that needs to process hundreds of documents. The map step (summarizing each document) is embarrassingly parallel since documents are independent. Your job is to implement a map-reduce pattern where documents are summarized in parallel (map), then a single reduce step synthesizes all summaries into one coherent report.

Examples

Example 1

Input: 20 quarterly reports covering topics A through T

Current (bad) output: The final report is correct, but processing takes ~22 seconds because each document is summarized sequentially before the reduce step runs.

Expected (good) output: The map phase processes all 20 documents in parallel (~2s), then the reduce step synthesizes a report (~1s), for a total of ~3 seconds. The report identifies cross-cutting themes: "Across 20 reports, recurring themes include Q1 budget overruns, Q3 product launches, and organizational restructuring in Q4."

Example 2

Input: 5 research papers on different subtopics

Current (bad) output: Takes ~7 seconds to sequentially summarize and combine 5 papers.

Expected (good) output: Map phase completes in ~1.5s, reduce in ~1s. The report synthesizes: "The five papers converge on two themes: improved efficiency through automation and the need for human oversight in critical decision-making."

Your Task

Refactor the starter code so that:

The map phase summarizes each document individually and in parallel.
The reduce phase waits for all summaries, then synthesizes them into a cohesive executive report.
The report identifies themes and patterns across documents rather than just listing summaries.
The pipeline handles at least 20 documents efficiently.

Evaluation

Submissions are checked for the following:

Parallel map phase: Documents are summarized in parallel, not sequentially.
Individual summaries: Each document receives its own summary before aggregation.
Coherent reduce output: The final report synthesizes themes across summaries rather than concatenating them.
Handles 20 documents: The pipeline successfully processes at least 20 documents.

#93. Map-Reduce over Documents