Can I Create an AI Agent to Create Crossword Puzzles? Part 2: What Is an Agent, Anyway?

4 minute read

In Part 1, we saw that with a bit of prompt engineering, an LLM could put together a decent puzzle.

But that was just one puzzle, and a small one at that. It still needed manual QA. We had to check the clues and solutions, make sure the grid worked, and confirm that the words and clues were fresh and not repetitive. If we’re talking about generating a meaningful number of high-quality puzzles, all of that becomes a real bottleneck. At that point, you might as well just make the puzzle by hand.

So in this article, we’re going to explore what our “agent architecture” might look like and clarify what exactly we’re trying to build.

We’ve still got a human-in-the-loop problem. The question is: can we get rid of that?

What Are Our Goals?

Before we go building, let’s think about the workflow of creating a crossword puzzle by hand. That workflow is essentially what we want our system to control. A typical puzzle creation process could look like this:

Grid design: Choose a grid template with symmetry and a reasonable balance of open and blocked squares.
Theming: If it’s a themed puzzle, identify anchor words or phrases around which the puzzle will revolve.
Filling: Populate the rest of the grid with intersecting words while respecting constraints like word length and symmetry.
Clue writing: Draft clues that balance difficulty, originality, and fairness for the solver.
Quality checks: Eliminate duplicate clues, niche references, or nonsensical definitions. Ensure diversity across puzzles and alignment with the target audience.
Playtesting: Have someone (or something) solve the puzzle to verify difficulty, solvability, and overall flow.

Professional crossword editors follow similar steps to ensure quality and originality. The New York Times Crossword submission guidelines outline some of these standards.

What we want to do is perform each of these steps automatically. That means producing a continuous stream of high-quality, solvable, non-repetitive puzzles without human intervention.

Agentic Systems

While there isn't a singular definition for what an "AI agent" is, Anthropic defines "agentic systems," but makes a notable architectural distinction between "workflows" and "agents":

Workflows are systems where LLMs and tools are orchestrated through predefined code paths. Workflows are "ideal for situations where the task can be easily and cleanly decomposed into fixed subtasks."
Agents are systems where LLMs dynamically direct their own processes and tool usage, maintaining control over how they accomplish tasks. Agents are "used for open-ended problems where it’s difficult or impossible to predict the required number of steps."

Agentic systems build on the basic capabilities of foundational LLMs by adding memory, reasoning, tool use, and other augmentations. Where an LLM can only generate text, an agent can remember past interactions, evaluate its own outputs, and take actions through external systems like databases, APIs, or MCP servers. How do they do this? Frameworks like LangChain provide ready-to-use scaffolding for these features. Or you could build that stuff yourself, if you're into that kind of thing.

And you might be thinking, “Didn’t we generate a puzzle successfully in the last article?” Well, yes, but not really. Last time, we just prompted an LLM to generate a single puzzle. That’s not the same as working with an agentic system. The one-shot approach has clear limitations: it relies on carefully crafted prompts for every step, it cannot reliably assess puzzle quality or track which clues we’ve already used, and it lacks memory to evaluate the puzzle it generated against past puzzles.

Since we already have a well-defined publishing process, it seems that our puzzle-generation system is actually a better fit for a workflow model.

Evaluator-Optimizer Workflow

The evaluator-optimizer loop is a workflow design pattern where one model proposes a solution and another critiques it until arbitrary acceptance criteria are met.

In our case, the optimizer will generate a grid and set of clues based on requirements like theme and puzzle size. The evaluator checks for symmetry, solvability, fairness, and freshness, then returns feedback to the optimizer. The cycle continues until both models agree the puzzle is complete.

This mirrors the constructor–editor dynamic found in professional puzzle editing, where iteration is what separates a publishable crossword from a rough draft. It is a simplified abstraction since real editing often involves multiple reviewers, but it gives us a starting architecture we can expand later with additional or specialized agents.

Conclusion

Well, we still haven't solved anything, but we have a clearer picture of what we are up against. Our use case appears better suited to an agentic workflow rather than a fully autonomous agent because our task is structured and well-defined.

In Part 3, we will move from theory to practice by sketching the workflow architecture and beginning implementation.

Can I Create an AI Agent to Create Crossword Puzzles? Part 2: What Is an Agent, Anyway?

What Are Our Goals?

Agentic Systems

Evaluator-Optimizer Workflow

Conclusion

/dev/null digest

Related Posts

Leave a Comment

Analysts Raise Outlook 34% After Team Implements New Jira Sprint Planning Process

Can I Create an AI Agent to Create Crossword Puzzles? Part 2: What Is an Agent, Anyway?

Can I Create an AI Agent to Create Crossword Puzzles? Part 1: The Adventure Begins