Tag Archives: ai

I Built a Pokémon Game. Here’s What I Learned About LangChain and LangGraph.

I wanted to learn LangChain and LangGraph properly — not through dry tutorials, but by building something fun. So I built a text-based Pokémon RPG where an LLM narrates your adventure, generates wild encounters, and drives the story, while Python handles the actual game mechanics.

The full source code is a single main.py file. In this post, I’ll walk through the key concepts and point to exactly where they show up in the code.

📦 Full source on GitHub

I also have a YouTube video about this


The Big Idea: LLM for Creativity, Code for Logic

The most important design decision was the split of responsibilities. The LLM handles things it’s good at — narration, personality, generating Pokémon names and descriptions. Python handles things that need to be deterministic — damage formulas, catch rates, HP tracking. LangGraph ties them together into a state machine that is the game loop.


1. Connecting to the LLM

LangChain abstracts LLM providers behind a unified interface. Whether you use OpenAI, Anthropic, or a self-hosted Ollama server, the API is the same. I’m running Qwen 3.5 on a remote Ollama instance:

llm = ChatOllama(
model="qwen3.5:35b-a3b",
base_url="http://127.0.0.1:11434",
max_tokens=4096,
temperature=0.7,
)

This single object gets reused everywhere — for narration, Pokémon generation, and Professor Oak’s dialogue. Swap the model or URL, and the entire game runs on a different LLM with zero code changes.


2. Prompt Templates: Giving the LLM a Role

Raw strings work, but templates are reusable. The narrator chain uses a SystemMessage to set the persona, a MessagesPlaceholder for conversation history, and variables for dynamic context:

narrator = (
ChatPromptTemplate.from_messages([
("system", """You are the narrator of a Pokémon text adventure.
Player: {player_name} | Location: {location} | Badges: {badge_count}
Team: {team_str} ..."""),
MessagesPlaceholder("history"),
("human", "{input}"),
])
| llm
)

The | pipe is LCEL (LangChain Expression Language) — it composes the template and the LLM into a single callable chain. One .invoke() fills the template, sends it to the model, and returns the response.


3. Structured Output: Pokémon as Data, Not Prose

This was the moment it clicked for me. Instead of parsing free text with regex, you define a Pydantic model and LangChain forces the LLM to return valid, typed data:

class WildPokemonSchema(BaseModel):
name: str
type: str
level: int = Field(ge=2, le=50)
hp: int = Field(ge=20, le=120)
attack: int = Field(ge=10, le=60)
defense: int = Field(ge=10, le=50)
encounter_generator = llm.with_structured_output(WildPokemonSchema)

Now, when I call encounter_generator.invoke("Generate a wild Pokémon for Viridian Forest"), I get back an actual WildPokemonSchema object with guaranteed fields and value ranges — not a blob of text I have to hope is parseable.


4. LangGraph: The Game Is a State Machine

This is where things get interesting. A Pokémon game isn’t a linear prompt → response flow. It’s a loop with branches: explore → maybe encounter → fight or catch or run → check outcome → loop back. That’s a state machine, and that’s exactly what LangGraph gives you.

First, you define the state — everything the game needs to track:

class GameState(TypedDict):
messages: Annotated[list, add_messages]
player_name: str
location: str
pokemon_team: list[dict]
wild_pokemon: dict | None
badge_count: int
game_phase: str
turn_count: int

The Annotated[list, add_messages] part is a reducer — it tells LangGraph to append new messages to the list instead of replacing it. This is how conversation history accumulates automatically.

Then you write nodes — plain functions that receive the state and return partial updates:

def explore_node(state: GameState) -> dict:
# ... call the narrator LLM, return new messages
return {"messages": [...], "game_phase": "exploration"}
def battle_node(state: GameState) -> dict:
# ... handle fight/catch/run logic
return {"messages": [...], "wild_pokemon": updated, "game_phase": "battle"}

You only return the keys that changed. LangGraph handles merging.


5. Conditional Edges: Branching Paths

The real power of the graph is dynamic routing. After exploring, should the player encounter a wild Pokémon or keep walking? After a battle turn, did they win, lose, or is the fight still going?

def route_after_battle(state: GameState) -> str:
phase = state.get("game_phase", "")
if phase == "exploration":
return "explore" # won the fight
if phase == "game_over":
return "game_over" # your Pokémon fainted
return "battle" # fight continues
graph.add_conditional_edges("battle", route_after_battle,
{"explore": "explore", "game_over": "game_over", "battle": "battle"})

The routing function reads the state and returns a string key. The mapping dict sends the graph to the right node. No if/else spaghetti — the graph structure is the game logic.


6. interrupt(): Waiting for the Player

The most game-changing feature (pun intended). interrupt() pauses the entire graph and surfaces a prompt to the player. When they respond, execution resumes exactly where it left off:

# Inside battle_node:
action = interrupt(
f"⚔️ BATTLE — Turn {state.get('turn_count', 0) + 1}\n"
f" {p['name']}: {p['hp']}/{p['max_hp']} HP\n"
f" Wild {w['name']}: {w['hp']}/{w['max_hp']} HP\n"
f" Your moves: [{moves_str}]\n"
f" Or: [catch] / [run]"
)
# 'action' now contains whatever the player typed

For this to work, you need a checkpointer — it saves the graph’s state between pauses:

checkpointer = MemorySaver()
game = graph.compile(checkpointer=checkpointer)
# Each session gets a thread_id (like a save file)
config = {"configurable": {"thread_id": f"game-{name}"}}

The game loop then checks for interrupts and resumes with the player’s input:

snapshot = game.get_state(config)
if snapshot.tasks and snapshot.tasks[0].interrupts:
prompt = snapshot.tasks[0].interrupts[0].value
player_input = input("> ")
result = game.invoke(Command(resume=player_input), config)

The Final Graph

Here’s the complete game flow:

        ┌──────────┐
        │  START    │
        └────┬─────┘
             │
        ┌────▼─────┐
        │  intro    │  ← Professor Oak
        └────┬─────┘
             │
        ┌────▼─────┐ ◄──────────────────────────┐
        │ explore   │  ← waits for player input   │
        └────┬─────┘                              │
             │                                    │
      ┌──────┴──────┐                             │
      ▼             ▼                             │
 ┌────────┐  ┌──────────────┐                     │
 │  heal  │  │encounter_chk │                     │
 └───┬────┘  └──────┬───────┘                     │
     │          ┌───┴────┐                        │
     │        none    encounter                   │
     │          │        │                        │
     │          │ ┌──────▼──────┐                  │
     │          │ │   battle    │◄──┐             │
     │          │ │  (interrupt)│   │ ongoing     │
     │          │ └──────┬──────┘   │             │
     │          │   ┌────┼────┐    │             │
     │          │  win  loss  loop─┘             │
     │          │   │    │                        │
     └──────────┴───┴────┼────────────────────────┘
                         │
                  ┌──────▼──────┐
                  │  game_over  │ → END
                  └─────────────┘

Key Takeaways

Split responsibilities wisely. LLMs are great at generating creative text and structured data. They’re terrible at math and consistent state tracking. Let each do what it’s good at.

Structured output is underrated. .with_structured_output() turned the LLM from a chatbot into a game asset generator. No parsing, no praying — just typed Python objects.

LangGraph thinks in graphs, not chains. Once I stopped thinking “prompt → response” and started thinking “state → node → conditional edge → next state,” the game architecture fell into place naturally.

interrupt() makes real interactivity possible. Without it, you’re stuck building hacky input loops around the LLM. With it, the graph itself manages the pause/resume cycle.


The full game is a single main.py — about 300 lines of Python. Clone it, point it at any Ollama-compatible server, and start catching Pokémon.

📦 Source code on GitHub

Is coding over? My prediction…

Here’s a summary of the related video I uploaded to my YouTube channel:


We Are About to Let AI Write 90% of Our Code

Hi friends 👋

In the last two months, something has changed.

And I don’t mean incrementally. I mean, fundamentally.

If you’ve tried using Claude Code with Opus — or accessed the Opus model through another provider — you can feel it. This is no longer autocomplete on steroids. This is something different.

This is real.
And it’s starting to work really well.

My Prediction

I’m not sure you’ll agree with me, but here it goes:

Within the next 2–3 years, 90% of the code we ship will be AI-generated.

Our job as developers will shift dramatically.

Instead of writing most of the code ourselves, we’ll focus on:

  • Providing high-quality context
  • Managing complexity and moving pieces
  • Handling edge cases AI can’t infer
  • Connecting systems
  • Making architectural decisions
  • Ensuring business value is delivered

In short, we’ll move from being writers of code to being managers of AI agents.

Almost like engineering managers — but for agents.

From Autocomplete to Agents

The early days of AI in development were about better tab-complete.

That era is over.

It’s time to “leave the seat” to AI agents — or even multiple agents working together — and step into a different role:

  • Making sure priorities are correct
  • Deciding which models to use and when
  • Managing cost (because yes, this can get expensive)
  • Ensuring output quality
  • Validating real-world impact

This year, I think we’ll learn a lot about how to be efficient in this new paradigm.

If You Don’t Believe It…

Try Claude Code with Opus.

That’s my honest recommendation. It’s what I’ve been using over the past two weeks, and it genuinely opened my eyes.

Other models can work too — Codex latest versions are solid — but not all models feel the same. Some are useful, but don’t yet deliver that “this changes everything” moment.

Opus does.

New Challenges Ahead

Of course, this shift brings new problems:

What happens to pull requests?

If most of the code is AI-generated, what exactly are we reviewing?

What about knowledge depth?

If you’re not writing the code, are you really understanding it?

This is critical.

You don’t want to be on call at 3AM, debugging production, and only knowing how to “prompt better.”

We are not at the point where programming becomes assembly and English becomes the new C.

We are far from that.

You still need to understand what’s happening. Deeply.

The 90/10 Rule

I think we’ll see something like a Pareto distribution:

  • 90% of code: AI-generated
  • 10% of code: Human-crafted

That 10% will matter a lot.

It will involve:

  • Complex context
  • Architectural glue
  • Edge cases
  • Critical logic
  • Irreducible human judgment

Development isn’t disappearing.

But it is transforming.

Exciting Times (Depending on Why You’re Here)

If you love building, solving problems, designing systems — this is an incredibly exciting time.

If what you loved most was physically typing every line of code yourself…

That part is changing.


I’m optimistic.

I think software development is evolving, not dying.

But the role of the developer?
That’s definitely being rewritten.

Let me know what you think.

See you 👋

A robot transcribing a big mp3

HOWTO transcribe from MP4 to TXT with Whisper AI

In an era where information is constantly flowing through various forms of media, the need to extract and transcribe audio content has become increasingly important. Whether you’re a journalist, a content creator, or simply someone looking to convert spoken words into written text, the process of transcribing audio can be a game-changer. In this guide, we’ll explore how to transcribe audio from an MP4 file to text using Whisper AI, a powerful automatic speech recognition (ASR) system developed by OpenAI.

Related video from my Youtube channel:

What is Whisper AI?

Whisper AI is an advanced ASR system designed to convert spoken language into written text. It has been trained on an extensive dataset, making it capable of handling various languages and accents. Whisper AI has numerous applications, including transcription services, voice assistants, and more. In this guide, we will focus on using it for transcribing audio from MP4 files to text.

Prerequisites

Before you can start transcribing MP4 files with Whisper AI, make sure you have the following prerequisites in place:

  1. Docker: Docker is a platform for developing, shipping, and running applications in containers. You’ll need Docker installed on your system. If you don’t have it, you can download and install Docker.
  2. MP4 to MP3 Conversion: Whisper AI currently accepts MP3 audio files as input. If your audio is in MP4 format, you’ll need to convert it to MP3 first. There are various tools available for this purpose. You can use FFmpeg for a reliable and versatile conversion process.

fmpeg -i 20230523_111106-Meeting\ Recording.mp4 20230523_111106-Meeting\ Recording.mp3

Transcribing MP4 to TXT with Whisper AI

Now, let’s walk through the steps to transcribe an MP4 file to text using Whisper AI. We’ll assume you already have your MP4 file converted to MP3.

Step 1: Clone the Whisper AI Docker Repository

First, clone the Whisper AI Docker repository to your local machine. Open a terminal and run the following command:

git clone https://github.com/hisano/openai-whisper-on-docker.git

Step 2: Navigate to the Repository

Change your current directory to the cloned repository:

cd openai-whisper-on-docker

Step 3: Build the Docker Image

Build the Docker image for Whisper AI with the following command:

docker image build --tag whisper:latest .

Step 4: Set Up Volume and File Name

Set the VOLUME_DIRECTORY to your current directory and specify the name of your MP3 file. In this example, we’ll use “hello.mp3”:

VOLUME_DIRECTORY=$(pwd)

FILE_NAME=hello.mp3

Step 5: Copy Your MP3 File

Copy your MP3 file (the one you want to transcribe) to the current directory.

cp ../20230503_094932-Meeting\ Recording.mp3 ./$FILE_NAME

Step 6: Transcribe the MP3 File

Finally, use the following command to transcribe the MP3 file to text using Whisper AI. In this example, we’re specifying the model as “small” and the language as “Spanish.” Adjust these parameters according to your needs:

docker container run --rm --volume ${VOLUME_DIRECTORY}:/data whisper --model small --language Spanish /data/$FILE_NAME

Once you execute this command, Whisper AI will process the audio file and provide you with the transcribed text output.

You’ll see transcription is outputted through stdout so consider piping the docker run to a file.

docker container run --rm --volume ${VOLUME_DIRECTORY}:/data whisper --model small --language Spanish /data/$FILE_NAME &> result.txt

You can monitor how it goes with:

tail -f result.txt

If you see a warning like:

/usr/local/lib/python3.9/site-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead

It will mean that you lack a CUDA setup so it will run using your CPU.

Also notice that here we’re using the small model, which is good enough but perhaps too slow with CPU usage. In my machine, it takes like 2.5 hours to transcribe 3 hours of audio.

Conclusion

Transcribing audio from MP4 to text has never been easier, thanks to Whisper AI and the power of Docker. With this guide, you can efficiently convert spoken content into written text, opening up a world of possibilities for content creation, research, and more. Experiment with different Whisper AI models and languages to tailor your transcription experience to your specific needs. Happy transcribing!

Note: I’ve written this blog post with the help of ChatGPT based on my own experiments with Whisper AI. I’m just too lazy to write something coherent in English. Sorry for that, I hope you liked it anyway.


Prompt: “Write a blog post whose title is HOWTO transcribe from mp4 to txt with Whisper AI. It should explain what Whisper AI is but also explain how to extract mp3 from mp4, and the following commands, ignore first column: 10054 git clone https://github.com/hisano/openai-whisper-on-docker.git 10055 cd openai-whisper-on-docker 10056 docker image build –tag whisper:latest . 10057 VOLUME_DIRECTORY=$(pwd) 10058 FILE_NAME=hello.mp3 10059 cp ../20230503_094932-Meeting\ Recording.mp3 ./hello.mp3 10060 docker container run –rm –volume ${VOLUME_DIRECTORY}:/data whisper –model small –language Spanish /data/hello.mp3” . After that, I added some extra useful information about performance.