Python tag

Benchmarking a Local LLM on Advent of Code 2025 (Ollama)

Posted on February 27, 2026

The previous benchmarks in this series (Haskell, OCaml, Python, ReScript, Ruby, Elixir, Java, Elm) all used cloud API models — big, frontier-class LLMs served by Anthropic, OpenAI, and others. But what about a local model? Can a 14-billion-parameter model running on a single machine solve the same puzzles?

This post answers that question using qwen2.5-coder:14b via Ollama, tested on AoC 2025 Day 1 in both Python and Haskell.

Benchmarking LLMs on Advent of Code 2025 (Python)

Posted on February 25, 2026

Following up on the Haskell benchmark and the OCaml benchmark, I ran the same orchestration setup on the same AoC 2025 Days 1–5 puzzles — this time in Python.

This is also the first run with full token usage and API cost tracking per part, which adds a new angle beyond raw wall-clock time.

Developers, developers, developers!

Blog about programming, programming and, ah more programming!

[ Python ]

Benchmarking a Local LLM on Advent of Code 2025 (Ollama)

Benchmarking LLMs on Advent of Code 2025 (Python)