Developers, developers, developers!

Blog about programming, programming and, ah more programming!

[ Python ]

Benchmarking a Local LLM on Advent of Code 2025 (Ollama)

The previous benchmarks in this series (Haskell, OCaml, Python, ReScript, Ruby, Elixir, Java, Elm) all used cloud API models — big, frontier-class LLMs served by Anthropic, OpenAI, and others. But what about a local model? Can a 14-billion-parameter model running on a single machine solve the same puzzles?

This post answers that question using qwen2.5-coder:14b via Ollama, tested on AoC 2025 Day 1 in both Python and Haskell.