Developers, developers, developers!

Blog about programming, programming and, ah more programming!

Benchmarking a Local LLM on Advent of Code 2025 (Ollama)

The previous benchmarks in this series (Haskell, OCaml, Python, ReScript, Ruby, Elixir, Java, Elm) all used cloud API models — big, frontier-class LLMs served by Anthropic, OpenAI, and others. But what about a local model? Can a 14-billion-parameter model running on a single machine solve the same puzzles?

This post answers that question using qwen2.5-coder:14b via Ollama, tested on AoC 2025 Day 1 in both Python and Haskell.

Benchmarking LLMs on Advent of Code 2025 (Elm)

Following up on the Haskell benchmark, the OCaml benchmark, the Python benchmark, the ReScript benchmark, the Ruby benchmark, the Elixir benchmark, and the Java benchmark, I ran the same AoC 2025 Days 1–5 setup in Elm.

Elm is the most niche language in this series. It's a pure functional language that compiles to JavaScript, has no native CLI story, and sees relatively little use outside its frontend niche. Each model received a pre-built scaffold — run.mjs, elm.json, and a Day00.elm template — that compiles and runs Elm modules via Node.js. The question was whether models would handle Elm's strict type system, lack of escape hatches, and unfamiliar idioms (e.g. Debug.log for output, Platform.worker for headless programs).

The answer: every single one of them did.

Benchmarking LLMs on Advent of Code 2025 (ReScript)

Following up on the Haskell benchmark, the OCaml benchmark, and the Python benchmark, I ran AoC 2025 Days 1–5 in ReScript — a typed functional language that compiles to JavaScript with a lean standard library, a distinct syntax, and very limited LLM training data.

This post covers three runs of the same benchmark, each adding a different intervention to see what helps models cope with an unfamiliar language:

  1. Run 1 — no help at all. 3-minute timeout. 1 completer out of 10.
  2. Run 2 — overflow warning + longer timeout. 2 completers.
  3. Run 3 — a ReScript system prompt teaching syntax, stdlib, and types. 7 completers.

Benchmarking LLMs on Advent of Code 2025 (OCaml)

Following up on the Haskell benchmark, I ran the same orchestration setup on the same AoC 2025 Days 1–5 puzzles — this time requiring solutions in OCaml. The methodology is identical: each model gets an isolated directory, a puzzle description, and must write its final answer to ANSWER.txt. Wrong answer or no answer = ejection.

Generate Passwords

Quick way to generate passwords on the command line

Closures

Examples in ruby and javascript

Currying

Examples in ruby and javascript