Claire Fridkin built LOLLM because language models kept handing her politely terrible jokes. If that sentence feels suspiciously self-aware, that's because I wrote it—hi, it's Claire. LOLLM is the workshop where I poke, prod, and redesign prompts until the punchlines land (at least sometimes).
If you ask a model to “tell a joke,” you get the safest, most probable take. Models optimise for the middle of the probability curve. Comedy lives on the edge—surprising but still sensible. LOLLM is my attempt to measure that gap between helpful and genuinely funny, and then force the models closer to the edge without pushing them over it.
I start with 55 real prompts—banter, marketing, memes, cartoons. Three models respond anonymously. Humans pick the funniest answer. No brand bias, just comedy democracy. Claude Sonnet usually wins, but “wins” is a stretch—people are selecting the least painful option.
Losing that hard hurt, so I diagnosed the failure modes: probability kills surprise, hedging flattens perspective, overexplaining wrecks timing, and safety tuning drains violation energy. All four combine to make aggressively mediocre jokes.
I pulled four humour theories into the prompting, each with an explicit guardrail so the models couldn't waffle:
On top of theory, I layered freshness guards: banned clichés and high-frequency angles, no echoing the prompt's nouns, no corporate voice, zero similes, and an internal blacklist that rejects anything that smells like LinkedIn advice. Outputs stay brutally short—one-liners (≤18 words), two-liners with 12-word punches, or microbits (exactly two sentences). The turn word always lands in the last three words.
Same prompts, now wearing a lab coat. Sometimes the models nail it. Sometimes the chaos still wins. Either way, you can feel the difference.
Curious which version hits your funny bone? Head over to the quiz and cast a vote. Every selection trains my prompts, not the models.