Josh Ludan — best-of

In-browser ML demos, small JS vibelets, and a Claude-curated corner.

Real models running in your browser, or research you can actually read and run.

Dice Reader

Upload a photo of D&D dice; a YOLOv3 detector locates each die and a MobileNetV2 classifier reads the face value — all in-browser.

TBM

Open in Colab

Text Bottleneck Models

Interpretable-by-design LLM classifiers that route predictions through a small set of human-readable concept features. Runs end-to-end in a single Colab on CEBaB.

Small, playful JS experiments — less "demo of a system," more "vibe you can scroll through." Live and playable.

★

React · live

CrochetPattern

Upload any image and get back a color-mapped crochet pattern you can follow stitch-by-stitch. Browser-only, turns pixels into yarn instructions.

React · live

POEMSCROLL

Every poem embedded with Instructor, squashed onto a 1-D UMAP axis, then served as a scroll — drift from one poem to its semantic neighbor.

↔

React · live

1DEmbedScroller

BYOC (bring-your-own-corpus) version of POEMSCROLL — scroll any word list along one UMAP axis. Ships with GloVe, MiniLM, MPNet & more.

Claude Code Corner

Josh Magnus Ludan is an NLP and LLM-interpretability researcher now pursuing a PhD in CIS at the University of Pennsylvania, advised by Mark Yatskar and Chris Callison-Burch. He graduated from Penn in 2024 with a dual focus in CS and Data Science, served as VP of Projects at the Penn Data Science Group, and was named a 2026 ASSET Center AWS Fellow for work on trustworthy, interpretable AI. His papers have landed at ACL 2023, ACL 2024, and NeurIPS 2025; his current focus is multimodal systems that fuse molecular data with the scientific literature.

Greatest-hits publications

Explanation-based Finetuning Makes Models More Robust to Spurious Cues

ACL 2023 · Ludan et al. (with Callison-Burch)

Force the model to justify its answer in free text during finetuning and it stops exploiting shortcut features. +15.4 accuracy recovery on e-SNLI.

arXiv →

Interpretable-by-Design Text Understanding with Iteratively Generated Concept Bottleneck (TBM)

2023/2024 · Ludan first author

A classifier that routes predictions through an LLM-discovered set of human-readable concepts, making each decision auditable while rivaling few-shot GPT-4.

arXiv → Open in Colab →

RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

ACL 2024 · Dugan, Ludan, et al.

A 6M-generation benchmark across 11 models, 8 domains, and 11 adversarial attacks that exposes how brittle “99% accurate” AI-text detectors actually are.

arXiv →

Medex: Distilling Knowledge Priors from Literature for Therapeutic Design

NeurIPS 2025 · Jones, Maus, …, Ludan, …, Yatskar

An LLM pipeline that mines scientific literature into concise, fair-use priors for therapeutic and compound design.

arXiv →

Analysis of Moral Judgement on Reddit (r/AITA)

arXiv preprint · 2021

Benchmarks every architecture from CNNs up through GPT-3 on r/AITA posts to see whether models can make nuanced moral calls on actual human drama.

arXiv →

Projects worth knowing about

Reddit social contagion (with Prof. Damon Centola) — scraped hundreds of gigabytes of Reddit to model how contagions move through communities under different graph-centrality measures.
Street View → US-state EfficientNet — raised SOTA on predicting the US state of a Street View image from 25.9% to 54.2%.
Financial-sentiment LLM for online communities — beat every publicly available baseline during evaluation; 2023 Best Practicum at Penn.
YouTube consumption patterns — topic modeling of American YouTube data with Homa Hosseinmardi & Duncan Watts at the CSS Lab.
Daily Pennsylvanian topic modeling — surfacing discussion-trend shifts in the campus paper over time.
PDSG consulting — NLP for physician procedure-eligibility decisions (Flagler Health); sales-data customer-acquisition modeling (EmployAI).

Find Josh elsewhere

jmrludan.com — personal site (bio, highlights, resume)
Google Scholar · ACL Anthology · dblp
GitHub · Kaggle
X / Twitter · Threads
Penn Data Science Group