Josh Ludan — best-of

In-browser ML demos, small JS vibelets, and a Claude-curated corner.

Real models running in your browser, or research you can actually read and run.

Small, playful JS experiments — less "demo of a system," more "vibe you can scroll through." Live and playable.

Claude Code Corner

Josh Magnus Ludan is an NLP and LLM-interpretability researcher now pursuing a PhD in CIS at the University of Pennsylvania, advised by Mark Yatskar and Chris Callison-Burch. He graduated from Penn in 2024 with a dual focus in CS and Data Science, served as VP of Projects at the Penn Data Science Group, and was named a 2026 ASSET Center AWS Fellow for work on trustworthy, interpretable AI. His papers have landed at ACL 2023, ACL 2024, and NeurIPS 2025; his current focus is multimodal systems that fuse molecular data with the scientific literature.

Greatest-hits publications

Explanation-based Finetuning Makes Models More Robust to Spurious Cues

ACL 2023 · Ludan et al. (with Callison-Burch)

Force the model to justify its answer in free text during finetuning and it stops exploiting shortcut features. +15.4 accuracy recovery on e-SNLI.

Interpretable-by-Design Text Understanding with Iteratively Generated Concept Bottleneck (TBM)

2023/2024 · Ludan first author

A classifier that routes predictions through an LLM-discovered set of human-readable concepts, making each decision auditable while rivaling few-shot GPT-4.

RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors

ACL 2024 · Dugan, Ludan, et al.

A 6M-generation benchmark across 11 models, 8 domains, and 11 adversarial attacks that exposes how brittle “99% accurate” AI-text detectors actually are.

Medex: Distilling Knowledge Priors from Literature for Therapeutic Design

NeurIPS 2025 · Jones, Maus, …, Ludan, …, Yatskar

An LLM pipeline that mines scientific literature into concise, fair-use priors for therapeutic and compound design.

Analysis of Moral Judgement on Reddit (r/AITA)

arXiv preprint · 2021

Benchmarks every architecture from CNNs up through GPT-3 on r/AITA posts to see whether models can make nuanced moral calls on actual human drama.

Projects worth knowing about

  • Reddit social contagion (with Prof. Damon Centola) — scraped hundreds of gigabytes of Reddit to model how contagions move through communities under different graph-centrality measures.
  • Street View → US-state EfficientNet — raised SOTA on predicting the US state of a Street View image from 25.9% to 54.2%.
  • Financial-sentiment LLM for online communities — beat every publicly available baseline during evaluation; 2023 Best Practicum at Penn.
  • YouTube consumption patterns — topic modeling of American YouTube data with Homa Hosseinmardi & Duncan Watts at the CSS Lab.
  • Daily Pennsylvanian topic modeling — surfacing discussion-trend shifts in the campus paper over time.
  • PDSG consulting — NLP for physician procedure-eligibility decisions (Flagler Health); sales-data customer-acquisition modeling (EmployAI).

Find Josh elsewhere