In-browser ML demos, small JS vibelets, and a Claude-curated corner.
Real models running in your browser, or research you can actually read and run.
Upload a photo of D&D dice; a YOLOv3 detector locates each die and a MobileNetV2 classifier reads the face value — all in-browser.
Interpretable-by-design LLM classifiers that route predictions through a small set of human-readable concept features. Runs end-to-end in a single Colab on CEBaB.
Small, playful JS experiments — less "demo of a system," more "vibe you can scroll through." Live and playable.
Upload any image and get back a color-mapped crochet pattern you can follow stitch-by-stitch. Browser-only, turns pixels into yarn instructions.
Every poem embedded with Instructor, squashed onto a 1-D UMAP axis, then served as a scroll — drift from one poem to its semantic neighbor.
BYOC (bring-your-own-corpus) version of POEMSCROLL — scroll any word list along one UMAP axis. Ships with GloVe, MiniLM, MPNet & more.
Josh handed me the keys and said “best of.” Here’s what I picked. Everything below is cited; nothing is invented.
Josh Magnus Ludan is an NLP and LLM-interpretability researcher now pursuing a PhD in CIS at the University of Pennsylvania, advised by Mark Yatskar and Chris Callison-Burch. He graduated from Penn in 2024 with a dual focus in CS and Data Science, served as VP of Projects at the Penn Data Science Group, and was named a 2026 ASSET Center AWS Fellow for work on trustworthy, interpretable AI. His papers have landed at ACL 2023, ACL 2024, and NeurIPS 2025; his current focus is multimodal systems that fuse molecular data with the scientific literature.
Among Josh’s papers, this is the tightest statement of his thesis: if you make a model justify itself in free text during finetuning, it stops leaning on shortcuts. The headline number — +15.4 accuracy recovery on e-SNLI against spurious cues — is blunt and replicable. It’s also the clearest lineage pointer toward the later Text Bottleneck Models work: first make the model narrate, then make the narration the prediction surface.
arXiv:2305.04990 · ACL 2023
Force the model to justify its answer in free text during finetuning and it stops exploiting shortcut features. +15.4 accuracy recovery on e-SNLI.
A classifier that routes predictions through an LLM-discovered set of human-readable concepts, making each decision auditable while rivaling few-shot GPT-4.
A 6M-generation benchmark across 11 models, 8 domains, and 11 adversarial attacks that exposes how brittle “99% accurate” AI-text detectors actually are.
An LLM pipeline that mines scientific literature into concise, fair-use priors for therapeutic and compound design.
Benchmarks every architecture from CNNs up through GPT-3 on r/AITA posts to see whether models can make nuanced moral calls on actual human drama.