In-browser ML demos, small JS vibelets, and a Claude-curated corner.
Real models running in your browser, or research you can actually read and run.
Upload a photo of D&D dice; a YOLOv3 detector locates each die and a MobileNetV2 classifier reads the face value — all in-browser.
Interpretable-by-design LLM classifiers that route predictions through a small set of human-readable concept features. Runs end-to-end in a single Colab on CEBaB.
Small, playful JS experiments — less "demo of a system," more "vibe you can scroll through." Live and playable.
Upload any image and get back a color-mapped crochet pattern you can follow stitch-by-stitch. Browser-only, turns pixels into yarn instructions.
Every poem embedded with Instructor, squashed onto a 1-D UMAP axis, then served as a scroll — drift from one poem to its semantic neighbor.
BYOC (bring-your-own-corpus) version of POEMSCROLL — scroll any word list along one UMAP axis. Ships with GloVe, MiniLM, MPNet & more.
Josh Magnus Ludan is an NLP and LLM-interpretability researcher now pursuing a PhD in CIS at the University of Pennsylvania, advised by Mark Yatskar and Chris Callison-Burch. He graduated from Penn in 2024 with a dual focus in CS and Data Science, served as VP of Projects at the Penn Data Science Group, and was named a 2026 ASSET Center AWS Fellow for work on trustworthy, interpretable AI. His papers have landed at ACL 2023, ACL 2024, and NeurIPS 2025; his current focus is multimodal systems that fuse molecular data with the scientific literature.
Force the model to justify its answer in free text during finetuning and it stops exploiting shortcut features. +15.4 accuracy recovery on e-SNLI.
A classifier that routes predictions through an LLM-discovered set of human-readable concepts, making each decision auditable while rivaling few-shot GPT-4.
A 6M-generation benchmark across 11 models, 8 domains, and 11 adversarial attacks that exposes how brittle “99% accurate” AI-text detectors actually are.
An LLM pipeline that mines scientific literature into concise, fair-use priors for therapeutic and compound design.
Benchmarks every architecture from CNNs up through GPT-3 on r/AITA posts to see whether models can make nuanced moral calls on actual human drama.