Doing The Thing: From Mechanical Turk to Mech Interp

_{For motivation/momentum purposes: Started AI Safety journey 160 days ago with current daily streak of "doing the thing" for 1 consecutive days!}

Updates

Extending and reproducing Anthropic's MOLTs (sparse Mixtures of Linear Transforms) as part of Georg Lange's SPAR stream on Automating Circuit Interpretability with Agents (writeup, github)
Reproducing Transluce's Predictive Concept Decoders (PCD) to "read the mind" of an AI model through decoding activations of a Subject Model into natural language via a Sparse Encoder and LLM Decoder Model [github]
- Started Karpathy-inspired video tutorials PCD reproduction line-by-line to make AutoInterp research more accessible [video walkthrough]
Playing with Fire: Proposal for a Sharing AI Safety Research with At-Risk Youth through Games [video, documentation]
- 3-step pipeline to making AIS research accessible to broader community (e.g. at-risk teenagers using AI for mental health counseling); “AIS researchers are only one piece of the puzzle” ethos
Reflections from first 3 weeks of 2026 abroad at LISA and attending Oxford AI Safety Initiative's ARBOx including work in progress Theory of Change

TLDR of Things I've Done Since Starting My AI Safety Journey 160 Days Ago

Projects

Cross-Linguistic Alignment: Does LoRA Fine Tuning a model on a task (e.g. respond in all CAPS) translate cross-linguistically? (Summary && Github)
Reproducing Neo et al. 2024 Interpreting Context Look Ups
Multilingual Semantics Probe: Looking for Steering Vectors for semantically ambiguous sentences in English but not Mandarin
Syntactic Dependencies in Transformers: Attention Patterns for Balanced Parentheses (Dyck) Language (Github)

Programs

ARBOx: 2 weeks of compressed ARENA curriculum; project on Cross-linguistic generalization of fine-tuning
Attended NeurIPS Mech Interp 2025 Workshop and found some cool takeaways!
Started being mentored by Sudhanshu Kasewa from 80,000 Hours

What is this?

Here it goes! I'm Kyle, a 4th year Computational Linguistics student at USC. This is the start of my AI Safety journey! I am very greatful to be mentored by Prof. Khalil Iskarous at USC.

I learned about AI Safety from the Seattle Llama4 Hackathon on June 21st, 2025 where I learned of AI 2027. After finishing an awesome summer of engineering, I realized the problems which excite me the most lie at the crossroads of engineering and science (computation and linguistics).

The urgent need for understanding increasingly capable AI models coupled with a burning passion for working at the interdisciplinary intersection of NLP, linguistics, and engineering at scale has sharpened my goal: to become an AI Safety researcher in Mechanistic Interpretability.

Working Backwards

Sometimes (often) I get analysis paralysis or want to wait for the perfect {time, situation, background, preparation} to start which makes it difficult to get into pursuing my goals (and dreams). So this time around, I know my goal to become an AI Safety/Mech Interp researcher! After finding David Quarel's do the thing I decided that this site is a place where I will keep myself acountable for doing the thing.

Doing --> Working through math problems, reading papers, writing down lists of possible intersections of linguistics and NLP
The Thing --> Any of the above for at least 1 hour every day, with consistent (though not perfect) progress.

Mechanical Turk to Mech Interp

What is dlog?

I am starting this daily log or dlog where every day I will document my progress. I hope that the daily act of documenting will make me more resilient and help prove to myself how badly I want to be an interpetability researcher. With the help of AI, a macro pulls the daily logs into the summary you see below:

Extra Stuff (click me)

Wait What's Computational Linguistics?

As a Computational Linguistics student, I see Computational Linguistics as three parts:

Linguistics = study of human language processing / cognition
Mechanistic Interpretability = study of LLM language processing / cognition
Computational Linguistics = Interdisciplinary approach to studying LLM language processing

The Computational Linguistics topics that pull me at 9.8 m / s^2 are concepts like Information Theory and Probabilistic Phonology in addition to Theoretical Machine Learning and NLP.

What does AI Safety and Mechanistic Interpretability mean to me?

I hope that having deep knowledge in both the fields of linguistics and ML/NLP can help me build a more holistic understanding of LLM cognition and language processing.
I see Mechanistic Interpretability as a sort of psycholinguistics (the study of real-time processing of language) for LLMs.
Furthermore, I see Mechanistic Interpretability as a foundational basis for understanding AI systems. Perhaps understanding models (such as like biological organisms) can support the other branches of AI Safety (alignment, control, governance, and more).

dlog: 160 Days and Counting

Total time focused so far: 505 hrs 22 mins throughout 160 days of learning

Below are the latest updates (auto-generated).

Latest entries

2026-05-26 | 0 hr 0 min | Goal:
Writeup official proposal and research questions for investigating the syntax of LLMs

_Maxwell
2026-05-22 | 2 hr 0 min | Goal: Writeup and Proposal of Syntax of LLMs Proposal [see doc]
Writeup official proposal and research questions for investigating the syntax of LLMs

_Maxwell
2026-05-21 | 2 hr 0 min | Goal: Train Toy Seq2Seq Transformer (Vaswani et al.) on translation from English to Zapanese
Toy transformer can generalize to unseen examples of same depth but not unseen recursive depths (using complementizer phrases)

_Maxwell
2026-05-20 | 2 hr 0 min | Goal: Develop Zapanese Context Free Grammar [see PR]
Design Grammar and script to translate from Head Initial (SVO) to Head Final (SOV) languages

_Maxwell
2026-05-19 | 2 hr 50 min | Goal: MOLT Transfer of Knowledge
Wrote Transfer of Knowledge for future MOLT research

_MOLTs
2026-05-09 | 4 hr 0 min | Goal: Writeup documentation on follow up steps for Instruction Tuned MOLTs training
Also tried to evaluate GPT-2 MOLTs which had high MSE, will investigate

_MOLTs
2026-05-08 | 9 hr 0 min | Goal: Realized Fatal Flaw in MOLTs Training
Trained MOLTs for instruction tuned model without chat data and chat template

_MOLTs
2026-05-07 | 9 hr 0 min | Goal: Continue Training Gemma3-4B-IT MOLTs
Use Multi-GPU training to train Gemma3-4B-IT MOLTs

_MOLTs
2026-05-06 | 9 hr 0 min | Goal: Train Gemma3-4B-IT MOLTs
Use Multi-GPU training to train Gemma3-4B-IT MOLTs

_MOLTs
2026-04-28 | 4 hr 50 min | Goal: Steering with MOLT Features and Top Activating prompts dashboard; see also PR for MOLTs to master and Wandb Runs
Feedback from Georg on using a translation task to verify necessary and sufficient conditions

_MOLTs

Todo List

The todo list started getting too beefy and has been moved to its own todo page!