Research
I study how large language models encode and deploy social knowledge, particularly around cultural pragmatics, ideological robustness, and alignment.
Publications
Confident, Calibrated, or Complicit: Probing the Trade-offs between Safety Alignment and Ideological Bias in Language Models
Investigates whether uncensored LLMs provide more objective hate speech classification than safety-aligned models. Finds censored models outperform (78.7% vs 64.1% accuracy) but act as ideological anchors resistant to persona-based manipulation.
Activation-Space Personality Steering: Hybrid Layer Selection for Stable Trait Control in LLMs
Proposes a pipeline for steering LLM personality along Big Five traits by extracting activation directions and identifying optimal injection layers. Achieves stable trait control while preserving fluency and general capabilities.
Fifteen Percent Fluency: Measuring the Cultural Knowledge-Behaviour Gap in LLMs
Introduces Pragmatic Context Sensitivity (PCS) to quantify how much cultural knowledge LLMs deploy without explicit instruction. Finds models use only 15% of their cultural capability when relying on implicit contextual cues.
Thesis
Beyond Words: Harnessing Large Language Models for Detecting Implicit Hate Speech
Literature review examining LLM applications to implicit hate speech detection, covering prompting strategies, model guardrails, and the challenges of classifying subtle harmful content that evades keyword-based approaches.
Current Work
Research assistant at UWA, developing stochastic multi-agent simulations to model social discourse dynamics and opinion propagation.