March 1, 2026 5 min read

March 2026

This monthly newsletter covers recent developments and upcoming events in AI safety, ethics, and governance in Montréal.

Events

When Is a Human Actually “Overseeing” an AI System?
Tuesday, March 3, 7–9 PM, Maison du développement durable
Shalaleh Rismani (McGill/Mila, Open Roboethics Institute) presents research showing that greater understanding of AI systems doesn’t necessarily improve human oversight, and may worsen it through overtrust.

What should Montréal’s role be in AI safety?
Tuesday, March 10, 7–9 PM, Montréal
Structured discussion mapping Montréal’s AI safety ecosystem (Mila, LawZero, HΩ, PauseAI, CAISI, OBVIA, and others) and identifying gaps and possible collaborations.

Social Reasoning and the Ecology of Thought
March 10–13, IVADO, Montréal, $40–$240
Workshop in IVADO’s thematic semester on computational reasoning. 22 speakers from AI, neuroscience, and philosophy examine reasoning in multi-agent systems: theory of mind, argumentation, distributed reasoning.

Building Safer AI for Youth Mental Health
March 16–23, Mila + online
Week-long hackathon organized by Mila. Three tracks: adversarial stress-testing, logic hardening, and synthetic data augmentation for safer conversational AI. Prizes include $10K and a Mila AI Safety Studio internship.

AI Control Hackathon
Friday–Sunday, March 20–22, Montréal
Three-day hackathon co-organized with Apart and Redwood Research on AI control: maintaining safety against potentially misaligned systems. Tracks: ControlArena challenges, control protocol design, and red teaming.

Policy and Governance

International AI Safety Report 2026. The second edition of the global consensus report, chaired by Yoshua Bengio (Mila/UdeM), authored by over 100 experts, backed by 30+ countries and international organizations including the EU, OECD, and UN. Key findings: general-purpose AI capabilities continue to advance while most risk management remains voluntary; models are beginning to distinguish between test and deployment settings, undermining safety evaluations; and evidence on risks emerges structurally slower than capabilities develop — a core dilemma for governance.

Canada: Tumbler Ridge tragedy exposes AI governance vacuum. A shooter killed eight people in Tumbler Ridge, BC on February 10. OpenAI had flagged and banned the shooter’s account months earlier for conversations about gun violence. AI Minister Evan Solomon summoned OpenAI to Ottawa. Canada currently has no legislation requiring AI companies to report such cases to law enforcement.

Canada: “If we don’t do anything, I think we’re about five years away from superintelligent AI.” David Krueger (Mila/UdeM) told the House of Commons ETHI committee on February 2. The committee’s study on “Challenges Posed by Artificial Intelligence and its Regulation” has heard from 20 expert witnesses since November. Krueger argued current alignment methods are technically inadequate: “We don’t know how to build superintelligent AI safely.” He identified chip manufacturing as the key leverage point, calling for limits on production.

Canada: AI Strategy consultation results released. ISED published its “Summary of Inputs” from October’s 30-day “national sprint”, with 11300+ submissions and 32 Task Force reports. The consultation’s compressed timeline and industry-heavy Task Force drew criticism: 160+ civil society organizations and individuals signed an open letter, and Michael Geist ran the Task Force reports through LLM analysis and found the government’s summary softens expert warnings into an “illusion of consensus”. The parallel People’s Consultation on AI is accepting submissions until March 15, 2026.

Canada: CIFAR commits $1M to AI Alignment Project. Four new Canadian research projects (up to $165K/year each) under the UK AISI-led international alignment coalition, bringing total CAISI-supported projects to sixteen.

Research

The Scientist AI: Safe by Design, by Not Desiring. Fornasiere, Richardson, Gendron, Serban, and Bengio (LawZero) lay out the design principles behind the Scientist AI: a system that maximizes understanding while limiting affordances and goal-directedness. Two mechanisms: contextualization (transforming training data to distinguish facts from claims about facts) and consequence invariance (severing training signals about downstream effects of predictions). A free-form generator produces hypotheses and reasoning, held accountable by a neutral estimator — separating creative reasoning from safety-critical judgment.

Detoxifying LLMs via Representation Erasure-Based Preference Optimization. Mohammadi Sepahvand, Triantafillou, Larochelle, Precup, Roy, and Dziugaite propose REPO, which works in representation space rather than output space — forcing toxic representations to converge toward benign ones via token-level preference optimization. Prior methods leave harmful directions intact internally (detectable by linear probes); REPO erases them. State-of-the-art robustness against relearning attacks and jailbreak attempts.

Position: Causality is Key for Interpretability Claims to Generalise. Joshi (Mila/UdeM), Mueller, Klindt, Brendel, Reizinger, and Sridhar (Mila/UdeM) use Pearl’s causal hierarchy to distinguish what interpretability studies can actually justify — separating observational, interventional, and counterfactual claims. They argue that terms like “mechanism,” “feature,” and “circuit” refer to different estimands across papers, and that matching claims to appropriate evidence levels is necessary for findings to generalize and results to be comparable.

Rethinking Hallucinations. Ganesh, Shokri, and Farnadi (Mila/McGill) introduce “prompt multiplicity” — showing that on benchmarks like Med-HALT, over 50% of questions get different answers under different prompt structures, even while aggregate accuracy barely changes. Hallucination detectors, they argue, capture consistency rather than correctness. RAG helps but can introduce its own inconsistencies.

Operationalising the Superficial Alignment Hypothesis. Vergara-Browne, Patil, Titov, Reddy (Mila/McGill), Pimentel, and Mosbach ground the superficial alignment hypothesis in algorithmic information theory, proposing “task complexity” — the minimum program length needed to achieve a target performance on a task — as a metric derived from Kolmogorov complexity. They find program length drops from gigabytes to ~151 kilobytes when conditioned on a pre-trained model, providing evidence that post-training surfaces knowledge already present rather than adding substantial new capabilities.

Opportunities

OBVIA Emerging Scholar Scholarships Funding for research on AI’s societal impacts, from collegial to doctoral levels.

This newsletter is by HΩ, assisted by AI. Feedback welcome!