AI Safety & Governance Evaluation — Unjournal Team Discussion

Context — April 2026

Why we’re convening this now

There are two related questions we want to think through together across one or more Zoom calls plus a shared document. This page collects async input and scheduling — all responses will feed into a shared discussion doc (mark anything sensitive as such).

Urgency: The Unjournal is navigating active grant and funding timelines, and moving forward meaningfully on AI-relevant evaluation work is both a strategic and operational priority. We also want to take advantage of the current moment in which this area is highly relevant to funders and policymakers alike.

Topic 1 — Where should the Unjournal focus? More than a year into tracking AI safety, governance, and social/economic impacts as a priority area, we haven’t committed to a clear strategy. What is our comparative advantage, what research should we evaluate, what pivotal questions should we foreground, and how do we stay timely?

Topic 2 — The Ord RL Scaling evaluation. In October 2025, Toby Ord published “How Well Does RL Scale?” (EA Forum), making high-profile quantitative claims resting on a chain of visual inference from redacted-axis charts — a kind of argument we’re well-placed to audit. The rapid-response window has passed, but no independent quantitative audit of his chain exists yet. Do we still pursue this?

What Ord claimed, and what’s changed since

Ord inferred roughly 10⁶× inefficiency for RL post-training vs. pre-training from slope comparisons on redacted-axis charts, concluding we may be near an “effective limit.” The EA Forum Scaling Series (Feb 2026) raised its profile but produced no independent reconstruction of his numerical chain. Since then: BroRL introduced rollout scaling that appears to break RL plateaus; ScaleRL showed sigmoidal rather than power-law behavior with movable asymptotes; the Qwen team confirmed recipe-dependent scaling; David Manheim published a critique arguing the “Scaling Paradox” dissolves mathematically. Ord has since framed his predictions as being about “appreciable slowing in 2027+.”

As responses come in: running synthesis → (auto-generated from submissions, updated on refresh).

See also: our research prioritization prototype — filter for cause area: AI or Catastrophic risks to see the landscape of AI-relevant work we’ve been tracking. And this internal discussion document has prior thinking. We’ll be building a shared discussion doc for this shortly — input from this form will be incorporated.

Our most relevant prior evaluations in this space

Read these on the Global Catastrophic Risks & Consequences of AI hub:

Towards best practices in AGI safety and governance: A survey of expert opinion Applied stream
The Returns to Science In the Presence of Technological Risks Applied stream
Forecasting Existential Risks: Evidence from a Long-Run Forecasting Tournament Applied stream
Existential Risk and Growth
Artificial Intelligence and Economic Growth

Only name and email are required — all discussion questions are optional, and will be incorporated into the shared discussion space. Skip anything that’s not relevant to you.

About You & Availability

Who are you, and when can you meet?

Name *

Email *

Affiliation (optional)

Your primary angle on this (optional)

A live call may be organised in May. All input — live or async — gets incorporated. Async responses are very welcome.

Availability for a call (if any) — or note that you prefer async (optional)

Topic 1 — Strategic Direction

Where can the Unjournal add value in AI safety, governance, and impact?

All questions in this section are optional — answer as many or as few as are relevant to you.

What types of research in this space should we evaluate — and what pivotal questions should we focus on? Who should we engage with? i

How do we stay timely enough to matter? Reviewer time is scarce and the relevant window for any given piece of AI research can be short. i

AI-assisted evaluation tools: Our LLM evaluation explorer demonstrates automated claim extraction, cognition lookup, and evidence mapping. The prioritization prototype supports triage across our paper pipeline. Could these form part of a faster, lighter AI-governance evaluation track?

Who should we bring into this project — as evaluators, field specialists, or discussants?

Feel free to suggest names, institutions, or types of expertise. We have strong applicants in the AI governance / safety space and want to be intentional about onboarding.

Should we consider expanding or pivoting to cover more technical AI safety research — without duplicating what the Alignment Journal and similar venues already do? i

Topic 2 — The Ord RL Scaling Evaluation

Should we still evaluate “How Well Does RL Scale?”

Both questions in this section are optional.

Quick context (recap from background above)

Toby Ord’s October 2025 post made a chain of quantitative inferences (slope inference from redacted charts → ~10⁶× inefficiency → “near effective limit” for RL post-training). The EA Forum Scaling Series week in Feb 2026 raised its profile but produced no independent audit of that chain. Since then the empirical picture has shifted significantly (BroRL, ScaleRL, Qwen, Manheim critique). Ord has since framed his predictions as being about 2027+.

A1 — Should we still pursue an evaluation of this work — given six months have passed and the field has moved substantially? Or has the moment passed?

If yes — what form should it take, and who’s in the loop?

A2–A3 — What form should it take, and who’s involved?

On form: narrow audit of Ord’s specific numerical chain vs. broader synthesis of the RL scaling evidence base (Ord + ScaleRL + BroRL + Qwen + Manheim)? Full evaluation with stamped ratings, or a rapid-response EA Forum / PubPub post? Shared public doc with incremental feedback? On who: should Ord be in the loop as a discussant/respondent, or kept at arm’s length for independence?

Other (optional)

Suggest others who should join this discussion (optional)

Researchers, evaluators, forecasters, or policymakers with relevant expertise. Include email if you have it.

Other thoughts or agenda items (optional)

Keep my responses private — include me in availability count and send David my answers, but do not include my text in the shared synthesis or show responses to other participants.

Responses will be incorporated into the shared discussion document unless you check the box above.

Participants

Who’s been invited

Core team with AI governance/safety remit, plus evaluator candidates for the RL scaling work:

Lorenzo Pacchiardi David Manheim Tristan Williams Andrei Potlogea Gavin Taylor Andrew Kao Seth Benzell Valentin Klotzbücher Alex Foster Florian Habermacher Uma Kalkar Anirudh Tagat Elise Racine Zhuoran Du Arturo Macias Anca Hanea

If you received this link and aren’t on the list, you’re still very welcome — note your name and affiliation in the form above.