Why we’re convening this now
There are two related questions we want to think through together across one or more Zoom calls plus a shared document. This page collects async input and scheduling — all responses will feed into a shared discussion doc (mark anything sensitive as such).
Urgency: The Unjournal is navigating active grant and funding timelines, and moving forward meaningfully on AI-relevant evaluation work is both a strategic and operational priority. We also want to take advantage of the current moment in which this area is highly relevant to funders and policymakers alike.
Topic 1 — Where should the Unjournal focus? More than a year into tracking AI safety, governance, and social/economic impacts as a priority area, we haven’t committed to a clear strategy. What is our comparative advantage, what research should we evaluate, what pivotal questions should we foreground, and how do we stay timely?
Topic 2 — The Ord RL Scaling evaluation. In October 2025, Toby Ord published “How Well Does RL Scale?” (EA Forum), making high-profile quantitative claims resting on a chain of visual inference from redacted-axis charts — a kind of argument we’re well-placed to audit. The rapid-response window has passed, but no independent quantitative audit of his chain exists yet. Do we still pursue this?
What Ord claimed, and what’s changed since
Ord inferred roughly 106× inefficiency for RL post-training vs. pre-training from slope comparisons on redacted-axis charts, concluding we may be near an “effective limit.” The EA Forum Scaling Series (Feb 2026) raised its profile but produced no independent reconstruction of his numerical chain. Since then: BroRL introduced rollout scaling that appears to break RL plateaus; ScaleRL showed sigmoidal rather than power-law behavior with movable asymptotes; the Qwen team confirmed recipe-dependent scaling; David Manheim published a critique arguing the “Scaling Paradox” dissolves mathematically. Ord has since framed his predictions as being about “appreciable slowing in 2027+.”
As responses come in: running synthesis → (auto-generated from submissions, updated on refresh).
See also: our research prioritization prototype — filter for cause area: AI or Catastrophic risks to see the landscape of AI-relevant work we’ve been tracking. And this internal discussion document has prior thinking. We’ll be building a shared discussion doc for this shortly — input from this form will be incorporated.
Our most relevant prior evaluations in this space
Read these on the Global Catastrophic Risks & Consequences of AI hub:
- Towards best practices in AGI safety and governance: A survey of expert opinion Applied stream
- The Returns to Science In the Presence of Technological Risks Applied stream
- Forecasting Existential Risks: Evidence from a Long-Run Forecasting Tournament Applied stream
- Existential Risk and Growth
- Artificial Intelligence and Economic Growth
Who’s been invited
Core team with AI governance/safety remit, plus evaluator candidates for the RL scaling work:
If you received this link and aren’t on the list, you’re still very welcome — note your name and affiliation in the form above.