The Unjournal · Internal · April 2026

Response Synthesis

AI Safety & Governance Evaluation — aggregated from team responses

9 responses received

9 responses received

Josephine Schwab2026-04-25 David Manheim2026-04-21 Lorenzo Pacchiardi2026-04-17 Zhuoran Du2026-04-17 Uma Kalkar2026-04-17 Pía Garavaglia2026-04-16 Anirudh Tagat2026-04-16 Florian Habermacher2026-04-16 Andrew Kao2026-04-15

Synthesis generated 2026-06-07 08:18 UTC

Commenting on this synthesis — via Hypothes.is

The annotation panel is open on the right side of this page. Highlight any text and click Annotate to add a comment, question, or pushback — visible to the rest of the team.

Free Hypothes.is account needed (~30 seconds). Use Reply to respond to existing notes, or highlight new text to add your own.

Underlined phrases have supporting quotes from respondents — hover to read.

Strategic Direction

Research focus and pivotal questions

Responses cluster around two broad areas, with a clear tension between fit and impact.

The AI × economics/labor cluster draws the strongest convergence. Pacchiardi, Garavaglia, Du, Habermacher, Kao, and Tagat all point here, with framings ranging from macro-trend tracking à la Ord"AI-economics interface (eg impacts on labour market [Anthropic's work] or macro-trends à la Ord) seems the most relevant area"— Lorenzo Pacchiardi to workplace retraining, privacy, and AI literacy/inequality. Habermacher pushes for politically grounded regulatory frameworks for labor impacts"politically grounded (I mean 'realpolitik' type not 'purely ivory tower abstract') Regulatory Frameworks for dealing with labor impacts"— Florian Habermacher — though DR notes this needs concrete examples: what methods, what producers, what decision-relevance? Tagat observes there is already substantial labor-impacts work in the pipeline, which from UJ's curation standpoint is a feature not a bug. DR: "Why is it a 'warning'? That would seem to be a good thing. We're not the ones doing the research; we're the ones prioritizing it, curating it, commissioning its evaluation." Kao additionally suggests evaluating high-effort informal pieces like Trammell/Patel and Citrini"Phil Trammel and Dwarkesh Patel's post on Capital in the 22nd Century ... Citrini research's 2028 global intelligence crisis"— Andrew Kao. The open question DR flags: does this cluster reach the highest-GCR-impact issues, or stay comfortably mid-impact?

The governance/risk-parameters cluster (Schwab, Kalkar) is more GCR-adjacent but stretches UJ's comfort zone. Kalkar proposes evaluable questions on what governance mechanisms actually change frontier lab behavior"What governance mechanisms actually change frontier lab behavior, and under what conditions?"— Uma Kalkar and quantitative risk-parameter estimation (DR: "This seems quantitative, which is more comfortable to us"). Schwab pushes toward middle-power regulatory strategies and lessons from arms control with China"effective middle power strategies for AI regulation ... how has China cooperated in other risk domains? What treaties has it ratified?"— Josephine SchwabDR flags this as a possible move away from UJ's quantitative/empirical comfort zone toward IR-style analysis, and wants concrete examples before committing.

Staying timely

Convergent process proposals:

Fast-track with one evaluator. Manheim, Garavaglia, and Kao agree that securing >1 evaluator should not block publication. Manheim proposes inviting multiple reviewers and moving forward as soon as one is submitted"Fast track, 1+ evaluators, invites to more than 1 reviewer and move forward as soon as 1 is submitted"— David Manheim.

Pre-booking evaluator time. Pacchiardi suggests reserving reviewer time before paper selection, paying more for flexibility, and targeting specific cruxes requiring only ~1 hour of work"targeting very specific cruxes in a paper ... could be more efficient, for instance requiring reviewers to be useful with 1 hour of work only"— Lorenzo Pacchiardi. DR: "This seems like a great idea to me. Talked about it in the past, as well as giving evaluators some limited choice over which ones they would like to evaluate."

Commentary roundups. Kao proposes ACX/Zvi-style clustered comment digests as a complement to formal evaluation. Manheim adds fast peer review of normally non-peer-reviewable documents like model cards and technical reports"peer review of non-peer reviewable documents (Model cards, technical reports from think tanks, etc)"— David Manheim. DR notes this proposal sits awkwardly with the consensus against technical-safety expansion below.

AI assistance with human-in-loop. Garavaglia endorses AI-assisted briefings; Kalkar cautions against deeper automation for quality reasons. Tagat suggests an NBER-style pipeline with quicker turnaround.

Who to bring in

Specific names: Jonathan Prunty (CFI Cambridge) and Marko Tesic (DSIT) for labor markets (Pacchiardi). Networks: RAND TASP Fellows and GovAI Fellows (Kalkar, Manheim), BlueDot (Manheim), the AI Slack channel for crowdsourcing (Tagat), the ILO and AI developers themselves (Du). Tagat also recommends engaging AI/x-risk funders (Schmidt) to evaluate work they commission.

The Ord RL Scaling Evaluation

Should we still pursue it?

The picture has changed materially. Before Ord's April 29 reply, views ranged from "moment passed" (Schwab) and "not top priority" (Habermacher) to genuine on-the-fence (Pacchiardi, Manheim) to enthusiastic first-mover framing (Kalkar: a window of opportunity given Ord's reframe to 2027+"it may actually make sense to be a 'first-mover' and evaluate it now ... there's a window of opportunity to assess in light of the other RL papers"— Uma Kalkar).

Ord's April 29 reply substantially reopens the question. Three updates matter:

  1. Ord endorses formal analysis: "I'd be very happy for there to be more formal analysis of these questions (mine were very quick analyses)." He won't participate but doesn't object.
  2. There is an unwritten wrap-up essay Ord intended for the EA Forum Scaling Series but won't write — a gap a UJ synthesis could directly fill.
  3. He clarified his quantitative position more precisely than anywhere public: speed was ~60% compute / 40% other in the GPT-2/3/4 era; ~half the compute boost is going away, leaving ~30% + 40% = 70% of prior speed absent new compensating factors. This is not "RL has hit a fundamental limit" — it is a claim about declining relative contribution, with explicit room for algorithms, data, RL environments, or recursive self-improvement to compensate.

This gap between Ord's actual modest position and the "near effective limit" headline reading is itself worth documenting. The "moment passed" view should be updated accordingly.

If yes — what form?

Kalkar's proposal is the most concrete: a long-form EA Forum/PubPub post separating empirically established, contested, and open"long-form EA Forum/ PubPub post. It should cover the full evidence base ... with clear separation between what's empirically established, what's contested"— Uma Kalkar claims, with a draft circulated to GovAI (and Ord). Given Ord's endorsement, this now looks tractable. A crux-targeted, ~1-hour Pacchiardi-style review of the specific 60/40 → 30/40 claim could be a lightweight complement.

Points of apparent consensus

  • AI × economics/labor is squarely in UJ's wheelhouse and well-supplied with evaluable work.
  • Process should accept single-evaluator publication when needed; pre-booking and crux-targeting are promising.
  • Broad reluctance to expand into technical AI safety — though DR flags that rapid, credible evaluation of non-alignment technical safety work may genuinely be uncovered.
  • GovAI, RAND TASP, and the existing AI Slack are the right first-call networks.

Key tensions worth discussing

  • Wheelhouse vs. GCR-relevance: labor-market work fits comfortably but may not reach the highest-impact questions; governance and risk-parameter work is more GCR-relevant but stretches expertise.
  • Model cards/technical reports vs. anti-expansion consensus: Manheim's proposal arguably is technical-safety territory.
  • Speed vs. rigor: single-evaluator fast-track and AI assistance vs. Kalkar's quality concerns.
  • The Ord evaluation specifically: respondents leaned skeptical, but Ord's own endorsement and newly precise quantitative framing materially update the calculus — worth revisiting now.
Individual responses (9)
Josephine Schwab — ai_researcher 2026-04-25
Research focus and pivotal questions
The limits of middle power regulatory strategies and regulatory export in AI – effective middle power strategies for AI regulation and risk management. International cooperation on cross-border AI incidents. Global AI red lines convergence. Regulatory gaps and overlapping architectures. Learnings from AI-adjacent risk domains (arms control, biological and chemical prohibitions), especially in relation to AI superpowers – i.e. how has China cooperated in other risk domains? What treaties has it ratified? How can we leverage Chinese state approaches to transparency in AI governance?
Technical AI safety expansion
I would say stick to AI governance, yes.
Ord evaluation — pursue?
Moment passed.
David Manheim — field_specialist 2026-04-21
Research focus and pivotal questions
Very fast turnaround on forum posts and possibly peer review of non-peer reviewable documents (Model cards, technical reports from think tanks, etc)
Staying timely
Fast track, 1+ evaluators, invites to more than 1 reviewer and move forward as soon as 1 is submitted.
Who to bring in
Unsure
Technical AI safety expansion
No, very much opposed to trying to compete on this.
Ord evaluation — pursue?
Meh, still not sure we have a good place for a take on this.
Others to involve
GovAI folks, Maybe Bluedot?
Availability
Prefer asyc, at ISO conference full week of Apr 21
Lorenzo Pacchiardi — ai_researcher 2026-04-17
Research focus and pivotal questions
I agree that AI-economics interface (eg impacts on labour market [Anthropic's work] or macro-trends à la Ord) seems the most relevant area for the Unjournal to focus on.
Staying timely
What about "pre-booking" evaluator's time before deciding what paper to review so that you can choose a paper and are sure that someone will be able to look at that in a timely manner (of course depends on having reviewers who are flexible enough in topics they can look at but maybe this is possible for macro-impact tracking works?)

Otherwise, simply paying more allows people to drop other priorities.

Also agree that targeting very specific cruxes in a paper (eg highlighted from the evaluation explorer) could be more efficient, for instance requiring reviewers to be useful with 1 hour of work only
Who to bring in
The following two people are knowledgeable about AI's impact on labour market
- Jonathan Prunty (Leverhulme Centre for the Future of Intelligence, University of Cambridge)
- Marko Tesic (DSIT, UK government)
Technical AI safety expansion
I think it's hard to find useful niche considering Alignment Journal and traditional AI conferences
Ord evaluation — pursue?
I have given my take on this before and I am still on the fence on the utility of this.
Availability
quite busy next couple of weeks, would hold off this unless super useful
Zhuoran Du 2026-04-17
Research focus and pivotal questions
Perhaps the effect of AI on labour welfare, AI on privacy and data security, AI on organisational reform, and AI literacy and inequality, etc.
Who to bring in
International Labour Organisation (for AI with labour welfare)
AI developers are important
Uma Kalkar — ai_researcher 2026-04-17
Research focus and pivotal questions
There's a gap in regulatory interventions, comparative international governance analysis, and estimates of risk parameters for AI safety and governance. More research into strong external scrutiny could help give Unjournal a competitive edge.

Examples of possible RQs:
-- What governance mechanisms actually change frontier lab behavior, and under what conditions?
-- How do capability thresholds translate into tractable regulatory triggers?
-- What predicts adoption vs. resistance for cross-national diffusion of AI governance frameworks?
Staying timely
The LLM eval + prioritization tool sound like good ways to (a) decipher what pieces are relevant/urgent and (b) verify/check their claims with humans in the loop across the process. That tension of rigor vs. speed will always be there; I don't think I would suggest automating the process more for fear of possible impacting standards/quality.
Who to bring in
-- Reach out to the RAND TASP Fellows
-- Consider the current/former GovAI Fellows
(These cohorts will already be specializing/focusing on elements of AI safety and governance research so it may be easier to get them to help review)
Technical AI safety expansion
I would focus on nailing down the safety/governance angle first, then look at expansion (but I'm wary of that because there is already good work by ARC Evals and METR/Epoch/etc).
Ord evaluation — pursue?
Given that the empirical picture has become messier and more contested, it may actually make sense to be a "first-mover" and evaluate it now. Given that Toby Ord's reframed his predictions to 2027+, there's a window of opportunity to assess in light of the other RL papers.
Ord evaluation — what form
Strongly suggest a long-form EA Forum/ PubPub post. It should cover the full evidence base listed above with clear separation between what's empirically established, what's contested, and where further research is needed.

Would definitely circulate a first draft for edits across the GovAI community (Toby Ord included).
Availability
async only in mid-May
Pía Garavaglia — economist 2026-04-16
Research focus and pivotal questions
Impact in the workplace and labour markets, specifically addressing how to shift and train workers' skills and how to develop strategic regulatory frameworks
Staying timely
One evaluator + AI assisted briefing
Technical AI safety expansion
I think it's important to analyze theoretical frameworks without missing on the actual application/testing. Dedicated evaluators could work better towards that aim.
Anirudh Tagat — uj_team 2026-04-16
Research focus and pivotal questions
The research should ideally be AI-related or impacts of AI in economic and social domains. There is already a lot of ongoing work related to labor market impacts.

I think engage with funders of AI and catastrophic risk, alignment etc. (I think Schmidt is interested, but are only funding research right now) -- it might be useful to reach out to them to provide an open evaluation of the work that they are commissioning / providing grants to. This way we also get to engage with AI researchers working at the forefront (at least in economics, broadly).
Staying timely
Look for papers coming out of top research centres around AI and economic/social impacts. Thinking of an NBER-track type evaluation pipeline, but with much quicker turnarounds.
Who to bring in
Sending out the mailer to everyone in the AI slack channel is a good idea, especially for crowdsourcing ideas on this. I think if we can locate stakeholders through these existing networks, that would be helpful.
Technical AI safety expansion
I am not well versed in this domain, but I would recommend sticking to our domain expertise (in terms of work we have already evaluated).
Florian Habermacher — economist 2026-04-16
Research focus and pivotal questions
Neglected ones! Comes to mind:
- politically grounded (I mean 'realpolitik' type not 'purely ivory tower abstract') Regulatory Frameworks for dealing with labor impacts (and maybe with social impacts more broadly)
Ord evaluation — pursue?
I'm not super informed but sounds like from general perspective it might be not top priority.
Availability
Every evening from 6 PM CEST until midnight CEST (most days)
Every Sat and Sun from 8 AM CEST until midnight CEST (most days)
Mon-Fri CET office hours (8 AM CEST until 6 PM CEST): variable availability (50% available 50% unavailable)
Andrew Kao — field_specialist 2026-04-15
Research focus and pivotal questions
The typical things in this space that Unjournal is already evaluating (e.g., new working papers in social science) are great.
But also worth considering evaluations of slightly less formal (but still high effort) pieces: as two examples, Phil Trammel and Dwarkesh Patel's post on Capital in the 22nd Century https://substack.com/@philiptrammell/p-182789127 and Citrini research's 2028 global intelligence crisis https://www.citriniresearch.com/p/2028gic
Staying timely
AI assistance is obviously helpful. I think having multiple evaluators is still valuable, but if it seems difficult to secure >1 opinion then it should not be an obstacle to publishing an evaluation.

Separately, I wonder if something along the lines of ACX/Zvi Moshowitz style 'commentary roundups' that presents clusters of comments made by others online + light discussion could be useful. This would be as a complement, not substitute, to existing eval effort. For this, I think the relevant question is whether the typical reader of an evaluation is plugged into discourse enough to already know the prevailing sentiment/feedback towards a piece or not.
Availability
Likely async only, next few weeks very busy.

← Back to discussion page