🟩 Anthropic and OpenAI release new models, US-Iran talks continue, illegal biolab in the US, nuclear arms treaty expires | Global Risks Weekly Roundup #6/2026
Executive summary
Geopolitics: American and Iranian negotiators met for nuclear talks in Oman. No agreement was reached, but both sides said the talks will continue. A nuclear arms control treaty between the US and Russia expired.
Will Iran’s current regime and the US reach any agreement around Iran’s nuclear program before July 1, 2026? Forecasters believe there’s a 17% (5.0% to 32%) probability thereof.
What is the chance that any of the US, China or Russia will start any negotiations towards a new nuclear arms treaty by the end of 2026? Forecasters estimate around even odds, 52% (40% to 65%).
What is the chance that a new country will acquire nuclear weapons by 2030? Forecasters give a 5.3% (1.0% to 15%) chance to this outcome.
Technology and AI: Anthropic and OpenAI released their most powerful models to date. Both companies said that the models were mostly built by AIs, and that some of their benchmarks that were designed to detect dangerous capabilities have been saturated. METR assessed GPT-5.2 (high) to be SOTA on its time horizon task suite.
Will METR assess any AI to have a time horizon of 100 hours by the end of 2026? Forecasters give an aggregate probability of 7.6% (3% to 20%).
Economics: The US and India reached a trade deal. Both countries will slash tariffs on each other’s exports, and India agreed to stop buying Russian oil.
Technology and artificial intelligence
Anthropic released their latest model, Claude Opus 4.6. OpenAI, meanwhile, released their latest model, GPT-5.3-Codex. Both companies claim that their AI models mostly built these latest models.
Anthropic said that some of their benchmarks for determining dangerous capabilities have now been saturated. For AI R&D capabilities, this meant that they instead relied on a survey of 16 employees to determine whether it could automate the work of an entry-level Anthropic researcher with three months of scaffolding and tooling improvements. OpenAI similarly said that “we cannot rule out” the possibility that GPT-5.3-Codex poses a “High” cyber risk.
The UK Government’s AI Security Institute also conducted red teaming of both models and found no instances of Claude Opus 4.6 engaging in “research sabotage”. Apollo Research, on the other hand, said it observed high levels of awareness that it was being evaluated from Claude Opus 4.6, and therefore declined to provide a formal assessment of the model.
METR’s Chris Painter said:
My bio says I work on AGI preparedness, so I want to clarify:
We are not prepared.
Over the last year, dangerous capability evaluations have moved into a state where it’s difficult to find any Q&A benchmark that models don’t saturate.
METR evaluated GPT-5.2 (high) on multi-step software and reasoning tasks. The organization found that the model is state-of-the-art on both their 50% and 80% time-horizon suites. However, forecasters debate the degree to which this will translate into real-world outcomes, including in the economic and catastrophic risk domains. They believe there’s a 7.6% (3% to 20%) chance that METR will assess any model to have a 50% time horizon of at least 100 hours by the end of 2026. This might happen if there are nonlinearities in time horizon growth once we reach time horizons of a full day, or due to better understood scaffolding.
The 2026 International AI Safety Report was published. Backed by over 30 countries and international organizations, the report finds that real-world evidence for several different risk categories is growing, including cyberthreats, biological and chemical threats, and loss of control risks. Notably, the US withheld support from this report.
Spain hosted a summit on the responsible use of AI in the military, but only 35 out of 85 participating countries signed the joint declaration. The US and China didn’t.
Large Reasoning Models are good at finding jailbreaks for other such models. This is interesting because much as defensive measures might increase with model capability, so might the ability to attack those measures. Here is a more casual treatment of the topic, with Gemini jailbreaking Opus.
And: The UK and Microsoft will work together to build a deepfake detection system. Banks are struggling to cope with faked AI-generated KYC documents. OpenAI is reportedly unsatisfied with some of Nvidia’s latest AI chips and has been seeking alternatives. AI models can rent a human, and there is generally a shady ecosystem arising for AI models. Baidu rolled out fully driverless taxis in Dubai.
Geopolitics
Middle East
Negotiators from the US and Iran met in Oman for negotiations over Iran’s nuclear program. Nothing conclusive was agreed, but Trump said that the talks were “very good”, and both sides said that they would continue to talk. The US military buildup in the region continues, with a focus on moving Patriot and THAAD air defense systems into the region. Forecasters believe there’s a 17% (5.0% to 32%) probability that the US and Iran’s current regime will reach a new agreement on Iran’s nuclear program before July 2026.
Europe
The US-Russia New START Treaty expired. Negotiated by the Obama and Medvedev administrations in 2010 and extended by the Biden and Putin administrations in 2021, it aimed to cap each nation’s strategic nuclear arsenals.
The US called for negotiations with Russia and China on a new treaty, while accusing China of carrying out a secret nuclear test in 2020.
Forecasters think there’s a 52% (40% to 65%) chance that negotiations on a new nuclear arms control treaty involving the US, Russia or China will begin before 2027, and a 5.3% (1.0% to 15%) chance that at least one country that currently doesn’t have nuclear weapons will possess them by 2030.
Ukraine’s President Zelensky says that Trump gave Ukraine and Russia a June deadline to reach an agreement to end the war. Forecasters are sceptical that an agreement will be reached by then.
Asia
China warned that US arms sales to Taiwan could jeopardise Trump’s visit to Beijing in April, though forecasters generally think the meeting will go ahead as planned. President Xi also said that Taiwan is the “most important” issue in a phone call with Donald Trump.
A series of coordinated attacks across Balochistan in Pakistan, claimed by the Balochistan Liberation Army (BLA), killed dozens after assailants struck hospitals, schools, banks and markets in nine cities. The violence comes amid rising ethnic unrest in the border regions of Pakistan, Iran andAfghanistan. Taliban officials have expressed support for Balochi separatists.
Biorisk
Investigators found a biological lab at a home in Las Vegas, and said that it could be linked to a similar illegal facility found in California. The California lab was allegedly run by a Chinese citizen who is currently in federal custody, and contained “pathogen-labeled containers” with labels such as “dengue fever”, “HIV” and “malaria,” along with hundreds of mice.
Antibodies to H5N1 bird flu have been detected in a dairy herd in the Netherlands.
In Bangladesh, a fruit bat virus, Pteropine orthoreovirus, was found in 5 human throat swabs from patients with severe respiratory illnesses or encephalitis collected from December 2022 to March 2023. The patients were initially thought to be infected with Nipah virus; one patient died. This virus is now added to the list of zoonotic fruit bat viruses that can cause serious disease in humans.
Economy
The US and India struck a trade deal. Total US tariffs on imports from India will be cut from 50% to 18%, while India agreed to stop buying Russian oil and to slash its own tariffs on imports from the United States. India has long been viewed as highly protectionist, but has in the last year reached trade agreements with Britain, the European Union and the United States.
US software and services stocks lost $1T last week, but partially rebounded, as investors worry about how AI would disrupt these industries.
Nature and Climate
Solar active region 4366 threw off another X-class flare but has calmed down over the past week. Additional X-class and smaller solar flares are possible from the region, but the risk that the region will produce a solar flare that triggers a dangerous coronal mass ejection (CME) is now almost zero.






Wikipedia tells me that the last official nuclear test (not including North Korea) occurred in the 90s. Based on the single-sentence mention of the U.S. accusing China of testing in 2020, Sentinel doesn't seem to think it's a big deal. That being said, non-U.S. orgs have said they didn't detect any sort of test, so who knows.
To my inexperienced self, it seems like a not-small deal that a major nation is carrying out nuclear tests in secret given the Comprehensive Nuclear-Test-Ban Treaty of 1996. I recognize that both the U.S. and China did not ratify it, so perhaps it's just a "hey look at us we're committed to peace on paper kind of but not actually because we're continuing development".
Is that a fair assessment? Do you place any probability on the China test being legitimate or the U.S. making the accusation for some other strategic reason?