Incremental AI Risks from Proxy-Simulations

Menou, Kristen

Published December 28, 2023 | Version v1

Publication Metadata-only

Incremental AI Risks from Proxy-Simulations

Menou, Kristen

Contributors

Others:

Numerical simulations are versatile predictive tools that permit explorations of complex systems. The ability of LLM agents to simulate real-world scenarios will expand the AI risk landscape. In the proxysimulation threat model, a user (or a deceptively aligned AI) can obfuscate the goal behind simulationbased predictions by leveraging the generalizability of simulation tools. Three highly idealized proxysimulation examples are presented that illustrate how damage, casualties, and concealment of illegal activities can be planned for, in obfuscation. This approach bypasses existing alignment and safety filters (GPT4, Claude2 and LLama2). AI-enabled simulations facilitate access to prediction-based planning that is not otherwise readily available. To the extent that goal obfuscation is possible, this increases AI risk.

Additional details

URL: https://hal.science/hal-04365629
URN: urn:oai:HAL:hal-04365629v1

Origin repository: UNICA

	All versions	This version
Views	0	0
Downloads	0	0
Data volume	0 Bytes	0 Bytes

Incremental AI Risks from Proxy-Simulations

Creators

Contributors

Others:

Description

Additional details

Identifiers

Origin repository