AI Agent Learns to Weigh the Cost of Rule Violation Before Decision

A new approach to designing autonomous AI agents teaches machines to consider in advance what follows from breaking rules – and when a violation might still be justified. The work published by Vineel Tummala and Daniela Inclezan on ArXiv offers a framework where AI not only tries to adhere to rules but also assesses the penalties associated with them.

The underlying idea is of so-called policy-aware agents. They operate in an environment with various regulations, obligations, and rights, much like humans operate within the framework of laws and rules. Traditionally, research has focused on how to keep AI strictly within the boundaries of rules. However, Tummala and Inclezan's framework considers situations where breaking rules might be necessary to achieve an important goal, such as in high-stakes decisions.

The research extends the Authorization and Obligation Policy Language (AOPL) developed by Gelfond and Lobo by adding explicit penalties for rule violations. For decision-making, it uses Answer Set Programming, a part of logical programming, which allows the agent to calculate the consequences of different courses of action.

The approach is said to ensure that policies, or sets of rules, are well-formed, take into account the priorities between different rules, and improve explainability. The agent can justify its actions by showing what penalties it anticipated and why it was still worth breaking a particular rule.

According to the authors, such modeling of not just perfect obedience but also realistic rule-breaking behavior can help policymakers simulate human-like decision-making before the introduction of new rules.

Source: Autonomous Agents and Policy Compliance: A Framework for Reasoning About Penalties, ArXiv (AI).