A New Method to Distribute Credit Fairly Among AI Search Sources

The new MaxShapley method aims to solve how the credit for answers generated by generative search engines can be fairly distributed among different data sources. When a search engine no longer just lists links, but a large language model compiles an answer from multiple documents, the question arises of who should be paid and how much.

The concept is based on the Shapley value, familiar from economics, which describes how much each player—in this case, an individual document—contributes to the collective outcome. However, calculating Shapley values precisely is practically very expensive, as computation time increases exponentially with the number of sources.

MaxShapley is a special case of the Shapley value that utilizes a so-called decomposable max-sum utility function. Thanks to this, the contribution of sources can be calculated in linear time relative to the number of documents, dramatically speeding up the computation compared to the traditional Shapley approach. The method is specifically designed for AI searches that use retrieval-augmented generation: first, multiple documents are retrieved, then a language model combines them into an answer.

MaxShapley was tested on three multi-step question-answer datasets known as HotPotQA, MuSiQUE, and MS MARCO. In these tasks, finding an answer requires combining multiple sources, making them a natural testbed for measuring source contribution.

The research shows that it is possible to develop a computationally efficient method that seriously considers the contribution of content creators in generative search engines. Such mechanisms could be central if a sustainable economic model for AI-based search is to be built.

Source: MaxShapley: Towards Incentive-compatible Generative Search with Fair Context Attribution, ArXiv (AI).