Algorithmic pricing is not new, but newer software programs are much more “autonomous” than their precursors. Powered by Artificial Intelligence (AI), pricing algorithms can develop their pricing strategies from scratch, engaging in active experimentation and adapting to the evolving environment. In this learning process, they require little or no external guidance. Taken together with the diffusion and evolution of pricing algorithms, these developments raise various issues for competition policy, particularly as regards tacit collusion.
While so far no one has brought an antitrust case against autonomously colluding algorithms, antitrust agencies are discussing the problem seriously. In addition to the OECD, competition authorities in the US, Canada and UK have held roundtable or issued papers on the topic.
This paper, available here, tries to understand whether tacit collusion arising from AI should be a real concern by looking, for the first time, at the emergence of collusive strategies among autonomous pricing algorithms. It takes an experimental approach, by constructing AI pricing agents and letting them interact repeatedly in computer-simulated marketplaces.
While any conclusions are necessarily tentative at this stage, the paper’s findings suggest that algorithmic collusion is not just a remote theoretical possibility. The results indicate that even relatively simple pricing algorithms systematically learn to play collusive strategies – without being designed or instructed to collude, without communicating with one another, and without prior knowledge of the environment in which they operate.
What is more, the paper not only shows that pricing algorithms learn to collude, but further suggests they may be better than humans at colluding tacitly. The experimental literature has consistently found that human subjects find it hard to coordinate without explicit communication. These algorithms, by contrast, display a stubborn propensity to collude without needing to communicate. These results raise the issue of whether current law and policy on tacit collusion is still adequate in the age of AI.
The paper is structured as follows:
Section 2 describes the class of Q-learning algorithms used in this paper.
All reinforcement learning algorithms share a common structure. They adapt their behaviour to past experience by adopting actions that have proven successful more frequently and unsuccessful ones less frequently. In this way, the algorithms learn to pursue optimal strategies with little or no prior knowledge of the particular problem at hand.
Beyond this basic structure, reinforcement learning comes in many different varieties. For the purposes of collusion, algorithms must have memory and be able to keep track of rivals’ past actions, since memoryless agents are unable to punish rivals’ past defections and thus cannot genuinely collude. Within the set of algorithms with memory, the authors selected Q-learning algorithms for their model for a number of reasons. First, Q-learning algorithms are designed expressly to maximise the present value of a flow of rewards in repeated choice contexts. Secondly, they are highly popular among computer scientists. Thirdly, they are simple and can be fully characterised by few parameters, the economic interpretation of which is clear. Lastly, Q-learning algorithms share the same architecture as the more sophisticated programs that have recently obtained spectacular successes, achieving superhuman performances in tasks such as playing Go, Atari video-games or chess.
The authors provide a history of the development of Q-learning algorithms and a technical overview of how they operate. I am not going to review this here, but if you are interested you know where to find this discussion. What matters here is that the authors chose to use simple, “tabular” Q-learning algorithms because they have led to the most spectacular successes in complex multi-agent environments. Furthermore, while deep learning approaches that rely on iterative methods and neural network structures seem both settled and promising, the simplicity of Q-learning keeps possibly arbitrary modelling choices to a minimum.
Section 3 reviews the relevant economics and computer science literature.
A few papers have analysed how Q-learning algorithms perform in oligopoly settings, competing either against fixed-strategy opponents or against other Q-learners. Since computer scientists dominate the literature, the primary focus has been on algorithms’ performance in terms of ability to converge, speed of convergence and payoff. Relatively few studies have addressed the issue of whether Q-learning algorithms can learn to collude, and none provides clear evidence on this point.
One group of studies analyses pricing games specifically. All the papers in this group consider a model of staggered pricing where the two firms alternate in moving and committing to a price level for two periods. These papers further posit a behaviour according to which a firm can condition its price only on rivals’ current prices and not on past history (including the firms’ own previous prices). In this setting, several works have found that algorithms adopt supra-competitive prices, at least to some extent.
Another group of papers has studied repeated Cournot games regarding strategic substitutes. Here, all players act simultaneously and have bounded recall (i.e. limited memory). They, too, find that outcomes are far from fully competitive: specifically, the average profit gain in their experiment increased by around 60%. When the action space is reduced to two feasible actions, both Bertrand and Cournot games collapse to a sort of prisoner’s dilemma, the third group of studies discussed in this paper.
There is a substantial literature on reinforcement learning in the context of repeated prisoner’s dilemma, which sometimes finds a significant amount of cooperation. However, prisoner’s dilemma is a special type of game, in that there is only one way to cooperate. It is often argued that tacit collusion is hard to achieve in practice because there are many supra-competitive prices, and players may find it difficult to coordinate in the absence of explicit communication; analyses based on the prisoner’s dilemma cannot address this concern.
Section 4 provides a detailed description of the economic environments where the simulations are performed.
The authors constructed a number of Q-learning algorithms and let them interact in a repeated Bertrand oligopoly setting (i.e. price competition at similar quality levels) where the algorithms choose the price. They selected a model of price competition with logit demand and constant marginal costs. This model has several parameters that can be modified to analyse factors such as vertical and horizontal product differentiation, or the level of demand. The authors also posit that the algorithm will remember the whole experiment’s timeframe, since they assume that perfect monitoring is reasonable for markets where algorithmic pricing is used.
The model begins with the algorithms choosing prices in purely random fashion. As time passes, the algorithms start taking the profit maximising option given available information more and more frequently. For each set of parameters, an experiment consists of 1,000 sessions. In each session, agents play against the same opponents for a large number of periods: up to a billion plays, or until convergent behaviour – i.e. the algorithms arrived at the same result, akin to tacit collusion – is achieved.
A challenge in analysing the algorithms behaviour is that, for strategic problems like this, there is no manner under computer science to determine whether the algorithms are actually converging or, if they converge, whether they converge to a Nash equilibrium. On the other hand, convergence can be verified in practice. In the light of this, the authors use the following criterion to determine whether the algorithms are adopting converging behaviour: convergence is deemed to occur if for each player the optimal strategy does not change for 100,000 consecutive periods.
Section 5 reports the main results, while Section 6 performs a number of robustness checks.
In more than 99.9 % of the sessions the authors obtained convergence – i.e. the algorithms arrived at the same result, which amounts to tacit collusion. Furthermore, prices are quite stable upon convergence. In most cases, and proportionally to the level of experimentation the algorithms engage in, the algorithms converged to an optimal strategy, i.e. Nash equilibrium in a repeated game with bounded recall. This suggests that the algorithms will learn to play Nash equilibria games, provided that they are given a fair chance to experiment and learn.
A key implication is that once the learning process is complete, the algorithms cannot be exploited, no matter how smart the opponent is. Once they are trained, the algorithms systematically charge supra-competitive prices and earn supra-competitive profits, reflecting partial collusion, because they learn systematically to raise prices. In effect, prices are rarely as high as monopoly prices, but are almost always higher than in a Bertrand-Nash equilibrium. Price dispersion is low, and firms tend to price symmetrically.
The authors then test whether this result was achieved through collusion, understood as “a situation in which firms use a reward/punishment scheme to coordinate their behaviour for the purpose of producing a supracompetitive outcome.” In other words, what is crucial is not the level of profits as such, but the way the supra-competitive result is achieved. Even extremely high profit levels may be regarded as collusive only insofar as deviations that are profitable in the short run would trigger a punishment.
In short, the authors find not only that algorithmic competition leads to supra-competitive profits, but also that such profits are achieved in collusion-like ways. First, the fact that profit gain tends to increase with the amount of plays suggests that Q-learning algorithms learn to collude. Secondly, in economic environments where collusion cannot arise in equilibrium, the algorithms learn instead to set competitive prices. Lastly, going back to settings where collusion is possible, defections from the collusive price is punished, and the punishment makes the defections unprofitable (‘off-path punishments that have a finite duration, with a slow return to the pre-deviation price’). Furthermore, the algorithms learn to achieve tacit collusion outcomes with no prior knowledge of the environment in which they operate. In addition, they leave no trace whatever of concerted action: they do not communicate with one another, nor have they been designed or instructed to collude.
From the standpoint of competition policy, these findings should clearly ring a bell. They suggest that with the advent of Artificial Intelligence and algorithmic pricing, tacit collusion may become more prevalent, heightening the risk that tolerant antitrust policy may produce too many false negatives and so possibly calling for policy adjustments.
The relevance of this paper’s results are obvious, assuming they are robust enough. The question then turns on what to do to address the challenges created by algorithmic tacit collusion, particularly given that tacit collusion is typically not prohibited under competition rules since there are no communications, agreements or concerted practices to sanction.
This is the topic discussed in the paper below.