Arbitrage Trading with the Floyd-Warshall Algorithm

Introduction

In the following post, I want to showcase the possibilty of identifying arbitrage opportunities in foreign exchange markets using the Floyd-Warshall algorithm. The basic idea is to represent all possible trading pairs of different currencies in a weighted directed graph and then (after performing an additional transformation of arc weights) use the algorithm to identify whether the graph contains any negative cycles. If this is the case, we managed to find a trading sequence that yields a net positive.

I really like this little programming exercise for two reasons:
Firstly, the implementation of the algorithm is straight forward and very close to the pseudo code provided below, especially when using Python. There is only little programming knowledge needed to follow along.
Secondly, students at our business school tend to gravitate towards the more glamorous areas of a business degree, e.g. finance, consulting, entrepreneurship, and show only little love for operations research. This little thought experiment however highlights the fact that operations research approaches are not only limited to problems stemming from supply chain management, scheduling, or routing but can be applied to a wider range of problems.
And yes, even though I do not explicitly use that phrase in my lectures, the idea of a “free money glitch” is enticing and catches the audience’s attention accordingly.

Floyd-Warshall Algorithm

The Floyd-Warshall algorithm solves the “all-pairs shortest path problem” of a (directed or undirected) weighted graph. Given a set of vertices \(N\) and a set of arcs \(A\), the algorithm terminates with an \(\vert N \vert \times \vert N \vert\) matrix \(d\) in which entries \(d[i,j]\) indicate the shortest distance from vertex \(i\) to vertex \(j\). We will also have to keep track of the immediate predecessors in another \(\vert N \vert \times \vert N \vert\) matrix \(p\): Here, the entry \(p[i,j]\) represents the immediate predecessor of \(j\) on our shortest path from \(i\) to \(j\). After the algorithm terminates, the actual shortest path from \(i\) to \(j\) can be retrieved recursively. Going into all the intricate details of the algorithm is beyond the scope of this post, a more in-depth explanation can be found in the book “Network Flows – Theory, Algorithms, and Applications” by Ahuja, Magnanti and Orlin. The following pseudo code however will help us to derive an appropriate data structure and guide the implementation step.

# Initialization
for all nodes i in N do:
    d[i,i] = 0
    p[i,i] = i

    for all nodes j in V\{i} do:
        if (i,j) in A:
            d[i,j] = c[i,j]
            p[i,j] = i
        else:
            d[i,j] = infinity
            p[i,j] = 0

# Iterations
for all nodes k in N do:
    for all nodes i in N do:
        for all nodes j in N do:
            if d[i,j] > d[i,k] + d[k,j]:
                d[i,j] = d[i,k] + d[k,j]
                p[i,j] = p[k,j]

In the initialization block, we simply set up the data as can be found in our initial graph: Firstly, distances from a vertex to itself (\(d[i,i]\)) are initialized with a distance of 0 with their immediate predecessor being the vertex itself. Secondly, if an arc \((i,j)\) exists between two vertices \(i\) and \(j\), we store the associated arc weight (let’s call it \(c[i,j]\)) in our distance matrix and set the predecessor \(p[i,j]\) to \(i\). Thirdly, if no arc exists between two vertices, then the associated distance will be infinitely high and the associated predecessor will be initialized with the placeholder “0”.

In the iterations block, we simply iterate over the different nodes in a (triply) nested structure and repeatedly answer the question: Can we reduce the currently best-known distance \(d[i,j]\) from \(i\) to \(j\) if we instead take the currently best-known shortest path from \(i\) to \(k\) and then the currently best-known shortest path from \(k\) to \(j\)? If this is the case, then the best-known shortest distance from \(i\) to \(j\) will be updated to \(d[i,k] + d[k,j]\). The immediate predecessor of \(j\) will be set to \(p[k,j]\) accordingly.

One prerequisite for the Floyd-Warshall algorithm is that no negative cycles (cycles that when traversed net a total distance of smaller 0) exist in the graph. In the following example, we will purposely violate this prerequiste, because the algorithm can also be used to detect the existence of any of these negative cycles. How do we know if this is the case? By checking the diagonal entries of our distance matrix after the algorithm terminates, namely if the distance “from a vertex to itself” (i.e.: \(d[i,i]\)) is strictly smaller than 0.

Remember that we initialized our algorithm with the distances \(d[i,i]\) of 0 for each of the vertices. Accordingly, if at any point in the algorithm we manage to reduce the distance from a vertex to itself to less than 0 via the traversal of an intermediate vertex, we have detected a negative cycle. It is exactly these negative cycles that enable us to identify trading sequences of currencies that result in a net gain. Our goal is therefore to identify whether the graph at hand contains any of those negative cycles.

Currency Pairs as Weighted Directed Graph

So how and where does the idea of trading currency pairs come into play and how does this relate to the idea of negative cylces? Assume that we are given the following table of potential trades between currencies and their associated exchange rates:

	GBP	EUR	YEN	CHF	USD	GOLD
GBP	1			2.1904		0.004816
EUR		1			1.0752
YEN			1		0.008309
CHF		0.6677		1
USD			120	1.3941	1	0.003065
Gold	205			455.2	322	1

Here, the table is to be read as follows: We can trade 1 Euro for 1.0752 USD. Note however that there is no possibility to trade USD back to Euro. The table can also be represented in a weighted directed graph:

For the given graph, consider the example of the following trading sequence: \(EUR \rightarrow USD \rightarrow CHF \rightarrow EUR\). Starting off with 1 Euro and performing these trades, we end up with \(1 \cdot 1.0752 \cdot 1.3941 \cdot 0.6677 = 1.0008397… > 1\), effectively gaining a net profit of \(0.0008397…\) Euro. Not much, but in our hypothetical example, this can be repeated infinitely many times! Consequently, what we are now interested in is finding cycles in our graph where the product over all arcs is greater than 1. Note how this clashes with the initial idea of using the Floyd-Warshall algorithm to detect negative cycles (where the sum over all arcs is smaller than 0). Therefore, we need to employ an additional mathematical trick.

Transformation of Arc Weights

If you think back to high school math and the properties of the logarithm, you might remember something along the lines of \(ln(a \cdot b) = ln(a) + ln(b)\), or in a more general sense \(ln(\prod\limits_{x \in X} x) = \sum\limits_{x \in X} ln(x)\).
Let us define \(d_c\) to be the weights of arcs in a cycle \(C\).
Our requirement for an arbitrage opportunity is that the product over all arc weights of a cycle is greater than 1, or more formally \(\prod\limits_{c \in C} d_c > 1\).
With the introduced property of the logarithm, we can rewrite this condition as:

\[ \begin{aligned}
\prod\limits_{c \in C} d_c &> 1 \\
ln(\prod\limits_{c \in C} d_c) &> ln(1) \\
ln(\prod\limits_{c \in C} d_c) &> 0 \\
\sum\limits_{c \in C} ln(d_c) &> 0 \\
\sum\limits_{c \in C} -ln(d_c) &< 0
\end{aligned}
\]

Notice how by rewriting the inequality, we managed to transform the initial problem of finding cycles where the product is greater than 1 into one of finding negative cycles: In particular one of finding cycles where the sum of the negative logarithmic arc weights is smaller than 0. Accordingly, we simply replace all arc weights of the graph with their negative natural logarithm:

[Values of arc weights are rounded to six digits to keep the image small]

Implementation

The implementation is straight forward. As an introductory example to programming in the Management Science exercise course, I want to highlight the following aspects:

How versatile and convenient a (general purpose) programming language such as Python can be.
The file handling of the initial input data (an Excel-workbook of exchange rates) as well as the rendering of the output pdf-files is handled by the third party libraries openpyxl and graphviz, respectively.
How close the implementation of the actual algorithm is to the pseudo code.
Notice how the algorithm is only a few lines of code, while the majority of the code is related to reading in data and generating output files.
How a suitable data structure can be derived from the mathematical notation.
Here, I purposely refrained from using established graph libraries such as networkx to show how we can create custom classes to represent the graph at hand by using the introduced mathematical notation: Given a graph \(G = (N, A) \) we define a graph class that has a set of nodes and set of arcs as attributes. In the same fashion, we create classes for arcs and nodes with the necessary attributes for each, e.g., an arc weight for each instance of an arc object.

Github and Installation

The source code can be found on my github: Arbitrage Trading
Feel free to clone the repository and play around with the code!
Ideally, set up a Python virtual environment and install the necessary dependencies via
pip install -r requirements.txt