I took on the MESS 2024 Computational Optimization Challenge

What is this?

I participated in the MESS 2024 Computational Optimization Competition, and I want to share it.

Basically, I was bored during the summer break and I wanted to do something. I didn't have any uni stuff at the time, I was only working for my company, which meant some free time!

I like OR, so I thought this would be a good opportunity to learn something new. For instance, my original plan was to try out the Hexaly solver because I thought it would be excellent at this problem, but they denied me the student access because I work for Quantagonia.

The Problem at Hand

The competition focused on a single problem, the Trigger-Arc Traveling Salesman Problem. I think they made it up themselves, but it's a variant of the TSP. The organizers provided some half-convincing real-world application, but I think it's just a fun academic problem.

How does the competition work?

The teams are given a set of instances, and they need to find good solutions for each of them. You can submit as many solutions as you want, and the best one for each instance will be used to evaluate your performance. So no time or computational limits, and every solver or solution approach is allowed. Teams of up to three people were allowed, but I participated

Problem definition

The Trigger Arc Travelling Salesman Problem (TATSP) is defined as the problem of identifying in a graph G a Hamiltonian cycle of minimum cost considering relations defined between arcs. For each arc a in the graph, relations with other arcs can be defined, which act as triggers. If the trigger arc is traversed, the cost of the arc a is reset. Only the last trigger arc encountered before reaching the arc a is considered. The key concept in this problem is the definition of a trigger arc. If the relationship is defined for arcs and , then arc will act as a trigger for arc , meaning that traversing arc will result in the assignment of a new cost to arc if has not yet been traversed.

More formally (skippable)

More formally, consider a directed graph with weights on the arcs. The node is designated as the starting node (depot). Let be defined as the cost of traversing the arc . For each arc , a set of relations is associated. Finally, let , be the traversal cost of the arc if the relation is active.

Let be the ordered sequence of arcs starting at node representing a Hamiltonian cycle in . The relation is active if and only if the arcs , and the arc precedes the arc in , and there is no other active relation such that precedes in .

It follows that for each arc , at most one relation can be active. As a result, the traversal cost of the arc will be equal to if there are no active relations in or if is the only active relation in .

My Solution Approach

We could say that my solution method was the result of a series of improvements.

A small reformulation trick

A first Mixed Integer Programming model

Build a meaningful TSP instance, and solve it

A GRASP for the TATSP

Small reformulation trick

This is not even a trick. But for the formulation, and later for the implementation, it's useful to think of the costs of the edges as the base costs, and the costs of the relations as the additional costs.

So we simply preprocess every instance by doing for each relation .

If we don’t do this, we are forced to model this “if else” on the cost of an edge, which is not nice for MIP models or in general for combinatorial optimization. After applying this trick, the cost of an edge is . To be fair, we are just passing the “if else” responsbility to the decision variable that tells whether an edge is active or not. But it’s a useful trick for MIP modeling anyway.

A Mixed Integer Programming Model for the TATSP

I first developed a MIP model of the problem, I think it's a good starting point for any optimization problem. Modeling the arc-trigger relations was challenging, but it can be done. This is my model:

Problem Parameters

Let be a directed graph with:

representing the set of nodes (vertices).
representing the set of edges (arcs) with base costs for .
and representing the set of incoming and outgoing edges of node .

is composed of pairs . We call the trigger of the target arc . If a relation is active, the cost of traversing the target arc is , the latter being the cost associated with the relation .

Decision Variables

: binary variable, equal to 1 if edge is part of the tour, 0 otherwise.

: continuous variable representing the position of node in the tour. Note: in the model, we abuse the notation to write for . This corresponds to where , i.e., the position of the first node of the arc in the tour.

: binary variable, equal to 1 if the relation is active, 0 otherwise.

: binary variable, used to model precedence constraints between arcs and .

Objective function

We aim to minimize the total cost of the tour, considering the base costs of the arcs and the costs associated with the active relations.

Constraints (it’s long and verbose)

1. Flow Conservation:

Only one incoming and one outgoing edge is allowed for each node.

2. Subtour Elimination:

Miller-Tucker-Zemlin constraints to avoid subtours.

3. At most one relation is active:

Only one relation can be active for each target arc, and no relation can be active if the target arc is not part of the tour.

4. Strengthen relation deactivation constraint:

Strengthen the previous constraint to ensure that if the target or the trigger is not part of the tour, the relation cannot be active.

5. Relation is inactive if follows in the tour:

A relation cannot be active if follows in the tour.

6. At least one relation is active:

For a given relation , if and are part of the tour (), and precedes in the tour (), then at least one relation targeting must be active.

7. Precedence constraints on variables:

If precedes in the tour, then . If the end node of the edge is shared, can not precede or vice versa.

8. Relation precedence constraints:

Given two relations and , if precedes which precedes in the tour (, ), then the relation cannot be active if is active (recall that only one relation can be active for each target arc).

9. Set the starting node:

Set the position of the starting node in the tour to be the first node.

Performance

About the performance of this model, it's not great. I passed it to Gurobi, and it could only solve to optimality 5/21 instances in the first round, and 1/34 instances in the second round. But it felt right to have it. You can find the model implementation in this file.

Good reasons for starting with a MIP:

Modeling makes you think deep about the problem and improves your understanding.

It’s fun.

What if the model could solve to optimality instances that you will spend hours trying to solve? What if your best solution in an instance is already optimal? This was not the case 🤡.

Build a meaningful TSP instance, and solve it

A key observation is that feasibility for the TATSP problem is trivial: any permutation of is a feasible solution. In other words, TSP solutions are also feasible solutions for the Trigger-Arc TSP. Moreover, the graph sizes were rather small: number of nodes ranging between 20 and 100, and edges between 100 and 1000.

So my question was:

Is there a TSP instance on the same graph, such that its optimal solution is the optimal solution to the TATSP?

The answer is probably no, but this question hints the direction of my next improvement.

As you probably guessed, what makes the TATSP hard is the relations between arcs. There were instances with massive number of relations, completely intractable by the MIP model.

💡

So I came up with a method that heuristically constructs a TSP instance (sets weights on the same instance graph) in a way that we push the solution of the TSP to trigger the TATSP relations that decrease the cost of the objective the most.

This heuristic is not trivial, and I had the problem that you actually need "prior knowledge" of the solution tour in order to define the TSP instance. Therefore, the edge costs of the TSP instance are defined for a given "prior" tour. More than a tour, it's an order of the nodes that is used to estimate the probability that a node comes before another one. More intuition on this later.

We construct the instance in the following way:

Estimate the probability that an edge is used. We model this probability to be inversely proportional to the distance between the nodes in the node_priorities list. This means , where is the distance between nodes and in the node_priorities list. Example: if prior_tour = , then .

Estimate the probability that a relation is active. We do this by taking into account the probability that the trigger edge is used, the probability that the target edge is used, and the closeness between the trigger and the target. Closer makes it more likely that the relation is active because only the last trigger edge in the tour will activate the relation. This means that for relation and , , the probability that the relation is active is , where is a parameter that we can tune.

Define the TSP instance with costs equal to the expected cost of the edge, considering the relations and their probabilities. To do this, we iterate over all the relations, and for each pair of edges we add the expected cost increase or decrease to the edge cost.

Again, why the need for the prior tour? Well, I can't think of any other way to estimate the probability of an edge being used without having some priorities on the nodes. Of course, one could assume that all edges happen with the same probability, but this would lower the expressiveness of the heuristic.

And finally, how do we use this to solve the TATSP?

We just sample a prior tour, build the TSP instance, and solve it. Then evaluate the resulting tour with the TATSP objective.

This method alone is not great, but allowed me to get decent solutions for all instances, something that the MIP could not do. It's also very fast, because the TSP is fast to solve, or we can set a time limit for the solver because TSP optimality brings no theoretical benefit.

Final step: A GRASP for the TATSP

GRASP is a simple metaheuristic that stands for Greedy Randomized Adaptive Search Procedure. It's works as follows:

Sample a randomized solution (usually a Randomized Greedy Construction Heuristic)

Apply local search to improve the solution, stop if no improvement is found

Repeat until you get a good solution

We already have a way to generate randomized solutions for the TATSP. Given a prior tour (think of it as a seed), we build a TSP instance, and we solve it to get a TATSP tour.

What we need is the Local Search improvement procedure that improves a given solution. I considered the following local search operators, which are quite standard in TSP literature.

2-opt: Swap two edges, neighborhood of size

3-opt: Swap three edges, neighborhood of size

delay-1: Take an arc an insert it later in the tour, neighborhood of size

So we are all set! Some GRASP implementation details:

As I mentioned before, is not too large, so exploring the complete neighborhood can be done is less than seconds for 2-opt and delay-1, and second for 3-opt. Therefore, we consider that our solution converges with respect to the three neighborhoods when no improvement is found on the three neighborhoods.

We skip the local search if the random solution is not promising. We define this as being worse than the best solution found so far. We make the assumption that the local search will not be able to improve this solution enough to overtake the current best.

For sampling the initial random solution, we select the best integer feasible solution found during the solving of the TSP instances according to the TATSP objective function (and not just the optimal one). We also evaluate the prior itself, and if it's better than the TSP solutions, we use it as the initial solution. The reason for this is that the TSP solutions could have a strong bias, and using only them could load to the exploration of only a small part of the search space.

Implementation

You can find the implementation of this project in this repository: github-trigger_arc_tsp

I used Python. It could have been a good opportunity to use C++ for speedups, but I code much more slowly in C++. Still, I think Python performed well in this case. C++ would have been more necessary if instance evaluation were more expensive. However, since evaluating an instance took only milliseconds, it wasn’t a problem. I used the Gurobi Python API for the MIP model and Hatch for Python project management.

Using unit tests for MIP modeling

I found it incredibly helpful to write unit tests for the MIP model. I had around 10 small instances that I could compute the solution by hand, that lead to 10 tests checking that the model produced the correct optimal solution. After a change in the model, I could run the tests to make sure the model was working as expected. I would do the same for every other MIP model I write.

Results and Prize

I got some nice results and finished 3rd in the competition. I want to congratulate the rest of the participants for their great work, and the organizers for the opportunity.