Heuristic vs Exact Optimization

Some context

My Master is in Data Science, and we are allowed to specialize in different areas of this field. I basically specialized in (1) general Machine Learning, and (2) Optimization.

The two most important courses I took related to Optimization were:

Mathematical Programming (186.835) → MIP modeling, solving, decomposition, uncertainty, computing…

Heuristic Optimization (192.137) → overview of the most common and general heuristic optimization techniques.

On each of them, we had to work on a big project in which we focused on a specific problem, and we tried to solve it by applying exact or heuristic optimization, respectively.

My take on heuristic vs exact optimization algorithms

If you had asked me some time ago, I would have said that exact optimization is the only right way, and that providing solutions without optimality guarantees is for losers.

Naturally, I still find more mathematical beauty in exact approaches. But my opinion on this topic has changed because of two ideas:

Heuristic and exact opti are not really two separate worlds. Exact optimization solvers critically rely on heuristic methods to find heuristic solutions. Moreover, it is a common practice for an OR practitioner to first find a feasible solution heuristically, and then feed it to an exact solver to improve it and prove optimality. On the other hand, heuristic optimization should never forget about the strengths of exact solvers. I would argue that even if you plan to solve a problem heuristically, having a MIP formulation and trying to solve the small instances with a solver is the right first step. Heuristic methods should then help you find solutions to these instances that are intractable for the solver. Moreover, the root node heuristics of exact solvers are incredibly strong, and you can use their provided solution to further improve it using hand-crafted heuristics.

Optimality, if not easy to find, is useless in practical applications. I attended a talk given by Professor Ruben Ruiz (AWS) at the EURO Conference 2024. ChatGPT summarizes his talk like this:

In OR research and academia, there is a strong focus on developing complex, problem-specific algorithms to achieve near-optimal solutions, often adding complexity to achieve marginal gains in optimality gaps. However, these intricate methods can be impractical for industrial problems that require flexibility, maintainability, and quick adaptability. In real-world scenarios with large datasets and approximated inputs, a slight loss in optimality is acceptable in exchange for more robust and pragmatic solutions.

For example, assume the fictional papers:

A QUBO - Benders Decomposition with a Branch-and-Cut and a neural QUBO solver for the Bin Packing problem → their algorithm only provided decent optimality proofs for instances of size < 100 and just doesn’t work for instances > 1000
A simple GRASP heuristic for large Vector bin Packing with Heterogeneous Bins problems → they can quickly provide solutions for instances of size 1M

The point of the talk is that paper (1) is regarded as higher compared to (2). However, its practical applications are close to zero because there are no Bin Packing problems in the real world, and the industry problems are larger in size. So basically, academia is solving the problems that we don’t have to optimally, instead of solving the real-world problems heuristically.

Complete abstract of Ruben Ruiz’s talk at EURO2024

Title: Optimal is the enemy of good: Solving very large scale virtual machine placement problems at Amazon Elastic Compute Cloud

Abstract: A significant portion of the effort within the field of Operations Research is dedicated to the development of intricate ad-hoc algorithms and models tailored for addressing diverse optimization problems. Frequently, the research community exhibits a preference for complexity in the proposed methodologies. Notably, an esteemed characteristic sought in these methods is the incorporation of problem-specific knowledge, enabling the attainment of results that get as close as possible to optimality.

In academic publishing, the pursuit of marginal gains in optimality gaps is a common goal, even at the expense of introducing additional complexity to models or algorithms. While there is consensus that such endeavors contribute to advancements in algorithms and propel the field of Operations Research forward, a frequently underestimated dimension is the practical applicability of these advancements in industrial settings. Real industrial problems are often messy with imprecise (or numerous) objectives, soft constraints, preferences and a myriad of situations that demand pragmatic approaches. Additionally, these problems undergo rapid evolution with frequent model refinements, occurring often fortnightly. Here, the ongoing painful adaptation of intricate code bases stemming from ad-hoc methods is not the favored choice. In contrast, solvers offer greater flexibility, allowing faster implementation of new constraints through their APIs than altering tailored algorithms. In industrial contexts, the preference lies in flexibility, maintainability, robustness, and reduced operational load, where a modest optimality gap is deemed a minor trade-off. Moreover, in the realm of large-scale real-world problems, mathematical solvers often prove impractical. It is commonplace to resort to simplifying the problem for solvability. This practice raises a critical question: Is it logical to insist on chasing optimality in solving a simplified problem with a reduced real-world fidelity? Furthermore, real problems entail vast datasets, often including forecasted or approximated input data. Does it make sense to go great lengths to optimally solve a problem when a portion of the input data involves approximations? In this presentation, I advocate for relinquishing the pursuit of optimality and, instead, embracing heuristic solvers and simplified modeling approaches. Such strategies yield expedient, adaptable, maintainable, and easily implementable models. The discourse will draw upon various examples, spanning classical routing and scheduling problems, culminating in intricate real-world virtual machine placement bin packing problems encountered at Amazon Elastic Compute Cloud (EC2).

Bio: Ruben Ruiz is Principal Applied Scientist at Amazon Web Services (AWS) and Full Professor of Statistics and Operations Research on leave of absence at the Polytechnic University of Valencia, Spain. He is co-author of more than 100 papers in International Journals and has participated in presentations of more than two hundred papers in national and international conferences. He is editor of the Elsevier’s JCR-listed journal Operations Research Perspectives (ORP) and co-editor of the JCR-listed journal European Journal of Industrial Engineering (EJIE). He is also associate editor of other journals like TOP and member of the editorial boards of several journals most notably European Journal of Operational Research and Computers and Operations Research. His research interests include scheduling and routing in real life scenarios as well as cloud computing scheduling.

Some question I have:

💡

What is the best framework to build a fully heuristic solver around? An interesting question is whether Branch and Bound / Cut (without much focus on the dual side) would be the best approach.

MIP optimization for the k-Minimum Spanning Tree problem

You can take a look at the repository for this project here: jsalvasoler/K-MST-with-MILP-optimization (github.com). In the end, this is just a classic modeling exercise: come up with different MIP models and compare how a solver performs on them. But I enjoyed it. 🙂

In this project, I provided different MILP formulations for the k-Minimum Spanning Tree Problem. In particular, we define:

a sequential formulation (inspired by Miller, Tucker and Zemlin, MTZ)

a Single-Commodity Flow formulation, SCF

a Multi-Commodity Flow formulation, MCF

a Cycle Elimination Constraints formulation, CEC

a Directed Cutset Constraints formulation, DCC

I used ortools and gurobipy for the implementation, and I compared the formulations on a set of 10 graph instances. I compared the licensed solver Gurobi 10.0.1 vs the open-source SCIP v803 for the compact models, and only Gurobi was used for the exponential models.

Here is an example of the CEC formulation, which turned out to be best together with MTZ.

This model uses binary variables , and that select the edges and nodes part of the MST. We use the notation and for the neighboring nodes of .

I wanted to briefly go through the results I got on 18 benchmark instances. The table contains the time to optimality in seconds, for each formulation (columns) and for each instance (rows). The instance size and difficulty increase with the instance ID. Our findings were that MTZ and CEC were the best performing formulations. A faster implementation of the separation routing of CEC (for instance, using C++ instead of Python for finding the Max Flow between nodes) would have probably made this model the best.

Instance	SCF	MCF	MTZ	CEC	DCC
1.0	0.01	0.02	0.00	0.00	0.01
1.1	0.01	0.02	0.01	0.00	0.01
2.0	0.05	0.10	0.03	0.01	0.15
2.1	0.21	0.61	0.04	0.02	0.12
3.0	0.37	6.90	0.07	0.03	0.47
3.1	21.2	2749.5	0.29	0.11	5.37
4.0	3.66	76.19	0.14	0.09	0.39
4.1	7.57	736.9	0.51	0.09	4.27
5.0	14.6	ㅤ	0.38	0.16	4.85
5.1	294.1	ㅤ	0.65	0.44	25.7
6.0	ㅤ	ㅤ	8.56	3.02	440.7
6.1	ㅤ	ㅤ	7.56	2.17	1462.6
7.0	ㅤ	ㅤ	2.93	1.08	552.8
7.1	ㅤ	ㅤ	3.36	4.31	ㅤ
8.0	ㅤ	ㅤ	5.40	1.49	1071.8
8.1	ㅤ	ㅤ	21.3	68.3	ㅤ
9.0	ㅤ	ㅤ	76.6	121.2	ㅤ
9_1	ㅤ	ㅤ	ㅤ	ㅤ	ㅤ

Heuristic methods for the Weighted S-Plex Editing problem

In this project I worked with Grégoire de Lambertye, and the goal was to implement many heuristic and meta-heuristics for the s-Plex Editing Problem. Here is the repository for the project: GregoireLamb/heuristic-optimization-splex (github.com)

Formal problem description (a bit too long).

We consider the Weighted s-Plex Editing Problem (WsPEP), which is a generalization of the s-Plex Editing Problem which in itself is a generalization of the Graph Editing Problem. We are given an undirected graph , a positive integer value , and a symmetric weight matrix which assigns every (unordered) vertex pair a non-negative integer weight . An s-Plex of a graph is a subset of vertices such that each vertex has degree at least and there exist no edges to vertices outside the s-Plex, i.e., . Note that a clique (complete graph on a vertex subset) is a 1-plex. The goal is to edit the edges of the graph by deleting existing edges and/or inserting new edges such that the edited graph consists only of non-overlapping s-Plexes and such that the sum over all weights of the edited edges is minimal. Let a candidate solution be represented by variables , where a value 1 indicates that edge is either inserted (if ) or deleted (if ) and a value 0 indicates that the edge is not edited. The objective function is then given by .

In essence, the problem is about adding or removing edges from a graph such that the resulting graph has all its connected components having the s-Plex property. The edges are weighted, so you want to add or remove them with the minimum cost.

We implemented the following methods:

Construction Heuristics: Greedy and Randomized Greedy construction heuristics

GRASP: Greedy Randomized Adaptive Search Procedure on top of the Randomized Greedy construction heuristic

Local Search: Iterated Local Search with the construction heuristic as the initial solution. The neighborhood structures considered are

Swap nodes: Swap two nodes in the solution
k-flips: Remove k edges from the solution
Move nodes: Move nodes from one s-Plex to another

VND: Variable Neighborhood Descent

Simulated Annealing: Simulated Annealing with the construction heuristic as the initial solution.

BRKGA: Biased Random Key Genetic Algorithm

ACO: Ant Colony Optimization

Moreover, we performed:

Statistical testing to fairly identify the best methods on a set of instances

Hyperparameter Tuning of some of the methods using SMAC.

Our conclusions and lessons learned

Here are the conclusions that we wrote in the report. We sound a bit know-it-all and pretentious.

Construction Heuristics: We developed a very fast construction heuristic that provided strong baseline solutions, which were challenging for other methods to improve. This emphasizes the importance of investing effort in creating an effective initial heuristic, as it can be crucial for the success of a heuristic procedure.

Good Coding Practices: Adhering to good coding practices such as comprehensive OOP, configuration files, and proper git usage made our project more manageable. It allowed us to extend our initial assignment easily and reuse code efficiently

Metaheuristics Challenges: Implementing efficient metaheuristics like evolutionary algorithms (GA and ACO) proved difficult. We expected these to outperform Simulated Annealing but found that they might require larger populations, more iterations, or simply more time to show their potential.

SMAC's Power: SMAC is a powerful framework for optimizing black-box functions and will be a valuable tool in our future work. However, optimizing over varied instance distributions was challenging and less effective than anticipated, possibly due to limited computing power and time.

s-Plex Problem Difficulty: The Weighted Editing s-Plex problem is complex, with large instances that demand efficient algorithms and low-resource usage. While the large instances were challenging, smaller or simpler instances might have provided better learning opportunities and allowed for more effective algorithm testing and development.

But here are the real lessons and conclusions:

We had a lot of fun, working on well-defined optimization problems is so nice.

Gregoire and I work very well together.

There isn’t a “unified” heuristic algorithmic framework.

There isn’t a language to model problems that are going to be solved heuristically.
There isn’t a very well established package or library for heuristic methods.
In the end, heuristics are very problem-specific, and researchers are okay with implementing them from scratch.
I find this a bit annoying.

SMAC is powerful, but it is so badly supported on Windows that it is unusable (as of July 2024).

Heuristic methods are so dependent on algorithmic design choices. We made some questionable choices at the beginning of the project (how to represent solutions, the neighborhood structures), and this limited the performance of all the subsequent methods. Our methods’ performances were nice, but not perfect. We still got the best grade.