The fine-tuning of Large Language Models (LLMs) has enabled them to recently achieve milestones in natural language processing applications. The emergence of ever larger LLMs has paved the way for more efficient fine-tuning methods. Among these, the Low-Rank Adaptation (LoRA) method keeps most of the weights of the pre-trained LLM frozen while introducing a low-rank decomposition of the weight matrix, enabling the tuning of only a very small proportion of the network. The performance on downstream tasks of models fine-tuned with LoRA heavily relies on a set of hyperparameters including the rank of the decomposition. In this work, we investigate the choice of these hyperparameters through two main blackbox optimization (BBO) techniques. We examine the whole pipeline of performing fine-tuning and validation on a pre-trained LLM as a blackbox and efficiently explore the space of hyperparameters with the \nomad algorithm, achieving a boost in performance and human alignment of the tuned model.
Traditional neural networks have an impressive classification performance, but what they learn cannot be inspected, verified or extracted. Neural Logic Networks on the other hand have an interpretable structure that enables them to learn a logical mechanism relating the inputs and outputs with AND and OR operations. We generalize these networks with NOT operations and biases that take into account unobserved data and develop a rigorous logical and probabilistic modeling in terms of concept combinations to motivate their use. We also propose a novel factorized IF-THEN rule structure for the model as well as a modified learning algorithm. Our method improves the state-of-the-art in Boolean networks discovery and is able to learn relevant, interpretable rules in tabular classification, notably on examples from the medical and industrial fields where interpretability has tangible value.
255
In this work, we propose Wasserstein distributionally robust shallow convex neural networks (WaDiRo-SCNNs) to provide reliable nonlinear predictions when subject to adverse and corrupted datasets. Our approach is based on the reformulation of a new convex training program for ReLU-based shallow neural networks, which allows us to cast the problem into the order-1 Wasserstein distributionally robust optimization framework. Our training procedure is conservative, has low stochasticity, is solvable with open-source solvers, and is scalable to large industrial deployments. We provide out-of-sample performance guarantees, show that hard convex physical constraints can be enforced in the training program, and propose a mixed-integer convex post-training verification program to evaluate model stability. WaDiRo-SCNN aims to make neural networks safer for critical applications, such as in the energy sector. Finally, we numerically demonstrate our model's performance through both a synthetic experiment and a real-world power system application, viz., the prediction of hourly energy consumption in non-residential buildings within the context of virtual power plants, and evaluate its stability across standard regression benchmark datasets. The experimental results are convincing and showcase the strengths of the proposed model.
2
In this work, we present a new unsupervised anomaly (outlier) detection (AD) method using the sliced-Wasserstein metric. This filtering technique is conceptually interesting for MLOps pipelines deploying machine learning models in critical sectors, e.g., energy, as it offers a conservative data selection. Additionally, we open the first dataset showcasing localized critical peak rebate demand response in a northern climate. We demonstrate the capabilities of our method on synthetic datasets as well as standard AD datasets and use it in the making of a first benchmark for our open-source localized critical peak rebate dataset.
Efficiently solving a vehicle routing problem (VRP) in a practical runtime is a critical challenge for delivery management companies. This paper explores both a theoretical and experimental connection between the Capacitated Vehicle Routing Problem (CVRP) and the Constrained Centroid-Based Clustering (CCBC). Reducing a CVRP to a CCBC is a synonym for a transition from an exponential to a polynomial complexity using commonly known algorithms for clustering, i.e K-means. At the beginning, we conduct an exploratory analysis to highlight the existence of such a relationship between the two problems through illustrative small-size examples and simultaneously deduce some mathematically-related formulations and properties. On a second level, the paper proposes a CCBC based approach endowed with some enhancements. The proposed framework consists of three stages. At the first step, a constrained centroid-based clustering algorithm generates feasible clusters of customers. This methodology incorporates three enhancement tools to achieve near-optimal clusters, namely: a multi-start procedure for initial centroids, a customer assignment metric, and a self-adjustment mechanism for choosing the number of clusters. At the second step, a traveling salesman problem (T SP) solver is used to optimize the order of customers within each cluster. Finally, we introduce a process relying on routes cutting and relinking procedure, which calls upon solving a linear and integer programming model to further improve the obtained routes. This step is inspired by the ruin & recreate algorithm. This approach is an extension of the classical cluster-first, route-second method and provides near-optimal solutions on well-known benchmark instances in terms of solution quality and computational runtime, offering a milestone in solving VRP.
In Markov decision processes (MDPs), quantile risk measures such as Value-at-Risk are a standard metric for modeling RL agents' preferences for certain outcomes. This paper proposes a new Q-learning algorithm for quantile optimization in MDPs with strong convergence and performance guarantees. The algorithm leverages a new, simple dynamic program (DP) decomposition for quantile MDPs. Compared with prior work, our DP decomposition requires neither known transition probabilities nor solving complex saddle point equations and serves as a suitable foundation for other model-free RL algorithms. Our numerical results in tabular domains show that our Q-learning algorithm converges to its DP variant and outperforms earlier algorithms.
Multiobjective blackbox optimization deals with problems where the objective and constraint functions are the outputs of a numerical simulation. In this context, no derivatives are available, nor can they be approximated by finite differences, which precludes the use of classical gradient-based techniques. The DMulti-MADS algorithm implements a state-of-the-art direct search procedure for multiobjective blackbox optimization based on the mesh adaptive direct search (MADS) algorithm. Since its conception, many search strategies have been proposed to improve the practical efficiency of the single-objective MADS algorithm. Inspired by this previous research, this work proposes the integration of two search heuristics into the DMulti-MADS algorithm. The first uses quadratic models, built from previously evaluated points, which act as surrogates for the true objectives and constraints, to suggest new promising candidates. The second exploits the sampling strategy of the Nelder-Mead algorithm to explore the decision space for new non-dominated points. Computational experiments on analytical problems and three engineering applications show that the use of such search steps considerably improves the performance of the DMulti-MADS algorithm.
Decision trees are highly interpretable models for solving classification problems in machine learning (ML). The standard ML algorithms for training decision trees are fast but generate suboptimal trees in terms of accuracy. Other discrete optimization models in the literature address the optimality problem but only work well on relatively small datasets. \cite{firat2020column} proposed a column-generation-based heuristic approach for learning decision trees. This approach improves scalability and can work with large datasets. In this paper, we describe improvements to this column generation approach. First, we modify the subproblem model to significantly reduce the number of subproblems in multiclass classification instances. Next, we show that the data-dependent constraints in the master problem are implied, and use them as cutting planes. Furthermore, we describe a separation model to generate data points for which the linear programming relaxation solution violates their corresponding constraints. We conclude by presenting computational results that show that these modifications result in better scalability.
NOMAD is software for optimizing blackbox problems. In continuous development since 2001, it constantly evolved with the integration of new algorithmic features published in scientific publications. These features are motivated by real applications encountered by industrial partners. The latest major release of NOMAD, version 3, dates from 2008. Minor releases are produced as new features are incorporated. The present work describes NOMAD 4, a complete redesign of the previous version, with a new architecture providing more flexible code, added functionalities and reusable code. We introduce algorithmic components, which are building blocks for more complex algorithms, and can initiate other components, launch nested algorithms, or perform specialized tasks. They facilitate the implementation of new ideas, including the MegaSearchPoll component, warm and hot restarts, and a revised version of the PSD-MADS algorithm. Another main improvement of NOMAD 4 is the usage of parallelism, to simultaneously compute multiple blackbox evaluations, and to maximize usage of available cores. Running different algorithms, tuning their parameters, and comparing their performance for optimization is simpler than before, while overall optimization performance is maintained between versions 3 and 4. NOMAD is freely available at www.gerad.ca/nomad and the whole project is visible at github.com/bbopt/nomad.
Motivated by the needs from an airline crew scheduling application, we introduce structured convolutional kernel networks (Struct-CKN), which combine CKNs from Mairal et al. (2014) in a structured prediction framework that supports constraints on the outputs. CKNs are a particular kind of convolutional neural networks that approximate a kernel feature map on training data, thus combining properties of deep learning with the non-parametric flexibility of kernel methods. Extending CKNs to structured outputs allows us to obtain useful initial solutions on a flight-connection dataset that can be further refined by an airline crew scheduling solver. More specifically, we use a flight-based network modeled as a general conditional random field capable of incorporating local constraints in the learning process. Our experiments demonstrate that this approach yields significant improvements for the large-scale crew pairing problem (50,000 flights per month) over standard approaches, reducing the solution cost by 17% (a gain of millions of dollars) and the cost of global constraints by 97%.
The growing reliance on power electronics introduces new challenges requiring detailed time-domain analyses with fast and accurate circuit simulation tools. Currently, commercial time-domain simulation software are mainly relying on physics-based methods to simulate power electronics. Recent work showed that data-driven and physics-informed learning methods can increase simulation speed with limited compromise on accuracy, but many challenges remain before deployment in commercial tools can be possible. In this paper, we propose a physics-informed bidirectional long-short term memory neural network (BiLSTM-PINN) model to simulate the time-domain response of a closed-loop dc-dc boost converter for various operating points, parameters, and perturbations. A physics-informed fully-connected neural network (FCNN) and a BiLSTM are also trained to establish a comparison. The three methods are then compared using step-response tests to assess their performance and limitations in terms of accuracy. The results show that the BiLSTM-PINN and BiLSTM models outperform the FCNN model by more than 9 and 4.5 times, respectively, in terms of median RMSE. Their standard deviation values are more than 2.6 and 1.7 smaller than the FCNN's, making them also more consistent. Those results illustrate that the proposed BiLSTM-PINN is a potential alternative to other physics-based or data-driven methods for power electronics simulations.
A recent study on the classical Capacitated Vehicle Routing Problem (CVRP) introduced an adaptive version of the widely used Iterated Local Search (ILS) paradigm, hybridized with a path-relinking strategy (PR). The solution method, called AILS-PR, outperformed existing meta-heuristics for the CVRP on benchmark instances. However, tests on large-scale instances of the CVRP suggested that PR was too slow, making AILS-PR less advantageous in this case. To overcome this challenge, this paper presents an Adaptive Iterated Local Search (AILS) with two phases in its search process. Both phases include the perturbation and local search steps of ILS. The main difference between them is that the reference solution in the first phase is found by the acceptance criterion, while in the second phase it is selected from a pool of the best solutions found in the search process, the so-called elite set. This algorithm, called AILS-II, is very competitive on smaller instances, outperforming the other methods from the literature with respect to the average gap to the best known solutions. Moreover, AILS-II consistently outperformed the state of the art on larger instances with up to 30,000 vertices.
We consider a class of dynamic collective choice models with social interactions, whereby a large number of non-uniform agents have to individually settle on one of multiple discrete alternative choices, with the relevance of their would-be choices continuously impacted by noise and the unfolding group behavior. This class of problems is modeled here as a so-called Min-LQG game, i.e., a linear quadratic Gaussian dynamic and non-cooperative game, with an additional combinatorial aspect in that it includes a final choice-related minimization in its terminal cost. The presence of this minimization term is key to enforcing some specific discrete choice by each individual agent. The theory of mean field games is invoked to generate a class of decentralized agent feedback control strategies which are then shown to converge to an exact Nash equilibrium of the game as the number of players increases to infinity. A key building block in our approach is an explicit solution to the problem of computing the best response of a generic agent to some arbitrarily posited smooth mean field trajectory. Ultimately, an agent is shown to face a continuously revised discrete choice problem, where greedy choices dictated by current conditions must be constantly balanced against the risk of the future process noise upsetting the wisdom of such decisions.Even though an agent's ultimately chosen alternative is random and dictated by its entire noise history and initial state, the limiting infinite population macroscopic behavior can still be predicted. It is shown that any Nash equilibrium of the game is defined by an a priori computable probability matrix characterizing the manner in which the agent population ultimately splits among the available alternatives.
Robots performing tasks in warehouses provide the first example of wide-spread adoption of autonomous vehicles in transportation and logistics. The efficiency of these operations, which can vary widely in practice, are a key factor in the success of supply chains. In this work we consider the problem of coordinating a fleet of robots performing picking operations in a warehouse so as to maximize the net profit achieved within a time period while respecting problem- and robot-specific constraints. We formulate the problem as a weighted set packing problem where the elements in consideration are items on the warehouse floor that can be picked up and delivered within specified time windows. We enforce the constraint that robots must not collide, that each item is picked up and delivered by at most one robot, and that the number of robots active at any time does not exceed the total number available. Since the set of routes is exponential in the size of the input, we attack optimization of the resulting integer linear program using column generation, where pricing amounts to solving an elementary resource-constrained shortest-path problem. We propose an efficient optimization scheme that avoids consideration of every increment within the time windows. We also propose a heuristic pricing algorithm that can efficiently solve the pricing subproblem. While this itself is an important problem, the insights gained from solving these problems effectively can lead to new advances in other time-widow constrained vehicle routing problems.
This work introduces the StoMADS-PB algorithm for constrained stochastic blackbox optimization, which is an extension of the mesh adaptive direct-search (MADS) method originally developed for deterministic blackbox optimization under general constraints. The values of the objective and constraint functions are provided by a noisy blackbox, i.e., they can only be computed with random noise whose distribution is unknown. As in MADS, constraint violations are aggregated into a single constraint violation function. Since all functions values are numerically unavailable, StoMADS-PB uses estimates and introduces so-called probabilistic bounds for the violation. Such estimates and bounds obtained from stochastic observations are required to be accurate and reliable with high but fixed probabilities. The proposed method, which allows intermediate infeasible iterates, accepts new points using sufficient decrease conditions and imposing a threshold on the probabilistic bounds. Using Clarke nonsmooth calculus and martingale theory, Clarke stationarity convergence results for the objective and the violation function are derived with probability one.
Penalty methods are a well known class of algorithms for constrained optimization. They transform a constrained problem into a sequence of unconstrained penalized problems in the hope that approximate solutions of the latter converge to a solution of the former. If Lagrange multipliers exist, exact penalty methods ensure that the penalty parameter only need increase a finite number of times, but are typically scorned in smooth optimization for the penalized problems are not smooth. This led researchers to consider the implementation of exact penalty methods inconvenient. Recently, advances in proximal methods have led to increasingly efficient solvers for nonsmooth optimization. We show that the exact 2\ell_2-penalty method for equality-constrained optimization can in fact be implemented efficiently by solving the penalized problem with a proximal-type algorithm. We study the convergence of our algorithm and establish a worst-case complexity bound of O(ϵ2)O(\epsilon^{-2}) to bring a stationarity measure below ϵ>0\epsilon > 0 under the Mangarasian-Fromowitz constraint qualification and Lipschitz continuity of the objective gradient and constraint Jacobian. In a degenerate scenario where the penalty parameter grows unbounded, the complexity becomes O(ϵ8)O(\epsilon^{-8}), which is worse than another bound found in the literature. We justify the difference by arguing that our feasibility measure is properly scaled. Finally, we report numerical experience on small-scale problems from a standard collection and compare our solver with an augmented-Lagrangian and an SQP method. Our preliminary implementation is on par with the augmented Lagrangian in terms of robustness and efficiency. It is on par with the SQP method in terms of robustness, though the former remains ahead in terms of number of problem function evaluations.
We propose a new unsupervised anomaly detection method based on the sliced-Wasserstein distance for training data selection in machine learning approaches. Our filtering technique is interesting for decision-making pipelines deploying machine learning models in critical sectors, e.g., power systems, as it offers a conservative data selection and an optimal transport interpretation. To ensure the scalability of our method, we provide two efficient approximations. The first approximation processes reduced-cardinality representations of the datasets concurrently. The second makes use of a computationally light Euclidian distance approximation. Additionally, we open the first dataset showcasing localized critical peak rebate demand response in a northern climate. We present the filtering patterns of our method on synthetic datasets and numerically benchmark our method for training data selection. Finally, we employ our method as part of a first forecasting benchmark for our open-source dataset.
We introduce iterative methods named TriCG and TriMR for solving symmetric quasi-definite systems based on the orthogonal tridiagonalization process proposed by Saunders, Simon and Yip in 1988. TriCG and TriMR are tantamount to preconditioned Block-CG and Block-MINRES with two right-hand sides in which the two approximate solutions are summed at each iteration, but require less storage and work per iteration. We evaluate the performance of TriCG and TriMR on linear systems generated from the SuiteSparse Matrix Collection and from discretized and stablized Stokes equations. We compare TriCG and TriMR with SYMMLQ and MINRES, the recommended Krylov methods for symmetric and indefinite systems. In all our experiments, TriCG and TriMR terminate earlier than SYMMLQ and MINRES on a residual-based stopping condition with an improvement of up to 50% in terms of number of iterations. They also terminate more reliably than Block-CG and Block-MINRES. Experiments in quadruple and octuple precision suggest that loss of orthogonality in the basis vectors is significantly less pronounced in TriCG and TriMR than in Block-CG and Block-MINRES.
This paper presents an invariant Rauch-Tung- Striebel (IRTS) smoother applicable to systems with states that are an element of a matrix Lie group. In particular, the extended Rauch-Tung-Striebel (RTS) smoother is adapted to work within a matrix Lie group framework. The main advantage of the invariant RTS (IRTS) smoother is that the linearization of the process and measurement models is independent of the state estimate resulting in state-estimate-independent Jacobians when certain technical requirements are met. A sample problem is considered that involves estimation of the three dimensional pose of a rigid body on SE(3), along with sensor biases. The multiplicative RTS (MRTS) smoother is also reviewed and is used as a direct comparison to the proposed IRTS smoother using experimental data. Both smoothing methods are also compared to invariant and multiplicative versions of the Gauss-Newton approach to solving the batch state estimation problem.
Inspired by successful biological collective decision mechanisms such as honey bees searching for a new colony or the collective navigation of fish schools, we consider a mean field games (MFG)-like scenario where a large number of agents have to make a choice among a set of different potential target destinations. Each individual both influences and is influenced by the group's decision, as well as the mean trajectory of all the agents. The model can be interpreted as a stylized version of opinion crystallization in an election for example. The agents' biases are dictated first by their initial spatial position and, in a subsequent generalization of the model, by a combination of initial position and a priori individual preference. The agents have linear dynamics and are coupled through a modified form of quadratic cost. Fixed point based finite population equilibrium conditions are identified and associated existence conditions are established. In general multiple equilibria may exist and the agents need to know all initial conditions to compute them precisely. However, as the number of agents increases sufficiently, we show that 1) the computed fixed point equilibria qualify as epsilon Nash equilibria, 2) agents no longer require all initial conditions to compute the equilibria but rather can do so based on a representative probability distribution of these conditions now viewed as random variables. Numerical results are reported.
There are no more papers matching your filters at the moment.