Accurate prediction of continuous properties is essential to many scientific and engineering tasks. Although deep-learning regressors excel with abundant labels, their accuracy deteriorates in data-scarce regimes. We introduce RankRefine, a model-agnostic, plug-and-play post hoc method that refines regression with expert knowledge coming from pairwise rankings. Given a query item and a small reference set with known properties, RankRefine combines the base regressor's output with a rank-based estimate via inverse variance weighting, requiring no retraining. In molecular property prediction task, RankRefine achieves up to 10% relative reduction in mean absolute error using only 20 pairwise comparisons obtained through a general-purpose large language model (LLM) with no finetuning. As rankings provided by human experts or general-purpose LLMs are sufficient for improving regression across diverse domains, RankRefine offers practicality and broad applicability, especially in low-data settings.
We study the problem of computing a minimum ss--tt cut in an unweighted, undirected graph via \emph{cut queries}. In this model, the input graph is accessed through an oracle that, given a subset of vertices SVS \subseteq V, returns the size of the cut (S,VS)(S, V \setminus S). This line of work was initiated by Rubinstein, Schramm, and Weinberg (ITCS 2018), who gave a randomized algorithm that computes a minimum ss--tt cut using O~(n5/3)\widetilde{O}(n^{5/3}) queries, thereby showing that one can avoid spending Θ~(n2)\widetilde{\Theta}(n^2) queries required to learn the entire graph. A recent result by Anand, Saranurak, and Wang (SODA 2025) also matched this upper bound via a deterministic algorithm based on blocking flows. In this work, we present a new randomized algorithm that improves the cut-query complexity to O~(n8/5)\widetilde{O}(n^{8/5}). At the heart of our approach is a query-efficient subroutine that incrementally reveals the graph edge-by-edge while increasing the maximum ss--tt flow in the learned subgraph at a rate faster than classical augmenting-path methods. Notably, our algorithm is simple, purely combinatorial, and can be naturally interpreted as a recursive greedy procedure. As a further consequence, we obtain a \emph{deterministic} and \emph{combinatorial} two-party communication protocol for computing a minimum ss--tt cut using O~(n11/7)\widetilde{O}(n^{11/7}) bits of communication. This improves upon the previous best bound of O~(n5/3)\widetilde{O}(n^{5/3}), which was obtained via reductions from the aforementioned cut-query algorithms. In parallel, it has been observed that an O~(n3/2)\widetilde{O}(n^{3/2})-bit randomized protocol can be achieved via continuous optimization techniques; however, these methods are fundamentally different from our combinatorial approach.
We introduce a blackbox framework that simplifies all known parallel algorithms with near-linear work for single-source reachability and shortest paths in directed graphs. Specifically, existing reachability algorithms rely on constructing shortcuts; our blackbox allows these algorithms that construct shortcuts with hopbound hh to assume the input graph GG is ``shallow'', meaning if vertex ss can reach vertex tt, it can do so in approximately hh hops. This assumption significantly simplifies shortcut construction [Fin18, JLS19], resulting in simpler parallel reachability algorithms. Furthermore, our blackbox extends naturally to simplify parallel algorithms for constructing hopsets and, consequently, for computing shortest paths [CFR20 , CF23 , RHM+23 ].
Large Language Models (LLMs) are increasingly relying on web crawling to stay up to date and accurately answer user queries. These crawlers are expected to honor this http URL files, which govern automated access. In this study, for the first time, we investigate whether reputable news websites and misinformation sites differ in how they configure these files, particularly in relation to AI crawlers. Analyzing a curated dataset, we find a stark contrast: 60.0% of reputable sites disallow at least one AI crawler, compared to just 9.1% of misinformation sites in their this http URL files. Reputable sites forbid an average of 15.5 AI user agents, while misinformation sites prohibit fewer than one. We then measure active blocking behavior, where websites refuse to return content when HTTP requests include AI crawler user agents, and reveal that both categories of websites utilize it. Notably, the behavior of reputable news websites in this regard aligns more closely with their declared this http URL directive than that of misinformation websites. Finally, our longitudinal analysis reveals that this gap has widened over time, with AI-blocking by reputable sites rising from 23% in September 2023 to nearly 60% by May 2025. Our findings highlight a growing asymmetry in content accessibility that may shape the training data available to LLMs, raising essential questions for web transparency, data ethics, and the future of AI training practices.
In this paper, we discuss the maximum flow problem in the two-party communication model, where two parties, each holding a subset of edges on a common vertex set, aim to compute the maximum flow of the union graph with minimal communication. We show that this can be solved with O~(n1.5)\tilde{O}(n^{1.5}) bits of communication, improving upon the trivial O~(n2)\tilde{O}(n^2) bound. To achieve this, we derive two additional, more general results: 1. We present a randomized algorithm for linear programs with two-sided constraints that requires O~(n1.5k)\tilde{O}(n^{1.5}k) bits of communication when each constraint has at most kk non-zeros. This result improves upon the prior work by [Ghadiri, Lee, Padmanabhan, Swartworth, Woodruff, Ye, STOC'24], which achieves a complexity of O~(n2)\tilde{O}(n^2) bits for LPs with one-sided constraints. Upon more precise analysis, their algorithm can reach a bit complexity of O~(n1.5+nk)\tilde{O}(n^{1.5} + nk) for one-sided constraint LPs. Nevertheless, for sparse matrices, our approach matches this complexity while extending the scope to two-sided constraints. 2. Leveraging this result, we demonstrate that the minimum cost flow problem, as a special case of solving linear programs with two-sided constraints and as a general case of maximum flow problem, can also be solved with a communication complexity of O~(n1.5)\tilde{O}(n^{1.5}) bits. These results are achieved by adapting an interior-point method (IPM)-based algorithm for solving LPs with two-sided constraints in the sequential setting by [van den Brand, Lee, Liu, Saranurak, Sidford, Song, Wang, STOC'21] to the two-party communication model. This adaptation utilizes techniques developed by [Ghadiri, Lee, Padmanabhan, Swartworth, Woodruff, Ye, STOC'24] for distributed convex optimization.
This paper presents parallel, distributed and quantum algorithms for single-source shortest paths when edges can have negative weights (negative-weight SSSP). We show a framework that reduces negative-weight SSSP in all these setting to no(1)n^{o(1)} calls to any SSSP algorithm that works with a virtual source. More specifically, for a graph with mm edges, nn vertices, undirected hop-diameter DD, and polynomially bounded integer edge weights, we show randomized algorithms for negative-weight SSSP with (i) WSSSP(m,n)no(1)W_{SSSP}(m,n)n^{o(1)} work and SSSSP(m,n)no(1)S_{SSSP}(m,n)n^{o(1)} span, given access to an SSSP algorithm with WSSSP(m,n)W_{SSSP}(m,n) work and SSSSP(m,n)S_{SSSP}(m,n) span in the parallel model, (ii) TSSSP(n,D)no(1)T_{SSSP}(n,D)n^{o(1)}, given access to an SSSP algorithm that takes TSSSP(n,D)T_{SSSP}(n,D) rounds in CONGEST\mathsf{CONGEST}, (iii) QSSSP(m,n)no(1)Q_{SSSP}(m,n)n^{o(1)} quantum edge queries, given access to a non-negative-weight SSSP algorithm that takes QSSSP(m,n)Q_{SSSP}(m,n) queries in the quantum edge query model. This work builds off the recent result of [Bernstein, Nanongkai, Wulff-Nilsen, FOCS'22], which gives a near-linear time algorithm for negative-weight SSSP in the sequential setting. Using current state-of-the-art SSSP algorithms yields randomized algorithms for negative-weight SSSP with (i) m1+o(1)m^{1+o(1)} work and n1/2+o(1)n^{1/2+o(1)} span in the parallel model, (ii) (n2/5D2/5+n+D)no(1)(n^{2/5}D^{2/5} + \sqrt{n} + D)n^{o(1)} rounds in CONGEST\mathsf{CONGEST}, (iii) m1/2n1/2+o(1)m^{1/2}n^{1/2+o(1)} quantum queries to the adjacency list or n1.5+o(1)n^{1.5+o(1)} quantum queries to the adjacency matrix. Our main technical contribution is an efficient reduction for computing a low-diameter decomposition (LDD) of directed graphs to computations of SSSP with a virtual source. Efficiently computing an LDD has heretofore only been known for undirected graphs in both the parallel and distributed models.
We study polynomial-time approximation algorithms for two closely-related problems, namely computing shortcuts and transitive-closure spanners (TC spanners). For a directed unweighted graph G=(V,E)G=(V, E) and an integer dd, a set of edges EV×VE'\subseteq V\times V is called a dd-TC spanner of GG if the graph H:=(V,E)H:=(V, E') has (i) the same transitive-closure as GG and (ii) diameter at most d.d. The set EV×VE''\subseteq V\times V is a dd-shortcut of GG if EEE\cup E'' is a dd-TC spanner of GG. Our focus is on the following (αD,αS)(\alpha_D, \alpha_S)-approximation algorithm: given a directed graph GG and integers dd and ss such that GG admits a dd-shortcut (respectively dd-TC spanner) of size ss, find a (dαD)(d\alpha_D)-shortcut (resp. (dαD)(d\alpha_D)-TC spanner) with sαSs\alpha_S edges, for as small αS\alpha_S and αD\alpha_D as possible. As our main result, we show that, under the Projection Game Conjecture (PGC), there exists a small constant ϵ>0\epsilon>0, such that no polynomial-time (nϵ,nϵ)(n^{\epsilon},n^{\epsilon})-approximation algorithm exists for finding dd-shortcuts as well as dd-TC spanners of size ss. Previously, super-constant lower bounds were known only for dd-TC spanners with constant dd and αD=1{\alpha_D}=1 [Bhattacharyya, Grigorescu, Jung, Raskhodnikova, Woodruff 2009]. Similar lower bounds for super-constant dd were previously known only for a more general case of directed spanners [Elkin, Peleg 2000]. No hardness of approximation result was known for shortcuts prior to our result. As a side contribution, we complement the above with an upper bound of the form (nγD,nγS)(n^{\gamma_D}, n^{\gamma_S})-approximation which holds for 3γD+2γS>13\gamma_D + 2\gamma_S > 1 (e.g., (n1/5+o(1),n1/5+o(1))(n^{1/5+o(1)}, n^{1/5+o(1)})-approximation).
Vertex connectivity and its variants are among the most fundamental problems in graph theory, with decades of extensive study and numerous algorithmic advances. The directed variants of vertex connectivity are usually solved by manually extending fast algorithms for undirected graphs, which has required considerable effort. In this paper, we present a simple, black-box randomized reduction from directed to undirected vertex connectivity for dense graphs. As immediate corollaries, we largely simplify the proof for directed vertex connectivity in n2+o(1)n^{2+o(1)} time [LNP+25], and obtain a parallel vertex connectivity algorithm for directed graphs with nω+o(1)n^{\omega+o(1)} work and no(1)n^{o(1)} depth, via the undirected vertex connectivity algorithm of [BJMY25]. Our reduction further extends to the weighted version of the problem. By combining our reduction with the recent subcubic-time algorithm for undirected weighted vertex cuts [CT25], we obtain the first subcubic-time algorithm for weighted directed vertex connectivity, improving upon a three-decade-old bound [HRG00] for dense graphs.
This research introduces a formal framework for active learning with contrastive examples, where the learner is aware of the oracle's strategy for generating these examples. It quantifies the sample complexity improvements for various concept classes, illustrating how specific contrastive mechanisms and metric choices influence learning efficiency.
In this paper, we discuss the maximum flow problem in the two-party communication model, where two parties, each holding a subset of edges on a common vertex set, aim to compute the maximum flow of the union graph with minimal communication. We show that this can be solved with O~(n1.5)\tilde{O}(n^{1.5}) bits of communication, improving upon the trivial O~(n2)\tilde{O}(n^2) bound. To achieve this, we derive two additional, more general results: 1. We present a randomized algorithm for linear programs with two-sided constraints that requires O~(n1.5k)\tilde{O}(n^{1.5}k) bits of communication when each constraint has at most kk non-zeros. This result improves upon the prior work by [Ghadiri, Lee, Padmanabhan, Swartworth, Woodruff, Ye, STOC'24], which achieves a complexity of O~(n2)\tilde{O}(n^2) bits for LPs with one-sided constraints. Upon more precise analysis, their algorithm can reach a bit complexity of O~(n1.5+nk)\tilde{O}(n^{1.5} + nk) for one-sided constraint LPs. Nevertheless, for sparse matrices, our approach matches this complexity while extending the scope to two-sided constraints. 2. Leveraging this result, we demonstrate that the minimum cost flow problem, as a special case of solving linear programs with two-sided constraints and as a general case of maximum flow problem, can also be solved with a communication complexity of O~(n1.5)\tilde{O}(n^{1.5}) bits. These results are achieved by adapting an interior-point method (IPM)-based algorithm for solving LPs with two-sided constraints in the sequential setting by [van den Brand, Lee, Liu, Saranurak, Sidford, Song, Wang, STOC'21] to the two-party communication model. This adaptation utilizes techniques developed by [Ghadiri, Lee, Padmanabhan, Swartworth, Woodruff, Ye, STOC'24] for distributed convex optimization.
A number of recent works have studied algorithms for entrywise p\ell_p-low rank approximation, namely, algorithms which given an n×dn \times d matrix AA (with ndn \geq d), output a rank-kk matrix BB minimizing ABpp=i,jAi,jBi,jp\|A-B\|_p^p=\sum_{i,j}|A_{i,j}-B_{i,j}|^p when p>0p > 0; and AB0=i,j[Ai,jBi,j]\|A-B\|_0=\sum_{i,j}[A_{i,j}\neq B_{i,j}] for p=0p=0. On the algorithmic side, for p(0,2)p \in (0,2), we give the first (1+ϵ)(1+\epsilon)-approximation algorithm running in time npoly(k/ϵ)n^{\text{poly}(k/\epsilon)}. Further, for p=0p = 0, we give the first almost-linear time approximation scheme for what we call the Generalized Binary 0\ell_0-Rank-kk problem. Our algorithm computes (1+ϵ)(1+\epsilon)-approximation in time (1/ϵ)2O(k)/ϵ2nd1+o(1)(1/\epsilon)^{2^{O(k)}/\epsilon^{2}} \cdot nd^{1+o(1)}. On the hardness of approximation side, for p(1,2)p \in (1,2), assuming the Small Set Expansion Hypothesis and the Exponential Time Hypothesis (ETH), we show that there exists δ:=δ(α)>0\delta := \delta(\alpha) > 0 such that the entrywise p\ell_p-Rank-kk problem has no α\alpha-approximation algorithm running in time 2kδ2^{k^{\delta}}.
A recent breakthrough by [LNPSY STOC'21] showed that solving s-t vertex connectivity is sufficient (up to polylogarithmic factors) to solve (global) vertex connectivity in the sequential model. This raises a natural question: What is the relationship between s-t and global vertex connectivity in other computational models? In this paper, we demonstrate that the connection between global and s-t variants behaves very differently across computational models: 1.In parallel and distributed models, we obtain almost tight reductions from global to s-t vertex connectivity. In PRAM, this leads to a nω+o(1)n^{\omega+o(1)}-work and no(1)n^{o(1)}-depth algorithm for vertex connectivity, improving over the 35-year-old O~(nω+1)\tilde O(n^{\omega+1})-work O(log2n)O(\log^2n)-depth algorithm by [LLW FOCS'86], where ω\omega is the matrix multiplication exponent and nn is the number of vertices. In CONGEST, the reduction implies the first sublinear-round (when the diameter is moderately small) vertex connectivity algorithm. This answers an open question in [JM STOC'23]. 2. In contrast, we show that global vertex connectivity is strictly harder than s-t vertex connectivity in the two-party communication setting, requiring Θ~(n1.5)\tilde \Theta (n^{1.5}) bits of communication. The s-t variant was known to be solvable in O~(n)\tilde O(n) communication [BvdBEMN FOCS'22]. Our results resolve open problems raised by [MN STOC'20, BvdBEMN FOCS'22, AS SOSA'23]. At the heart of our results is a new graph decomposition framework we call \emph{common-neighborhood clustering}, which can be applied in multiple models. Finally, we observe that global vertex connectivity cannot be solved without using s-t vertex connectivity, by proving an s-t to global reduction in dense graphs, in the PRAM and communication models.
We study succinct representations of vertex cuts by centralized oracles and labeling schemes. For an undirected nn-vertex graph G=(V,E)G = (V,E) and integer parameter f1f \geq 1, the goal is supporting vertex cut queries: Given FVF \subseteq V with Ff|F| \leq f, determine if FF is a vertex cut in GG. In the centralized data structure setting, it is required to preprocess GG into an ff-vertex cut oracle that can answer such queries quickly, while occupying only small space. In the labeling setting, one should assign a short label to each vertex in GG, so that a cut query FF can be answered by merely inspecting the labels assigned to the vertices in FF. While the ``stst cut variants'' of the above problems have been extensively researched and are known to admit very efficient solutions, the basic ``cut query'' setting is essentially open (particularly for f>3f > 3). This work achieves the first significant progress on these problems: [ff-Vertex Cut Oracles:] Every nn-vertex graph GG admits ff-vertex cut oracle with O~(22fn)\tilde{O}(2^{2f} n) space and O~(22f)\tilde{O}(2^{2f}) query time, hence almost optimal for f=o(logn)f=o(\log n). In case GG is ff-connected (namely, when interested in minimum vertex cut queries), the space and query time improve to O~(fn)\tilde{O}(f n) and O~(f2)\tilde{O}(f^2), respectively. [ff-Vertex Cut Labels:] Every nn-vertex graph admits an ff-vertex cut labeling scheme, where the labels have length of O~(n11/f)\tilde{O}(n^{1-1/f}) bits (when ff is polylogarithmic in nn). This nearly matches the recent lower bound given by Long, Pettie and Saranurak (SODA 2025).
Molecular deep learning models have achieved remarkable success in property prediction, but they often require large amounts of labeled data. The challenge is that, in real-world applications, labels are extremely scarce, as obtaining them through laboratory experimentation is both expensive and time-consuming. In this work, we introduce MoleVers, a versatile pretrained molecular model designed for various types of molecular property prediction in the wild, i.e., where experimentally-validated labels are scarce. MoleVers employs a two-stage pretraining strategy. In the first stage, it learns molecular representations from unlabeled data through masked atom prediction and extreme denoising, a novel task enabled by our newly introduced branching encoder architecture and dynamic noise scale sampling. In the second stage, the model refines these representations through predictions of auxiliary properties derived from computational methods, such as the density functional theory or large language models. Evaluation on 22 small, experimentally-validated datasets demonstrates that MoleVers achieves state-of-the-art performance, highlighting the effectiveness of its two-stage framework in producing generalizable molecular representations for diverse downstream properties.
The real type of a finite family of univariate polynomials characterizes the combined sign behavior of the polynomials over the real line. We derive an explicit formula for the number of real types subject to given degree bounds. For the special case of a single polynomial we present a closed-form expression involving Fibonacci numbers. This allows us to precisely describe the asymptotic growth of the number of real types as the degree increases, in terms of the golden ratio.
Segment routing is a modern form of source-based routing, i.e., a routing technique where all or part of the routing decision is predetermined by the source or a hop on the path. Since initial standardization efforts in 2013, segment routing seems to have garnered substantial industry and operator support. Especially segment routing over IPv6 (SRv6) is advertised as having several advantages for easy deployment and flexibility in operations in networks. Many people, however, argue that the deployment of segment routing and SRv6 in particular poses a significant security threat if not done with the utmost care. In this paper we conduct a first empirical analysis of SRv6 deployment in the Internet. First, we analyze SRv6 behavior in an emulation environment and find that different SRv6 implementations have the potential to leak information to the outside. Second, we search for signs of SRv6 deployment in publicly available route collector data, but could not find any traces. Third, we run large-scale traceroute campaigns to investigate possible SRv6 deployments. In this first empirical study on SRv6 we are unable to find traces of SRv6 deployment even for companies that claim to have it deployed in their networks. This lack of leakage might be an indication of good security practices being followed by network operators when deploying SRv6.
We present a randomized parallel algorithm in the {\sf PRAM} model for kk-vertex connectivity. Given an undirected simple graph, our algorithm either finds a set of fewer than kk vertices whose removal disconnects the graph or reports that no such set exists. The algorithm runs in $O(m \cdot \text{poly}(k, \log n))workand work and O(\text{poly}(k, \log n))$ depth, which is nearly optimal for any k=poly(logn)k = \text{poly}(\log n). Prior to our work, algorithms with near-linear work and polylogarithmic depth were known only for k=3k=3 [Miller, Ramachandran, STOC'87]; for k=4k=4, sequential algorithms achieving near-linear time were known [Forster, Nanongkai, Yang, Saranurak, Yingchareonthawornchai, SODA'20], but no algorithm with near-linear work could achieve even sublinear (on nn) depth.
We give a deterministic algorithm for computing a global minimum vertex cut in a vertex-weighted graph nn vertices and mm edges in O^(mn)\widehat O(mn) time. This breaks the long-standing Ω^(n4)\widehat \Omega(n^{4})-time barrier in dense graphs, achievable by trivially computing all-pairs maximum flows. Up to subpolynomial factors, we match the fastest randomized O~(mn)\tilde O(mn)-time algorithm by [Henzinger, Rao, and Gabow'00], and affirmatively answer the question by [Gabow'06] whether deterministic O(mn)O(mn)-time algorithms exist even for unweighted graphs. Our algorithm works in directed graphs, too. In unweighted undirected graphs, we present a faster deterministic $\widehat O(m\kappa)timealgorithmwhere-time algorithm where \kappa\le n$ is the size of the global minimum vertex cut. For a moderate value of κ\kappa, this strictly improves upon all previous deterministic algorithms in unweighted graphs with running time $\widehat O(m(n+\kappa^{2}))[Even75], [Even'75], \widehat O(m(n+\kappa\sqrt{n}))$ [Gabow'06], and O^(m2O(κ2))\widehat O(m2^{O(\kappa^{2})}) [Saranurak and Yingchareonthawornchai'22]. Recently, a linear-time algorithm has been shown by [Korhonen'24] for very small κ\kappa. Our approach applies the common-neighborhood clustering, recently introduced by [Blikstad, Jiang, Mukhopadhyay, Yingchareonthawornchai'25], in novel ways, e.g., on top of weighted graphs and on top of vertex-expander decomposition. We also exploit pseudorandom objects often used in computational complexity communities, including crossing families based on dispersers from [Wigderson and Zuckerman'99; TaShma, Umans and Zuckerman'01] and selectors based on linear lossless condensers [Guruswwami, Umans and Vadhan'09; Cheraghchi'11]. To our knowledge, this is the first application of selectors in graph algorithms.
There are no more papers matching your filters at the moment.