logic-in-computer-science
Researchers at Rice University developed a dynamic-programming framework for symbolic Boolean synthesis using Zero-suppressed Decision Diagrams (ZDDs). Their tool, DPZynth, achieves improved scalability and performance, notably demonstrating exponentially better results on the 'mutex' benchmark family, and introduces a practical 'magic number' heuristic to balance planning and execution in complex solvers.
Craig interpolation and uniform interpolation have many applications in knowledge representation, including explainability, forgetting, modularization and reuse, and even learning. At the same time, many relevant knowledge representation formalisms do in general not have Craig or uniform interpolation, and computing interpolants in practice is challenging. We have a closer look at two prominent knowledge representation formalisms, description logics and logic programming, and discuss theoretical results and practical methods for computing interpolants.
30
We report on the Equational Theories Project (ETP), an online collaborative pilot project to explore new ways to collaborate in mathematics with machine assistance. The project successfully determined all 22 028 942 edges of the implication graph between the 4694 simplest equational laws on magmas, by a combination of human-generated and automated proofs, all validated by the formal proof assistant language Lean. As a result of this project, several new constructions of magmas satisfying specific laws were discovered, and several auxiliary questions were also addressed, such as the effect of restricting attention to finite magmas.
Neural networks are widely used, yet their analysis and verification remain challenging. In this work, we present a Lean 4 formalization of neural networks, covering both deterministic and stochastic models. We first formalize Hopfield networks, recurrent networks that store patterns as stable states. We prove convergence and the correctness of Hebbian learning, a training rule that updates network parameters to encode patterns, here limited to the case of pairwise-orthogonal patterns. We then consider stochastic networks, where updates are probabilistic and convergence is to a stationary distribution. As a canonical example, we formalize the dynamics of Boltzmann machines and prove their ergodicity, showing convergence to a unique stationary distribution using a new formalization of the Perron-Frobenius theorem.
We demonstrate that the checkable/synthesisable split in bidirectional typechecking coincides with existing dualities in polarised System L, also known as polarised μμ~\mu\tilde{\mu}-calculus. Specifically, positive terms and negative coterms are checkable, and negative terms and positive coterms are synthesisable. This combines a standard formulation of bidirectional typechecking with Zeilberger's `cocontextual' variant. We extend this to ordinary `cartesian' System L using Mc Bride's co-de Bruijn formulation of scopes, and show that both can be combined in a linear-nonlinear style, where linear types are positive and cartesian types are negative. This yields a remarkable 3-way coincidence between the shifts of polarised System L, LNL calculi, and bidirectional calculi.
We present a proof system that extends action logic by omega iteration, which is viewed as infinitary multiplicative conjunction. We prove cut admissibility and establish complexity bounds for the provability predicate.
Large language models (LLMs) excel across many natural language tasks, yet their generalisation to structural perturbations in logical contexts remains poorly understood. We introduce a controlled evaluation framework that probes reasoning reliability through four targeted stress tests: (1) rule deletion, removing either redundant or essential rules from a multi-step inference chain; (2) contradictory evidence injection; (3) logic-preserving rewrites generated through several families of equivalence laws (contrapositive, double negation, implication, De Morgan, identity, and commutativity); and (4) multi-law equivalence stacking that introduces 2-5 simultaneous logical transformations. Across three representative model families: BERT, Qwen2, and LLaMA-like models. Our experiments reveal a strikingly consistent pattern: all models achieve perfect accuracy on the base tasks and remain fully generalise to redundant rule deletion and all equivalence-based rewrites (single or multi-law), but fail sharply under essential rule deletion (dropping to 25% accuracy) and collapse completely in the presence of explicit contradictions (0% accuracy). These results demonstrate that LLMs possess stable invariance to semantic-preserving logical transformations, yet remain fundamentally brittle to missing or conflicting evidence. Our framework provides a clean diagnostic tool for isolating such reasoning failure modes and highlights persistent gaps in the logical generalisation abilities of current LLMs.
Researchers from Shanghai Jiao Tong University developed an "Information Physics of Intelligence" framework, unifying logical depth, Shannon entropy, and thermodynamic principles to establish fundamental limits and optimal strategies for balancing information storage and computation. Their work introduces a critical frequency threshold and an Energy-Time-Space Triality Bound, demonstrating optimal strategies can yield up to 54.7% latency reduction and 53% energy savings in hybrid knowledge systems.
A study demonstrated that Quantum Picturalism (QPic) enables high school students to learn and excel in advanced quantum theory, with 82% passing an exam derived from Oxford University postgraduate questions and 48% achieving distinction, even with limited prior quantum knowledge. This suggests a pathway to democratize access to complex quantum concepts.
Current AI paradigms, as "architects of experience," face fundamental challenges in explainability and value alignment. This paper introduces "Weight-Calculatism," a novel cognitive architecture grounded in first principles, and demonstrates its potential as a viable pathway toward Artificial General Intelligence (AGI). The architecture deconstructs cognition into indivisible Logical Atoms and two fundamental operations: Pointing and Comparison. Decision-making is formalized through an interpretable Weight-Calculation model (Weight = Benefit * Probability), where all values are traceable to an auditable set of Initial Weights. This atomic decomposition enables radical explainability, intrinsic generality for novel situations, and traceable value alignment. We detail its implementation via a graph-algorithm-based computational engine and a global workspace workflow, supported by a preliminary code implementation and scenario validation. Results indicate that the architecture achieves transparent, human-like reasoning and robust learning in unprecedented scenarios, establishing a practical and theoretical foundation for building trustworthy and aligned AGI.
Orbit-finite models of computation generalise the standard models of computation, to allow computation over infinite objects that are finite up to symmetries on atoms, denoted by A\mathbb{A}. Set theory with atoms is used to reason about these objects. Recent work assumes that A\mathbb{A} is countable and that the symmetries are the automorphisms of a structure on A\mathbb{A}. We study this set theory to understand generalisations of this approach. We show that: this construction is well-defined and sufficiently expressive; and that automorphism groups are adequate. Certain uncountable structures appear similar to countable structures, suggesting that the theory of orbit-finite constructions may apply to these uncountable structures. We prove results guaranteeing that the theory of symmetries of two structures are equal. Let: PM(A)PM(\mathcal{A}) be the universe of symmetries induced by adding atoms in bijection with A\mathcal{A} and considering the symmetric universe; A\underline{\mathcal{A}} be the image of A\mathcal{A} on the atoms; and ϕPM(A)\phi ^{PM(\mathcal{A})} be the relativisation of ϕ\phi to PM(A)PM(\mathcal{A}). We prove that all symmetric universes of equality atoms have theory Th(PM(N))Th(PM(\left\langle \mathbb{N}\right\rangle)). We prove that for structures A\mathcal{A}, `nicely' covered by a set of cardinality κ\kappa, there is a structure BA\mathcal{B}\equiv\mathcal{A} of size κ\kappa such that for all formulae ϕ(x)\phi(x) in one variable, \begin{equation*} ZFC\vdash \phi(\underline{\mathcal{A}})^{PM(\mathcal{A})}\leftrightarrow\phi(\underline{\mathcal{B}})^{PM(\mathcal{B})} \end{equation*}
Researchers at the University of Oxford present a complete classification of long-refinement graphs where the maximum vertex degree is at most 4, thereby resolving previous gaps for specific graph sizes. The work also definitively proves the non-existence of "long-refinement pairs" for any graph order n ≥ 3, which were previously an open problem.
Autodiff refers to the core of the automatic differentiation systems developed in projects like JAX and Dex. Autodiff has recently been formalised in a linear typed calculus by Radul et al in arXiv:2204.10923. Although this formalisation suffices to express the main program transformations of Autodiff, the calculus is very specific to this task, and it is not clear whether the type system yields a substructural logic that has interest on its own. We propose an encoding of Autodiff into a linear λ\lambda-calculus that enjoys a Curry-Howard correspondence with Girard's linear logic. We prove that the encoding is sound both qualitatively (the encoded terms are extensionally equivalent to the original ones) and quantitatively (the encoding preserves the original work cost as described in arXiv:2204.10923). As a byproduct, we show that unzipping, one of the transformations used to implement backpropagation in Autodiff, is, in fact, optional.
Researchers at Duke University developed a geometric framework to model Large Language Model reasoning as continuous flows within their representation space. Their work empirically demonstrates that formal logical structures govern the velocity and curvature of these reasoning flows, while surface-level semantic content primarily determines the absolute position of representations.
2
Research from Purdue University demonstrates that transformer models can learn to infer graph connectivity from training examples, achieving near 100% test accuracy on grid-like graphs, but they struggle with disconnected chain graphs where a simple low-dimensional embedding heuristic is not applicable. For challenging graph structures, increasing training data size proves more effective than scaling model size.
1
We verify the correctness of a variety of mutual exclusion algorithms through model checking. We look at algorithms where communication is via shared read/write registers, where those registers can be atomic or non-atomic. For the verification of liveness properties, it is necessary to assume a completeness criterion to eliminate spurious counterexamples. We use justness as completeness criterion. Justness depends on a concurrency relation; we consider several such relations, modelling different assumptions on the working of the shared registers. We present executions demonstrating the violation of correctness properties by several algorithms, and in some cases suggest improvements.
103
Assertion-Based Verification (ABV) is critical for ensuring functional correctness in modern hardware systems. However, manually writing high-quality SVAs remains labor-intensive and error-prone. To bridge this gap, we propose AssertCoder, a novel unified framework that automatically generates high-quality SVAs directly from multimodal hardware design specifications. AssertCoder employs a modality-sensitive preprocessing to parse heterogeneous specification formats (text, tables, diagrams, and formulas), followed by a set of dedicated semantic analyzers that extract structured representations aligned with signal-level semantics. These representations are utilized to drive assertion synthesis via multi-step chain-of-thought (CoT) prompting. The framework incorporates a mutation-based evaluation approach to assess assertion quality via model checking and further refine the generated assertions. Experimental evaluation across three real-world Register-Transfer Level (RTL) designs demonstrates AssertCoder's superior performance, achieving an average increase of 8.4% in functional correctness and 5.8% in mutation detection compared to existing state-of-the-art approaches.
Logical reasoning is a core capability for large language models (LLMs), yet existing benchmarks that rely solely on final-answer accuracy fail to capture the quality of the reasoning process. To address this, we introduce FineLogic, a fine-grained evaluation framework that assesses logical reasoning across three dimensions: overall accuracy, stepwise soundness, and representation-level probing. Leveraging this framework, we conduct a comprehensive study on how different supervision formats in fine-tuning shape reasoning abilities. We fine-tune LLMs on four supervision styles: one in natural language and three symbolic variants. We find a key trade-off: natural language supervision excels at generalization to out-of-distribution and long-chain problems, whereas symbolic supervision is superior at instilling structurally sound, atomic reasoning steps. Furthermore, our probing analysis indicates that fine-tuning primarily refines the model's step-by-step generation process, rather than improving its ability to converge on an answer early. Together, our framework and analysis provide a more rigorous lens for evaluating and improving logical reasoning in LLMs. The code is available at this https URL.
3
Large language models (LLMs) are increasingly integrated in software development, but ensuring correctness in LLM-generated code remains challenging and often requires costly manual review. Verifiable code generation -- jointly generating code, specifications, and proofs of code-specification alignment -- offers a promising path to address this limitation and further unleash LLMs' benefits in coding. Yet, there exists a significant gap in evaluation: current benchmarks often focus on only individual components rather than providing a holistic evaluation framework of all tasks. In this paper, we introduce Verina (Verifiable Code Generation Arena), a high-quality benchmark enabling a comprehensive and modular evaluation of code, specification, and proof generation as well as their compositions. Verina consists of 189 manually curated coding tasks in Lean, with detailed problem descriptions, reference implementations, formal specifications, and extensive test suites. Our extensive evaluation of state-of-the-art LLMs reveals significant challenges in verifiable code generation, especially in proof generation, underscoring the need for improving LLM-based theorem provers in verification domains. The best model, OpenAI o4-mini, achieves a 61.4\% code correctness rate, 51.0\% for specification soundness and completeness, and a mere 3.6\% proof success rate (based on one trial per task). We hope Verina will catalyze progress in verifiable code generation by providing a rigorous and comprehensive benchmark. We release our dataset on this https URL and our evaluation code on this https URL.
30
A new benchmark, SATBench, assesses LLMs' search-based logical reasoning through automatically generated natural language puzzles derived from Boolean Satisfiability problems. The evaluation shows LLMs achieve 89.3% accuracy on easier problems but drop to 53.0% on hard instances, particularly struggling with unsatisfiable scenarios and exhibiting unreliable reasoning traces.
1
There are no more papers matching your filters at the moment.