alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Ask or search anything...

Events

Watch Recordings

AI for Law01/09 · Joel Niklaus · Hugging Face

Papers Benchmarks

ETH Zurich

Small-scale dynamics and structure of free-surface turbulence

05 Dec 2024

ETH Zurich Zhejiang University logo

Zhejiang University

The dynamics of small-scale structures in free-surface turbulence is crucial to large-scale phenomena in natural and industrial environments. Here we conduct experiments on the quasi-flat free surface of a zero-mean-flow turbulent water tank over the Reynolds number range

Re_{\lambda} = 207\textrm{--}312

. By seeding microscopic floating particles at high concentrations, the fine scales of the flow and the velocity gradient tensor are resolved. A kinematic relation is derived expressing the contribution of surface divergence and vorticity to the dissipation rate. The probability density functions of divergence, vorticity and strain-rate collapse once normalized by the Kolmogorov scales. Their magnitude displays strong intermittency and follows chi-square distributions with power-law tails at small values. The topology of high-intensity events and two-point statistics indicate that the surface divergence is characterized by dissipative spatial and temporal scales, while the high-vorticity and high-strain-rate regions are larger, long-lived, concurrent, and elongated. The second-order velocity structure functions obey the classic Kolmogorov scaling in the inertial range when the dissipation rate on the surface is considered, with a different numerical constant than in 3D turbulence. The cross-correlation among divergence, vorticity and strain-rate indicates that the surface-attached vortices are strengthened during downwellings and diffuse when those dissipate. Sources (sinks) in the surface velocity fields are associated with strong (weak) surface-parallel stretching and compression along perpendicular directions. The floating particles cluster over spatial and temporal scales larger than those of the sinks. These results demonstrate that, compared to 3D turbulence, in free-surface turbulence the energetic scales leave a stronger imprint on the small-scale quantities.

#physics #fluid-dynamics

Paper thumbnail

Efficient Tabular Data Preprocessing of ML Pipelines

23 Sep 2024

ETH Zurich researchers developed Piper, an FPGA-based hardware accelerator to address the CPU-GPU performance mismatch in machine learning pipelines by efficiently offloading stateful tabular data preprocessing. Piper achieved up to a 71.3x speedup over a 128-core CPU server and 20.3x over an Nvidia V100 GPU for binary input, significantly improving GPU utilization and reducing resource consumption.

#computer-science #hardware-architecture #machine-learning

Paper thumbnail

Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples

08 Oct 2025

ETH Zurich Anthropic logo

Research from institutions including the UK AI Security Institute and Anthropic demonstrates that poisoning attacks on Large Language Models are determined by a near-constant absolute number of malicious samples, rather than a percentage of the total training data. As few as 250 poisoned documents were sufficient to backdoor models ranging from 600 million to 13 billion parameters, though subsequent alignment training significantly reduced attack success.

#adversarial-attacks #adversarial-robustness #computer-science

Paper thumbnail

AnyUp: Universal Feature Upsampling

14 Oct 2025

ETH Zurich Google logo

AnyUp introduces a universal method for generating high-resolution feature maps from diverse low-resolution vision encoders without requiring model-specific retraining. The approach achieves state-of-the-art performance across various dense prediction tasks and generalizes robustly to unseen feature types and resolutions.

#computer-science #computer-vision-and-pattern-recognition #machine-learning

Paper thumbnail

Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models

12 Jun 2023

allen-nie

Allen Nie

jos-rozen

Jos Rozen

sgdgp

Sayan Ghosh

ETH Zurich KAIST logo

A large-scale and diverse benchmark, BIG-bench, was introduced to rigorously evaluate the capabilities and limitations of large language models across 204 tasks. The evaluation revealed that even state-of-the-art models currently achieve aggregate scores below 20 (on a 0-100 normalized scale), indicating significantly lower performance compared to human experts.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

Apertus: Democratizing Open and Compliant LLMs for Global Language Environments

01 Dec 2025

ETH Zurich University of Zurich

We present Apertus, a fully open suite of large language models (LLMs) designed to address two systemic shortcomings in today's open model ecosystem: data compliance and multilingual representation. Unlike many prior models that release weights without reproducible data pipelines or regard for content-owner rights, Apertus models are pretrained exclusively on openly available data, retroactively respecting `this http URL` exclusions and filtering for non-permissive, toxic, and personally identifiable content. To mitigate risks of memorization, we adopt the Goldfish objective during pretraining, strongly suppressing verbatim recall of data while retaining downstream task performance. The Apertus models also expand multilingual coverage, training on 15T tokens from over 1800 languages, with ~40% of pretraining data allocated to non-English content. Released at 8B and 70B scales, Apertus approaches state-of-the-art results among fully open models on multilingual benchmarks, rivalling or surpassing open-weight counterparts. Beyond model weights, we release all scientific artifacts from our development cycle with a permissive license, including data preparation scripts, checkpoints, evaluation suites, and training code, enabling transparent audit and extension.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling

20 Oct 2025

michal-balcerak

Michał Balcerak

ETH Zurich Harvard University logo

Harvard University

Energy Matching presents a generative framework that unifies optimal transport flow matching with Energy-Based Models by learning a single, time-independent scalar potential. The method achieves state-of-the-art EBM performance with an FID of 3.34 on CIFAR-10, demonstrating competitive generation quality with leading diffusion models and enhanced capabilities for conditional generation and inverse problems.

#computer-science #artificial-intelligence #machine-learning

Paper thumbnail

Graph of Thoughts: Solving Elaborate Problems with Large Language Models

07 Feb 2024

ETH Zurich Warsaw University of Technology

Graph of Thoughts (GoT) introduces a novel prompting framework that models Large Language Model (LLM) reasoning as an arbitrary graph structure. This approach enables more flexible thought transformations like aggregation and refinement, leading to superior solution quality (e.g., 62% median error reduction in sorting) and improved cost-efficiency (e.g., >31% cost reduction) compared to previous state-of-the-art methods like Tree of Thoughts on elaborate tasks.

#computer-science #artificial-intelligence #computation-and-language

Resources 2,280

Paper thumbnail

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

29 Oct 2024

pashmina-cameron

Pashmina Cameron

ETH Zurich EPFL logo

QuaRot introduces a method for end-to-end 4-bit quantization of Large Language Models, including weights, activations, and the KV cache, by implicitly removing outliers from activations through orthogonal transformations of the model's weights. This approach enabled LLAMA2-70B to achieve a perplexity of 3.79 with a 3.33x prefill speedup and 3.89x memory savings compared to the FP16 baseline.

#computer-science #machine-learning #efficient-transformers

Paper thumbnail

Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics

24 Nov 2025

ETH Zurich ETH AI Center

Researchers from ETH Zurich developed the Robotic World Model (RWM), a framework that learns robust world models for complex robotic environments without domain-specific biases. This approach enables policies trained solely in imagination to be deployed onto physical quadrupedal and humanoid robots with zero-shot transfer, effectively bridging the sim-to-real gap for complex low-level control tasks.

#computer-science #artificial-intelligence #machine-learning

Paper thumbnail

Defeating Prompt Injections by Design

24 Jun 2025

ETH Zurich Google DeepMind logo

Google DeepMind

Researchers from Google, Google DeepMind, and ETH Zurich introduced CaMeL, a system-level defense that secures Large Language Model (LLM) agents against prompt injection attacks by integrating traditional software security principles like control and data flow integrity and capabilities. This approach achieved 0 successful prompt injection attacks on the AgentDojo benchmark, significantly outperforming heuristic methods, while maintaining 77% task success.

#adversarial-attacks #agentic-frameworks #agents

Paper thumbnail

SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining

03 Jun 2025

ETH Zurich University of Amsterdam logo

University of Amsterdam

SceneSplat introduces a framework for open-vocabulary 3D scene understanding that natively operates on 3D Gaussian Splats, supported by the new large-scale SceneSplat-7K dataset. This approach achieves state-of-the-art zero-shot semantic segmentation, boosting f-mIoU by up to 10.4% on ScanNet++ benchmarks, while being 445.8 times faster for inference compared to prior methods.

#computer-science #computer-vision-and-pattern-recognition #geometric-deep-learning

Paper thumbnail

TreeRPO: Tree Relative Policy Optimization

27 Sep 2025

zhicheng-yang

Zhicheng Yang

ETH Zurich Sun Yat-Sen University logo

Sun Yat-Sen University

TREERPO enhances Large Language Model reasoning by employing a novel tree sampling mechanism to generate fine-grained, step-level reward signals without requiring a separate process reward model. This method improves Pass@1 accuracy by up to 16.5% for Qwen2.5-Math-1.5B and reduces average response length by 18.1% compared to GRPO.

#agents #chain-of-thought #computer-science

Paper thumbnail

AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents

24 Nov 2024

ETH Zurich Invariant Labs

ETH Zurich researchers developed AgentDojo, a dynamic and extensible evaluation framework to measure the adversarial robustness of LLM agents against prompt injection attacks in realistic, tool-calling environments. The framework revealed that even highly capable LLMs struggle with complex benign tasks and are susceptible to prompt injection attacks, with more capable models often being easier to attack. While existing defenses show mixed results, simple tool isolation mechanisms proved most effective at mitigating attacks.

#adversarial-attacks #adversarial-robustness #agent-based-systems

Paper thumbnail

Attention-Based Map Encoding for Learning Generalized Legged Locomotion

11 Jun 2025

ETH Zurich Disney Research Zurich

An end-to-end learning framework integrates attention mechanisms into deep reinforcement learning to enable precise foothold selection and robust locomotion for legged robots on sparse terrains. The system allowed quadrupedal and humanoid robots to successfully traverse complex obstacle courses, showing higher success rates and better velocity tracking than previous methods.

#computer-science #robotics

Paper thumbnail

Developing Generalist Foundation Models from a Multimodal Dataset for 3D Computed Tomography

30 Sep 2025

Enis Simsar

ETH Zurich University of Zurich

Advancements in medical imaging AI, particularly in 3D imaging, have been limited due to the scarcity of comprehensive datasets. We introduce CT-RATE, a public dataset that pairs 3D medical images with corresponding textual reports. CT-RATE comprises 25,692 non-contrast 3D chest CT scans from 21,304 unique patients. Each scan is accompanied by its corresponding radiology report. Leveraging CT-RATE, we develop CT-CLIP, a CT-focused contrastive language-image pretraining framework designed for broad applications without the need for task-specific training. We demonstrate how CT-CLIP can be used in multi-abnormality detection and case retrieval, and outperforms state-of-the-art fully supervised models across all key metrics. By combining CT-CLIP's vision encoder with a pretrained large language model, we create CT-CHAT, a vision-language foundational chat model for 3D chest CT volumes. Finetuned on over 2.7 million question-answer pairs derived from the CT-RATE dataset, CT-CHAT underscores the necessity for specialized methods in 3D medical imaging. Collectively, the open-source release of CT-RATE, CT-CLIP, and CT-CHAT not only addresses critical challenges in 3D medical imaging but also lays the groundwork for future innovations in medical AI and improved patient care.

#ai-for-health #computer-science #computer-vision-security

Paper thumbnail

Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLMs

23 Sep 2025

ETH Zurich University of Tübingen

Large language model (LLM) developers aim for their models to be honest, helpful, and harmless. However, when faced with malicious requests, models are trained to refuse, sacrificing helpfulness. We show that frontier LLMs can develop a preference for dishonesty as a new strategy, even when other options are available. Affected models respond to harmful requests with outputs that sound harmful but are crafted to be subtly incorrect or otherwise harmless in practice. This behavior emerges with hard-to-predict variations even within models from the same model family. We find no apparent cause for the propensity to deceive, but show that more capable models are better at executing this strategy. Strategic dishonesty already has a practical impact on safety evaluations, as we show that dishonest responses fool all output-based monitors used to detect jailbreaks that we test, rendering benchmark scores unreliable. Further, strategic dishonesty can act like a honeypot against malicious users, which noticeably obfuscates prior jailbreak attacks. While output monitors fail, we show that linear probes on internal activations can be used to reliably detect strategic dishonesty. We validate probes on datasets with verifiable outcomes and by using them as steering vectors. Overall, we consider strategic dishonesty as a concrete example of a broader concern that alignment of LLMs is hard to control, especially when helpfulness and harmlessness conflict.

#adversarial-attacks #adversarial-robustness #agents

Paper thumbnail

Object-Centric Learning with Slot Attention

14 Oct 2020

ETH Zurich Max Planck Institute for Intelligent Systems

The paper introduces Slot Attention, an architectural module designed to extract object-centric representations from raw perceptual inputs. This module enables efficient unsupervised object discovery and supervised set prediction, demonstrating strong generalization to varying object counts and achieving competitive performance with significant computational efficiency improvements over prior methods.

#attention-mechanisms #computer-science #computer-vision-and-pattern-recognition

Paper thumbnail

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

04 Sep 2025

jdekoninck

Jasper Dekoninck

kristian-minchev

Kristian Minchev

maria-drencheva

Maria Drencheva

ETH Zurich INSAIT, Sofia University ",St. Kliment Ohridski"

Researchers from ETH Zurich and INSAIT conducted the first evaluation of large language models on generating rigorous natural language proofs for problems from the 2025 USA Mathematical Olympiad. The study found that most state-of-the-art models scored below 5% of the maximum points, with the highest-performing model, GEMINI-2.5-PRO, achieving only 24.4%, demonstrating fundamental shortcomings in advanced mathematical reasoning capabilities.

#chain-of-thought #computer-science #computation-and-language

Paper thumbnail

Generalized Interpolating Discrete Diffusion

09 Jun 2025

ETH Zurich Max Planck Institute for Intelligent Systems

The Generalized Interpolating Discrete Diffusion (GIDD) framework introduces a flexible theoretical foundation for discrete diffusion models, demonstrating that hybrid noise (masking and uniform) enables self-correction abilities in text generation. Models trained with this approach achieve superior generative sample quality, with a BASE model (p_u=0.2) improving generative perplexity from 214 to 93.3, and also reaching state-of-the-art compute-matched perplexity for mask-only diffusion language models (22.29 PPL).

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

There are no more papers matching your filters at the moment.