alphaXiv

History

Papers Benchmarks

LuxiTech

1,118

01 Dec 2025

computer-science artificial-intelligence computation-and-language

SpikingBrain: Spiking Brain-inspired Large Models

Chinese Academy of Sciences

Beihang University

The Hong Kong Polytechnic University Beijing Academy of Artificial Intelligence Zhongguancun Academy LuxiTech Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology Beijing Key Laboratory of Brain-Inspired General Intelligence Large Model MetaX Integrated Circuit Co., Ltd.

Mainstream Transformer-based large language models face major efficiency bottlenecks: training computation scales quadratically with sequence length, and inference memory grows linearly, limiting long-context processing. Building large models on non-NVIDIA platforms also poses challenges for stable and efficient training. To address this, we introduce SpikingBrain, a family of brain-inspired models designed for efficient long-context training and inference. SpikingBrain leverages the MetaX GPU cluster and focuses on three aspects: (1) Model Architecture: linear and hybrid-linear attention architectures with adaptive spiking neurons; (2) Algorithmic Optimizations: an efficient, conversion-based training pipeline and a dedicated spike coding framework; (3) System Engineering: customized training frameworks, operator libraries, and parallelism strategies tailored to MetaX hardware. Using these techniques, we develop two models: SpikingBrain-7B, a linear LLM, and SpikingBrain-76B, a hybrid-linear MoE LLM. These models demonstrate the feasibility of large-scale LLM development on non-NVIDIA platforms, and training remains stable for weeks on hundreds of MetaX GPUs with Model FLOPs Utilization at expected levels. SpikingBrain achieves performance comparable to open-source Transformer baselines while using only about 150B tokens for continual pre-training. Our models also significantly improve long-context efficiency and deliver inference with (partially) constant memory and event-driven spiking behavior. For example, SpikingBrain-7B attains over 100x speedup in Time to First Token for 4M-token sequences. Furthermore, the proposed spiking scheme achieves 69.15 percent sparsity, enabling low-power operation. Overall, this work demonstrates the potential of brain-inspired mechanisms to drive the next generation of efficient and scalable large model design.

1,022

2,218

31 Oct 2024

attention-mechanisms computer-science computation-and-language

Gated Slot Attention for Efficient Linear-Time Sequence Modeling

University of Waterloo Tencent AI Lab Soochow University

University of California, Santa Cruz LuxiTech

Freda Shi

Gated Slot Attention (GSA) presents a linear-time recurrent architecture that incorporates a gating mechanism to enhance memory capacity and adaptively forget information. It improves performance on in-context recall-intensive tasks and enables more effective finetuning of pretrained Transformers into recurrent models, often surpassing larger models trained from scratch on various benchmarks.

2,500

25 Jul 2025

computer-science computation-and-language efficient-transformers

Scalable MatMul-free Language Modeling

University of California, Davis Soochow University

University of California, Santa Cruz LuxiTech

Researchers from the University of California, Santa Cruz and Intel Labs developed the first scalable language model architecture that completely eliminates matrix multiplication (MatMul) operations. This MatMul-free model achieves competitive performance with Transformer++ models up to 2.7 billion parameters while drastically improving computational and energy efficiency on both GPUs and neuromorphic hardware.

3,032

10 Jan 2025

computer-science machine-learning edge-computing

Threshold Neuron: A Brain-inspired Artificial Neuron for Efficient On-device Inference

Tsinghua University

Peking University LuxiTech

Enhancing the computational efficiency of on-device Deep Neural Networks (DNNs) remains a significant challengein mobile and edge computing. As we aim to execute increasingly complex tasks with constrained computational resources, much of the research has focused on compressing neural network structures and optimizing systems. Although many studies have focused on compressing neural network structures and parameters or optimizing underlying systems, there has been limited attention on optimizing the fundamental building blocks of neural networks: the neurons. In this study, we deliberate on a simple but important research question: Can we design artificial neurons that offer greater efficiency than the traditional neuron paradigm? Inspired by the threshold mechanisms and the excitation-inhibition balance observed in biological neurons, we propose a novel artificial neuron model, Threshold Neurons. Using Threshold Neurons, we can construct neural networks similar to those with traditional artificial neurons, while significantly reducing hardware implementation complexity. Our extensive experiments validate the effectiveness of neural networks utilizing Threshold Neurons, achieving substantial power savings of 7.51x to 8.19x and area savings of 3.89x to 4.33x at the kernel level, with minimal loss in precision. Furthermore, FPGA-based implementations of these networks demonstrate 2.52x power savings and 1.75x speed enhancements at the system level. The source code will be made available upon publication.

15 Dec 2024

computer-science artificial-intelligence neural-and-evolutionary-computing

Deployment Pipeline from Rockpool to Xylo for Edge Computing

LuxiTech SynSense

Deploying Spiking Neural Networks (SNNs) on the Xylo neuromorphic chip via the Rockpool framework represents a significant advancement in achieving ultra-low-power consumption and high computational efficiency for edge applications. This paper details a novel deployment pipeline, emphasizing the integration of Rockpool's capabilities with Xylo's architecture, and evaluates the system's performance in terms of energy efficiency and accuracy. The unique advantages of the Xylo chip, including its digital spiking architecture and event-driven processing model, are highlighted to demonstrate its suitability for real-time, power-sensitive applications.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode