alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Browser Extension

Ask or search anything...

Events

Watch Recordings

AI for Law01/09 · Joel Niklaus · Hugging Face

Papers Benchmarks

Unicom Data IntelligenceChina Unicom

HumorReject: Decoupling LLM Safety from Refusal Prefix via A Little Humor

10 Nov 2025

zihui-wu

zihui wu

Xidian University China Unicom

Large Language Models (LLMs) commonly rely on explicit refusal prefixes for safety, making them vulnerable to prefix injection attacks. We introduce HumorReject, a novel data-driven approach that reimagines LLM safety by decoupling it from refusal prefixes through humor as an indirect refusal strategy. Rather than explicitly rejecting harmful instructions, HumorReject responds with contextually appropriate humor that naturally defuses potentially dangerous requests. Our approach effectively addresses common "over-defense" issues while demonstrating superior robustness against various attack vectors. Our findings suggest that improvements in training data design can be as important as the alignment algorithm itself in achieving effective LLM safety. The code and dataset are available at this https URL.

#adversarial-attacks #adversarial-robustness #computer-science

Paper thumbnail

GlitchMiner: Mining Glitch Tokens in Large Language Models via Gradient-based Discrete Optimization

10 Nov 2025

zihui-wu

zihui wu

Xidian University China Unicom

Glitch tokens, inputs that trigger unpredictable or anomalous behavior in Large Language Models (LLMs), pose significant challenges to model reliability and safety. Existing detection methods primarily rely on heuristic embedding patterns or statistical anomalies within internal representations, limiting their generalizability across different model architectures and potentially missing anomalies that deviate from observed patterns. We introduce GlitchMiner, an behavior-driven framework designed to identify glitch tokens by maximizing predictive entropy. Leveraging a gradient-guided local search strategy, GlitchMiner efficiently explores the discrete token space without relying on model-specific heuristics or large-batch sampling. Extensive experiments across ten LLMs from five major model families demonstrate that GlitchMiner consistently outperforms existing approaches in detection accuracy and query efficiency, providing a generalizable and scalable solution for effective glitch token discovery. Code is available at [this https URL]

#adversarial-attacks #adversarial-robustness #computer-science

Paper thumbnail

Large-Scale AI in Telecom: Charting the Roadmap for Innovation, Scalability, and Enhanced Digital Experiences

06 Mar 2025

rongpengli

Rongpeng Li

Northwestern Polytechnical University Northeastern University logo

Northeastern University

A comprehensive white paper from the GenAINet Initiative introduces Large Telecom Models (LTMs) as a novel framework for integrating AI into telecommunications infrastructure, providing a detailed roadmap for innovation while addressing critical challenges in scalability, hardware requirements, and regulatory compliance through insights from a diverse coalition of academic, industry and regulatory experts.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

CHiSafetyBench: A Chinese Hierarchical Safety Benchmark for Large Language Models

02 Sep 2024

China Unicom Unicom Digital Technology

Researchers from China Unicom developed CHiSafetyBench, a hierarchical safety benchmark for Chinese Large Language Models, which features a culturally relevant taxonomy, multi-turn conversational scenarios, and an LLM-based automatic evaluation method. Evaluations on mainstream Chinese LLMs showed varying safety capabilities across models and significant performance drops in multi-turn risky dialogues.

#computer-science #conversational-ai #artificial-intelligence

Paper thumbnail

LeMiCa: Lexicographic Minimax Path Caching for Efficient Diffusion-Based Video Generation

17 Nov 2025

China Unicom Unicom Data Intelligence, China Unicom

We present LeMiCa, a training-free and efficient acceleration framework for diffusion-based video generation. While existing caching strategies primarily focus on reducing local heuristic errors, they often overlook the accumulation of global errors, leading to noticeable content degradation between accelerated and original videos. To address this issue, we formulate cache scheduling as a directed graph with error-weighted edges and introduce a Lexicographic Minimax Path Optimization strategy that explicitly bounds the worst-case path error. This approach substantially improves the consistency of global content and style across generated frames. Extensive experiments on multiple text-to-video benchmarks demonstrate that LeMiCa delivers dual improvements in both inference speed and generation quality. Notably, our method achieves a 2.9x speedup on the Latte model and reaches an LPIPS score of 0.05 on Open-Sora, outperforming prior caching techniques. Importantly, these gains come with minimal perceptual quality degradation, making LeMiCa a robust and generalizable paradigm for accelerating diffusion-based video generation. We believe this approach can serve as a strong foundation for future research on efficient and reliable video synthesis. Our code is available at :this https URL

#computer-science #artificial-intelligence #computer-vision-and-pattern-recognition

Paper thumbnail

Safety Evaluation of DeepSeek Models in Chinese Contexts

08 May 2025

Unicom Data Intelligence Data Science & Artifical Intelligence Research Institute, China Unicom

This paper from Unicom Data Intelligence and China Unicom conducted the first comprehensive safety evaluation of DeepSeek-R1 and DeepSeek-V3 models in Chinese contexts using CHiSafetyBench. The study revealed that DeepSeek models exhibit weaker performance in identifying risky content and refusing harmful queries compared to other Chinese LLMs, particularly in the "discrimination" category.

#ai-for-cybersecurity #computer-science #artificial-intelligence

Paper thumbnail

Optimizing for the Shortest Path in Denoising Diffusion Model

13 Mar 2025

Southwest Petroleum University Unicom Data Intelligence

In this research, we propose a novel denoising diffusion model based on shortest-path modeling that optimizes residual propagation to enhance both denoising efficiency and quality. Drawing on Denoising Diffusion Implicit Models (DDIM) and insights from graph theory, our model, termed the Shortest Path Diffusion Model (ShortDF), treats the denoising process as a shortest-path problem aimed at minimizing reconstruction error. By optimizing the initial residuals, we improve the efficiency of the reverse diffusion process and the quality of the generated samples. Extensive experiments on multiple standard benchmarks demonstrate that ShortDF significantly reduces diffusion time (or steps) while enhancing the visual fidelity of generated samples compared to prior arts. This work, we suppose, paves the way for interactive diffusion-based applications and establishes a foundation for rapid data generation. Code is available at this https URL

#computer-science #computer-vision-and-pattern-recognition

Paper thumbnail

Joint Deblurring and 3D Reconstruction for Macrophotography

02 Oct 2025

University of Science and Technology of China China Unicom

Macro lens has the advantages of high resolution and large magnification, and 3D modeling of small and detailed objects can provide richer information. However, defocus blur in macrophotography is a long-standing problem that heavily hinders the clear imaging of the captured objects and high-quality 3D reconstruction of them. Traditional image deblurring methods require a large number of images and annotations, and there is currently no multi-view 3D reconstruction method for macrophotography. In this work, we propose a joint deblurring and 3D reconstruction method for macrophotography. Starting from multi-view blurry images captured, we jointly optimize the clear 3D model of the object and the defocus blur kernel of each pixel. The entire framework adopts a differentiable rendering method to self-supervise the optimization of the 3D model and the defocus blur kernel. Extensive experiments show that from a small number of multi-view images, our proposed method can not only achieve high-quality image deblurring but also recover high-fidelity 3D appearance.

#computer-science #computer-vision-and-pattern-recognition #generative-models

Paper thumbnail

TP3M: Transformer-based Pseudo 3D Image Matching with Reference Image

12 Aug 2024

China Unicom Unicom Digital Technology

Image matching is still challenging in such scenes with large viewpoints or illumination changes or with low textures. In this paper, we propose a Transformer-based pseudo 3D image matching method. It upgrades the 2D features extracted from the source image to 3D features with the help of a reference image and matches to the 2D features extracted from the destination image by the coarse-to-fine 3D matching. Our key discovery is that by introducing the reference image, the source image's fine points are screened and furtherly their feature descriptors are enriched from 2D to 3D, which improves the match performance with the destination image. Experimental results on multiple datasets show that the proposed method achieves the state-of-the-art on the tasks of homography estimation, pose estimation and visual localization especially in challenging scenes.

#attention-mechanisms #computer-science #computer-vision-security

Paper thumbnail

Data-Driven Deepfake Image Detection Method -- The 2024 Global Deepfake Image Detection Challenge

15 Aug 2025

China Unicom AI Innovation Center

With the rapid development of technology in the field of AI, deepfake technology has emerged as a double-edged sword. It has not only created a large amount of AI-generated content but also posed unprecedented challenges to digital security. The task of the competition is to determine whether a face image is a Deepfake image and output its probability score of being a Deepfake image. In the image track competition, our approach is based on the Swin Transformer V2-B classification network. And online data augmentation and offline sample generation methods are employed to enrich the diversity of training samples and increase the generalization ability of the model. Finally, we got the award of excellence in Deepfake image detection.

#computer-science #computer-vision-security #computer-vision-and-pattern-recognition

Paper thumbnail

Matching-Free Depth Recovery from Structured Light

25 Jun 2025

University of Science and Technology of China China Unicom

We introduce a novel approach for depth estimation using images obtained from monocular structured light systems. In contrast to many existing methods that depend on image matching, our technique employs a density voxel grid to represent scene geometry. This grid is trained through self-supervised differentiable volume rendering. Our method leverages color fields derived from the projected patterns in structured light systems during the rendering process, facilitating the isolated optimization of the geometry field. This innovative approach leads to faster convergence and high-quality results. Additionally, we integrate normalized device coordinates (NDC), a distortion loss, and a distinctive surface-based color loss to enhance geometric fidelity. Experimental results demonstrate that our method outperforms current matching-based techniques in terms of geometric performance in few-shot scenarios, achieving an approximately 30% reduction in average estimated depth errors for both synthetic scenes and real-world captured scenes. Moreover, our approach allows for rapid training, being approximately three times faster than previous matching-free methods that utilize implicit representations.

#computer-science #computer-vision-security #computer-vision-and-pattern-recognition

Paper thumbnail

Beyond Real: Imaginary Extension of Rotary Position Embeddings for Long-Context LLMs

08 Dec 2025

Fudan University Shanghai AI Lab

Rotary Position Embeddings (RoPE) have become a standard for encoding sequence order in Large Language Models (LLMs) by applying rotations to query and key vectors in the complex plane. Standard implementations, however, utilize only the real component of the complex-valued dot product for attention score calculation. This simplification discards the imaginary component, which contains valuable phase information, leading to a potential loss of relational details crucial for modeling long-context dependencies. In this paper, we propose an extension that re-incorporates this discarded imaginary component. Our method leverages the full complex-valued representation to create a dual-component attention score. We theoretically and empirically demonstrate that this approach enhances the modeling of long-context dependencies by preserving more positional information. Furthermore, evaluations on a suite of long-context language modeling benchmarks show that our method consistently improves performance over the standard RoPE, with the benefits becoming more significant as context length increases. The code is available at this https URL.

#attention-mechanisms #computer-science #computation-and-language

Paper thumbnail

Application-Driven AI Paradigm for Person Counting in Various Scenarios

24 Mar 2023

Person counting is considered as a fundamental task in video surveillance. However, the scenario diversity in practical applications makes it difficult to exploit a single person counting model for general use. Consequently, engineers must preview the video stream and manually specify an appropriate person counting model based on the scenario of camera shot, which is time-consuming, especially for large-scale deployments. In this paper, we propose a person counting paradigm that utilizes a scenario classifier to automatically select a suitable person counting model for each captured frame. First, the input image is passed through the scenario classifier to obtain a scenario label, which is then used to allocate the frame to one of five fine-tuned models for person counting. Additionally, we present five augmentation datasets collected from different scenarios, including side-view, long-shot, top-view, customized and crowd, which are also integrated to form a scenario classification dataset containing 26323 samples. In our comparative experiments, the proposed paradigm achieves better balance than any single model on the integrated dataset, thus its generalization in various scenarios has been proved.

#computer-science #computer-vision-and-pattern-recognition #domain-adaptation

Paper thumbnail

Linear Model of RIS-Aided High-Mobility Communication System

28 Feb 2025

South China University of Technology University College London logo

University College London

Reconfigurable intelligent surface (RIS)-aided vehicle-to-everything (V2X) communication has emerged as a crucial solution for providing reliable data services to vehicles on the road. However, in delay-sensitive or high-mobility communications, the rapid movement of vehicles can lead to random scattering in the environment and time-selective fading in the channel. In view of this, we investigate in this paper an innovative linear model with low-complexity transmitter signal design and receiver detection methods, which boost stability in fast-fading environments and reduce channel training overhead. Specifically, considering the differences in hardware design and signal processing at the receiving end between uplink and downlink communication systems, distinct solutions are proposed. Accordingly, we first integrate the Rician channel introduced by the RIS with the corresponding signal processing algorithms to model the RIS-aided downlink communication system as a Doppler-robust linear model. Inspired by this property, we design a precoding scheme based on the linear model to reduce the complexity of precoding. Then, by leveraging the linear model and the large-scale antenna array at the base station (BS) side, we improve the linear model for the uplink communication system and derive its asymptotic performance in closed-form. Simulation results demonstrate the performance advantages of the proposed RIS-aided high-mobility communication system compared to other benchmark schemes.

#signal-processing #electrical-engineering

Paper thumbnail

What is the best model? Application-driven Evaluation for Large Language Models

14 Jun 2024

China Unicom Unicom Digital Technology

General large language models enhanced with supervised fine-tuning and reinforcement learning from human feedback are increasingly popular in academia and industry as they generalize foundation models to various practical tasks in a prompt manner. To assist users in selecting the best model in practical application scenarios, i.e., choosing the model that meets the application requirements while minimizing cost, we introduce A-Eval, an application-driven LLMs evaluation benchmark for general large language models. First, we categorize evaluation tasks into five main categories and 27 sub-categories from a practical application perspective. Next, we construct a dataset comprising 678 question-and-answer pairs through a process of collecting, annotating, and reviewing. Then, we design an objective and effective evaluation method and evaluate a series of LLMs of different scales on A-Eval. Finally, we reveal interesting laws regarding model scale and task difficulty level and propose a feasible method for selecting the best model. Through A-Eval, we provide clear empirical and engineer guidance for selecting the best model, reducing barriers to selecting and using LLMs and promoting their application and development. Our benchmark is publicly available at this https URL.

#computer-science #artificial-intelligence #computation-and-language

Paper thumbnail

6G Network AI Architecture for Everyone-Centric Customized Services

07 Dec 2023

University of Oslo University of Waterloo logo

University of Waterloo

Mobile communication standards were developed for enhancing transmission and network performance by using more radio resources and improving spectrum and energy efficiency. How to effectively address diverse user requirements and guarantee everyone's Quality of Experience (QoE) remains an open problem. The Sixth Generation (6G) mobile systems will solve this problem by utilizing heterogenous network resources and pervasive intelligence to support everyone-centric customized services anywhere and anytime. In this article, we first coin the concept of Service Requirement Zone (SRZ) on the user side to characterize and visualize the integrated service requirements and preferences of specific tasks of individual users. On the system side, we further introduce the concept of User Satisfaction Ratio (USR) to evaluate the system's overall service ability of satisfying a variety of tasks with different SRZs. Then, we propose a network Artificial Intelligence (AI) architecture with integrated network resources and pervasive AI capabilities for supporting customized services with guaranteed QoEs. Finally, extensive simulations show that the proposed network AI architecture can consistently offer a higher USR performance than the cloud AI and edge AI architectures with respect to different task scheduling algorithms, random service requirements, and dynamic network conditions.

#computer-science #networking-and-internet-architecture

iLearnRobot: An Interactive Learning-Based Multi-Modal Robot with Continuous Improvement

25 Jun 2025

Unicom Data Intelligence China United Network Communications Group Corporation Limited

It is crucial that robots' performance can be improved after deployment, as they are inherently likely to encounter novel scenarios never seen before. This paper presents an innovative solution: an interactive learning-based robot system powered by a Multi-modal Large Language Model(MLLM). A key feature of our system is its ability to learn from natural dialogues with non-expert users. We also propose chain of question to clarify the exact intent of the question before providing an answer and dual-modality retrieval modules to leverage these interaction events to avoid repeating same mistakes, ensuring a seamless user experience before model updates, which is in contrast to current mainstream MLLM-based robotic systems. Our system marks a novel approach in robotics by integrating interactive learning, paving the way for superior adaptability and performance in diverse environments. We demonstrate the effectiveness and improvement of our method through experiments, both quantitively and qualitatively.

#computer-science #continual-learning #conversational-ai

Paper thumbnail

Internet of Intelligence: A Survey on the Enabling Technologies, Applications, and Challenges

18 May 2022

Purple Mountain Laboratories Beijing University of Posts and Telecommunications

The Internet of intelligence is conceived as an emerging networking paradigm, which will make intelligence as easy to obtain as information. This paper provides an overview of the Internet of intelligence, focusing on motivations, architecture, enabling technologies, applications, and existing challenges. This can provide a good foundation for those who are interested to gain insights into the concept of the Internet of intelligence and the key enablers of this emerging networking paradigm. Specifically, this paper starts by investigating the evolution of networking paradigms and artificial intelligence (AI), based on which we present the motivations of the Internet of intelligence by demonstrating that networking needs intelligence and intelligence needs networking. We then present the layered architecture to characterize the Internet of intelligence systems and discuss the enabling technologies of each layer. Moreover, we discuss the critical applications and their integration with the Internet of intelligence paradigm. Finally, some technical challenges and open issues are summarized to fully exploit the benefits of the Internet of intelligence.

#computer-science #networking-and-internet-architecture

Paper thumbnail

A Novel Speech-Driven Lip-Sync Model with CNN and LSTM

02 May 2022

Generating synchronized and natural lip movement with speech is one of the most important tasks in creating realistic virtual characters. In this paper, we present a combined deep neural network of one-dimensional convolutions and LSTM to generate vertex displacement of a 3D template face model from variable-length speech input. The motion of the lower part of the face, which is represented by the vertex movement of 3D lip shapes, is consistent with the input speech. In order to enhance the robustness of the network to different sound signals, we adapt a trained speech recognition model to extract speech feature, and a velocity loss term is adopted to reduce the jitter of generated facial animation. We recorded a series of videos of a Chinese adult speaking Mandarin and created a new speech-animation dataset to compensate the lack of such public data. Qualitative and quantitative evaluations indicate that our model is able to generate smooth and natural lip movements synchronized with speech.

#computer-science #computer-vision-security #artificial-intelligence

Paper thumbnail

Diversified and Compatible Web APIs Recommendation in IoT

11 Aug 2021

China Agricultural University Macquarie University

With the ever-increasing popularity of Service-oriented Architecture (SoA) and Internet of Things (IoT), a considerable number of enterprises or organizations are attempting to encapsulate their provided complex business services into various lightweight and accessible web APIs (application programming interfaces) with diverse functions. In this situation, a software developer can select a group of preferred web APIs from a massive number of candidates to create a complex mashup economically and quickly based on the keywords typed by the developer. However, traditional keyword-based web API search approaches often suffer from the following difficulties and challenges. First, they often focus more on the functional matching between the candidate web APIs and the mashup to be developed while neglecting the compatibility among different APIs, which probably returns a group of incompatible web APIs and further leads to a mashup development failure. Second, existing approaches often return a web API composition solution to the mashup developer for reference, which narrows the developer's API selection scope considerably and may reduce developer satisfaction heavily. In view of the above challenges and successful application of game theory in the IoT, based on the idea of game theory, we propose a compatible and diverse web APIs recommendation approach for mashup creations, named MCCOMP+DIV, to return multiple sets of diverse and compatible web APIs with higher success rate. Finally, we validate the effectiveness and efficiency of MCCOMP+DIV through a set of experiments based on a real-world web API dataset, i.e., the PW dataset crawled from ProgrammableWeb.com.

#computer-science #social-and-information-networks

Paper thumbnail

There are no more papers matching your filters at the moment.