Tianjin Artificial Intelligence Innovation Center (TAIIC)
Despite the rapid development of neural vocoders in recent years, they usually suffer from some intrinsic challenges like opaque modeling, and parameter-performance trade-off. In this study, we propose an innovative time-frequency (T-F) domain-based neural vocoder to resolve the above-mentioned challenges. To be specific, we bridge the connection between the classical signal range-null decomposition (RND) theory and vocoder task, and the reconstruction of target spectrogram can be decomposed into the superimposition between the range-space and null-space, where the former is enabled by a linear domain shift from the original mel-scale domain to the target linear-scale domain, and the latter is instantiated via a learnable network for further spectral detail generation. Accordingly, we propose a novel dual-path framework, where the spectrum is hierarchically encoded/decoded, and the cross- and narrow-band modules are elaborately devised for efficient sub-band and sequential modeling. Comprehensive experiments are conducted on the LJSpeech and LibriTTS benchmarks. Quantitative and qualitative results show that while enjoying lightweight network parameters, the proposed approach yields state-of-the-art performance among existing advanced methods. Our code and the pretrained model weights are available at this https URL.
The advance of direct satellite-to-device communication has positioned mega-satellite constellations as a cornerstone of 6G wireless communication, enabling seamless global connectivity even in remote and underserved areas. However, spectrum scarcity and capacity constraints imposed by the Shannon's classical information theory remain significant challenges for supporting the massive data demands of multimedia-rich wireless applications. Generative Semantic Communication (GSC), powered by artificial intelligence-based generative foundation models, represents a paradigm shift from transmitting raw data to exchanging semantic meaning. GSC can not only reduce bandwidth consumption, but also enhance key semantic features in multimedia content, thereby offering a promising solution to overcome the limitations of traditional satellite communication systems. This article investigates the integration of GSC into mega-satellite constellations from a networking perspective. We propose a GSC-empowered satellite networking architecture and identify key enabling technologies, focusing on GSC-empowered network modeling and GSC-aware networking strategies. We construct a discrete temporal graph to model semantic encoders and decoders, distinct knowledge bases, and resource variations in mega-satellite networks. Based on this framework, we develop model deployment for semantic encoders and decoders and GSC-compatible routing schemes, and then present performance evaluations. Finally, we outline future research directions for advancing GSC-empowered satellite networks.
Visual speech recognition (VSR), commonly known as lip reading, has garnered significant attention due to its wide-ranging practical applications. The advent of deep learning techniques and advancements in hardware capabilities have significantly enhanced the performance of lip reading models. Despite these advancements, existing datasets predominantly feature stable video recordings with limited variability in lip movements. This limitation results in models that are highly sensitive to variations encountered in real-world scenarios. To address this issue, we propose a novel framework, LipGen, which aims to improve model robustness by leveraging speech-driven synthetic visual data, thereby mitigating the constraints of current datasets. Additionally, we introduce an auxiliary task that incorporates viseme classification alongside attention mechanisms. This approach facilitates the efficient integration of temporal information, directing the model's focus toward the relevant segments of speech, thereby enhancing discriminative capabilities. Our method demonstrates superior performance compared to the current state-of-the-art on the lip reading in the wild (LRW) dataset and exhibits even more pronounced advantages under challenging conditions.
In Reconfigurable Intelligent Surfaces (RIS), reflective elements (REs) are typically configured as a single array, but as RE numbers increase, this approach incurs high overhead for optimal configuration. Subarray grouping provides an effective tradeoff between performance and overhead. This paper studies RIS-aided massive random access (RA) at the Medium Access Control (MAC) layer in cellular networks to enhance throughput. We introduce an opportunistic scheduling scheme that integrates multi-round access requests, subarray grouping for efficient RIS link acquisition, and multi-user data transmission. To optimize access request timing, RIS estimation overhead and throughput, we propose a multi-user RA strategy using sequential decision optimization to maximize average system throughput. A low-complexity algorithm is also developed for practical implementation. Both theoretical analysis and numerical simulations demonstrate that the proposed strategy significantly outperforms the extremes of full-array grouping and element-wise grouping.
This paper addresses the challenges of throughput optimization in wireless cache-aided cooperative networks. We propose an opportunistic cooperative probing and scheduling strategy for efficient content delivery. The strategy involves the base station probing the relaying channels and cache states of multiple cooperative nodes, thereby enabling opportunistic user scheduling for content delivery. Leveraging the theory of Sequentially Planned Decision (SPD) optimization, we dynamically formulate decisions on cooperative probing and stopping time. Our proposed Reward Expected Thresholds (RET)-based strategy optimizes opportunistic probing and scheduling. This approach significantly enhances system throughput by exploiting gains from local caching, cooperative transmission and time diversity. Simulations confirm the effectiveness and practicality of the proposed Media Access Control (MAC) strategy.
In this work, we investigate resource allocation strategy for real time communication (RTC) over satellite networks with virtual network functions. Enhanced by inter-satellite links (ISLs), in-orbit computing and network virtualization technologies, large-scale satellite networks promise global coverage at low-latency and high-bandwidth for RTC applications with diversified functions. However, realizing RTC with specific function requirements using intermittent ISLs, requires efficient routing methods with fast response times. We identify that such a routing problem over time-varying graph can be formulated as an integer linear programming problem. The branch and bound method incurs O(Lτ(3Vτ+Lτ)Lτ)\mathcal{O}(|\mathcal{L}^{\tau}| \cdot (3 |\mathcal{V}^{\tau}| + |\mathcal{L}^{\tau}|)^{|\mathcal{L}^{\tau}|}) time complexity, where Vτ|\mathcal{V}^{\tau}| is the number of nodes, and Lτ|\mathcal{L}^{\tau}| is the number of links during time interval τ{\tau}. By adopting a k-shortest path-based algorithm, the theoretical worst case complexity becomes O(Vτ!Vτ3)O(|\mathcal{V}^{\tau}|! \cdot |\mathcal{V}^{\tau}|^3). Although it runs fast in most cases, its solution can be sub-optimal and may not be found, resulting in compromised acceptance ratio in practice. To overcome this, we further design a graph-based algorithm by exploiting the special structure of the solution space, which can obtain the optimal solution in polynomial time with a computational complexity of O(3Lτ+(2logVτ+1)Vτ)\mathcal{O}(3|\mathcal{L}^{\tau}| + (2\log{|\mathcal{V}^{\tau}|}+1) |\mathcal{V}^{\tau}|). Simulations conducted on starlink constellation with thousands of satellites corroborate the effectiveness of the proposed algorithm.
This paper addresses the challenges of throughput optimization in wireless cache-aided cooperative networks. We propose an opportunistic cooperative probing and scheduling strategy for efficient content delivery. The strategy involves the base station probing the relaying channels and cache states of multiple cooperative nodes, thereby enabling opportunistic user scheduling for content delivery. Leveraging the theory of Sequentially Planned Decision (SPD) optimization, we dynamically formulate decisions on cooperative probing and stopping time. Our proposed Reward Expected Thresholds (RET)-based strategy optimizes opportunistic probing and scheduling. This approach significantly enhances system throughput by exploiting gains from local caching, cooperative transmission and time diversity. Simulations confirm the effectiveness and practicality of the proposed Media Access Control (MAC) strategy.
There are no more papers matching your filters at the moment.