Northeast Electric Power University
Although Reinforcement Learning (RL) agents are effective in well-defined environments, they often struggle to generalize their learned policies to dynamic settings due to their reliance on trial-and-error interactions. Recent work has explored applying Large Language Models (LLMs) or Vision Language Models (VLMs) to boost the generalization of RL agents through policy optimization guidance or prior knowledge. However, these approaches often lack seamless coordination between the RL agent and the foundation model, leading to unreasonable decision-making in unfamiliar environments and efficiency bottlenecks. Making full use of the inferential capabilities of foundation models and the rapid response capabilities of RL agents and enhancing the interaction between the two to form a dual system is still a lingering scientific question. To address this problem, we draw inspiration from Kahneman's theory of fast thinking (System 1) and slow thinking (System 2), demonstrating that balancing intuition and deep reasoning can achieve nimble decision-making in a complex world. In this study, we propose a Dual-System Adaptive Decision Framework (DSADF), integrating two complementary modules: System 1, comprising an RL agent and a memory space for fast and intuitive decision making, and System 2, driven by a VLM for deep and analytical reasoning. DSADF facilitates efficient and adaptive decision-making by combining the strengths of both systems. The empirical study in the video game environment: Crafter and Housekeep demonstrates the effectiveness of our proposed method, showing significant improvements in decision abilities for both unseen and known tasks.
Temporal action localization (TAL) involves dual tasks to classify and localize actions within untrimmed videos. However, the two tasks often have conflicting requirements for features. Existing methods typically employ separate heads for classification and localization tasks but share the same input feature, leading to suboptimal performance. To address this issue, we propose a novel TAL method with Cross Layer Task Decoupling and Refinement (CLTDR). Based on the feature pyramid of video, CLTDR strategy integrates semantically strong features from higher pyramid layers and detailed boundary-aware boundary features from lower pyramid layers to effectively disentangle the action classification and localization tasks. Moreover, the multiple features from cross layers are also employed to refine and align the disentangled classification and regression results. At last, a lightweight Gated Multi-Granularity (GMG) module is proposed to comprehensively extract and aggregate video features at instant, local, and global temporal granularities. Benefiting from the CLTDR and GMG modules, our method achieves state-of-the-art performance on five challenging benchmarks: THUMOS14, MultiTHUMOS, EPIC-KITCHENS-100, ActivityNet-1.3, and HACS. Our code and pre-trained models are publicly available at: this https URL.
2
The "You only look once v4"(YOLOv4) is one type of object detection methods in deep learning. YOLOv4-tiny is proposed based on YOLOv4 to simple the network structure and reduce parameters, which makes it be suitable for developing on the mobile and embedded devices. To improve the real-time of object detection, a fast object detection method is proposed based on YOLOv4-tiny. It firstly uses two ResBlock-D modules in ResNet-D network instead of two CSPBlock modules in Yolov4-tiny, which reduces the computation complexity. Secondly, it designs an auxiliary residual network block to extract more feature information of object to reduce detection error. In the design of auxiliary network, two consecutive 3x3 convolutions are used to obtain 5x5 receptive fields to extract global features, and channel attention and spatial attention are also used to extract more effective information. In the end, it merges the auxiliary network and backbone network to construct the whole network structure of improved YOLOv4-tiny. Simulation results show that the proposed method has faster object detection than YOLOv4-tiny and YOLOv3-tiny, and almost the same mean value of average precision as the YOLOv4-tiny. It is more suitable for real-time object detection.
The emerging cryptocurrency market has lately received great attention for asset allocation due to its decentralization uniqueness. However, its volatility and brand new trading mode have made it challenging to devising an acceptable automatically-generating strategy. This study proposes a framework for automatic high-frequency bitcoin transactions based on a deep reinforcement learning algorithm-proximal policy optimization (PPO). The framework creatively regards the transaction process as actions, returns as awards and prices as states to align with the idea of reinforcement learning. It compares advanced machine learning-based models for static price predictions including support vector machine (SVM), multi-layer perceptron (MLP), long short-term memory (LSTM), temporal convolutional network (TCN), and Transformer by applying them to the real-time bitcoin price and the experimental results demonstrate that LSTM outperforms. Then an automatically-generating transaction strategy is constructed building on PPO with LSTM as the basis to construct the policy. Extensive empirical studies validate that the proposed method performs superiorly to various common trading strategy benchmarks for a single financial product. The approach is able to trade bitcoins in a simulated environment with synchronous data and obtains a 31.67% more return than that of the best benchmark, improving the benchmark by 12.75%. The proposed framework can earn excess returns through both the period of volatility and surge, which opens the door to research on building a single cryptocurrency trading strategy based on deep learning. Visualizations of trading the process show how the model handles high-frequency transactions to provide inspiration and demonstrate that it can be expanded to other financial products.
In the era of Industry 4.0, ensuring the resilience of cyber-physical systems against sophisticated cyber threats is increasingly critical. This study proposes a pioneering AI-based control framework that enhances short-term voltage stability assessments (STVSA) in power systems under complex composite cyber-attacks. First, by incorporating white-box and black-box adversarial attacks with Denial-of-Service (DoS) perturbations during training, composite adversarial attacks are implemented. Second, the application of Spectral Normalized Conditional Wasserstein Generative Adversarial Network with Gradient Penalty (SNCWGAN-GP) and Fast Gradient Sign Method (FGSM) strengthens the model's resistance to adversarial disturbances, improving data quality and training stability. Third, an assessment model based on Long Short-Term Memory (LSTM)-enhanced Graph Attention Network (L-GAT) is developed to capture dynamic relationships between the post-fault dynamic trajectories and electrical grid topology. Experimental results on the IEEE 39-bus test system demonstrate the efficacy and superiority of the proposed method in composite cyber-attack scenarios. This contribution is pivotal to advancing AI-based resilient control strategies for nonlinear dynamical systems, marking a substantial enhancement in the security of cyber-physical systems.
This paper presents GCNs-Net, a graph convolutional neural network framework designed for decoding time-resolved EEG motor imagery signals by leveraging functional topological relationships among electrodes. The system achieved an average accuracy of 93.06% on the PhysioNet Dataset (10 subjects) and demonstrated state-of-the-art performance at the 100-subject group level with 88.14% accuracy.
Hydrogen energy plays an important role in the transformation of low-carbon energy, and electric hydrogen coupling will become a typical energy scenario. Aiming at the operation flexibility of low-carbon electricity hydrogen coupling system with high proportion of wind power and photovoltaic, this paper studies the flexibility margin of electricity hydrogen coupling energy block based on model predictive control (MPC). By analyzing the power exchange characteristics of heterogeneous energy, the homogenization models of various heterogeneous energy sources are established. According to the analysis of power system flexibility margin, three dimensions of flexibility margin evaluation indexes are defined from the dimension of system operation, and an electricity hydrogen coupling energy block scheduling model is established. The model predictive control algorithm is used to optimize the power balance operation of the electro hydrogen coupling energy block, and the flexibility margin of the energy block is quantitatively analyzed and calculated. Through the example analysis, it is verified that the calculation method proposed in this paper can not only realize the on-line power balance optimization of electric hydrogen coupling energy block, but also effectively quantify the operation flexibility margin of electric hydrogen coupling energy block.
Transformer has emerged as a powerful deep-learning technique for two-dimensional (2D) seismic data interpolation, owing to its global modeling ability. However, its core operation introduces heavy computational burden due to the quadratic complexity, hindering its further application to higher-dimensional data. To achieve Transformer-based three-dimensional (3D) seismic interpolation, we propose a 2.5-dimensional Transformer network (T-2.5D) that adopts a cross-dimensional transfer learning (TL) strategy, so as to adapt the 2D Transformer encoders to 3D seismic data. The proposed T-2.5D is mainly composed of 2D Transformer encoders and 3D seismic dimension adapters (SDAs). Each 3D SDA is placed before a Transformer encoder to learn spatial correlation information across seismic lines. The proposed cross-dimensional TL strategy comprises two stages: 2D pre-training and 3D fine-tuning. In the first stage, we optimize the 2D Transformer encoders using a large amount of 2D data patches. In the second stage, we freeze the 2D Transformer encoders and fine-tune the 3D SDAs using limited 3D data volumes. Extensive experiments on multiple datasets are conducted to assess the effectiveness and efficiency of T-2.5D. Experimental results demonstrate that the proposed method achieves comparable performance to that of full 3D Transformer at a significantly low cost.
Macro photography (MP) is a specialized field of photography that captures objects at an extremely close range, revealing tiny details. Although an accurate macro photography image quality assessment (MPIQA) metric can benefit macro photograph capturing, which is vital in some domains such as scientific research and medical applications, the lack of MPIQA data limits the development of MPIQA metrics. To address this limitation, we conducted a large-scale MPIQA study. Specifically, to ensure diversity both in content and quality, we sampled 2,000 MP images from 15,700 MP images, collected from three public image websites. For each MP image, 17 (out of 21 after outlier removal) quality ratings and a detailed quality report of distortion magnitudes, types, and positions are gathered by a lab study. The images, quality ratings, and quality reports form our novel multi-labeled MPIQA database, MMP-2k. Experimental results showed that the state-of-the-art generic IQA metrics underperform on MP images. The database and supplementary materials are available at this https URL
11
In a modern power system with an increasing proportion of renewable energy, wind power prediction is crucial to the arrangement of power grid dispatching plans due to the volatility of wind power. However, traditional centralized forecasting methods raise concerns regarding data privacy-preserving and data islands problem. To handle the data privacy and openness, we propose a forecasting scheme that combines federated learning and deep reinforcement learning (DRL) for ultra-short-term wind power forecasting, called federated deep reinforcement learning (FedDRL). Firstly, this paper uses the deep deterministic policy gradient (DDPG) algorithm as the basic forecasting model to improve prediction accuracy. Secondly, we integrate the DDPG forecasting model into the framework of federated learning. The designed FedDRL can obtain an accurate prediction model in a decentralized way by sharing model parameters instead of sharing private data which can avoid sensitive privacy issues. The simulation results show that the proposed FedDRL outperforms the traditional prediction methods in terms of forecasting accuracy. More importantly, while ensuring the forecasting performance, FedDRL can effectively protect the data privacy and relieve the communication pressure compared with the traditional centralized forecasting method. In addition, a simulation with different federated learning parameters is conducted to confirm the robustness of the proposed scheme.
Global warming presents an unprecedented challenge to our planet however comprehensive understanding remains hindered by geographical biases temporal limitations and lack of standardization in existing research. An end to end visual analysis of global warming using three distinct temperature datasets is presented. A baseline adjusted from the Paris Agreements one point five degrees Celsius benchmark based on data analysis is employed. A closed loop design from visualization to prediction and clustering is created using classic models tailored to the characteristics of the data. This approach reduces complexity and eliminates the need for advanced feature engineering. A lightweight convolutional neural network and long short term memory model specifically designed for global temperature change is proposed achieving exceptional accuracy in long term forecasting with a mean squared error of three times ten to the power of negative six and an R squared value of zero point nine nine nine nine. Dynamic time warping and KMeans clustering elucidate national level temperature anomalies and carbon emission patterns. This comprehensive method reveals intricate spatiotemporal characteristics of global temperature variations and provides warming trend attribution. The findings offer new insights into climate change dynamics demonstrating that simplicity and precision can coexist in environmental analysis.
Post-stack seismic profiles are images reflecting containing geological structures which provides a critical foundation for understanding the distribution of oil and gas resources. However, due to the limitations of seismic acquisition equipment and data collecting geometry, the post-stack profiles suffer from low resolution and strong noise issues, which severely affects subsequent seismic interpretation. To better enhance the spatial resolution and signal-to-noise ratio of post-seismic profiles, a multi-scale attention encoder-decoder network based on generative adversarial network (MAE-GAN) is proposed. This method improves the resolution of post-stack profiles, and effectively suppresses noises and recovers weak signals as well. A multi-scale residual module is proposed to extract geological features under different receptive fields. At the same time, an attention module is designed to further guide the network to focus on important feature information. Additionally, to better recover the global and local information of post-stack profiles, an adversarial network based on a Markov discriminator is proposed. Finally, by introducing an edge information preservation loss function, the conventional loss function of the Generative Adversarial Network is improved, which enables better recovery of the edge information of the original post-stack profiles. Experimental results on simulated and field post-stack profiles demonstrate that the proposed MAE-GAN method outperforms two advanced convolutional neural network-based methods in noise suppression and weak signal recovery. Furthermore, the profiles reconstructed by the MAE-GAN method preserve more geological structures.
For humans, taste is essential for perceiving food's nutrient content or harmful components. The current sensory evaluation of taste mainly relies on artificial sensory evaluation and electronic tongue, but the former has strong subjectivity and poor repeatability, and the latter is not flexible enough. This work proposed a strategy for acquiring and recognizing taste electroencephalogram (EEG), aiming to decode people's objective perception of taste through taste EEG. Firstly, according to the proposed experimental paradigm, the taste EEG of subjects under different taste stimulation was collected. Secondly, to avoid insufficient training of the model due to the small number of taste EEG samples, a Temporal and Spatial Reconstruction Data Augmentation (TSRDA) method was proposed, which effectively augmented the taste EEG by reconstructing the taste EEG's important features in temporal and spatial dimensions. Thirdly, a multi-view channel attention module was introduced into a designed convolutional neural network to extract the important features of the augmented taste EEG. The proposed method has accuracy of 99.56%, F1-score of 99.48%, and kappa of 99.38%, proving the method's ability to distinguish the taste EEG evoked by different taste stimuli successfully. In summary, combining TSRDA with taste EEG technology provides an objective and effective method for sensory evaluation of food taste.
The utilization of large-scale distributed renewable energy promotes the development of the multi-microgrid (MMG), which raises the need of developing an effective energy management method to minimize economic costs and keep self energy-sufficiency. The multi-agent deep reinforcement learning (MADRL) has been widely used for the energy management problem because of its real-time scheduling ability. However, its training requires massive energy operation data of microgrids (MGs), while gathering these data from different MGs would threaten their privacy and data security. Therefore, this paper tackles this practical yet challenging issue by proposing a federated multi-agent deep reinforcement learning (F-MADRL) algorithm via the physics-informed reward. In this algorithm, the federated learning (FL) mechanism is introduced to train the F-MADRL algorithm thus ensures the privacy and the security of data. In addition, a decentralized MMG model is built, and the energy of each participated MG is managed by an agent, which aims to minimize economic costs and keep self energy-sufficiency according to the physics-informed reward. At first, MGs individually execute the self-training based on local energy operation data to train their local agent models. Then, these local models are periodically uploaded to a server and their parameters are aggregated to build a global agent, which will be broadcasted to MGs and replace their local agents. In this way, the experience of each MG agent can be shared and the energy operation data is not explicitly transmitted, thus protecting the privacy and ensuring data security. Finally, experiments are conducted on Oak Ridge national laboratory distributed energy control communication lab microgrid (ORNL-MG) test system, and the comparisons are carried out to verify the effectiveness of introducing the FL mechanism and the outperformance of our proposed F-MADRL.
A community integrated energy system (CIES) is an important carrier of the energy internet and smart city in geographical and functional terms. Its emergence provides a new solution to the problems of energy utilization and environmental pollution. To coordinate the integrated demand response and uncertainty of renewable energy generation (RGs), a data-driven two-stage distributionally robust optimization (DRO) model is constructed. A comprehensive norm consisting of the 1-norm and infinity-norm is used as the uncertainty probability distribution information set, thereby avoiding complex probability density information. To address multiple uncertainties of RGs, a generative adversarial network based on the Wasserstein distance with gradient penalty is proposed to generate RG scenarios, which has wide applicability. To further tap the potential of the demand response, we take into account the ambiguity of human thermal comfort and the thermal inertia of buildings. Thus, an integrated demand response mechanism is developed that effectively promotes the consumption of renewable energy. The proposed method is simulated in an actual CIES in North China. In comparison with traditional stochastic programming and robust optimization, it is verified that the proposed DRO model properly balances the relationship between economical operation and robustness while exhibiting stronger adaptability. Furthermore, our approach outperforms other commonly used DRO methods with better operational economy, lower renewable power curtailment rate, and higher computational efficiency.
Battery Energy Storage Systems (BESS) are essential for peak shaving, balancing power supply and demand while enhancing grid efficiency. This study proposes a cycle-based control strategy for charging and discharging, which optimizes capture rate (CR), release rate (RR), and capacity utilization rate (CUR), improving BESS performance. Compared to traditional day-ahead methods, the cycle-based approach enhances operational accuracy and reduces capacity waste, achieving a CUR increase from 75.1% to 79.9%. An innovative cluster-level power allocation method, leveraging an improved Particle Swarm Optimization (PSO) algorithm, is introduced. This strategy reduces daily energy loss by 174.21 kWh (3.7%) and increases BESS efficiency by 0.4%. Transient and steady-state energy loss components are analyzed, revealing that transient loss proportion decreases significantly as power depth increases, from 27.2% at 1 MW to 1.3% at 10 MW. Simulations based on a detailed Simulink/Simscape model validate these methods, demonstrating enhanced peak shaving effectiveness and prolonged BESS lifespan by reducing equivalent cycles. The study provides a robust framework for optimizing BESS performance and efficiency in real-world applications.
An improved Multi-Agent Soft Actor-Critic (MASAC) algorithm, augmented with Automated Machine Learning (AutoML), facilitates collaborative optimization scheduling for multi-microgrid (MMG) systems, leading to a 7.36% reduction in MMG operating costs and outperforming other DRL methods by 12.9% to 17.30% in economic efficiency and convergence speed.
This paper examines the Galileo Open Service Navigation Message Authentication (OSNMA) and, for the first time, discovers two critical vulnerabilities, namely artificially-manipulated time synchronization (ATS) and interruptible message authentication (IMA). ATS allows attackers falsify a receiver's signals and/or local reference time (LRT) while still fulfilling the time synchronization (TS) requirement. IMA allows temporary interruption of the navigation data authentication process due to the reception of a broken message (probably caused by spoofing attacks) and restores the authentication later. By exploiting the ATS vulnerability, we propose a TS-comply replay (TSR) attack with two variants (real-time and non-real-time), where attackers replay signals to a victim receiver while strictly complying with the TS rule. We further propose a TS-comply forgery (TSF) attack, where attackers first use a previously-disclosed key to forge a message based on the OSNMA protocol, then tamper with the vitcim receiver's LRT correspondingly to comply with the TS rule and finally transmit the forged message to the receiver. Finally, we propose a concatenating replay (CR) attack based on the IMA vulnerability, where attackers concatenate replayed signals to the victim receiver's signals in a way that still enables correct verification of the navigation data in the replayed signals. To validate the effectiveness of the proposed attacks, we conduct real-world experiments with a commercial Galileo receiver manufactured by Septentrio, two software-defined radio (SDR) devices, open-source Galileo-SDR-SIM and OSNMAlib software. The results showed that all the attacks can successfully pass the OSNMA scheme and the TSF attack can spoof receivers to arbitrary locations.
The taste electroencephalogram (EEG) evoked by the taste stimulation can reflect different brain patterns and be used in applications such as sensory evaluation of food. However, considering the computational cost and efficiency, EEG data with many channels has to face the critical issue of channel selection. This paper proposed a channel selection method called class activation mapping with attention (CAM-Attention). The CAM-Attention method combined a convolutional neural network with channel and spatial attention (CNN-CSA) model with a gradient-weighted class activation mapping (Grad-CAM) model. The CNN-CSA model exploited key features in EEG data by attention mechanism, and the Grad-CAM model effectively realized the visualization of feature regions. Then, channel selection was effectively implemented based on feature regions. Finally, the CAM-Attention method reduced the computational burden of taste EEG recognition and effectively distinguished the four tastes. In short, it has excellent recognition performance and provides effective technical support for taste sensory evaluation.
The electric vehicle (EV) and electric vehicle charging station (EVCS) have been widely deployed with the development of large-scale transportation electrifications. However, since charging behaviors of EVs show large uncertainties, the forecasting of EVCS charging power is non-trivial. This paper tackles this issue by proposing a reinforcement learning assisted deep learning framework for the probabilistic EVCS charging power forecasting to capture its uncertainties. Since the EVCS charging power data are not standard time-series data like electricity load, they are first converted to the time-series format. On this basis, one of the most popular deep learning models, the long short-term memory (LSTM) is used and trained to obtain the point forecast of EVCS charging power. To further capture the forecast uncertainty, a Markov decision process (MDP) is employed to model the change of LSTM cell states, which is solved by our proposed adaptive exploration proximal policy optimization (AePPO) algorithm based on reinforcement learning. Finally, experiments are carried out on the real EVCSs charging data from Caltech, and Jet Propulsion Laboratory, USA, respectively. The results and comparative analysis verify the effectiveness and outperformance of our proposed framework.
There are no more papers matching your filters at the moment.