Centre for Research and Technology Hellas
As the digital landscape becomes more interconnected, the frequency and severity of zero-day attacks, have significantly increased, leading to an urgent need for innovative Intrusion Detection Systems (IDS). Machine Learning-based IDS that learn from the network traffic characteristics and can discern attack patterns from benign traffic offer an advanced solution to traditional signature-based IDS. However, they heavily rely on labeled datasets, and their ability to generalize when encountering unseen traffic patterns remains a challenge. This paper proposes a novel self-supervised contrastive learning approach based on transformer encoders, specifically tailored for generalizable intrusion detection on raw packet sequences. Our proposed learning scheme employs a packet-level data augmentation strategy combined with a transformer-based architecture to extract and generate meaningful representations of traffic flows. Unlike traditional methods reliant on handcrafted statistical features (NetFlow), our approach automatically learns comprehensive packet sequence representations, significantly enhancing performance in anomaly identification tasks and supervised learning for intrusion detection. Our transformer-based framework exhibits better performance in comparison to existing NetFlow self-supervised methods. Specifically, we achieve up to a 3% higher AUC in anomaly detection for intra-dataset evaluation and up to 20% higher AUC scores in inter-dataset evaluation. Moreover, our model provides a strong baseline for supervised intrusion detection with limited labeled data, exhibiting an improvement over self-supervised NetFlow models of up to 1.5% AUC when pretrained and evaluated on the same dataset. Additionally, we show the adaptability of our pretrained model when fine-tuned across different datasets, demonstrating strong performance even when lacking benign data from the target domain.
In this paper, a novel multi-view methodology for graph-based neural networks is proposed. A systematic and methodological adaptation of the key concepts of classical deep learning methods such as convolution, pooling and multi-view architectures is developed for the context of non-Euclidean manifolds. The aim of the proposed work is to present a novel multi-view graph convolution layer, as well as a new view pooling layer making use of: a) a new hybrid Laplacian that is adjusted based on feature distance metric learning, b) multiple trainable representations of a feature matrix of a graph, using trainable distance matrices, adapting the notion of views to graphs and c) a multi-view graph aggregation scheme called graph view pooling, in order to synthesise information from the multiple generated views. The aforementioned layers are used in an end-to-end graph neural network architecture for graph classification and show competitive results to other state-of-the-art methods.
The rising availability of commercial 360360^\circ cameras that democratize indoor scanning, has increased the interest for novel applications, such as interior space re-design. Diminished Reality (DR) fulfills the requirement of such applications, to remove existing objects in the scene, essentially translating this to a counterfactual inpainting task. While recent advances in data-driven inpainting have shown significant progress in generating realistic samples, they are not constrained to produce results with reality mapped structures. To preserve the `reality' in indoor (re-)planning applications, the scene's structure preservation is crucial. To ensure structure-aware counterfactual inpainting, we propose a model that initially predicts the structure of an indoor scene and then uses it to guide the reconstruction of an empty -- background only -- representation of the same scene. We train and compare against other state-of-the-art methods on a version of the Structured3D dataset modified for DR, showing superior results in both quantitative metrics and qualitative results, but more interestingly, our approach exhibits a much faster convergence rate. Code and models are available at this https URL .
37
Data augmentation (DA) is ubiquitously used in training of Automatic Speech Recognition (ASR) models. DA offers increased data variability, robustness and generalization against different acoustic distortions. Recently, personalization of ASR models on mobile devices has been shown to improve Word Error Rate (WER). This paper evaluates data augmentation in this context and proposes persoDA; a DA method driven by user's data utilized to personalize ASR. persoDA aims to augment training with data specifically tuned towards acoustic characteristics of the end-user, as opposed to standard augmentation based on Multi-Condition Training (MCT) that applies random reverberation and noises. Our evaluation with an ASR conformer-based baseline trained on Librispeech and personalized for VOICES shows that persoDA achieves a 13.9% relative WER reduction over using standard data augmentation (using random noise & reverberation). Furthermore, persoDA shows 16% to 20% faster convergence over MCT.
In this paper, we introduce InDistill, a method that serves as a warmup stage for enhancing Knowledge Distillation (KD) effectiveness. InDistill focuses on transferring critical information flow paths from a heavyweight teacher to a lightweight student. This is achieved via a training scheme based on curriculum learning that considers the distillation difficulty of each layer and the critical learning periods when the information flow paths are established. This procedure can lead to a student model that is better prepared to learn from the teacher. To ensure the applicability of InDistill across a wide range of teacher-student pairs, we also incorporate a pruning operation when there is a discrepancy in the width of the teacher and student layers. This pruning operation reduces the width of the teacher's intermediate layers to match those of the student, allowing direct distillation without the need for an encoding stage. The proposed method is extensively evaluated using various pairs of teacher-student architectures on CIFAR-10, CIFAR-100, and ImageNet datasets demonstrating that preserving the information flow paths consistently increases the performance of the baseline KD approaches on both classification and retrieval settings. The code is available at this https URL.
Nowadays, the construction of a complex robotic system requires a high level of specialization in a large number of diverse scientific areas. It is reasonable that a single researcher cannot create from scratch the entirety of this system, as it is impossible for him to have the necessary skills in the necessary fields. This obstacle is being surpassed with the existent robotic frameworks. This paper tries to give an extensive review of the most famous robotic frameworks and middleware, as well as to provide the means to effortlessly compare them. Additionally, we try to investigate the differences between the definitions of a robotic framework, a robotic middleware and a robotic architecture.
The advent of accessible Generative AI tools enables anyone to create and spread synthetic images on social media, often with the intention to mislead, thus posing a significant threat to online information integrity. Most existing Synthetic Image Detection (SID) solutions struggle on generated images sourced from the Internet, as these are often altered by compression and other operations. To address this, our research enhances SID by exploring data augmentation combinations, leveraging a genetic algorithm for optimal augmentation selection, and introducing a dual-criteria optimization approach. These methods significantly improve model performance under real-world perturbations. Our findings provide valuable insights for developing detection models capable of identifying synthetic images across varying qualities and transformations, with the best-performing model achieving a mean average precision increase of +22.53% compared to models without augmentations. The implementation is available at github.com/efthimia145/sid-composite-data-augmentation.
A systematic empirical comparison demonstrates that L2 regularization effectively mitigates Membership Inference Attacks by reducing model overfitting, often achieving privacy levels comparable to Differential Privacy while preserving higher model utility. The research reveals a strong correlation between overfitting and vulnerability to such attacks.
Modern technologies have led illicit firearms trafficking to partially merge with cybercrime, while simultaneously permitting its off-line aspects to become more sophisticated. Law enforcement officers face difficult challenges that require hi-tech solutions. This article presents a real-world system, powered by advanced Artificial Intelligence, for facilitating them in their everyday work.
The Mixture Cure (MC) models constitute an appropriate and easily interpretable method when studying a time-to-event variable in a population comprised of both susceptible and cured individuals. In literature, those models usually assume that the latter are unobservable. However, there are cases in which a cured individual may be identified. For example, when studying the distant metastasis during the lifetime or the miscarriage during pregnancy, individuals that have died without a metastasis or have given birth are certainly non-susceptible. The same also holds when studying the x-year overall survival or the death during hospital stay. Common MC models ignore this information and consider them all censored, thus yielding in risk of assigning low immune probabilities to cured individuals. In this study, we consider a MC model that incorporates known information on cured individuals, with the time to cure identification being either deterministic or stochastic. We use the expectation-maximization algorithm to derive the maximum likelihood estimators. Furthermore, we compare different strategies that account for cure information such as (1) assigning infinite times to event for known cured cases and adjusting the traditional model and (2) considering only the probability of cure identification but ignoring the time until that happens. Theoretical results and simulations demonstrate the value of the proposed model especially when the time to cure identification is stochastic, increasing precision and decreasing the mean squared error. On the other hand, the traditional models that ignore the known cured information perform well when the curation is achieved after a known cutoff point. Moreover, through simulations the comparisons of the different strategies are examined, as possible alternatives to the complete-information model.
Learning based approaches for depth perception are limited by the availability of clean training data. This has led to the utilization of view synthesis as an indirect objective for learning depth estimation using efficient data acquisition procedures. Nonetheless, most research focuses on pinhole based monocular vision, with scarce works presenting results for omnidirectional input. In this work, we explore spherical view synthesis for learning monocular 360 depth in a self-supervised manner and demonstrate its feasibility. Under a purely geometrically derived formulation we present results for horizontal and vertical baselines, as well as for the trinocular case. Further, we show how to better exploit the expressiveness of traditional CNNs when applied to the equirectangular domain in an efficient manner. Finally, given the availability of ground truth depth data, our work is uniquely positioned to compare view synthesis against direct supervision in a consistent and fair manner. The results indicate that alternative research directions might be better suited to enable higher quality depth perception. Our data, models and code are publicly available at this https URL
113
The knowledge of potentially druggable binding sites on proteins is an important preliminary step towards the discovery of novel drugs. The computational prediction of such areas can be boosted by following the recent major advances in the deep learning field and by exploiting the increasing availability of proper data. In this paper, a novel computational method for the prediction of potential binding sites is proposed, called DeepSurf. DeepSurf combines a surface-based representation, where a number of 3D voxelized grids are placed on the protein's surface, with state-of-the-art deep learning architectures. After being trained on the large database of scPDB, DeepSurf demonstrates superior results on three diverse testing datasets, by surpassing all its main deep learning-based competitors, while attaining competitive performance to a set of traditional non-data-driven approaches.
The task of transforming a furnished room image into a background-only is extremely challenging since it requires making large changes regarding the scene context while still preserving the overall layout and style. In order to acquire photo-realistic and structural consistent background, existing deep learning methods either employ image inpainting approaches or incorporate the learning of the scene layout as an individual task and leverage it later in a not fully differentiable semantic region-adaptive normalization module. To tackle these drawbacks, we treat scene layout generation as a feature linear transformation problem and propose a simple yet effective adjusted fully differentiable soft semantic region-adaptive normalization module (softSEAN) block. We showcase the applicability in diminished reality and depth estimation tasks, where our approach besides the advantages of mitigating training complexity and non-differentiability issues, surpasses the compared methods both quantitatively and qualitatively. Our softSEAN block can be used as a drop-in module for existing discriminative and generative models. Implementation is available on this http URL.
In this work we contribute a distribution shift benchmark for a computer vision task; monocular depth estimation. Our differentiation is the decomposition of the wider distribution shift of uncontrolled testing on in-the-wild data, to three distinct distribution shifts. Specifically, we generate data via synthesis and analyze them to produce covariate (color input), prior (depth output) and concept (their relationship) distribution shifts. We also synthesize combinations and show how each one is indeed a different challenge to address, as stacking them produces increased performance drops and cannot be addressed horizontally using standard approaches.
In this paper, a detailed study on crime classification and prediction using deep learning architectures is presented. We examine the effectiveness of deep learning algorithms on this domain and provide recommendations for designing and training deep learning systems for predicting crime areas, using open data from police reports. Having as training data time-series of crime types per location, a comparative study of 10 state-of-the-art methods against 3 different deep learning configurations is conducted. In our experiments with five publicly available datasets, we demonstrate that the deep learning-based methods consistently outperform the existing best-performing methods. Moreover, we evaluate the effectiveness of different parameters in the deep learning architectures and give insights for configuring them in order to achieve improved performance in crime classification and finally crime prediction.
The growing popularity of smart mobile devices such as smartphones and tablets has made them an attractive target for cyber-criminals, resulting in a rapidly growing and evolving mobile threat as attackers experiment with new business models by targeting mobile users. With the emergence of the first large-scale mobile botnets, the core network has also become vulnerable to distributed denial-of-service attacks such as the signaling attack. Furthermore, complementary access methods such as Wi-Fi and femtocells introduce additional vulnerabilities for the mobile users as well as the core network. In this paper, we present the NEMESYS approach to smart mobile network security. The goal of the NEMESYS project is to develop novel security technologies for seamless service provisioning in the smart mobile ecosystem, and to improve mobile network security through a better understanding of the threat landscape. To this purpose, NEMESYS will collect and analyze information about the nature of cyber-attacks targeting smart mobile devices and the core network so that appropriate counter-measures can be taken. We are developing a data collection infrastructure that incorporates virtualized mobile honeypots and honeyclients in order to gather, detect and provide early warning of mobile attacks and understand the modus operandi of cyber-criminals that target mobile devices. By correlating the extracted information with known attack patterns from wireline networks, we plan to reveal and identify the possible shift in the way that cyber-criminals launch attacks against smart mobile devices.
Bias in computer vision systems can perpetuate or even amplify discrimination against certain populations. Considering that bias is often introduced by biased visual datasets, many recent research efforts focus on training fair models using such data. However, most of them heavily rely on the availability of protected attribute labels in the dataset, which limits their applicability, while label-unaware approaches, i.e., approaches operating without such labels, exhibit considerably lower performance. To overcome these limitations, this work introduces FLAC, a methodology that minimizes mutual information between the features extracted by the model and a protected attribute, without the use of attribute labels. To do that, FLAC proposes a sampling strategy that highlights underrepresented samples in the dataset, and casts the problem of learning fair representations as a probability matching problem that leverages representations extracted by a bias-capturing classifier. It is theoretically shown that FLAC can indeed lead to fair representations, that are independent of the protected attributes. FLAC surpasses the current state-of-the-art on Biased-MNIST, CelebA, and UTKFace, by 29.1%, 18.1%, and 21.9%, respectively. Additionally, FLAC exhibits 2.2% increased accuracy on ImageNet-A and up to 4.2% increased accuracy on Corrupted-Cifar10. Finally, in most experiments, FLAC even outperforms the bias label-aware state-of-the-art methods.
A foundational set of findable, accessible, interoperable, and reusable (FAIR) principles were proposed in 2016 as prerequisites for proper data management and stewardship, with the goal of enabling the reusability of scholarly data. The principles were also meant to apply to other digital assets, at a high level, and over time, the FAIR guiding principles have been re-interpreted or extended to include the software, tools, algorithms, and workflows that produce data. FAIR principles are now being adapted in the context of AI models and datasets. Here, we present the perspectives, vision, and experiences of researchers from different countries, disciplines, and backgrounds who are leading the definition and adoption of FAIR principles in their communities of practice, and discuss outcomes that may result from pursuing and incentivizing FAIR AI research. The material for this report builds on the FAIR for AI Workshop held at Argonne National Laboratory on June 7, 2022.
6D pose estimation is crucial for augmented reality, virtual reality, robotic manipulation and visual navigation. However, the problem is challenging due to the variety of objects in the real world. They have varying 3D shape and their appearances in captured images are affected by sensor noise, changing lighting conditions and occlusions between objects. Different pose estimation methods have different strengths and weaknesses, depending on feature representations and scene contents. At the same time, existing 3D datasets that are used for data-driven methods to estimate 6D poses have limited view angles and low resolution. To address these issues, we organize the Shape Retrieval Challenge benchmark on 6D pose estimation and create a physically accurate simulator that is able to generate photo-realistic color-and-depth image pairs with corresponding ground truth 6D poses. From captured color and depth images, we use this simulator to generate a 3D dataset which has 400 photo-realistic synthesized color-and-depth image pairs with various view angles for training, and another 100 captured and synthetic images for testing. Five research groups register in this track and two of them submitted their results. Data-driven methods are the current trend in 6D object pose estimation and our evaluation results show that approaches which fully exploit the color and geometric features are more robust for 6D pose estimation of reflective and texture-less objects and occlusion. This benchmark and comparative evaluation results have the potential to further enrich and boost the research of 6D object pose estimation and its applications.
There are no more papers matching your filters at the moment.