Assiut University
Multilingual OCR and information extraction from receipts remains challenging, particularly for complex scripts like Arabic. We introduce \dataset, a comprehensive dataset designed for Arabic-English receipt understanding comprising 20,000 annotated receipts from diverse retail settings, 30,000 OCR-annotated images, and 10,000 item-level annotations, and a new Receipt QA subset with 1265 receipt images paired with 40 question-answer pairs each to support LLM evaluation for receipt understanding. The dataset captures merchant names, item descriptions, prices, receipt numbers, and dates to support object detection, OCR, and information extraction tasks. We establish baseline performance using traditional methods (Tesseract OCR) and advanced neural networks, demonstrating the dataset's effectiveness for processing complex, noisy real-world receipt layouts. Our publicly accessible dataset advances automated multilingual document processing research (see this https URL ).
Large Language Models (LLMs) continue to advance natural language processing with their ability to generate human-like text across a range of tasks. Despite the remarkable success of LLMs in Natural Language Processing (NLP), their performance in text summarization across various domains and datasets has not been comprehensively evaluated. At the same time, the ability to summarize text effectively without relying on extensive training data has become a crucial bottleneck. To address these issues, we present a systematic evaluation of six LLMs across four datasets: CNN/Daily Mail and NewsRoom (news), SAMSum (dialog), and ArXiv (scientific). By leveraging prompt engineering techniques including zero-shot and in-context learning, our study evaluates the performance using the ROUGE and BERTScore metrics. In addition, a detailed analysis of inference times is conducted to better understand the trade-off between summarization quality and computational efficiency. For Long documents, introduce a sentence-based chunking strategy that enables LLMs with shorter context windows to summarize extended inputs in multiple stages. The findings reveal that while LLMs perform competitively on news and dialog tasks, their performance on long scientific documents improves significantly when aided by chunking strategies. In addition, notable performance variations were observed based on model parameters, dataset properties, and prompt design. These results offer actionable insights into how different LLMs behave across task types, contributing to ongoing research in efficient, instruction-based NLP systems.
92
This research addresses the challenge of limited data in tabular data classification, particularly prevalent in domains with constraints like healthcare. We propose Tab2Visual, a novel approach that transforms heterogeneous tabular data into visual representations, enabling the application of powerful deep learning models. Tab2Visual effectively addresses data scarcity by incorporating novel image augmentation techniques and facilitating transfer learning. We extensively evaluate the proposed approach on diverse tabular datasets, comparing its performance against a wide range of machine learning algorithms, including classical methods, tree-based ensembles, and state-of-the-art deep learning models specifically designed for tabular data. We also perform an in-depth analysis of factors influencing Tab2Visual's performance. Our experimental results demonstrate that Tab2Visual outperforms other methods in classification problems with limited tabular data.
Unconstrained text recognition is an important computer vision task, featuring a wide variety of different sub-tasks, each with its own set of challenges. One of the biggest promises of deep neural networks has been the convergence and automation of feature extractors from input raw signals, allowing for the highest possible performance with minimum required domain knowledge. To this end, we propose a data-efficient, end-to-end neural network model for generic, unconstrained text recognition. In our proposed architecture we strive for simplicity and efficiency without sacrificing recognition accuracy. Our proposed architecture is a fully convolutional network without any recurrent connections trained with the CTC loss function. Thus it operates on arbitrary input sizes and produces strings of arbitrary length in a very efficient and parallelizable manner. We show the generality and superiority of our proposed text recognition architecture by achieving state of the art results on seven public benchmark datasets, covering a wide spectrum of text recognition tasks, namely: Handwriting Recognition, CAPTCHA recognition, OCR, License Plate Recognition, and Scene Text Recognition. Our proposed architecture has won the ICFHR2018 Competition on Automated Text Recognition on a READ Dataset.
Answering questions related to the legal domain is a complex task, primarily due to the intricate nature and diverse range of legal document systems. Providing an accurate answer to a legal query typically necessitates specialized knowledge in the relevant domain, which makes this task all the more challenging, even for human experts. Question answering (QA) systems are designed to generate answers to questions asked in human languages. QA uses natural language processing to understand questions and search through information to find relevant answers. QA has various practical applications, including customer service, education, research, and cross-lingual communication. However, QA faces challenges such as improving natural language understanding and handling complex and ambiguous questions. Answering questions related to the legal domain is a complex task, primarily due to the intricate nature and diverse range of legal document systems. Providing an accurate answer to a legal query typically necessitates specialized knowledge in the relevant domain, which makes this task all the more challenging, even for human experts. At this time, there is a lack of surveys that discuss legal question answering. To address this problem, we provide a comprehensive survey that reviews 14 benchmark datasets for question-answering in the legal field as well as presents a comprehensive review of the state-of-the-art Legal Question Answering deep learning models. We cover the different architectures and techniques used in these studies and the performance and limitations of these models. Moreover, we have established a public GitHub repository where we regularly upload the most recent articles, open data, and source code. The repository is available at: \url{this https URL}.
The safety and accuracy of robotic navigation hold paramount importance, especially in the realm of soft continuum robotics, where the limitations of traditional rigid sensors become evident. Encoders, piezoresistive, and potentiometer sensors often fail to integrate well with the flexible nature of these robots, adding unwanted bulk and rigidity. To overcome these hurdles, our study presents a new approach to shape sensing in soft continuum robots through the use of soft e-textile resistive sensors. This sensor, designed to flawlessly integrate with the robot's structure, utilizes a resistive material that adjusts its resistance in response to the robot's movements and deformations. This adjustment facilitates the capture of multidimensional force measurements across the soft sensor layers. A deep Convolutional Neural Network (CNN) is employed to decode the sensor signals, enabling precise estimation of the robot's shape configuration based on the detailed data from the e-textile sensor. Our research investigates the efficacy of this e-textile sensor in determining the curvature parameters of soft continuum robots. The findings are encouraging, showing that the soft e-textile sensor not only matches but potentially exceeds the capabilities of traditional rigid sensors in terms of shape sensing and estimation. This advancement significantly boosts the safety and efficiency of robotic navigation systems.
This proposes a novel ensemble deep learning-based model to accurately classify, detect and localize different defect categories for aggressive pitches and thin resists (High NA applications).In particular, we train RetinaNet models using different ResNet, VGGNet architectures as backbone and present the comparison between the accuracies of these models and their performance analysis on SEM images with different types of defect patterns such as bridge, break and line collapses. Finally, we propose a preference-based ensemble strategy to combine the output predictions from different models in order to achieve better performance on classification and detection of defects. As CDSEM images inherently contain a significant level of noise, detailed feature information is often shadowed by noise. For certain resist profiles, the challenge is also to differentiate between a microbridge, footing, break, and zones of probable breaks. Therefore, we have applied an unsupervised machine learning model to denoise the SEM images to remove the False-Positive defects and optimize the effect of stochastic noise on structured pixels for better metrology and enhanced defect inspection. We repeated the defect inspection step with the same trained model and performed a comparative analysis for "robustness" and "accuracy" metric with conventional approach for both noisy/denoised image pair. The proposed ensemble method demonstrates improvement of the average precision metric (mAP) of the most difficult defect classes. In this work we have developed a novel robust supervised deep learning training scheme to accurately classify as well as localize different defect types in SEM images with high degree of accuracy. Our proposed approach demonstrates its effectiveness both quantitatively and qualitatively.
An AI-driven framework for customer profiling, segmentation, and sales prediction in direct marketing is presented, integrating RFM analysis with machine learning techniques like boosting trees and Radial Basis Function neural networks, achieving an overall predictive model accuracy of 0.877 for non-purchasing customers.
Aspect-based Sentiment analysis (ABSA) accomplishes a fine-grained analysis that defines the aspects of a given document or sentence and the sentiments conveyed regarding each aspect. This level of analysis is the most detailed version that is capable of exploring the nuanced viewpoints of the reviews. The bulk of study in ABSA focuses on English with very little work available in Arabic. Most previous work in Arabic has been based on regular methods of machine learning that mainly depends on a group of rare resources and tools for analyzing and processing Arabic content such as lexicons, but the lack of those resources presents another challenge. In order to address these challenges, Deep Learning (DL)-based methods are proposed using two models based on Gated Recurrent Units (GRU) neural networks for ABSA. The first is a DL model that takes advantage of word and character representations by combining bidirectional GRU, Convolutional Neural Network (CNN), and Conditional Random Field (CRF) making up the (BGRU-CNN-CRF) model to extract the main opinionated aspects (OTE). The second is an interactive attention network based on bidirectional GRU (IAN-BGRU) to identify sentiment polarity toward extracted aspects. We evaluated our models using the benchmarked Arabic hotel reviews dataset. The results indicate that the proposed methods are better than baseline research on both tasks having 39.7% enhancement in F1-score for opinion target extraction (T2) and 7.58% in accuracy for aspect-based sentiment polarity classification (T3). Achieving F1 score of 70.67% for T2, and accuracy of 83.98% for T3.
The ultra-dense deployment of interconnected satellites will characterize future low Earth orbit (LEO) mega-constellations. Exploiting this towards a more efficient satellite network (SatNet), this paper proposes a novel LEO SatNet architecture based on distributed massive multiple-input multiple-output (DM-MIMO) technology allowing ground user terminals to be connected to a cluster of satellites. To this end, we investigate various aspects of DM-MIMO-based satellite network design, the benefits of using this architecture, the associated challenges, and the potential solutions. In addition, we propose a distributed joint power allocation and handover management (D-JPAHM) technique that jointly optimizes the power allocation and handover management processes in a cross-layer manner. This framework aims to maximize the network throughput and minimize the handover rate while considering the quality-of-service (QoS) demands of user terminals and the power capabilities of the satellites. Moreover, we devise an artificial intelligence (AI)-based solution to efficiently implement the proposed D-JPAHM framework in a manner suitable for real-time operation and the dynamic SatNet environment. To the best of our knowledge, this is the first work to introduce and study DM-MIMO technology in LEO SatNets. Extensive simulation results reveal the superiority of the proposed architecture and solutions compared to conventional approaches in the literature.
In this paper, an automatic seeded region growing algorithm is proposed for cellular image segmentation. First, the regions of interest (ROIs) extracted from the preprocessed image. Second, the initial seeds are automatically selected based on ROIs extracted from the image. Third, the most reprehensive seeds are selected using a machine learning algorithm. Finally, the cellular image is segmented into regions where each region corresponds to a seed. The aim of the proposed is to automatically extract the Region of Interests (ROI) from the cellular images in terms of overcoming the explosion, under segmentation and over segmentation problems. Experimental results show that the proposed algorithm can improve the segmented image and the segmented results are less noisy as compared to some existing algorithms.
Multi-Agent Systems (MAS) are adopted and tested with many complex and critical industrial applications, which are required to be adaptive, scalable, context-aware, and include real-time constraints. Industrial Control Networks (ICN) are examples of these applications. An ICN is considered a system that contains a variety of interconnected industrial equipments, such as physical control processes, control systems, computers, and communication networks. It is built to supervise and control industrial processes. This paper presents a development case study on building a multi-layered agent-based ICN in which agents cooperate to provide an effective supervision and control of a set of control processes, basically controlled by a set of legacy control systems with limited computing capabilities. The proposed ICN is designed to add an intelligent layer on top of legacy control systems to compensate their limited capabilities using a cost-effective agent-based approach, and also to provide global synchronization and safety plans. It is tested and evaluated within a simulation environment. The main conclusion of this research is that agents and MAS can provide an effective, flexible, and cost-effective solution to handle the emerged limitations of legacy control systems if they are properly integrated with these systems.
Clustering analysis plays an important role in scientific research and commercial application. K-means algorithm is a widely used partition method in clustering. However, it is known that the K-means algorithm may get stuck at suboptimal solutions, depending on the choice of the initial cluster centers. In this article, we propose a technique to handle large scale data, which can select initial clustering center purposefully using Genetic algorithms (GAs), reduce the sensitivity to isolated point, avoid dissevering big cluster, and overcome deflexion of data in some degree that caused by the disproportion in data partitioning owing to adoption of multi-sampling. We applied our method to some public datasets these show the advantages of the proposed approach for example Hepatitis C dataset that has been taken from the machine learning warehouse of University of California. Our aim is to evaluate hepatitis dataset. In order to evaluate this dataset we did some preprocessing operation, the reason to preprocessing is to summarize the data in the best and suitable way for our algorithm. Missing values of the instances are adjusted using local mean method.
The outlier detection problem in some cases is similar to the classification problem. For example, the main concern of clustering-based outlier detection algorithms is to find clusters and outliers, which are often regarded as noise that should be removed in order to make more reliable clustering. In this article, we present an algorithm that provides outlier detection and data clustering simultaneously. The algorithmimprovesthe estimation of centroids of the generative distribution during the process of clustering and outlier discovery. The proposed algorithm consists of two stages. The first stage consists of improved genetic k-means algorithm (IGK) process, while the second stage iteratively removes the vectors which are far from their cluster centroids.
SCADA (Supervisory Control and Data Acquisition) is concerned with gathering process information from industrial control processes found in utilities such as power grids, water networks, transportation, manufacturing, etc., to provide the human operators with the required real-time access to industrial processes to be monitored and controlled either locally (on-site)or remotely (i.e., through Internet). Conventional solutions such as custom SCADA packages, custom communication protocols, and centralized architectures are no longer appropriate for engineering this type of systems because of their highly distribution and their uncertain continuously changing working environments. Multi-agent systems (MAS) appeared as a new architectural style for engineering complex and highly dynamic applications such as SCADA systems. In this paper, we propose an approach for simply developing flexible and interoperable SCADA systems based on the integration of MAS and OPC process protocol. The proposed SCADA system has the following advantages: 1) simple (easier to be implemented); 2) flexible (able to adapt to its environment dynamic changes); and 3) interoperable (relative to the underlying control systems, which belongs to diverse of vendors). The applicability of the proposed approach is demonstrated by a real case study example carried out in a paper mill.
Image feature classification is a challenging problem in many computer vision applications, specifically, in the fields of remote sensing, image analysis and pattern recognition. In this paper, a novel Self Organizing Map, termed improved SOM (iSOM), is proposed with the aim of effectively classifying Mammographic images based on their texture feature representation. The main contribution of the iSOM is to introduce a new node structure for the map representation and adopting a learning technique based on Kohonen SOM accordingly. The main idea is to control, in an unsupervised fashion, the weight updating procedure depending on the class reliability of the node, during the weight update time. Experiments held on a real Mammographic images. Results showed high accuracy compared to classical SOM and other state-of-art classifiers.
In spite of the high accuracy of the existing optical mark reading (OMR) systems and devices, a few restrictions remain existent. In this work, we aim to reduce the restrictions of multiple choice questions (MCQ) within tests. We use an image registration technique to extract the answer boxes from answer sheets. Unlike other systems that rely on simple image processing steps to recognize the extracted answer boxes, we address the problem from another perspective by training a machine learning classifier to recognize the class of each answer box (i.e., confirmed, crossed out, or blank answer). This gives us the ability to deal with a variety of shading and mark patterns, and distinguish between chosen (i.e., confirmed) and canceled answers (i.e., crossed out). All existing machine learning techniques require a large number of examples in order to train a model for classification, therefore we present a dataset including six real MCQ assessments with different answer sheet templates. We evaluate two strategies of classification: a straight-forward approach and a two-stage classifier approach. We test two handcrafted feature methods and a convolutional neural network. In the end, we present an easy-to-use graphical user interface of the proposed system. Compared with existing OMR systems, the proposed system has the least constraints and achieves a high accuracy. We believe that the presented work will further direct the development of OMR systems towards reducing the restrictions of the MCQ tests.
Let 0<\alpha<2, \beta>0 and \alpha/2<|s|\leq 1. In a previous work, we obtained all possible values of the Lebesgue exponent p=p(γ)p=p(\gamma) for which the Fourier transform of Eα,β(eı˙πsγ) E_{\alpha,\beta}(e^{\dot{\imath}\pi s} |\cdot|^{\gamma} ) is an Lp(Rd)L^{p}(\mathbb{R}^d) function, when \gamma>(d-1)/2. We recover the more interesting lower regularity case 0<\gamma\leq (d-1)/2, using tools from the Littlewood-Paley theory. This question arises in the analysis of certain space-time fractional diffusion and Schrödinger problems and has been solved for the particular cases α(0,1)\alpha\in (0,1), β=α,1\beta=\alpha,1, and s=1/2,1s=-1/2,1 via asymptotic analysis of Fox HH-functions. The Littlewood-Paley theory provides a simpler proof that allows considering all values of \beta,\gamma>0 and s(1,1][α/2,α/2]s\in (-1,1]\setminus [-\alpha/2,\alpha/2]. This enabled us to prove various key estimates for a general class of nonlocal space-time problems.
Data management applications are growing and require more attention, especially in the "big data" era. Thus, supporting such applications with novel and efficient algorithms that achieve higher performance is critical. Array database management systems are one way to support these applications by dealing with data represented in n-dimensional data structures. For instance, software like SciDB and RasDaMan can be powerful tools to achieve the required performance on large-scale problems with multidimensional data. Like their relational counterparts, these management systems support specific array query languages as the user interface. As a popular programming model, MapReduce allows large-scale data analysis, facilitates query processing, and is used as a DB engine. Nevertheless, one major obstacle is the low productivity of developing MapReduce applications. Unlike high-level declarative languages such as SQL, MapReduce jobs are written in a low-level descriptive language, often requiring massive programming efforts and complicated debugging processes. This work presents a system that supports translating array queries expressed in the Array Query Language (AQL) in SciDB into MapReduce jobs. We focus on translating some unique structural aggregations, including circular, grid, hierarchical, and sliding aggregations. Unlike traditional aggregations in relational DBs, these structural aggregations are designed explicitly for array manipulation. Thus, our work can be considered an array-view counterpart of existing SQL to MapReduce translators like HiveQL and YSmart. Our translator supports structural aggregations over arrays to meet various array manipulations. The translator can also help user-defined aggregation functions with minimal user effort. We show that our translator can generate optimized MapReduce code, which performs better than the short handwritten code by up to 10.84x.
We investigate the dynamics of non-classical correlations and quantum coherence in open quantum systems by employing metrics like local quantum Fisher information, local quantum uncertainty, and quantum Jensen-Shannon divergence. Our focus here is on a system of two qubits in two distinct physical situations: the first one when the two qubits are coupled to a single-mode cavity, while the second consists of two qubits immersed in dephasing reservoirs. Our study places significant emphasis on how the evolution of these quantum criterion is influenced by the initial state's purity (whether pure or mixed) and the nature of the environment (whether Markovian or non-Markovian). We observe that a decrease in the initial state's purity corresponds to a reduction in both quantum correlations and quantum coherence, whereas higher purity enhances these quantumness. Furthermore, we establish a quantum teleportation strategy based on the two different physical scenarios. In this approach, the resulting state of the two qubits functions as a quantum channel integrated into a quantum teleportation protocol. We also analyze how the purity of the initial state and the Markovian or non-Markovian regimes impact the quantum teleportation process.
There are no more papers matching your filters at the moment.