Daffodil International University
This systematic review comprehensively maps the current state of fact-checking and factuality evaluation in Large Language Models (LLMs), identifying key issues, current solutions, and future research directions. It categorizes evaluation metrics, analyzes the impact and mitigation of hallucinations, and details the role of datasets, prompting, fine-tuning, and Retrieval-Augmented Generation (RAG) in enhancing LLM factual accuracy.
The pursuit of human-level artificial intelligence (AI) has significantly advanced the development of autonomous agents and Large Language Models (LLMs). LLMs are now widely utilized as decision-making agents for their ability to interpret instructions, manage sequential tasks, and adapt through feedback. This review examines recent developments in employing LLMs as autonomous agents and tool users and comprises seven research questions. We only used the papers published between 2023 and 2025 in conferences of the A* and A rank and Q1 journals. A structured analysis of the LLM agents' architectural design principles, dividing their applications into single-agent and multi-agent systems, and strategies for integrating external tools is presented. In addition, the cognitive mechanisms of LLM, including reasoning, planning, and memory, and the impact of prompting methods and fine-tuning procedures on agent performance are also investigated. Furthermore, we evaluated current benchmarks and assessment protocols and have provided an analysis of 68 publicly available datasets to assess the performance of LLM-based agents in various tasks. In conducting this review, we have identified critical findings on verifiable reasoning of LLMs, the capacity for self-improvement, and the personalization of LLM-based agents. Finally, we have discussed ten future research directions to overcome these gaps.
102
This survey provides a structured overview of Parameter-Efficient Fine-Tuning (PEFT) techniques, examining their mechanisms, trade-offs, and applications across language, vision, and multimodal domains. It details how PEFT methods, such as LoRA and various adapter designs, enable the efficient adaptation of large models by updating only a small fraction of parameters, thereby addressing the resource and storage challenges of full fine-tuning.
30
The combination of LLM agents with external tools enables models to solve complex tasks beyond their knowledge base. Human-designed tools are inflexible and restricted to solutions within the scope of pre-existing tools created by experts. To address this problem, we propose ATLASS, an advanced tool learning and selection system designed as a closed-loop framework. It enables the LLM to solve problems by dynamically generating external tools on demand. In this framework, agents play a crucial role in orchestrating tool selection, execution, and refinement, ensuring adaptive problem-solving capabilities. The operation of ATLASS follows three phases: The first phase, Understanding Tool Requirements, involves the Agents determining whether tools are required and specifying their functionality; the second phase, Tool Retrieval/Generation, involves the Agents retrieving or generating tools based on their availability; and the third phase, Task Solving, involves combining all the component tools necessary to complete the initial task. The Tool Dataset stores the generated tools, ensuring reusability and minimizing inference cost. Current LLM-based tool generation systems have difficulty creating complex tools that need APIs or external packages. In ATLASS, we solve the problem by automatically setting up the environment, fetching relevant API documentation online, and using a Python interpreter to create a reliable, versatile tool that works in a wider range of situations. OpenAI GPT-4.0 is used as the LLM agent, and safety and ethical concerns are handled through human feedback before executing generated code. By addressing the limitations of predefined toolsets and enhancing adaptability, ATLASS serves as a real-world solution that empowers users with dynamically generated tools for complex problem-solving.
Researchers from the University of Central Florida introduce two open-source benchmark datasets for user profile construction and updating, alongside an LLM-based methodology. The approach, leveraging fine-tuned models like Mistral-7b, achieves high F1 scores of 93.84 for construction and 95.08 for updating, demonstrating robust capabilities in dynamic user profile management.
Current state of the art measures like BLEU, CIDEr, VQA score, SigLIP-2 and CLIPScore are often unable to capture semantic or structural accuracy, especially for domain-specific or context-dependent scenarios. For this, this paper proposes a Physics-Constrained Multimodal Data Evaluation (PCMDE) metric combining large language models with reasoning, knowledge based mapping and vision-language models to overcome these limitations. The architecture is comprised of three main stages: (1) feature extraction of spatial and semantic information with multimodal features through object detection and VLMs; (2) Confidence-Weighted Component Fusion for adaptive component-level validation; and (3) physics-guided reasoning using large language models for structural and relational constraints (e.g., alignment, position, consistency) enforcement.
Leaf disease is a common fatal disease for plants. Early diagnosis and detection is necessary in order to improve the prognosis of leaf diseases affecting plant. For predicting leaf disease, several automated systems have already been developed using different plant pathology imaging modalities. This paper provides a systematic review of the literature on leaf disease-based models for the diagnosis of various plant leaf diseases via deep learning. The advantages and limitations of different deep learning models including Vision Transformer (ViT), Deep convolutional neural network (DCNN), Convolutional neural network (CNN), Residual Skip Network-based Super-Resolution for Leaf Disease Detection (RSNSR-LDD), Disease Detection Network (DDN), and YOLO (You only look once) are described in this review. The review also shows that the studies related to leaf disease detection applied different deep learning models to a number of publicly available datasets. For comparing the performance of the models, different metrics such as accuracy, precision, recall, etc. were used in the existing studies.
This dataset represents almost all the harmful diseases for rice in Bangladesh. This dataset consists of 1106 image of five harmful diseases called Brown Spot, Leaf Scaled, Rice Blast, Rice Turngo, Steath Blight in two different background variation named field background picture and white background picture. Two different background variation helps the dataset to perform more accurately so that the user can use this data for field use as well as white background for decision making. The data is collected from rice field of Dhaka Division. This dataset can use for rice leaf diseases classification, diseases detection using Computer Vision and Pattern Recognition for different rice leaf disease.
The rapid evolution of Artificial Intelligence (AI) and Machine Learning (ML) has significantly transformed logistics and supply chain management, particularly in the pursuit of sustainability and eco-efficiency. This study explores AI-based methodologies for optimizing logistics operations in the USA, focusing on reducing environmental impact, improving fuel efficiency, and minimizing costs. Key AI applications include predictive analytics for demand forecasting, route optimization through machine learning, and AI-powered fuel efficiency strategies. Various models, such as Linear Regression, XGBoost, Support Vector Machine, and Neural Networks, are applied to real-world logistics datasets to reduce carbon emissions based on logistics operations, optimize travel routes to minimize distance and travel time, and predict future deliveries to plan optimal routes. Other models such as K-Means and DBSCAN are also used to optimize travel routes to minimize distance and travel time for logistics operations. This study utilizes datasets from logistics companies' databases. The study also assesses model performance using metrics such as mean absolute error (MAE), mean squared error (MSE), and R2 score. This study also explores how these models can be deployed to various platforms for real-time logistics and supply chain use. The models are also examined through a thorough case study, highlighting best practices and regulatory frameworks that promote sustainability. The findings demonstrate AI's potential to enhance logistics efficiency, reduce carbon footprints, and contribute to a more resilient and adaptive supply chain ecosystem.
Mango is an important fruit crop in South Asia, but its cultivation is frequently hampered by leaf diseases that greatly impact yield and quality. This research examines the performance of five pre-trained convolutional neural networks, DenseNet201, InceptionV3, ResNet152V2, SeResNet152, and Xception, for multi-class identification of mango leaf diseases across eight classes using a transfer learning strategy with fine-tuning. The models were assessed through standard evaluation metrics, such as accuracy, precision, recall, F1-score, and confusion matrices. Among the architectures tested, DenseNet201 delivered the best results, achieving 99.33% accuracy with consistently strong metrics for individual classes, particularly excelling in identifying Cutting Weevil and Bacterial Canker. Moreover, ResNet152V2 and SeResNet152 provided strong outcomes, whereas InceptionV3 and Xception exhibited lower performance in visually similar categories like Sooty Mould and Powdery Mildew. The training and validation plots demonstrated stable convergence for the highest-performing models. The capability of fine-tuned transfer learning models, for precise and dependable multi-class mango leaf disease detection in intelligent agricultural applications.
Large Language Models (LLMs) are trained on massive amounts of data, enabling their application across diverse domains and tasks. Despite their remarkable performance, most LLMs are developed and evaluated primarily in English. Recently, a few multi-lingual LLMs have emerged, but their performance in low-resource languages, especially the most spoken languages in South Asia, is less explored. To address this gap, in this study, we evaluate LLMs such as GPT-4, Llama 2, and Gemini to analyze their effectiveness in English compared to other low-resource languages from South Asia (e.g., Bangla, Hindi, and Urdu). Specifically, we utilized zero-shot prompting and five different prompt settings to extensively investigate the effectiveness of the LLMs in cross-lingual translated prompts. The findings of the study suggest that GPT-4 outperformed Llama 2 and Gemini in all five prompt settings and across all languages. Moreover, all three LLMs performed better for English language prompts than other low-resource language prompts. This study extensively investigates LLMs in low-resource language contexts to highlight the improvements required in LLMs and language-specific resources to develop more generally purposed NLP applications.
Disease-symptom datasets are significant and in demand for medical research, disease diagnosis, clinical decision-making, and AI-driven health management applications. These datasets help identify symptom patterns associated with specific diseases, thus improving diagnostic accuracy and enabling early detection. The dataset presented in this study systematically compiles disease-symptom relationships from various online sources, medical literature, and publicly available health databases. The data was gathered through analyzing peer-reviewed medical articles, clinical case studies, and disease-symptom association reports. Only the verified medical sources were included in the dataset, while those from non-peer-reviewed and anecdotal sources were excluded. The dataset is structured in a tabular format, where the first column represents diseases, and the remaining columns represent symptoms. Each symptom cell contains a binary value, indicating whether a symptom is associated with a disease. Thereby, this structured representation makes the dataset very useful for a wide range of applications, including machine learning-based disease prediction, clinical decision support systems, and epidemiological studies. Although there are some advancements in the field of disease-symptom datasets, there is a significant gap in structured datasets for the Bangla language. This dataset aims to bridge that gap by facilitating the development of multilingual medical informatics tools and improving disease prediction models for underrepresented linguistic communities. Further developments should include region-specific diseases and further fine-tuning of symptom associations for better diagnostic performance
Immediate damage assessment is essential after natural catastrophes; yet, conventional hand evaluation techniques are sluggish and perilous. Although satellite and unmanned aerial vehicle (UAV) photos offer extensive perspectives of impacted regions, current computer vision methodologies generally yield just classification labels or segmentation masks, so constraining their capacity to deliver a thorough situational comprehension. We introduce the Vision Language Caption Enhancer (VLCE), a multimodal system designed to produce comprehensive, contextually-informed explanations of disaster imagery. VLCE employs a dual-architecture approach: a CNN-LSTM model with a ResNet50 backbone pretrained on EuroSat satellite imagery for the xBD dataset, and a Vision Transformer (ViT) model pretrained on UAV pictures for the RescueNet dataset. Both systems utilize external semantic knowledge from ConceptNet and WordNet to expand vocabulary coverage and improve description accuracy. We assess VLCE in comparison to leading vision-language models (LLaVA and QwenVL) utilizing CLIPScore for semantic alignment and InfoMetIC for caption informativeness. Experimental findings indicate that VLCE markedly surpasses baseline models, attaining a maximum of 95.33% on InfoMetIC while preserving competitive semantic alignment. Our dual-architecture system demonstrates significant potential for improving disaster damage assessment by automating the production of actionable, information-dense descriptions from satellite and drone photos.
Due to the massive explanation of artificial intelligence, machine learning technology is being used in various areas of our day-to-day life. In the world, there are a lot of scenarios where a simple crime can be prevented before it may even happen or find the person responsible for it. A face is one distinctive feature that we have and can differentiate easily among many other species. But not just different species, it also plays a significant role in determining someone from the same species as us, humans. Regarding this critical feature, a single problem occurs most often nowadays. When the camera is pointed, it cannot detect a person's face, and it becomes a poor image. On the other hand, where there was a robbery and a security camera installed, the robber's identity is almost indistinguishable due to the low-quality camera. But just making an excellent algorithm to work and detecting a face reduces the cost of hardware, and it doesn't cost that much to focus on that area. Facial recognition, widget control, and such can be done by detecting the face correctly. This study aims to create and enhance a machine learning model that correctly recognizes faces. Total 627 Data have been collected from different Bangladeshi people's faces on four angels. In this work, CNN, Harr Cascade, Cascaded CNN, Deep CNN & MTCNN are these five machine learning approaches implemented to get the best accuracy of our dataset. After creating and running the model, Multi-Task Convolutional Neural Network (MTCNN) achieved 96.2% best model accuracy with training data rather than other machine learning models.
Large language models (LLMs) have garnered significant interest in natural language processing (NLP), particularly their remarkable performance in various downstream tasks in resource-rich languages. Recent studies have highlighted the limitations of LLMs in low-resource languages, primarily focusing on binary classification tasks and giving minimal attention to South Asian languages. These limitations are primarily attributed to constraints such as dataset scarcity, computational costs, and research gaps specific to low-resource languages. To address this gap, we present datasets for sentiment and hate speech tasks by translating from English to Bangla, Hindi, and Urdu, facilitating research in low-resource language processing. Further, we comprehensively examine zero-shot learning using multiple LLMs in English and widely spoken South Asian languages. Our findings indicate that GPT-4 consistently outperforms Llama 2 and Gemini, with English consistently demonstrating superior performance across diverse tasks compared to low-resource languages. Furthermore, our analysis reveals that natural language inference (NLI) exhibits the highest performance among the evaluated tasks, with GPT-4 demonstrating superior capabilities.
In light of the expanding population, an automated framework of disease detection can assist doctors in the diagnosis of ocular diseases, yields accurate, stable, rapid outcomes, and improves the success rate of early detection. The work initially intended the enhancing the quality of fundus images by employing an adaptive contrast enhancement algorithm (CLAHE) and Gamma correction. In the preprocessing techniques, CLAHE elevates the local contrast of the fundus image and gamma correction increases the intensity of relevant features. This study operates on a AMDNet23 system of deep learning that combined the neural networks made up of convolutions (CNN) and short-term and long-term memory (LSTM) to automatically detect aged macular degeneration (AMD) disease from fundus ophthalmology. In this mechanism, CNN is utilized for extracting features and LSTM is utilized to detect the extracted features. The dataset of this research is collected from multiple sources and afterward applied quality assessment techniques, 2000 experimental fundus images encompass four distinct classes equitably. The proposed hybrid deep AMDNet23 model demonstrates to detection of AMD ocular disease and the experimental result achieved an accuracy 96.50%, specificity 99.32%, sensitivity 96.5%, and F1-score 96.49.0%. The system achieves state-of-the-art findings on fundus imagery datasets to diagnose AMD ocular disease and findings effectively potential of our method.
This paper from Daffodil International University applies various machine learning algorithms to predict bank loan approvals, utilizing a large Kaggle dataset of 148,670 applications. It finds that ensemble methods like AdaBoosting and Random Forest achieve very high prediction accuracies, with AdaBoosting reaching 99.99%.
7
False news has received attention from both the general public and the scholarly world. Such false information has the ability to affect public perception, giving nefarious groups the chance to influence the results of public events like elections. Anyone can share fake news or facts about anyone or anything for their personal gain or to cause someone trouble. Also, information varies depending on the part of the world it is shared on. Thus, in this paper, we have trained a model to classify fake and true news by utilizing the 1876 news data from our collected dataset. We have preprocessed the data to get clean and filtered texts by following the Natural Language Processing approaches. Our research conducts 3 popular Machine Learning (Stochastic gradient descent, Naïve Bayes, Logistic Regression,) and 2 Deep Learning (Long-Short Term Memory, ASGD Weight-Dropped LSTM, or AWD-LSTM) algorithms. After we have found our best Naive Bayes classifier with 56% accuracy and an F1-macro score of an average of 32%.
4
Visual hallucinations in Large Language Models (LLMs), where the model generates responses that are inconsistent with the visual input, pose a significant challenge to their reliability, particularly in contexts where precise and trustworthy outputs are critical. Current research largely emphasizes post-hoc correction or model-specific fine-tuning strategies, with limited exploration of preprocessing techniques to address hallucination issues at the input stage. This study presents a novel ensemble-based preprocessing framework that adaptively selects the most appropriate filtering approach -- noise reduced (NR), edge enhanced (EE), or unaltered input (org) based on the type of question posed, resulting into reduced hallucination without requiring any modifications to the underlying model architecture or training pipeline. Evaluated on the `HaloQuest' dataset -- a benchmark designed to test multimodal reasoning on visually complex inputs, our method achieves a 44.3% reduction in hallucination rates, as measured by Natural Language Inference (NLI) scores using SelfCheckGPT. This demonstrates that intelligent input conditioning alone can significantly enhance factual grounding in LLM responses. The findings highlight the importance of adaptive preprocessing techniques in mitigating hallucinations, paving the way for more reliable multimodal systems capable of addressing real-world challenges.
Cattle lameness is often caused by hoof injuries or interdigital dermatitis, leads to pain and significantly impacts essential physiological activities such as walking, feeding, and drinking. This study presents a deep learning-based model for detecting cattle lameness, sickness, or gait abnormalities using publicly available video data. The dataset consists of 50 unique videos from 40 individual cattle, recorded from various angles in both indoor and outdoor environments. Half of the dataset represents naturally walking (normal/non-lame) cattle, while the other half consists of cattle exhibiting gait abnormalities (lame). To enhance model robustness and generalizability, data augmentation was applied to the training data. The pre-processed videos were then classified using two deep learning models: ConvLSTM2D and 3D CNN. A comparative analysis of the results demonstrates strong classification performance. Specifically, the 3D CNN model achieved a video-level classification accuracy of 90%, with precision, recall, and f1-score of 90.9%, 90.9%, and 90.91% respectively. The ConvLSTM2D model exhibited a slightly lower accuracy of 85%. This study highlights the effectiveness of directly applying classification models to learn spatiotemporal features from video data, offering an alternative to traditional multi-stage approaches that typically involve object detection, pose estimation, and feature extraction. Besides, the findings demonstrate that the proposed deep learning models, particularly the 3D CNN, effectively classify and detect lameness in cattle while simplifying the processing pipeline.
There are no more papers matching your filters at the moment.