alphaXiv

History

Papers Benchmarks

K. J. Somaiya College of Engineering

05 Oct 2025

ai-for-health computer-science contrastive-learning

Towards Foundation Models for Cryo-ET Subtomogram Analysis

Harvard University

Carnegie Mellon University University of Alabama at Birmingham Oak Ridge National Laboratory K. J. Somaiya College of Engineering

Cryo-electron tomography (cryo-ET) enables in situ visualization of macromolecular structures, where subtomogram analysis tasks such as classification, alignment, and averaging are critical for structural determination. However, effective analysis is hindered by scarce annotations, severe noise, and poor generalization. To address these challenges, we take the first step towards foundation models for cryo-ET subtomograms. First, we introduce CryoEngine, a large-scale synthetic data generator that produces over 904k subtomograms from 452 particle classes for pretraining. Second, we design an Adaptive Phase Tokenization-enhanced Vision Transformer (APT-ViT), which incorporates adaptive phase tokenization as an equivariance-enhancing module that improves robustness to both geometric and semantic variations. Third, we introduce a Noise-Resilient Contrastive Learning (NRCL) strategy to stabilize representation learning under severe noise conditions. Evaluations across 24 synthetic and real datasets demonstrate state-of-the-art (SOTA) performance on all three major subtomogram tasks and strong generalization to unseen datasets, advancing scalable and robust subtomogram analysis in cryo-ET.

03 Jan 2023

ai-for-genomics computer-science contrastive-learning

Graph Contrastive Learning for Multi-omics Data

St. Xavier’s College K. J. Somaiya College of Engineering

Advancements in technologies related to working with omics data require novel computation methods to fully leverage information and help develop a better understanding of human diseases. This paper studies the effects of introducing graph contrastive learning to help leverage graph structure and information to produce better representations for downstream classification tasks for multi-omics datasets. We present a learnining framework named Multi-Omics Graph Contrastive Learner(MOGCL) which outperforms several aproaches for integrating multi-omics data for supervised learning tasks. We show that pre-training graph models with a contrastive methodology along with fine-tuning it in a supervised manner is an efficient strategy for multi-omics data classification.

08 Sep 2019

autonomous-vehicles computer-science computer-vision-and-pattern-recognition

Autonomous Underwater Vehicle: Electronics and Software Implementation of the Proton AUV

K. J. Somaiya College of Engineering

The paper deals with the software and the electronics unit for an autonomous underwater vehicle. The implementation in the electronics unit is the connection and communication between SBC, pixhawk controller and other sensory hardware and actuators. The major implementation of the software unit is the algorithm for object detection based on Convolutional Neural Network (CNN) and its models. The Hyperparameters were tuned according to Odroid Xu4 for various models. The maneuvering algorithm uses the MAVLink protocol of the ArduSub project for movement and its simulation.

11 Dec 2024

ai-for-health computer-science computer-vision-security

MT3DNet: Multi-Task learning Network for 3D Surgical Scene Reconstruction

Sejong University K. J. Somaiya College of Engineering R.J. College Korean institute of Oriental Medicine

In image-assisted minimally invasive surgeries (MIS), understanding surgical scenes is vital for real-time feedback to surgeons, skill evaluation, and improving outcomes through collaborative human-robot procedures. Within this context, the challenge lies in accurately detecting, segmenting, and estimating the depth of surgical scenes depicted in high-resolution images, while simultaneously reconstructing the scene in 3D and providing segmentation of surgical instruments along with detection labels for each instrument. To address this challenge, a novel Multi-Task Learning (MTL) network is proposed for performing these tasks concurrently. A key aspect of this approach involves overcoming the optimization hurdles associated with handling multiple tasks concurrently by integrating a Adversarial Weight Update into the MTL framework, the proposed MTL model achieves 3D reconstruction through the integration of segmentation, depth estimation, and object detection, thereby enhancing the understanding of surgical scenes, which marks a significant advancement compared to existing studies that lack 3D capabilities. Comprehensive experiments on the EndoVis2018 benchmark dataset underscore the adeptness of the model in efficiently addressing all three tasks, demonstrating the efficacy of the proposed techniques.

23 Feb 2022

computer-science computer-vision-and-pattern-recognition image-and-video-processing

A Method for Waste Segregation using Convolutional Neural Networks

K. J. Somaiya College of Engineering

Segregation of garbage is a primary concern in many nations across the world. Even though we are in the modern era, many people still do not know how to distinguish between organic and recyclable waste. It is because of this that the world is facing a major crisis of waste disposal. In this paper, we try to use deep learning algorithms to help solve this problem of waste classification. The waste is classified into two categories like organic and recyclable. Our proposed model achieves an accuracy of 94.9%. Although the other two models also show promising results, the Proposed Model stands out with the greatest accuracy. With the help of deep learning, one of the greatest obstacles to efficient waste management can finally be removed.

26 Oct 2022

high-energy-astrophysical-phenomena physics

Exploring Short-Term Optical Variability of Blazars Using $\textit{TESS}$

Indian Institute of Technology, Bombay Indian Institute of Science Education and Research Tezpur University Universität Potsdam Indian Institute of Information Technology Leibniz-Institut für Astrophysik Potsdam Institute of Nuclear Physics, Polish Academy of Sciences K. J. Somaiya College of Engineering

Sarvesh Gharat

We present a first systematic time series study of a sample of blazars observed by the Transiting Exoplanet Survey Satellite

\textit{TESS}

spacecraft. By cross matching the positions of the sources in the TESS observations with those from Roma-BZCAT, 29 blazars including both BL Lacerate objects and flat-spectrum radio quasars were identified. The observation lengths of the 79 light curves of the sources, across all sectors on which the targets of interest have been observed by

\textit{TESS}

, range between 21.25 and 28.2 days. The light curves were analyzed using various methods of time series analysis. The results show that the sources exhibit significant variability with fractional variability spanning between 1.41% and 53.84%. The blazar flux distributions were studied by applying normal and lognormal probability density function models. The results indicate that optical flux histogram of the sources are consistent with normal probability density function with most of them following bi-modal distribution as opposed to uni-modal distribution. This suggests that the days-timescale optical variability is contributed either by two different emission zones or two distinct states of short-term activity in blazars. Power spectral density analysis was performed by using the power spectral response method and the true power spectra of unevenly sampled light curves were estimated. The power spectral slopes of the light curves ranged from 1.7 to 3.2.

07 Nov 2022

autonomous-vehicles computer-science computer-vision-security

Automatic Number Plate Recognition (ANPR) with YOLOv3-CNN

University of Mumbai K. J. Somaiya College of Engineering

We present a YOLOv3-CNN pipeline for detecting vehicles, segregation of number plates, and local storage of final recognized characters. Vehicle identification is performed under various image correction schemes to determine the effect of environmental factors (angle of perception, luminosity, motion-blurring, and multi-line custom font etc.). A YOLOv3 object detection model was trained to identify vehicles from a dataset of traffic images. A second YOLOv3 layer was trained to identify number plates from vehicle images. Based upon correction schemes, individual characters were segregated and verified against real-time data to calculate accuracy of this approach. While characters under direct view were recognized accurately, some numberplates affected by environmental factors had reduced levels of accuracy. We summarize the results under various environmental factors against real-time data and produce an overall accuracy of the pipeline model.

01 Jul 2021

computer-science distributed-parallel-and-cluster-computing machine-learning

Continual Distributed Learning for Crisis Management

Manipal Institute of Technology University of Massachusetts Amherst K. J. Somaiya College of Engineering

Social media platforms such as Twitter, Facebook etc can be utilised as an important source of information during disaster events. This information can be used for disaster response and crisis management if processed accurately and quickly. However, the data present in such situations is ever-changing, and using considerable resources during such a crisis is not feasible. Therefore, we have to develop a low resource and continually learning system that incorporates text classification models which are robust against noisy and unordered data. We utilised Distributed learning which enabled us to learn on resource-constrained devices, then to alleviate catastrophic forgetting in our target neural networks we utilized regularization. We then applied federated averaging for distributed learning and to aggregate the central model for continual learning.

29 Oct 2024

electrical-engineering

Comparative Analysis of PI and PID Controllers for Level and Flow Control in Coupled Tank Systems

K. J. Somaiya College of Engineering

The comparative study of Proportional-Integral (PI) and Proportional-Integral-Derivative (PID) controllers applied to level and flow control in coupled tank systems is presented in this research work. The coupled tank system, characterized by its nonlinear behavior, was selected due to its relevance in chemical processing industries where precision in liquid level control is crucial. The study evaluates the performance of both controllers under varying conditions, focusing on their ability to handle disturbances and maintain stability. Through experimental data and graphical analysis, it was observed that PID controllers, with their derivative action, provide faster response times and higher accuracy but are more sensitive to noise and harder to tune. In contrast, PI controllers, though slower, offer more stability and are easier to configure for systems where precise control is less critical. These findings highlight the trade-offs between the two control strategies, providing insights into their application depending on system requirements.

10 Oct 2019

autonomous-vehicles computer-science computer-vision-and-pattern-recognition

Self Driving RC Car using Behavioral Cloning

K. J. Somaiya College of Engineering

Self Driving Car technology is a vehicle that guides itself without human conduction. The first truly autonomous cars appeared in the 1980s with projects funded by DARPA( Defense Advance Research Project Agency ). Since then a lot has changed with the improvements in the fields of Computer Vision and Machine Learning. We have used the concept of behavioral cloning to convert a normal RC model car into an autonomous car using Deep Learning technology

07 Aug 2012

computer-science multimedia sound

Analysis of a Modern Voice Morphing Approach using Gaussian Mixture Models for Laryngectomees

University of Wisconsin-Madison K. J. Somaiya College of Engineering

This paper proposes a voice morphing system for people suffering from Laryngectomy, which is the surgical removal of all or part of the larynx or the voice box, particularly performed in cases of laryngeal cancer. A primitive method of achieving voice morphing is by extracting the source's vocal coefficients and then converting them into the target speaker's vocal parameters. In this paper, we deploy Gaussian Mixture Models (GMM) for mapping the coefficients from source to destination. However, the use of the traditional/conventional GMM-based mapping approach results in the problem of over-smoothening of the converted voice. Thus, we hereby propose a unique method to perform efficient voice morphing and conversion based on GMM,which overcomes the traditional-method effects of over-smoothening. It uses a technique of glottal waveform separation and prediction of excitations and hence the result shows that not only over-smoothening is eliminated but also the transformed vocal tract parameters match with the target. Moreover, the synthesized speech thus obtained is found to be of a sufficiently high quality. Thus, voice morphing based on a unique GMM approach has been proposed and also critically evaluated based on various subjective and objective evaluation parameters. Further, an application of voice morphing for Laryngectomees which deploys this unique approach has been recommended by this paper.

28 Oct 2021

computer-science computer-vision-and-pattern-recognition embedding-methods

Facial Emotion Recognition: A multi-task approach using deep learning

K. J. Somaiya College of Engineering

Facial Emotion Recognition is an inherently difficult problem, due to vast differences in facial structures of individuals and ambiguity in the emotion displayed by a person. Recently, a lot of work is being done in the field of Facial Emotion Recognition, and the performance of the CNNs for this task has been inferior compared to the results achieved by CNNs in other fields like Object detection, Facial recognition etc. In this paper, we propose a multi-task learning algorithm, in which a single CNN detects gender, age and race of the subject along with their emotion. We validate this proposed methodology using two datasets containing real-world images. The results show that this approach is significantly better than the current State of the art algorithms for this task.

There are no more papers matching your filters at the moment.

Events

Personalize Your Feed

Install Browser Extension

We're hiring

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Towards Foundation Models for Cryo-ET Subtomogram Analysis

Graph Contrastive Learning for Multi-omics Data

Autonomous Underwater Vehicle: Electronics and Software Implementation of the Proton AUV

MT3DNet: Multi-Task learning Network for 3D Surgical Scene Reconstruction

A Method for Waste Segregation using Convolutional Neural Networks

Exploring Short-Term Optical Variability of Blazars Using $\textit{TESS}$

Automatic Number Plate Recognition (ANPR) with YOLOv3-CNN

Continual Distributed Learning for Crisis Management

Comparative Analysis of PI and PID Controllers for Level and Flow Control in Coupled Tank Systems

Self Driving RC Car using Behavioral Cloning

Analysis of a Modern Voice Morphing Approach using Gaussian Mixture Models for Laryngectomees

Facial Emotion Recognition: A multi-task approach using deep learning

Events

AI for Law

Personalize Your Feed

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Towards Foundation Models for Cryo-ET Subtomogram Analysis

Graph Contrastive Learning for Multi-omics Data

Autonomous Underwater Vehicle: Electronics and Software Implementation of the Proton AUV

MT3DNet: Multi-Task learning Network for 3D Surgical Scene Reconstruction

A Method for Waste Segregation using Convolutional Neural Networks

Exploring Short-Term Optical Variability of Blazars Using TESS\textit{TESS}TESS

Automatic Number Plate Recognition (ANPR) with YOLOv3-CNN

Continual Distributed Learning for Crisis Management

Comparative Analysis of PI and PID Controllers for Level and Flow Control in Coupled Tank Systems

Self Driving RC Car using Behavioral Cloning

Analysis of a Modern Voice Morphing Approach using Gaussian Mixture Models for Laryngectomees

Facial Emotion Recognition: A multi-task approach using deep learning

Events

AI for Law

Personalize Your Feed

Exploring Short-Term Optical Variability of Blazars Using $\textit{TESS}$