IITB-Monash Research Academy
Recent advancements in deep learning have demonstrated remarkable performance comparable to human capabilities across various supervised computer vision tasks. However, the prevalent assumption of having an extensive pool of training data encompassing all classes prior to model training often diverges from real-world scenarios, where limited data availability for novel classes is the norm. The challenge emerges in seamlessly integrating new classes with few samples into the training data, demanding the model to adeptly accommodate these additions without compromising its performance on base classes. To address this exigency, the research community has introduced several solutions under the realm of few-shot class incremental learning (FSCIL). In this study, we introduce an innovative FSCIL framework that utilizes language regularizer and subspace regularizer. During base training, the language regularizer helps incorporate semantic information extracted from a Vision-Language model. The subspace regularizer helps in facilitating the model's acquisition of nuanced connections between image and text semantics inherent to base classes during incremental training. Our proposed framework not only empowers the model to embrace novel classes with limited data, but also ensures the preservation of performance on base classes. To substantiate the efficacy of our approach, we conduct comprehensive experiments on three distinct FSCIL benchmarks, where our framework attains state-of-the-art performance.
Generalized Class Discovery (GCD) clusters base and novel classes in a target domain using supervision from a source domain with only base classes. Current methods often falter with distribution shifts and typically require access to target data during training, which can sometimes be impractical. To address this issue, we introduce the novel paradigm of Domain Generalization in GCD (DG-GCD), where only source data is available for training, while the target domain, with a distinct data distribution, remains unseen until inference. To this end, our solution, DG2CD-Net, aims to construct a domain-independent, discriminative embedding space for GCD. The core innovation is an episodic training strategy that enhances cross-domain generalization by adapting a base model on tasks derived from source and synthetic domains generated by a foundation model. Each episode focuses on a cross-domain GCD task, diversifying task setups over episodes and combining open-set domain adaptation with a novel margin loss and representation learning for optimizing the feature space progressively. To capture the effects of fine-tuning on the base model, we extend task arithmetic by adaptively weighting the local task vectors concerning the fine-tuned models based on their GCD performance on a validation distribution. This episodic update mechanism boosts the adaptability of the base model to unseen targets. Experiments across three datasets confirm that DG2CD-Net outperforms existing GCD methods customized for DG-GCD.
Neural network-based methods represent the state-of-the-art in question generation from text. Existing work focuses on generating only questions from text without concerning itself with answer generation. Moreover, our analysis shows that handling rare words and generating the most appropriate question given a candidate answer are still challenges facing existing approaches. We present a novel two-stage process to generate question-answer pairs from the text. For the first stage, we present alternatives for encoding the span of the pivotal answer in the sentence using Pointer Networks. In our second stage, we employ sequence to sequence models for question generation, enhanced with rich linguistic features. Finally, global attention and answer encoding are used for generating the question most relevant to the answer. We motivate and linguistically analyze the role of each component in our framework and consider compositions of these. This analysis is supported by extensive experimental evaluations. Using standard evaluation metrics as well as human evaluations, our experimental results validate the significant improvement in the quality of questions generated by our framework over the state-of-the-art. The technique presented here represents another step towards more automated reading comprehension assessment. We also present a live system \footnote{Demo of the system is available at \url{this https URL}.} to demonstrate the effectiveness of our approach.
A study comparing in vitro biological neural networks with deep reinforcement learning algorithms demonstrated that biological cultures achieved superior sample efficiency in learning to play a simplified simulated Pong game. The biological systems learned effectively within approximately 70 game episodes, a performance level not matched by the deep RL algorithms given the same limited training exposure.
In stock trading, feature extraction and trading strategy design are the two important tasks to achieve long-term benefits using machine learning techniques. Several methods have been proposed to design trading strategy by acquiring trading signals to maximize the rewards. In the present paper the theory of deep reinforcement learning is applied for stock trading strategy and investment decisions to Indian markets. The experiments are performed systematically with three classical Deep Reinforcement Learning models Deep Q-Network, Double Deep Q-Network and Dueling Double Deep Q-Network on ten Indian stock datasets. The performance of the models are evaluated and comparison is made.
Image segmentation beyond predefined categories is a key challenge in remote sensing, where novel and unseen classes often emerge during inference. Open-vocabulary image Segmentation addresses these generalization issues in traditional supervised segmentation models while reducing reliance on extensive per-pixel annotations, which are both expensive and labor-intensive to obtain. Most Open-Vocabulary Segmentation (OVS) methods are designed for natural images but struggle with remote sensing data due to scale variations, orientation changes, and complex scene compositions. This necessitates the development of OVS approaches specifically tailored for remote sensing. In this context, we propose AerOSeg, a novel OVS approach for remote sensing data. First, we compute robust image-text correlation features using multiple rotated versions of the input image and domain-specific prompts. These features are then refined through spatial and class refinement blocks. Inspired by the success of the Segment Anything Model (SAM) in diverse domains, we leverage SAM features to guide the spatial refinement of correlation features. Additionally, we introduce a semantic back-projection module and loss to ensure the seamless propagation of SAM's semantic information throughout the segmentation pipeline. Finally, we enhance the refined correlation features using a multi-scale attention-aware decoder to produce the final segmentation map. We validate our SAM-guided Open-Vocabulary Remote Sensing Segmentation model on three benchmark remote sensing datasets: iSAID, DLRSD, and OpenEarthMap. Our model outperforms state-of-the-art open-vocabulary segmentation methods, achieving an average improvement of 2.54 h-mIoU.
Fisher proved in 1940 that any 22-(v,k,λ)(v,k,\lambda) design with v>kv>k has at least vv blocks. In 1975 Ray-Chaudhuri and Wilson generalised this result by showing that every tt-(v,k,λ)(v,k,\lambda) design with vk+t/2v \geq k+\lfloor t/2 \rfloor has at least (vt/2)\binom{v}{\lfloor t/2 \rfloor} blocks. By combining methods used by Bose and Wilson in proofs of these results, we obtain new lower bounds on the size of tt-(v,k,λ)(v,k,\lambda) coverings. Our results generalise lower bounds on the size of 22-(v,k,λ)(v,k,\lambda) coverings recently obtained by the first author.
We present the Civique system for emergency detection in urban areas by monitoring micro blogs like Tweets. The system detects emergency related events, and classifies them into appropriate categories like "fire", "accident", "earthquake", etc. We demonstrate our ideas by classifying Twitter posts in real time, visualizing the ongoing event on a map interface and alerting users with options to contact relevant authorities, both online and offline. We evaluate our classifiers for both the steps, i.e., emergency detection and categorization, and obtain F-scores exceeding 70% and 90%, respectively. We demonstrate Civique using a web interface and on an Android application, in realtime, and show its use for both tweet detection and visualization.
The dynamics of adhesion of a spherical micro-particle to a ligand-coated wall, in shear flow, is studied using a Langevin equation that accounts for thermal fluctuations, hydrodynamic interactions and adhesive interactions. Contrary to the conventional assumption that thermal fluctuations play a negligible role at high Peˊ\acute{e}clet numbers, we find that for particles with low surface densities of receptors, rotational diffusion caused by fluctuations about the flow and gradient directions aids in bond formation, leading to significantly greater adhesion on average, compared to simulations where thermal fluctuations are completely ignored. The role of wall hydrodynamic interactions on the steady state motion of a particle, when the particle is close to the wall, has also been explored. At high Peˊ\acute{e}clet numbers, the shear induced force that arises due to the stresslet part of the Stokes dipole, plays a dominant role, reducing the particle velocity significantly, and affecting the states of motion of the particle. The coupling between the translational and rotational degrees of freedom of the particle, brought about by the presence of hydrodynamic interactions, is found to have no influence on the binding dynamics. On the other hand, the drag coefficient, which depends on the distance of the particle from the wall, plays a crucial role at low rates of bond formation. A significant difference in the effect of both the shear force and the position dependent drag force, on the states of motion of the particle, is observed when the Peˊ\acute{e}let number is small.
Alcohol abuse may lead to unsociable behavior such as crime, drunk driving, or privacy leaks. We introduce automatic drunk-texting prediction as the task of identifying whether a text was written when under the influence of alcohol. We experiment with tweets labeled using hashtags as distant supervision. Our classifiers use a set of N-gram and stylistic features to detect drunk tweets. Our observations present the first quantitative evidence that text contains signals that can be exploited to detect drunk-texting.
This paper makes a simple increment to state-of-the-art in sarcasm detection research. Existing approaches are unable to capture subtle forms of context incongruity which lies at the heart of sarcasm. We explore if prior work can be enhanced using semantic similarity/discordance between word embeddings. We augment word embedding-based features to four feature sets reported in the past. We also experiment with four types of word embeddings. We observe an improvement in sarcasm detection, irrespective of the word embedding used or the original feature set to which our features are augmented. For example, this augmentation results in an improvement in F-score of around 4\% for three out of these four feature sets, and a minor degradation in case of the fourth, when Word2Vec embeddings are used. Finally, a comparison of the four embeddings shows that Word2Vec and dependency weight-based features outperform LSA and GloVe, in terms of their benefit to sarcasm detection.
The generation of electrical power from Vortex-Induced Vibration (VIV) of a cylinder is investigated numerically. The cylinder is free to oscillate in the direction transverse to the incoming flow. The cylinder is attached to a magnet that can move along the axis of a coil made from conducting wire. The magnet and the coil together constitute a basic electrical generator. When the cylinder undergoes VIV, the motion of the magnet creates a voltage across the coil, which is connected to a resistive load. By Lenz's law, induced current in the coil applies a retarding force to the magnet. Effectively, the electrical generator applies a damping force on the cylinder with a spatially varying damping coefficient. For the initial investigation reported here, the Reynolds number is restricted to Re < 200, so that the flow is laminar and two-dimensional (2D). The incompressible 2D Navier-Stokes equations are solved using an extensively validated spectral-element based solver. The effects of the electromagnetic (EM) damping constant xi_m, coil dimensions (radius a, length L), and mass ratio on the electrical power extracted are quantified. It is found that there is an optimal value of xi_m (xi_opt) at which maximum electrical power is generated. As the radius or length of the coil is increased, the value of xi_opt is observed to increase. Although the maximum average power remains the same, a larger coil radius or length results in a more robust system in the sense that a relatively large amount of power can be extracted when xi_m is far from xi_opt, unlike the constant damping ratio case. The average power output is also a function of Reynolds number, primarily through the increased maximum oscillation amplitude that occurs with increased Reynolds number at least within the laminar range, although the general qualitative findings seem likely to carry across to high Reynolds number VIV.
The concept of a `persistent worm' is introduced, representing the smallest possible length of a wormlike micelle, and modelled by a bead-spring chain with sticky beads at the ends. Persistent worms are allowed to combine with each other at their sticky ends to form wormlike micelles with a distribution of lengths, and the semiflexibility of a wormlike micelle is captured with a bending potential between springs, both within and across persistent worms that stick to each other. Multi-particle Brownian dynamics simulations of such polydisperse and `polyflexible' wormlike micelles, with hydrodynamic interactions included and coupled with reversible scission/fusion of persistent worms, are used to investigate the static and dynamic properties of wormlike micellar solutions in the dilute and unentangled semidilute concentration regimes. The influence of the sticker energy and persistent worm concentration are examined and simulations are shown to validate theoretical mean-field predictions of the universal scaling with concentration of the chain length distribution of linear wormlike micelles, independent of the sticker energy. The presence of wormlike micelles that form rings is shown not to affect the static properties of linear wormlike micelles, and mean-field predictions of ring length distributions are validated. Linear viscoelastic storage and loss moduli are computed and the unique features in the intermediate frequency regime compared to those of homopolymer solutions are highlighted. The distinction between Rouse and Zimm dynamics in wormlike micelle solutions is elucidated, with a clear identification of the onset of the screening of hydrodynamic interactions with increasing concentration.
In this paper, we propose a novel mechanism for enriching the feature vector, for the task of sarcasm detection, with cognitive features extracted from eye-movement patterns of human readers. Sarcasm detection has been a challenging research problem, and its importance for NLP applications such as review summarization, dialog systems and sentiment analysis is well recognized. Sarcasm can often be traced to incongruity that becomes apparent as the full sentence unfolds. This presence of incongruity- implicit or explicit- affects the way readers eyes move through the text. We observe the difference in the behaviour of the eye, while reading sarcastic and non sarcastic sentences. Motivated by his observation, we augment traditional linguistic and stylistic features for sarcasm detection with the cognitive features obtained from readers eye movement data. We perform statistical classification using the enhanced feature set so obtained. The augmented cognitive features improve sarcasm detection by 3.7% (in terms of F-score), over the performance of the best reported system.
Wordnets are rich lexico-semantic resources. Linked wordnets are extensions of wordnets, which link similar concepts in wordnets of different languages. Such resources are extremely useful in many Natural Language Processing (NLP) applications, primarily those based on knowledge-based approaches. In such approaches, these resources are considered as gold standard/oracle. Thus, it is crucial that these resources hold correct information. Thereby, they are created by human experts. However, human experts in multiple languages are hard to come by. Thus, the community would benefit from sharing of such manually created resources. In this paper, we release mappings of 18 Indian language wordnets linked with Princeton WordNet. We believe that availability of such resources will have a direct impact on the progress in NLP for these languages.
This paper presents a hybrid active inference model that dynamically balances computation-heavy planning with experience-based reactions. The model leverages a state-dependent bias parameter to switch between planning (DPEFE) and learning (CL) strategies, demonstrating improved adaptability and efficiency in diverse environments like the Cart Pole and mutating Grid World tasks.
Dense word vectors or 'word embeddings' which encode semantic properties of words, have now become integral to NLP tasks like Machine Translation (MT), Question Answering (QA), Word Sense Disambiguation (WSD), and Information Retrieval (IR). In this paper, we use various existing approaches to create multiple word embeddings for 14 Indian languages. We place these embeddings for all these languages, viz., Assamese, Bengali, Gujarati, Hindi, Kannada, Konkani, Malayalam, Marathi, Nepali, Odiya, Punjabi, Sanskrit, Tamil, and Telugu in a single repository. Relatively newer approaches that emphasize catering to context (BERT, ELMo, etc.) have shown significant improvements, but require a large amount of resources to generate usable models. We release pre-trained embeddings generated using both contextual and non-contextual approaches. We also use MUSE and XLM to train cross-lingual embeddings for all pairs of the aforementioned languages. To show the efficacy of our embeddings, we evaluate our embedding models on XPOS, UPOS and NER tasks for all these languages. We release a total of 436 models using 8 different approaches. We hope they are useful for the resource-constrained Indian language NLP. The title of this paper refers to the famous novel 'A Passage to India' by E.M. Forster, published initially in 1924.
Cognates are variants of the same lexical form across different languages; for example 'fonema' in Spanish and 'phoneme' in English are cognates, both of which mean 'a unit of sound'. The task of automatic detection of cognates among any two languages can help downstream NLP tasks such as Cross-lingual Information Retrieval, Computational Phylogenetics, and Machine Translation. In this paper, we demonstrate the use of cross-lingual word embeddings for detecting cognates among fourteen Indian Languages. Our approach introduces the use of context from a knowledge graph to generate improved feature representations for cognate detection. We, then, evaluate the impact of our cognate detection mechanism on neural machine translation (NMT), as a downstream task. We evaluate our methods to detect cognates on a challenging dataset of twelve Indian languages, namely, Sanskrit, Hindi, Assamese, Oriya, Kannada, Gujarati, Tamil, Telugu, Punjabi, Bengali, Marathi, and Malayalam. Additionally, we create evaluation datasets for two more Indian languages, Konkani and Nepali. We observe an improvement of up to 18% points, in terms of F-score, for cognate detection. Furthermore, we observe that cognates extracted using our method help improve NMT quality by up to 2.76 BLEU. We also release our code, newly constructed datasets and cross-lingual models publicly.
Whispering gallery mode (WGM) microdisk resonators are promising optical devices that confine light efficiently and enable enhanced nonlinear optical effects. This work presents a novel approach to reduce sidewall roughness in SiO\textsubscript{2} microdisk resonators using focused ion beam (FIB) polishing. The microdisks, with varying diameter ranging from 5 to 20 μ\mum are fabricated using a multi-step fabrication scheme. However, the etching process introduces significant sidewall roughness, which increases with decreasing microdisk radius, degrading the resonators' quality. To address this issue, a FIB system is employed to polish the sidewalls, using optimized process parameters to minimize Ga ion implantation. White light interferometry measurements reveal a significant reduction in surface roughness from 7 nm to 20 nm for a 5 μ\mum diameter microdisk, leading to a substantial enhancement in the scattering quality factor (Qss) from 3×1023\times 10^2 to 2×1062\times 10^6. These findings demonstrate the effectiveness of FIB polishing in improving the quality of microdisk resonators and open up new possibilities for the fabrication of advanced photonic devices.
Active inference has emerged as an alternative approach to control problems given its intuitive (probabilistic) formalism. However, despite its theoretical utility, computational implementations have largely been restricted to low-dimensional, deterministic settings. This paper highlights that this is a consequence of the inability to adequately model stochastic transition dynamics, particularly when an extensive policy (i.e., action trajectory) space must be evaluated during planning. Fortunately, recent advancements propose a modified planning algorithm for finite temporal horizons. We build upon this work to assess the utility of active inference for a stochastic control setting. For this, we simulate the classic windy grid-world task with additional complexities, namely: 1) environment stochasticity; 2) learning of transition dynamics; and 3) partial observability. Our results demonstrate the advantage of using active inference, compared to reinforcement learning, in both deterministic and stochastic settings.
There are no more papers matching your filters at the moment.