Dhirubhai Ambani Institute of Information and Communication Technology
Unlike the courts in western countries, public records of Indian judiciary are completely unstructured and noisy. No large scale publicly available annotated datasets of Indian legal documents exist till date. This limits the scope for legal analytics research. In this work, we propose a new dataset consisting of over 10,000 judgements delivered by the supreme court of India and their corresponding hand written summaries. The proposed dataset is pre-processed by normalising common legal abbreviations, handling spelling variations in named entities, handling bad punctuations and accurate sentence tokenization. Each sentence is tagged with their rhetorical roles. We also annotate each judgement with several attributes like date, names of the plaintiffs, defendants and the people representing them, judges who delivered the judgement, acts/statutes that are cited and the most common citations used to refer the judgement. Further, we propose an automatic labelling technique for identifying sentences which have summary worthy information. We demonstrate that this auto labeled data can be used effectively to train a weakly supervised sentence extractor with high accuracy. Some possible applications of this dataset besides legal document summarization can be in retrieval, citation analysis and prediction of decisions by a particular judge.
In recent years, federated learning (FL) has emerged as a promising technique for training machine learning models in a decentralized manner while also preserving data privacy. The non-independent and identically distributed (non-i.i.d.) nature of client data, coupled with constraints on client or edge devices, presents significant challenges in FL. Furthermore, learning across a high number of communication rounds can be risky and potentially unsafe for model exploitation. Traditional FL approaches may suffer from these challenges. Therefore, we introduce FedSiKD, which incorporates knowledge distillation (KD) within a similarity-based federated learning framework. As clients join the system, they securely share relevant statistics about their data distribution, promoting intra-cluster homogeneity. This enhances optimization efficiency and accelerates the learning process, effectively transferring knowledge between teacher and student models and addressing device constraints. FedSiKD outperforms state-of-the-art algorithms by achieving higher accuracy, exceeding by 25\% and 18\% for highly skewed data at α=0.1,0.5\alpha = {0.1,0.5} on the HAR and MNIST datasets, respectively. Its faster convergence is illustrated by a 17\% and 20\% increase in accuracy within the first five rounds on the HAR and MNIST datasets, respectively, highlighting its early-stage learning proficiency. Code is publicly available and hosted on GitHub (this https URL)
The unprecedented growth in the easy availability of photo-editing tools has endangered the power of digital this http URL image was supposed to be worth more than a thousand words,but now this can be said only if it can be authenticated orthe integrity of the image can be proved to be intact. In thispaper, we propose a digital image forensic technique for JPEG images. It can detect any forgery in the image if the forged portion called a ghost image is having a compression quality different from that of the cover image. It is based on resaving the JPEG image at different JPEG qualities, and the detection of the forged portion is maximum when it is saved at the same JPEG quality as the cover image. Also, we can precisely predictthe JPEG quality of the cover image by analyzing the similarity using Structural Similarity Index Measure (SSIM) or the energyof the images. The first maxima in SSIM or the first minima inenergy correspond to the cover image JPEG quality. We created adataset for varying JPEG compression qualities of the ghost and the cover images and validated the scalability of the experimental this http URL also, experimented with varied attack scenarios, e.g. high-quality ghost image embedded in low quality of cover image,low-quality ghost image embedded in high-quality of cover image,and ghost image and cover image both at the same this http URL proposed method is able to localize the tampered portions accurately even for forgeries as small as 10x10 sized pixel this http URL technique is also robust against other attack scenarios like copy-move forgery, inserting text into image, rescaling (zoom-out/zoom-in) ghost image and then pasting on cover image.
We propose a framework for achieving perfect synchronization in complex networks of Sakaguchi-Kuramoto oscillators in presence of higher order interactions (simplicial complexes) at a targeted point in the parameter space. It is achieved by using an analytically derived frequency set from the governing equations. The frequency set not only provides stable perfect synchronization in the network at a desired point, but also proves to be very effective in achieving high level of synchronization around it compared to the choice of any other frequency sets (Uniform, Normal etc.). The proposed framework has been verified using scale-free, random and small world networks. In all the cases, stable perfect synchronization is achieved at a targeted point for wide ranges of the coupling parameters and phase-frustration. Both first and second order transitions to synchronizations are observed in the system depending on the type of the network and phase frustration. The stability of perfect synchronization state is checked using the low dimensional reduction approach. The robustness of the perfect synchronization state obtained in the system using the derived frequency set is checked by introducing a Gaussian noise around it.
The Internet has been weaponized to carry out cybercriminal activities at an unprecedented pace. The rising concerns for preserving the privacy of personal data while availing modern tools and technologies is alarming. End-to-end encrypted solutions are in demand for almost all commercial platforms. On one side, it seems imperative to provide such solutions and give people trust to reliably use these platforms. On the other side, this creates a huge opportunity to carry out unchecked cybercrimes. This paper proposes a robust video hashing technique, scalable and efficient in chalking out matches from an enormous bulk of videos floating on these commercial platforms. The video hash is validated to be robust to common manipulations like scaling, corruptions by noise, compression, and contrast changes that are most probable to happen during transmission. It can also be transformed into the encrypted domain and work on top of encrypted videos without deciphering. Thus, it can serve as a potential forensic tool that can trace the illegal sharing of videos without knowing the underlying content. Hence, it can help preserve privacy and combat cybercrimes such as revenge porn, hateful content, child abuse, or illegal material propagated in a video.
Short-term rainfall forecasting, also known as precipitation nowcasting has become a potentially fundamental technology impacting significant real-world applications ranging from flight safety, rainstorm alerts to farm irrigation timings. Since weather forecasting involves identifying the underlying structure in a huge amount of data, deep-learning based precipitation nowcasting has intuitively outperformed the traditional linear extrapolation methods. Our research work intends to utilize the recent advances in deep learning to nowcasting, a multi-variable time series forecasting problem. Specifically, we leverage a bidirectional LSTM (Long Short-Term Memory) neural network architecture which remarkably captures the temporal features and long-term dependencies from historical data. To further our studies, we compare the bidirectional LSTM network with 1D CNN model to prove the capabilities of sequence models over feed-forward neural architectures in forecasting related problems.
RNA secondary structure prediction and classification are two important problems in the field of RNA biology. Here, we propose a new permutation based approach to create logical non-disjoint clusters of different secondary structures of a single class or type. Many different types of techniques exist to classify RNA secondary structure data but none of them have ever used permutation based approach which is very simple and yet powerful. We have written a small JAVA program to generate permutation, apply our algorithm on those permutations and analyze the data and create different logical clusters. We believe that these clusters can be utilized to untangle the mystery of RNA secondary structure and analyze the development patterns of unknown RNA.
Unlike the courts in western countries, public records of Indian judiciary are completely unstructured and noisy. No large scale publicly available annotated datasets of Indian legal documents exist till date. This limits the scope for legal analytics research. In this work, we propose a new dataset consisting of over 10,000 judgements delivered by the supreme court of India and their corresponding hand written summaries. The proposed dataset is pre-processed by normalising common legal abbreviations, handling spelling variations in named entities, handling bad punctuations and accurate sentence tokenization. Each sentence is tagged with their rhetorical roles. We also annotate each judgement with several attributes like date, names of the plaintiffs, defendants and the people representing them, judges who delivered the judgement, acts/statutes that are cited and the most common citations used to refer the judgement. Further, we propose an automatic labelling technique for identifying sentences which have summary worthy information. We demonstrate that this auto labeled data can be used effectively to train a weakly supervised sentence extractor with high accuracy. Some possible applications of this dataset besides legal document summarization can be in retrieval, citation analysis and prediction of decisions by a particular judge.
DNA self-assembly is a robust and programmable approach for building structures at nanoscale. Researchers around the world have proposed and implemented different techniques to build two dimensional and three dimensional nano structures. One such technique involves the implementation of DNA Bricks proposed by Ke et al., 2012 to create complex three-dimensional (3D) structures. Modeling these DNA nano structures can prove to be a cumbersome and tedious task. Exploiting the programmability of base-pairing to produce self-assembling custom shapes, we present a software suite 3DNA, which can be used for modeling, editing and visualizing such complex structures. 3DNA is an open source software which works on the simple and modular self assembly of DNA Bricks, offering a more intuitive better approach for constructing 3D shapes. Apart from modeling and envisaging shapes through a simple graphical user interface, 3DNA also supports an integrated random sequence generator that generates DNA sequences corresponding to the designed model. The software is available at www.guptalab.org/3dna
Music accounts for a significant chunk of interest among various online activities. This is reflected by wide array of alternatives offered in music related web/mobile apps, information portals, featuring millions of artists, songs and events attracting user activity at similar scale. Availability of large scale structured and unstructured data has attracted similar level of attention by data science community. This paper attempts to offer current state-of-the-art in music related analysis. Various approaches involving machine learning, information theory, social network analysis, semantic web and linked open data are represented in the form of taxonomy along with data sources and use cases addressed by the research community.
From the stock markets of six countries with high GDP, we study the stock indices, S&P 500 (NYSE, USA), SSE Composite (SSE, China), Nikkei (TSE, Japan), DAX (FSE, Germany), FTSE 100 (LSE, Britain) and NIFTY (NSE, India). The daily mean growth of the stock values is exponential. The daily price fluctuations about the mean growth are Gaussian, but with a non-zero asymptotic convergence. The growth of the monthly average of stock values is statistically self-similar to their daily growth. The monthly fluctuations of the price follow a Wiener process, with a decline of the volatility. The mean growth of the daily volume of trade is exponential. These observations are globally applicable and underline regularities across global stock markets.
In this paper, we propose Zero Aware Configurable Data Encoding by Skipping Transfer (ZAC-DEST), a data encoding scheme to reduce the energy consumption of DRAM channels, specifically targeted towards approximate computing and error resilient applications. ZAC-DEST exploits the similarity between recent data transfers across channels and information about the error resilience behavior of applications to reduce on-die termination and switching energy by reducing the number of 1's transmitted over the channels. ZAC-DEST also provides a number of knobs for trading off the application's accuracy for energy savings, and vice versa, and can be applied to both training and inference. We apply ZAC-DEST to five machine learning applications. On average, across all applications and configurations, we observed a reduction of 4040% in termination energy and 3737% in switching energy as compared to the state of the art data encoding technique BD-Coder with an average output quality loss of 1010%. We show that if both training and testing are done assuming the presence of ZAC-DEST, the output quality of the applications can be improved upto 9 times as compared to when ZAC-DEST is only applied during testing leading to energy savings during training and inference with increased output quality.
Digital technology has made possible unimaginable applications come true. It seems exciting to have a handful of tools for easy editing and manipulation, but it raises alarming concerns that can propagate as speech clones, duplicates, or maybe deep fakes. Validating the authenticity of a speech is one of the primary problems of digital audio forensics. We propose an approach to distinguish human speech from AI synthesized speech exploiting the Bi-spectral and Cepstral analysis. Higher-order statistics have less correlation for human speech in comparison to a synthesized speech. Also, Cepstral analysis revealed a durable power component in human speech that is missing for a synthesized speech. We integrate both these analyses and propose a machine learning model to detect AI synthesized speech.
This study aims to develop a generalised concept that will enable double explosive transitions in the forward and backward directions or a combination thereof. We found two essential factors for generating such phase transitions: the use of higher-order (triadic) interactions and the partial adaptation of a global order parameter acting on the triadic coupling. A compromise between the two factors may result in a double explosive transition. To reinforce numerical observations, we employed the Ott--Antonsen ansatz. We observed that for a wide class of hypergraphs, combining two elements can result in a double explosive transition.
In this work, we apply topic modeling using Non-Negative Matrix Factorization (NMF) on the COVID-19 Open Research Dataset (CORD-19) to uncover the underlying thematic structure and its evolution within the extensive body of COVID-19 research literature. NMF factorizes the document-term matrix into two non-negative matrices, effectively representing the topics and their distribution across the documents. This helps us see how strongly documents relate to topics and how topics relate to words. We describe the complete methodology which involves a series of rigorous pre-processing steps to standardize the available text data while preserving the context of phrases, and subsequently feature extraction using the term frequency-inverse document frequency (tf-idf), which assigns weights to words based on their frequency and rarity in the dataset. To ensure the robustness of our topic model, we conduct a stability analysis. This process assesses the stability scores of the NMF topic model for different numbers of topics, enabling us to select the optimal number of topics for our analysis. Through our analysis, we track the evolution of topics over time within the CORD-19 dataset. Our findings contribute to the understanding of the knowledge structure of the COVID-19 research landscape, providing a valuable resource for future research in this field.
Cluster synchronization in multilayer networks of phase oscillators with phase-lag poses significant challenges due to the destabilizing effects of delayed interactions. Leveraging the Sakaguchi-Kuramoto model, this study addresses these challenges by systematically exploring the role of natural frequency distributions in sustaining cluster synchronization under high phase-lag conditions. We focus on four distributions: uniform (uni-uni), partially degree-correlated (deg-uni, uni-deg), and fully degree-correlated (deg-deg), where oscillators' intrinsic frequencies align with their network connectivity. Through numerical and analytical investigations, we demonstrate that the deg-deg distribution, where both layers employ degree-matched frequencies, remarkably enhances synchronization stability, outperforming other configurations. We analyze two distinct network architectures: one composed entirely of nontrivial clusters and another combining trivial and nontrivial clusters. Results reveal that structural heterogeneity encoded in the deg-deg coupling counteracts phase-lag-induced desynchronization, enabling robust cluster synchronization even at large phase-lag values. Stability is rigorously validated via transverse Lyapunov exponents (TLEs), which confirm that deg-deg networks exhibit broader synchronization regimes compared to uniform or partially correlated systems. These findings provide critical insights into the interplay between topological heterogeneity and dynamical resilience, offering a framework for designing robust multilayer systems from delay-tolerant power grids to adaptive biological networks, where synchronization under phase-lag is paramount.
Use of formal techniques for verifying the security features of electronic commerce protocols would facilitate, the enhancement of reliability of such protocols, thereby increasing their usability. This paper projects the application of logic programming techniques for formal verification of a well referred security and transactions protocol, the NetBill. The paper uses ALSP (Action Language for Security Protocols) as an efficient formal specification language and SMODELS a model generator to formally analyze and plan attacks on the protocol.
Brain research has been driven by enquiry for principles of brain structure organization and its control mechanisms. The neuronal wiring map of C. elegans, the only complete connectome available till date, presents an incredible opportunity to learn basic governing principles that drive structure and function of its neuronal architecture. Despite its apparently simple nervous system, C. elegans is known to possess complex functions. The neuronal architecture forms an important underlying framework which specifies phenotypic features associated to sensation, movement, conditioning and memory. In this study, with the help of graph theoretical models, we investigated the C. elegans neuronal network to identify network features that are critical for its control. The 'driver neurons' are associated with important biological functions such as reproduction, signalling processes and anatomical structural development. We created 1D and 2D network models of C. elegans neuronal system to probe the role of features that confer controllability and small world nature. The simple 1D ring model is critically poised for the number of feed forward motifs, neuronal clustering and characteristic path-length in response to synaptic rewiring, indicating optimal rewiring. Using empirically observed distance constraint in the neuronal network as a guiding principle, we created a distance constrained synaptic plasticity model that simultaneously explains small world nature, saturation of feed forward motifs as well as observed number of driver neurons. The distance constrained model suggests optimum long distance synaptic connections as a key feature specifying control of the network.
The recent advancements in generative artificial speech models have made possible the generation of highly realistic speech signals. At first, it seems exciting to obtain these artificially synthesized signals such as speech clones or deep fakes but if left unchecked, it may lead us to digital dystopia. One of the primary focus in audio forensics is validating the authenticity of a speech. Though some solutions are proposed for English speeches but the detection of synthetic Hindi speeches have not gained much attention. Here, we propose an approach for discrimination of AI synthesized Hindi speech from an actual human speech. We have exploited the Bicoherence Phase, Bicoherence Magnitude, Mel Frequency Cepstral Coefficient (MFCC), Delta Cepstral, and Delta Square Cepstral as the discriminating features for machine learning models. Also, we extend the study to using deep neural networks for extensive experiments, specifically VGG16 and homemade CNN as the architecture models. We obtained an accuracy of 99.83% with VGG16 and 99.99% with homemade CNN models.
The algorithm to compute theory prime implicates, a generalization of prime implicates, in propositional logic has been suggested in \cite{Marquis}. In this paper we have extended that algorithm to compute theory prime implicates of a knowledge base XX with respect to another knowledge base Y\Box Y using \cite{Bienvenu}, where YY is a propositional knowledge base and XYX\models Y, in modal system T\mathcal{T} and we have also proved its correctness. We have also proved that it is an equivalence preserving knowledge compilation and the size of theory prime implicates of XX with respect to Y\Box Y is less than the size of the prime implicates of XYX\cup\Box Y.
There are no more papers matching your filters at the moment.