Fidelity Investments
The TEXT2ZINC dataset was created to evaluate Large Language Models (LLMs) in translating natural language problem descriptions into MiniZinc models for a range of optimization and satisfaction problems. Experiments using GPT-4 with various prompting strategies demonstrated that LLMs are not yet able to consistently generate correct and optimal combinatorial models, achieving a maximum solution accuracy of 25.39%.
FORGE introduces a foundational model for combinatorial optimization that learns general-purpose Mixed-Integer Programming (MIP) instance embeddings through unsupervised pre-training with a vector-quantized graph autoencoder. This framework enhances commercial solvers and existing machine learning methods by significantly reducing primal gaps by up to 85% and accelerating convergence across diverse optimization tasks and problem types.
This work builds upon the long-standing conjecture that linear diffusion models are inadequate for complex market dynamics. Specifically, it provides experimental validation for the author's prior arguments that realistic market dynamics are governed by higher-order (cubic and higher) non-linearities in the drift. As the diffusion drift is given by the negative gradient of a potential function, this means that a non-linear drift translates into a non-quadratic potential. These arguments were based both on general theoretical grounds as well as a structured approach to modeling the price dynamics which incorporates money flows and their impact on market prices. Here, we find direct confirmation of this view by analyzing high-frequency crypto currency data at different time scales ranging from minutes to months. We find that markets can be characterized by either a single-well or a double-well potential, depending on the time period and sampling frequency, where a double-well potential may signal market uncertainty or stress.
Data catalogs serve as repositories for organizing and accessing diverse collection of data assets, but their effectiveness hinges on the ease with which business users can look-up relevant content. Unfortunately, many data catalogs within organizations suffer from limited searchability due to inadequate metadata like asset descriptions. Hence, there is a need of content generation solution to enrich and curate metadata in a scalable way. This paper explores the challenges associated with metadata creation and proposes a unique prompt enrichment idea of leveraging existing metadata content using retrieval based few-shot technique tied with generative large language models (LLM). The literature also considers finetuning an LLM on existing content and studies the behavior of few-shot pretrained LLM (Llama, GPT3.5) vis-\`a-vis few-shot finetuned LLM (Llama2-7b) by evaluating their performance based on accuracy, factual grounding, and toxicity. Our preliminary results exhibit more than 80% Rouge-1 F1 for the generated content. This implied 87%- 88% of instances accepted as is or curated with minor edits by data stewards. By automatically generating descriptions for tables and columns in most accurate way, the research attempts to provide an overall framework for enterprises to effectively scale metadata curation and enrich its data catalog thereby vastly improving the data catalog searchability and overall usability.
Modern computer systems store vast amounts of personal data, enabling advances in AI and ML but risking user privacy and trust. For privacy reasons, it is desired sometimes for an ML model to forget part of the data it was trained on. This paper presents a new approach to machine unlearning using forgetting neural networks (FNN). FNNs are neural networks with specific forgetting layers, that take inspiration from the processes involved when a human brain forgets. While FNNs had been proposed as a theoretical construct, they have not been previously used as a machine unlearning method. We describe four different types of forgetting layers and study their properties. In our experimental evaluation, we report our results on the MNIST handwritten digit recognition and fashion datasets. The effectiveness of the unlearned models was tested using Membership Inference Attacks (MIA). Successful experimental results demonstrate the great potential of our proposed method for dealing with the machine unlearning problem.
Traditional information extraction systems face challenges with text only language models as it does not consider infographics (visual elements of information) such as tables, charts, images etc. often used to convey complex information to readers. Multimodal LLM (MLLM) face challenges of finding needle in the haystack problem i.e., either longer context length or substantial number of documents as search space. Late interaction mechanism over visual language models has shown state of the art performance in retrieval-based vision augmented Q&A tasks. There are yet few challenges using it for RAG based multi-modal Q&A. Firstly, many popular and widely adopted vector databases do not support native multi-vector retrieval. Secondly, late interaction requires computation which inflates space footprint and can hinder enterprise adoption. Lastly, the current state of late interaction mechanism does not leverage the approximate neighbor search indexing methods for large speed ups in retrieval process. This paper explores a pragmatic approach to make vision retrieval process scalable and efficient without compromising on performance quality. We propose multi-step custom implementation utilizing widely adopted hybrid search (metadata & embedding) and state of the art late interaction re-ranker to retrieve best matching pages. Finally, MLLM are prompted as reader to generate answers from contextualized best matching pages. Through experiments, we observe that the proposed design is scalable (significant speed up) and stable (without degrading performance quality), hence can be used as production systems at enterprises.
Many existing end-to-end systems for hybrid question answering tasks can often be boiled down to a "prompt-and-pray" paradigm, where the user has limited control and insight into the intermediate reasoning steps used to achieve the final result. Additionally, due to the context size limitation of many transformer-based LLMs, it is often not reasonable to expect that the full structured and unstructured context will fit into a given prompt in a zero-shot setting, let alone a few-shot setting. We introduce BlendSQL, a superset of SQLite to act as a unified dialect for orchestrating reasoning across both unstructured and structured data. For hybrid question answering tasks involving multi-hop reasoning, we encode the full decomposed reasoning roadmap into a single interpretable BlendSQL query. Notably, we show that BlendSQL can scale to massive datasets and improve the performance of end-to-end systems while using 35% fewer tokens. Our code is available and installable as a package at this https URL
114
We suggest a simple practical method to combine the human and artificial intelligence to both learn best investment practices of fund managers, and provide recommendations to improve them. Our approach is based on a combination of Inverse Reinforcement Learning (IRL) and RL. First, the IRL component learns the intent of fund managers as suggested by their trading history, and recovers their implied reward function. At the second step, this reward function is used by a direct RL algorithm to optimize asset allocation decisions. We show that our method is able to improve over the performance of individual fund managers.
Financial dialogue transcripts pose a unique challenge for sentence-level information extraction due to their informal structure, domain-specific vocabulary, and variable intent density. We introduce Fin-ExBERT, a lightweight and modular framework for extracting user intent-relevant sentences from annotated financial service calls. Our approach builds on a domain-adapted BERT (Bidirectional Encoder Representations from Transformers) backbone enhanced with LoRA (Low-Rank Adaptation) adapters, enabling efficient fine-tuning using limited labeled data. We propose a two-stage training strategy with progressive unfreezing: initially training a classifier head while freezing the backbone, followed by gradual fine-tuning of the entire model with differential learning rates. To ensure robust extraction under uncertainty, we adopt a dynamic thresholding strategy based on probability curvature (elbow detection), avoiding fixed cutoff heuristics. Empirical results show strong precision and F1 performance on real-world transcripts, with interpretable output suitable for downstream auditing and question-answering workflows. The full framework supports batched evaluation, visualization, and calibrated export, offering a deployable solution for financial dialogue mining.
Twenty-seven years ago, E. Freuder highlighted that "Constraint programming represents one of the closest approaches computer science has yet made to the Holy Grail of programming: the user states the problem, the computer solves it". Nowadays, CP users have great modeling tools available (like Minizinc and CPMpy), allowing them to formulate the problem and then let a solver do the rest of the job, getting closer to the stated goal. However, this still requires the CP user to know the formalism and respect it. Another significant challenge lies in the expertise required to effectively model combinatorial problems. All this limits the wider adoption of CP. In this position paper, we investigate a possible approach to leverage pre-trained Large Language Models to extract models from textual problem descriptions. More specifically, we take inspiration from the Natural Language Processing for Optimization (NL4OPT) challenge and present early results with a decomposition-based prompting approach to GPT Models.
Global sustainable fund universe encompasses open-end funds and exchange-traded funds (ETF) that, by prospectus or other regulatory filings, claim to focus on Environment, Social and Governance (ESG). Challengingly, the claims can only be confirmed by examining the textual disclosures to check if there is presence of intentionality and ESG focus on its investment strategy. Currently, there is no regulation to enforce sustainability in ESG products space. This paper proposes a unique method and system to classify and score the fund prospectuses in the sustainable universe regarding specificity and transparency of language. We aim to employ few-shot learners to identify specific, ambiguous, and generic sustainable investment-related language. Additionally, we construct a ratio metric to determine language score and rating to rank products and quantify sustainability claims for US sustainable universe. As a by-product, we publish manually annotated quality training dataset on Hugging Face (ESG-Prospectus-Clarity-Category under cc-by-nc-sa-4.0) of more than 1K ESG textual statements. The performance of the few-shot finetuning approach is compared with zero-shot models e.g., Llama-13B, GPT 3.5 Turbo etc. We found that prompting large language models are not accurate for domain specific tasks due to misalignment issues. The few-shot finetuning techniques outperform zero-shot models by large margins of more than absolute ~30% in precision, recall and F1 metrics on completely unseen ESG languages (test set). Overall, the paper attempts to establish a systematic and scalable approach to measure and rate sustainability intention quantitatively for sustainable funds using texts in prospectus. Regulatory bodies, investors, and advisors may utilize the findings of this research to reduce cognitive load in investigating or screening of ESG funds which accurately reflects the ESG intention.
Prompting is a flexible and adaptable way of providing instructions to a Large Language Model (LLM). Contextual prompts include context in the form of a document or dialogue along with the natural language instructions to the LLM, often constraining the LLM to restrict facts to that of the given context while complying with the instructions. Masking the context, it acts as template for prompts. In this paper, we present an iterative method to generate better templates using an LLM from an existing set of prompt templates without revealing the context to the LLM. Multiple methods of optimising prompts using the LLM itself are explored to check the effect of few shot sampling methods on iterative propagation while maintaining linguistic styles and syntax on optimisation of prompt templates, yielding a 103.87% improvement using the best performing method. Comparison of the results of multiple contextual tasks demonstrate the ability of LLMs to maintain syntax while learning to replicate linguistic styles. Additionally, the effect on the output with different methods of prompt template generation is shown.
Large Language Models (LLMs) are prone to critical failure modes, including \textit{intrinsic faithfulness hallucinations} (also known as confabulations), where a response deviates semantically from the provided context. Frameworks designed to detect this, such as Semantic Divergence Metrics (SDM), rely on identifying latent topics shared between prompts and responses, typically by applying geometric clustering to their sentence embeddings. This creates a disconnect, as the topics are optimized for spatial proximity, not for the downstream information-theoretic analysis. In this paper, we bridge this gap by developing a principled topic identification method grounded in the Deterministic Information Bottleneck (DIB) for geometric clustering. Our key contribution is to transform the DIB method into a practical algorithm for high-dimensional data by substituting its intractable KL divergence term with a computationally efficient upper bound. The resulting method, which we dub UDIB, can be interpreted as an entropy-regularized and robustified version of K-means that inherently favors a parsimonious number of informative clusters. By applying UDIB to the joint clustering of LLM prompt and response embeddings, we generate a shared topic representation that is not merely spatially coherent but is fundamentally structured to be maximally informative about the prompt-response relationship. This provides a superior foundation for the SDM framework and offers a novel, more sensitive tool for detecting confabulations.
Lead recommendations for financial products such as funds or ETF is potentially challenging in investment space due to changing market scenarios, and difficulty in capturing financial holder's mindset and their philosophy. Current methods surface leads based on certain product categorization and attributes like returns, fees, category etc. to suggest similar product to investors which may not capture the holder's investment behavior holistically. Other reported works does subjective analysis of institutional holder's ideology. This paper proposes a comprehensive data driven framework for developing a lead recommendations system in holder's space for financial products like funds by using transactional history, asset flows and product specific attributes. The system assumes holder's interest implicitly by considering all investment transactions made and collects possible meta information to detect holder's investment profile/persona like investment anticipation and investment behavior. This paper focusses on holder recommendation component of framework which employs a bi-partite graph representation of financial holders and funds using variety of attributes and further employs GraphSage model for learning representations followed by link prediction model for ranking recommendation for future period. The performance of the proposed approach is compared with baseline model i.e., content-based filtering approach on metric hits at Top-k (50, 100, 200) recommendations. We found that the proposed graph ML solution outperform baseline by absolute 42%, 22% and 14% with a look ahead bias and by absolute 18%, 19% and 18% on completely unseen holders in terms of hit rate for top-k recommendations: 50, 100 and 200 respectively.
The Cameras for Allsky Meteor Surveillance (CAMS) project, funded by NASA starting in 2010, aims to map our meteor showers by triangulating meteor trajectories detected in low-light video cameras from multiple locations across 16 countries in both the northern and southern hemispheres. Its mission is to validate, discover, and predict the upcoming returns of meteor showers. Our research aimed to streamline the data processing by implementing an automated cloud-based AI-enabled pipeline and improve the data visualization to improve the rate of discoveries by involving the public in monitoring the meteor detections. This article describes the process of automating the data ingestion, processing, and insight generation using an interpretable Active Learning and AI pipeline. This work also describes the development of an interactive web portal (the NASA Meteor Shower portal) to facilitate the visualization of meteor radiant maps. To date, CAMS has discovered over 200 new meteor showers and has validated dozens of previously reported showers.
We present a model of price formation in an inelastic market whose dynamics are partially driven by both money flows and their impact on asset prices. The money flow to the market is viewed as an investment policy of outside investors. For the price impact effect, we use an impact function that incorporates the phenomena of market inelasticity and saturation from new money (the dumb  moneydumb \; money effect). Due to the dependence of market investors' flows on market performance, the model implies a feedback mechanism that gives rise to nonlinear dynamics. Consequently, the market price dynamics are seen as a nonlinear diffusion of a particle (the marketronmarketron) in a two-dimensional space formed by the log-price xx and a memory variable yy. The latter stores information about past money flows, so that the dynamics are non-Markovian in the log price xx alone, but Markovian in the pair (x,y)(x,y), bearing a strong resemblance to spiking neuron models in neuroscience. In addition to market flows, the model dynamics are partially driven by return predictors, modeled as unobservable Ornstein-Uhlenbeck processes. By using a new interpretation of predictive signals as selfself-propulsionpropulsion components of the price dynamics, we treat the marketron as an active particle, amenable to methods developed in the physics of active matter. We show that, depending on the choice of parameters, our model can produce a rich variety of interesting dynamic scenarios. In particular, it predicts three distinct regimes of the market, which we call the GoodGood, the BadBad, and the UglyUgly markets. The latter regime describes a scenario of a total market collapse or, alternatively, a corporate default event, depending on whether our model is applied to the whole market or an individual stock.
Personalized recommender systems play a crucial role in direct marketing, particularly in financial services, where delivering relevant content can enhance customer engagement and promote informed decision-making. This study explores interpretable knowledge graph (KG)-based recommender systems by proposing two distinct approaches for personalized article recommendations within a multinational financial services firm. The first approach leverages Reinforcement Learning (RL) to traverse a KG constructed from both structured (tabular) and unstructured (textual) data, enabling interpretability through Path Directed Reasoning (PDR). The second approach employs the XGBoost algorithm, with post-hoc explainability techniques such as SHAP and ELI5 to enhance transparency. By integrating machine learning with automatically generated KGs, our methods not only improve recommendation accuracy but also provide interpretable insights, facilitating more informed decision-making in customer relationship management.
Crowding is widely regarded as one of the most important risk factors in designing portfolio strategies. In this paper, we analyze stock crowding using network analysis of fund holdings, which is used to compute crowding scores for stocks. These scores are used to construct costless long-short portfolios, computed in a distribution-free (model-free) way and without using any numerical optimization, with desirable properties of hedge portfolios. More specifically, these long-short portfolios provide protection for both small and large market price fluctuations, due to their negative correlation with the market and positive convexity as a function of market returns. By adding our long-short portfolio to a baseline portfolio such as a traditional 60/40 portfolio, our method provides an alternative way to hedge portfolio risk including tail risk, which does not require costly option-based strategies or complex numerical optimization. The total cost of such hedging amounts to the total cost of rebalancing the hedge portfolio.
In this paper, we describe the different approaches explored by the Jetsons team for the Multi-Lingual ESG Impact Duration Inference (ML-ESG-3) shared task. The shared task focuses on predicting the duration and type of the ESG impact of a news article. The shared task dataset consists of 2,059 news titles and articles in English, French, Korean, and Japanese languages. For the impact duration classification task, we fine-tuned XLM-RoBERTa with a custom fine-tuning strategy and using self-training and DeBERTa-v3 using only English translations. These models individually ranked first on the leaderboard for Korean and Japanese and in an ensemble for the English language, respectively. For the impact type classification task, our XLM-RoBERTa model fine-tuned using a custom fine-tuning strategy ranked first for the English language.
Over the years, there has been a paradigm shift in how users access financial services. With the advancement of digitalization more users have been preferring the online mode of performing financial activities. This has led to the generation of a huge volume of financial content. Most investors prefer to go through these contents before making decisions. Every industry has terms that are specific to the domain it operates in. Banking and Financial Services are not an exception to this. In order to fully comprehend these contents, one needs to have a thorough understanding of the financial terms. Getting a basic idea about a term becomes easy when it is explained with the help of the broad category to which it belongs. This broad category is referred to as hypernym. For example, "bond" is a hypernym of the financial term "alternative debenture". In this paper, we propose a system capable of extracting and ranking hypernyms for a given financial term. The system has been trained with financial text corpora obtained from various sources like DBpedia [4], Investopedia, Financial Industry Business Ontology (FIBO), prospectus and so on. Embeddings of these terms have been extracted using FinBERT [3], FinISH [1] and fine-tuned using SentenceBERT [54]. A novel approach has been used to augment the training set with negative samples. It uses the hierarchy present in FIBO. Finally, we benchmark the system performance with that of the existing ones. We establish that it performs better than the existing ones and is also scalable.
There are no more papers matching your filters at the moment.