IBM Research – Ireland
Question answering (QA) is a fundamental means to facilitate assessment and training of narrative comprehension skills for both machines and young children, yet there is scarcity of high-quality QA datasets carefully designed to serve this purpose. In particular, existing datasets rarely distinguish fine-grained reading skills, such as the understanding of varying narrative elements. Drawing on the reading education research, we introduce FairytaleQA, a dataset focusing on narrative comprehension of kindergarten to eighth-grade students. Generated by educational experts based on an evidence-based theoretical framework, FairytaleQA consists of 10,580 explicit and implicit questions derived from 278 children-friendly stories, covering seven types of narrative elements or relations. Our dataset is valuable in two folds: First, we ran existing QA models on our dataset and confirmed that this annotation helps assess models' fine-grained learning skills. Second, the dataset supports question generation (QG) task in the education domain. Through benchmarking with QG models, we show that the QG model trained on FairytaleQA is capable of asking high-quality and more diverse questions.
44
Explainability of Reinforcement Learning (RL) policies remains a challenging research problem, particularly when considering RL in a safety context. Understanding the decisions and intentions of an RL policy offer avenues to incorporate safety into the policy by limiting undesirable actions. We propose the use of a Boolean Decision Rules model to create a post-hoc rule-based summary of an agent's policy. We evaluate our proposed approach using a DQN agent trained on an implementation of a lava gridworld and show that, given a hand-crafted feature representation of this gridworld, simple generalised rules can be created, giving a post-hoc explainable summary of the agent's policy. We discuss possible avenues to introduce safety into a RL agent's policy by using rules generated by this rule-based model as constraints imposed on the agent's policy, as well as discuss how creating simple rule summaries of an agent's policy may help in the debugging process of RL agents.
Adversarial training shows promise as an approach for training models that are robust towards adversarial perturbation. In this paper, we explore some of the practical challenges of adversarial training. We present a sensitivity analysis that illustrates that the effectiveness of adversarial training hinges on the settings of a few salient hyperparameters. We show that the robustness surface that emerges across these salient parameters can be surprisingly complex and that therefore no effective one-size-fits-all parameter settings exist. We then demonstrate that we can use the same salient hyperparameters as tuning knob to navigate the tension that can arise between robustness and accuracy. Based on these findings, we present a practical approach that leverages hyperparameter optimization techniques for tuning adversarial training to maximize robustness while keeping the loss in accuracy within a defined budget.
We demonstrate Castor, a cloud-based system for contextual IoT time series data and model management at scale. Castor is designed to assist Data Scientists in (a) exploring and retrieving all relevant time series and contextual information that is required for their predictive modelling tasks; (b) seamlessly storing and deploying their predictive models in a cloud production environment; (c) monitoring the performance of all predictive models in production and (semi-)automatically retraining them in case of performance deterioration. The main features of Castor are: (1) an efficient pipeline for ingesting IoT time series data in real time; (2) a scalable, hybrid data management service for both time series and contextual data; (3) a versatile semantic model for contextual information which can be easily adopted to different application domains; (4) an abstract framework for developing and storing predictive models in R or Python; (5) deployment services which automatically train and/or score predictive models upon user-defined conditions. We demonstrate Castor for a real-world Smart Grid use case and discuss how it can be adopted to other application domains such as Smart Buildings, Telecommunication, Retail or Manufacturing.
Across numerous applications, forecasting relies on numerical solvers for partial differential equations (PDEs). Although the use of deep-learning techniques has been proposed, actual applications have been restricted by the fact the training data are obtained using traditional PDE solvers. Thereby, the uses of deep-learning techniques were limited to domains, where the PDE solver was applicable. We demonstrate a deep-learning framework for air-pollution monitoring and forecasting that provides the ability to train across different model domains, as well as a reduction in the run-time by two orders of magnitude. It presents a first-of-a-kind implementation that combines deep-learning and domain-decomposition techniques to allow model deployments extend beyond the domain(s) on which the it has been trained.
This paper introduces a dual-regularized ADMM approach to distributed, time-varying optimization. The proposed algorithm is designed in a prediction-correction framework, in which the computing nodes predict the future local costs based on past observations, and exploit this information to solve the time-varying problem more effectively. In order to guarantee linear convergence of the algorithm, a regularization is applied to the dual, yielding a dual-regularized ADMM. We analyze the convergence properties of the time-varying algorithm, as well as the regularization error of the dual-regularized ADMM. Numerical results show that in time-varying settings, despite the regularization error, the performance of the dual-regularized ADMM can outperform inexact gradient-based methods, as well as exact dual decomposition techniques, in terms of asymptotical error and consensus constraint violation.
There has been much recent interest in hierarchies of progressively stronger convexifications of polynomial optimisation problems (POP). These often converge to the global optimum of the POP, asymptotically, but prove challenging to solve beyond the first level in the hierarchy for modest instances. We present a finer-grained variant of the Lasserre hierarchy, together with first-order methods for solving the convexifications, which allow for efficient warm-starting with solutions from lower levels in the hierarchy.
Most current models of word representations(e.g.,GloVe) have successfully captured fine-grained semantics. However, semantic similarity exhibited in these word embeddings is not suitable for resolving bridging anaphora, which requires the knowledge of associative similarity (i.e., relatedness) instead of semantic similarity information between synonyms or hypernyms. We create word embeddings (embeddings_PP) to capture such relatedness by exploring the syntactic structure of noun phrases. We demonstrate that using embeddings_PP alone achieves around 30% of accuracy for bridging anaphora resolution on the ISNotes corpus. Furthermore, we achieve a substantial gain over the state-of-the-art system (Hou et al., 2013) for bridging antecedent selection.
Previous work on bridging anaphora recognition (Hou et al., 2013a) casts the problem as a subtask of learning fine-grained information status (IS). However, these systems heavily depend on many hand-crafted linguistic features. In this paper, we propose a discourse context-aware self-attention neural network model for fine-grained IS classification. On the ISNotes corpus (Markert et al., 2012), our model with the contextually-encoded word representations (BERT) (Devlin et al., 2018) achieves new state-of-the-art performances on fine-grained IS classification, obtaining a 4.1% absolute overall accuracy improvement compared to Hou et al. (2013a). More importantly, we also show an improvement of 3.9% F1 for bridging anaphora recognition without using any complex hand-crafted semantic features designed for capturing the bridging phenomenon.
Solving combinatorial optimization problems on current noisy quantum devices is currently being advocated for (and restricted to) binary polynomial optimization with equality constraints via quantum heuristic approaches. This is achieved using, e.g., the variational quantum eigensolver (VQE) and the quantum approximate optimization algorithm (QAOA). We present a decomposition-based approach to extend the applicability of current approaches to "quadratic plus convex" mixed binary optimization (MBO) problems, so as to solve a broad class of real-world optimization problems. In the MBO framework, we show that the alternating direction method of multipliers (ADMM) can split the MBO into a binary unconstrained problem (that can be solved with quantum algorithms), and continuous constrained convex subproblems (that can be solved cheaply with classical optimization solvers). The validity of the approach is then showcased by numerical results obtained on several optimization problems via simulations with VQE and QAOA on the quantum circuits implemented in Qiskit, an open-source quantum computing software development framework.
This study investigated an approach to improve the accuracy of computationally lightweight surrogate models by updating forecasts based on historical accuracy relative to sparse observation data. Using a lightweight, ocean-wave forecasting model, we created a large number of model ensembles, with perturbed inputs, for a two-year study period. Forecasts were aggregated using a machine-learning algorithm that combined forecasts from multiple, independent models into a single "best-estimate" prediction of the true state. The framework was applied to a case-study site in Monterey Bay, California. A~learning-aggregation technique used historical observations and model forecasts to calculate a weight for each ensemble member. Weighted ensemble predictions were compared to measured wave conditions to evaluate performance against present state-of-the-art. Finally, we discussed how this framework, which integrates ensemble aggregations and surrogate models, can be used to improve forecasting systems and further enable scientific process studies.
Convex regression (CR) is the problem of fitting a convex function to a finite number of noisy observations of an underlying convex function. CR is important in many domains and one of its workhorses is the non-parametric least square estimator (LSE). Currently, LSE delivers only non-smooth non-strongly convex function estimates. In this paper, leveraging recent results in convex interpolation, we generalize LSE to smooth strongly convex regression problems. The resulting algorithm relies on a convex quadratically constrained quadratic program. We also propose a parallel implementation, which leverages ADMM, that lessens the overall computational complexity to a tight O(n2)O(n^2) for nn observations. Numerical results support our findings.
The explosion in volume and variety of data offers enormous potential for research and commercial use. Increased availability of personal data is of particular interest in enabling highly customised services tuned to individual needs. Preserving the privacy of individuals against reidentification attacks in this fast-moving ecosystem poses significant challenges for a one-size fits all approach to anonymisation. In this paper we present (kk,ϵ\epsilon)-anonymisation, an approach that combines the kk-anonymisation and ϵ\epsilon-differential privacy models into a single coherent framework, providing privacy guarantees at least as strong as those offered by the individual models. Linking risks of less than 5\% are observed in experimental results, even with modest values of kk and ϵ\epsilon. Our approach is shown to address well-known limitations of kk-anonymity and ϵ\epsilon-differential privacy and is validated in an extensive experimental campaign using openly available datasets.
Bioinformatics pipelines depend on shared POSIX filesystems for its input, output and intermediate data storage. Containerization makes it more difficult for the workloads to access the shared file systems. In our previous study, we were able to run both ML and non-ML pipelines on Kubeflow successfully. However, the storage solutions were complex and less optimal. This is because there are no established resource types to represent the concept of data source on Kubernetes. More and more applications are running on Kubernetes for batch processing. End users are burdened with configuring and optimising the data access, which is what we have experienced before. In this article, we are introducing a new concept of Dataset and its corresponding resource as a native Kubernetes object. We have leveraged the Dataset Lifecycle Framework which takes care of all the low-level details about data access in Kubernetes pods. Its pluggable architecture is designed for the development of caching, scheduling and governance plugins. Together, they manage the entire lifecycle of the custom resource Dataset. We use Dataset Lifecycle Framework to serve data from object stores to both ML and non-ML pipelines running on Kubeflow. With DLF, we make training data fed into ML models directly without being downloaded to the local disks, which makes the input scalable. We have enhanced the durability of training metadata by storing it into a dataset, which also simplifies the set up of the Tensorboard, separated from the notebook server. For the non-ML pipeline, we have simplified the 1000 Genome Project pipeline with datasets injected into the pipeline dynamically. In addition, our preliminary results indicate that the pluggable caching mechanism can improve the performance significantly.
This paper surveys the machine learning literature and presents in an optimization framework several commonly used machine learning approaches. Particularly, mathematical optimization models are presented for regression, classification, clustering, deep learning, and adversarial learning, as well as new emerging applications in machine teaching, empirical model learning, and Bayesian network structure learning. Such models can benefit from the advancement of numerical optimization techniques which have already played a distinctive role in several machine learning settings. The strengths and the shortcomings of these models are discussed and potential research directions and open problems are highlighted.
14
In this paper, we propose a novel, computational efficient, dynamic ridesharing algorithm. The beneficial computational properties of the algorithm arise from casting the ridesharing problem as a linear assignment problem between fleet vehicles and customer trip requests within a federated optimization architecture. The resulting algorithm is up to four times faster than the state-of-the-art, even if it is implemented on a less dedicated hardware, and achieves similar service quality. Current literature showcases the ability of state-of-the-art ridesharing algorithms to tackle very large fleets and customer requests in almost near real-time, but the benefits of ridesharing seem limited to centralized systems. Our algorithm suggests that this does not need to be the case. The algorithm that we propose is fully distributable among multiple ridesharing companies. By leveraging two datasets, the New York city taxi dataset and the Melbourne Metropolitan Area dataset, we show that with our algorithm, real-time ridesharing offers clear benefits with respect to more traditional taxi fleets in terms of level of service, even if one considers partial adoption of the system. In fact, e.g., the quality of the solutions obtained in the state-of-the-art works that tackle the whole customer set of the New York city taxi dataset is achieved, even if one considers only a proportion of the fleet size and customer requests. This could make real-time urban-scale ridesharing very attractive to small enterprises and city authorities alike. However, in some cases, e.g., in multi-company scenarios where companies have predefined market shares, we show that the number of vehicles needed to achieve a comparable performance to the monopolistic setting increases, and this raises concerns on the possible negative effects of multi-company ridesharing.
This paper considers an online proximal-gradient method to track the minimizers of a composite convex function that may continuously evolve over time. The online proximal-gradient method is inexact, in the sense that: (i) it relies on an approximate first-order information of the smooth component of the cost; and, (ii) the proximal operator (with respect to the non-smooth term) may be computed only up to a certain precision. Under suitable assumptions, convergence of the error iterates is established for strongly convex cost functions. On the other hand, the dynamic regret is investigated when the cost is not strongly convex, under the additional assumption that the problem includes feasibility sets that are compact. Bounds are expressed in terms of the cumulative error and the path length of the optimal solutions. This suggests how to allocate resources to strike a balance between performance and precision in the gradient computation and in the proximal operator.
Understanding prospective clients becomes increasingly important as companies aim to enlarge their market bases. Traditional approaches typically treat each client in isolation, either studying its interactions or similarities with existing clients. We propose the Client Network, which considers the entire client ecosystem to predict the success of sale pitches for targeted clients by complex network analysis. It combines a novel ranking algorithm with data visualization and navigation. Based on historical interaction data between companies and clients, the Client Network leverages organizational connectivity to locate the optimal paths to prospective clients. The user interface supports exploring the client ecosystem and performing sales-essential tasks. Our experiments and user interviews demonstrate the effectiveness of the Client Network and its success in supporting sellers' day-to-day tasks.
Graph filters play a key role in processing the graph spectra of signals supported on the vertices of a graph. However, despite their widespread use, graph filters have been analyzed only in the deterministic setting, ignoring the impact of stochastic- ity in both the graph topology as well as the signal itself. To bridge this gap, we examine the statistical behavior of the two key filter types, finite impulse response (FIR) and autoregressive moving average (ARMA) graph filters, when operating on random time- varying graph signals (or random graph processes) over random time-varying graphs. Our analysis shows that (i) in expectation, the filters behave as the same deterministic filters operating on a deterministic graph, being the expected graph, having as input signal a deterministic signal, being the expected signal, and (ii) there are meaningful upper bounds for the variance of the filter output. We conclude the paper by proposing two novel ways of exploiting randomness to improve (joint graph-time) noise cancellation, as well as to reduce the computational complexity of graph filtering. As demonstrated by numerical results, these methods outperform the disjoint average and denoise algorithm, and yield a (up to) four times complexity redution, with very little difference from the optimal solution.
AMR parsing has experienced an unprecendented increase in performance in the last three years, due to a mixture of effects including architecture improvements and transfer learning. Self-learning techniques have also played a role in pushing performance forward. However, for most recent high performant parsers, the effect of self-learning and silver data augmentation seems to be fading. In this paper we propose to overcome this diminishing returns of silver data by combining Smatch-based ensembling techniques with ensemble distillation. In an extensive experimental setup, we push single model English parser performance to a new state-of-the-art, 85.9 (AMR2.0) and 84.3 (AMR3.0), and return to substantial gains from silver data augmentation. We also attain a new state-of-the-art for cross-lingual AMR parsing for Chinese, German, Italian and Spanish. Finally we explore the impact of the proposed technique on domain adaptation, and show that it can produce gains rivaling those of human annotated data for QALD-9 and achieve a new state-of-the-art for BioAMR.
There are no more papers matching your filters at the moment.