alphaXiv

566

24 Oct 2022

computer-science artificial-intelligence computation-and-language

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

UC Berkeley

Microsoft

Columbia University

Allen Institute for AI National Univ. of Singapore TCS Research Tehran Polytechnic Univ. of Washington Stanford Univ.Factored AI Arizona State Univ.Univ of Amsterdam Sharif Univ. of Tech.PSG College of Tech.Indian Institute of Tech.Government Polytechnic College Zycus Infotech National Institute of Tech. Karnataka Univ. of Massachusetts, Amherst

Mirali Purohit

Researchers at the Allen Institute for AI and the University of Washington created SUPER-NATURALINSTRUCTIONS, a public meta-dataset of 1,616 diverse NLP tasks, to advance instruction-following capabilities. Their Tk-INSTRUCT model, trained on this benchmark, outperformed the 175B-parameter InstructGPT by 9.9 ROUGE-L points on unseen English tasks and by 13.3 ROUGE-L points on unseen non-English tasks.

5

12 Jan 2022

computer-science digital-libraries

A simple model for citation curve

Institute for Research in Fundamental Sciences (IPM)National Univ. of Singapore Sharif Univ. of Technology

There is considerable interest in the citation count for an author's publications. This has led to many proposals for citation indices for characterizing citation distributions. However, there is so far no tractable model to facilitate the analysis of these distributions and the design of these indices. This paper presents a simple equation for such design and analysis. The equation has three parameters that are calibrated by three geometrical characteristics of a citation distribution. Its simple form makes it tractable. To demonstrate, the equation is used to derive closed-form expressions for various citation indices, analyze the effect of time and identify individual contribution to the Hirsch index for a group.

17

01 Apr 2025

computer-science databases machine-learning

FeatInsight: An Online ML Feature Management System on 4Paradigm Sage-Studio Platform

Tsinghua University 4Paradigm Inc.Shanghai Jiao Tong Univ.National Univ. of Singapore

Feature management is essential for many online machine learning applications and can often become the performance bottleneck (e.g., taking up to 70% of the overall latency in sales prediction service). Improper feature configurations (e.g., introducing too many irrelevant features) can severely undermine the model's generalization capabilities. However, managing online ML features is challenging due to (1) large-scale, complex raw data (e.g., the 2018 PHM dataset contains 17 tables and dozens to hundreds of columns), (2) the need for high-performance, consistent computation of interdependent features with complex patterns, and (3) the requirement for rapid updates and deployments to accommodate real-time data changes. In this demo, we present FeatInsight, a system that supports the entire feature lifecycle, including feature design, storage, visualization, computation, verification, and lineage management. FeatInsight (with OpenMLDB as the execution engine) has been deployed in over 100 real-world scenarios on 4Paradigm's Sage Studio platform, handling up to a trillion-dimensional feature space and enabling millisecond-level feature updates. We demonstrate how FeatInsight enhances feature design efficiency (e.g., for online product recommendation) and improve feature computation performance (e.g., for online fraud detection). The code is available at this https URL

464

15 Jan 2025

computer-science artificial-intelligence databases

OpenMLDB: A Real-Time Relational Data Feature Computation System for Online ML

Tsinghua University 4Paradigm Inc.SF Express Inc.Shanghai Jiao Tong Univ.National Univ. of Singapore

OpenMLDB offers a unified system for consistent, high-performance feature computation across both offline training and online serving environments. This approach addresses the common problem of offline-online inconsistency by generating consistent execution plans from a single SQL-based feature script, demonstrating significantly improved throughput and reduced latency compared to existing systems.

1,668

13

21 Jan 2025

computer-science information-theory

Hierarchy of Pseudo-Random Array Codes

National Univ. of Singapore Technion Israel Institute of Technology

Pseudo-random arrays are the two-dimensional analog of M-sequences. Pseudo-random array codes are the two-dimensional analog of sequences generated by a product of irreducible polynomials with the same exponent. The union of the arrays in such a code has the window property and the shift-and-add property, implying that these codes are linear. The folding technique is the most basic one for forming such arrays and codes. A new criterion for generating pseudo-random arrays based on folding is given. This new criterion yields pseudo-random arrays with new parameters. A general construction for such array codes is given. It appears that the arrays generated in this construction can be constructed by folding the nonzero sequences generated by a product of irreducible polynomials of the same degree and the same exponent. Two hierarchies of the pseudo-random array codes are provided. In one hierarchy codewords of one code with smaller windows are contained in codewords of another code which stands above him in the hierarchy. The second hierarchy is a partition of the pseudo-random array codes generated by folding into classes based on the polynomial types which participate in their construction.

01 Apr 2025

computer-science databases machine-learning

FeatInsight: An Online ML Feature Management System on 4Paradigm Sage-Studio Platform

National University of Singapore

Shanghai Jiao Tong University

Tsinghua University 4Paradigm Inc.Shanghai Jiao Tong Univ.National Univ. of Singapore

Feature management is essential for many online machine learning applications and can often become the performance bottleneck (e.g., taking up to 70% of the overall latency in sales prediction service). Improper feature configurations (e.g., introducing too many irrelevant features) can severely undermine the model's generalization capabilities. However, managing online ML features is challenging due to (1) large-scale, complex raw data (e.g., the 2018 PHM dataset contains 17 tables and dozens to hundreds of columns), (2) the need for high-performance, consistent computation of interdependent features with complex patterns, and (3) the requirement for rapid updates and deployments to accommodate real-time data changes. In this demo, we present FeatInsight, a system that supports the entire feature lifecycle, including feature design, storage, visualization, computation, verification, and lineage management. FeatInsight (with OpenMLDB as the execution engine) has been deployed in over 100 real-world scenarios on 4Paradigm's Sage Studio platform, handling up to a trillion-dimensional feature space and enabling millisecond-level feature updates. We demonstrate how FeatInsight enhances feature design efficiency (e.g., for online product recommendation) and improve feature computation performance (e.g., for online fraud detection). The code is available at this https URL.

20

alphaXiv

Explore

State of the Art

Sign In

Labs

Feedback

Dark mode

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

A simple model for citation curve

FeatInsight: An Online ML Feature Management System on 4Paradigm Sage-Studio Platform

OpenMLDB: A Real-Time Relational Data Feature Computation System for Online ML

Hierarchy of Pseudo-Random Array Codes

FeatInsight: An Online ML Feature Management System on 4Paradigm Sage-Studio Platform

Events

AI for Law

Personalize Your Feed