National Univ. of Singapore
Researchers at the Allen Institute for AI and the University of Washington created SUPER-NATURALINSTRUCTIONS, a public meta-dataset of 1,616 diverse NLP tasks, to advance instruction-following capabilities. Their Tk-INSTRUCT model, trained on this benchmark, outperformed the 175B-parameter InstructGPT by 9.9 ROUGE-L points on unseen English tasks and by 13.3 ROUGE-L points on unseen non-English tasks.
There is considerable interest in the citation count for an author's publications. This has led to many proposals for citation indices for characterizing citation distributions. However, there is so far no tractable model to facilitate the analysis of these distributions and the design of these indices. This paper presents a simple equation for such design and analysis. The equation has three parameters that are calibrated by three geometrical characteristics of a citation distribution. Its simple form makes it tractable. To demonstrate, the equation is used to derive closed-form expressions for various citation indices, analyze the effect of time and identify individual contribution to the Hirsch index for a group.
Feature management is essential for many online machine learning applications and can often become the performance bottleneck (e.g., taking up to 70% of the overall latency in sales prediction service). Improper feature configurations (e.g., introducing too many irrelevant features) can severely undermine the model's generalization capabilities. However, managing online ML features is challenging due to (1) large-scale, complex raw data (e.g., the 2018 PHM dataset contains 17 tables and dozens to hundreds of columns), (2) the need for high-performance, consistent computation of interdependent features with complex patterns, and (3) the requirement for rapid updates and deployments to accommodate real-time data changes. In this demo, we present FeatInsight, a system that supports the entire feature lifecycle, including feature design, storage, visualization, computation, verification, and lineage management. FeatInsight (with OpenMLDB as the execution engine) has been deployed in over 100 real-world scenarios on 4Paradigm's Sage Studio platform, handling up to a trillion-dimensional feature space and enabling millisecond-level feature updates. We demonstrate how FeatInsight enhances feature design efficiency (e.g., for online product recommendation) and improve feature computation performance (e.g., for online fraud detection). The code is available at this https URL
OpenMLDB offers a unified system for consistent, high-performance feature computation across both offline training and online serving environments. This approach addresses the common problem of offline-online inconsistency by generating consistent execution plans from a single SQL-based feature script, demonstrating significantly improved throughput and reduced latency compared to existing systems.
1,668
Pseudo-random arrays are the two-dimensional analog of M-sequences. Pseudo-random array codes are the two-dimensional analog of sequences generated by a product of irreducible polynomials with the same exponent. The union of the arrays in such a code has the window property and the shift-and-add property, implying that these codes are linear. The folding technique is the most basic one for forming such arrays and codes. A new criterion for generating pseudo-random arrays based on folding is given. This new criterion yields pseudo-random arrays with new parameters. A general construction for such array codes is given. It appears that the arrays generated in this construction can be constructed by folding the nonzero sequences generated by a product of irreducible polynomials of the same degree and the same exponent. Two hierarchies of the pseudo-random array codes are provided. In one hierarchy codewords of one code with smaller windows are contained in codewords of another code which stands above him in the hierarchy. The second hierarchy is a partition of the pseudo-random array codes generated by folding into classes based on the polynomial types which participate in their construction.
Feature management is essential for many online machine learning applications and can often become the performance bottleneck (e.g., taking up to 70% of the overall latency in sales prediction service). Improper feature configurations (e.g., introducing too many irrelevant features) can severely undermine the model's generalization capabilities. However, managing online ML features is challenging due to (1) large-scale, complex raw data (e.g., the 2018 PHM dataset contains 17 tables and dozens to hundreds of columns), (2) the need for high-performance, consistent computation of interdependent features with complex patterns, and (3) the requirement for rapid updates and deployments to accommodate real-time data changes. In this demo, we present FeatInsight, a system that supports the entire feature lifecycle, including feature design, storage, visualization, computation, verification, and lineage management. FeatInsight (with OpenMLDB as the execution engine) has been deployed in over 100 real-world scenarios on 4Paradigm's Sage Studio platform, handling up to a trillion-dimensional feature space and enabling millisecond-level feature updates. We demonstrate how FeatInsight enhances feature design efficiency (e.g., for online product recommendation) and improve feature computation performance (e.g., for online fraud detection). The code is available at this https URL.
20
There are no more papers matching your filters at the moment.