Decision trees are well-known due to their ease of interpretability. To improve accuracy, we need to grow deep trees or ensembles of trees. These are hard to interpret, offsetting their original benefits. Shapley values have recently become a popular way to explain the predictions of tree-based machine learning models. It provides a linear weighting to features independent of the tree structure. The rise in popularity is mainly due to TreeShap, which solves a general exponential complexity problem in polynomial time. Following extensive adoption in the industry, more efficient algorithms are required. This paper presents a more efficient and straightforward algorithm: Linear TreeShap. Like TreeShap, Linear TreeShap is exact and requires the same amount of memory.
Accurate estimation of treatment effects in online A/B testing is challenging with zero-inflated and skewed metrics. Traditional tests, like Welch's t-test, often lack sensitivity with heavy-tailed data due to their reliance on means, as opposed to e.g., percentiles. The Controlled Experiments Using Pre-experiment Data (CUPED) technique improves sensitivity by reducing variance, yet that variance reduction is insufficient for highly skewed metrics. Alternatively, Yuen's t-test uses trimmed means to robustly handle outliers and skewness. This paper introduces a method that combines the variance reduction of CUPED with the robustness of Yuen's t-test to enhance hypothesis testing sensitivity. Our novel approach integrates trimmed data in a principled manner, offering a framework that balances variance reduction with robust location measures. We demonstrate improved detection of significant effects with smaller sample sizes, enabling quicker experimental decisions without sacrificing statistical power. This work broadens the utility of controlled experiments in environments characterized by highly skewed or high-variance data.
Many photography websites such as Flickr, 500px, Unsplash, and Adobe Behance are used by amateur and professional photography enthusiasts. Unlike content-based image search, such users of photography websites are not just looking for photos with certain content, but more generally for photos with a certain photographic "aesthetic". In this context, we explore personalized photo recommendation and propose two aesthetic feature extraction methods based on (i) color space and (ii) deep style transfer embeddings. Using a dataset from 500px, we evaluate how these features can be best leveraged by collaborative filtering methods and show that (ii) provides a significant boost in photo recommendation performance.
2
The third version of the Hypertext Transfer Protocol (HTTP) is currently in its final standardization phase by the IETF. Besides better security and increased flexibility, it promises benefits in terms of performance. HTTP/3 adopts a more efficient header compression schema and replaces TCP with QUIC, a transport protocol carried over UDP, originally proposed by Google and currently under standardization too. Although HTTP/3 early implementations already exist and some websites announce its support, it has been subject to few studies. In this work, we provide a first measurement study on HTTP/3. We testify how, during 2020, it has been adopted by some of the leading Internet companies such as Google, Facebook and Cloudflare. We run a large-scale measurement campaign toward thousands of websites adopting HTTP/3, aiming at understanding to what extent it achieves better performance than HTTP/2. We find that adopting websites often host most web page objects on third-party servers, which support only HTTP/2 or even HTTP/1.1. Our experiments show that HTTP/3 provides sizable benefits only in scenarios with high latency or very poor bandwidth. Despite the adoption of QUIC, we do not find benefits in case of high packet loss, but we observe large diversity across website providers' infrastructures.
We explore the dynamics of a one-dimensional lattice of state machines on two states and two symbols sequentially updated via a process of "reflexive composition." The space of 256 machines exhibits a variety of behavior, including substitution, reversible "billiard ball" dynamics, and fractal nesting. We show that one machine generates the Sierpinski Triangle and, for a subset of boundary conditions, is isomorphic to cellular automata Rule 90 in Wolfram's naming scheme. More surprisingly, two other machines follow trajectories that map to Rule 90 in reverse. Whereas previous techniques have been developed to uncover preimages of Rule 90, this is the first study to produce such inverse dynamics naturally from the formalism itself. We argue that the system's symmetric treatment of state and message underlies its expressive power.
There are no more papers matching your filters at the moment.