Boğaziçi University
Iconicity, the resemblance between linguistic form and meaning, is pervasive in signed languages, offering a natural testbed for visual grounding. For vision-language models (VLMs), the challenge is to recover such essential mappings from dynamic human motion rather than static context. We introduce the Visual Iconicity Challenge, a novel video-based benchmark that adapts psycholinguistic measures to evaluate VLMs on three tasks: (i) phonological sign-form prediction (e.g., handshape, location), (ii) transparency (inferring meaning from visual form), and (iii) graded iconicity ratings. We assess 13 state-of-the-art VLMs in zero- and few-shot settings on Sign Language of the Netherlands and compare them to human baselines. On phonological form prediction, VLMs recover some handshape and location detail but remain below human performance; on transparency, they are far from human baselines; and only top models correlate moderately with human iconicity ratings. Interestingly, models with stronger phonological form prediction correlate better with human iconicity judgment, indicating shared sensitivity to visually grounded structure. Our findings validate these diagnostic tasks and motivate human-centric signals and embodied learning methods for modelling iconicity and improving visual grounding in multimodal models.
We present an integrated machine learning framework that transforms how manufacturing cost is estimated from 2D engineering drawings. Unlike traditional quotation workflows that require labor-intensive process planning, our approach about 200 geometric and statistical descriptors directly from 13,684 DWG drawings of automotive suspension and steering parts spanning 24 product groups. Gradient-boosted decision tree models (XGBoost, CatBoost, LightGBM) trained on these features achieve nearly 10% mean absolute percentage error across groups, demonstrating robust scalability beyond part-specific heuristics. By coupling cost prediction with explainability tools such as SHAP, the framework identifies geometric design drivers including rotated dimension maxima, arc statistics and divergence metrics, offering actionable insights for cost-aware design. This end-to-end CAD-to-cost pipeline shortens quotation lead times, ensures consistent and transparent cost assessments across part families and provides a deployable pathway toward real-time, ERP-integrated decision support in Industry 4.0 manufacturing environments.
We apply localization techniques to topologically AA-twisted N=(2,2)\mathcal{N}=(2,2) supersymmetric theories of vector and chiral multiplets on S2S^{2} and derive a novel exact formula for abelian observables, described by a distribution integrated along the real line. The distributional integral formula is verified by evaluating the correlator of the AA-twisted CPN1\mathbb{CP}^{N-1} gauged linear sigma model and confirming the standard selection rule. Finally, we use hyperfunctions to demonstrate the equivalence between the distributional and complex contour integral descriptions of the CPN1\mathbb{CP}^{N-1} correlator, and find agreement with the Jeffrey-Kirwan residue prescription.
Large language models have drastically changed the prospects of AI by introducing technologies for more complex natural language processing. However, current methodologies to train such LLMs require extensive resources including but not limited to large amounts of data, expensive machinery, and lengthy training. To solve this problem, this paper proposes a new tokenization method inspired by universal Lempel-Ziv-Welch data compression that compresses repetitive phrases into multi-word tokens. With MultiTok as a new tokenizing tool, we show that language models are able to be trained notably more efficiently while offering a similar accuracy on more succinct and compressed training data. In fact, our results demonstrate that MultiTok achieves a comparable performance to the BERT and GPT-2 standards as both a stand-alone tokenizer and an add-on to existing tokenizers while also providing close to 2.5x faster training with more than 30% less training data.
5
Keyword Spotting plays a critical role in enabling hands-free interaction for battery-powered edge devices. Few-Shot Keyword Spotting (FS-KWS) addresses the scalability and adaptability challenges of traditional systems by enabling recognition of custom keywords with only a few examples. However, existing FS-KWS systems achieve subpar accuracy at desirable false acceptance rates, particularly in resource-constrained edge environments. To address these issues, we propose a training scheme that leverages self-supervised learning models for robust feature extraction, dimensionality reduction, and knowledge distillation. The teacher model, based on Wav2Vec 2.0 is trained using Sub-center ArcFace loss, which enhances inter-class separability and intra-class compactness. To enable efficient deployment on edge devices, we introduce attention-based dimensionality reduction and train a standard lightweight ResNet15 student model. We evaluate the proposed approach on the English portion of the Multilingual Spoken Words Corpus (MSWC) and the Google Speech Commands (GSC) datasets. Notably, the proposed training method improves the 10-shot classification accuracy from 33.4% to 74.1% on 11 classes at 1% false alarm accuracy on the GSC dataset, thus making it significantly better-suited for a real use case scenario.
We identify raising and lowering operators of the de Sitter algebra with focus on their action on states in particular in 4 spacetime dimensions. There isn't a unique solution to the question of how the de Sitter ladder operators act on states. By fixing the action of certain generators one can conclude on the action of the rest. Our main aim is to be able to identify highest and lowest weight states such that we can recognize them for quantized fields on a rigid de Sitter background.
We present a search for solar axions produced through the axion-electron coupling (gae)(g_{ae}) using data from a novel 7-GridPix detector installed at the CERN Axion Solar Telescope (CAST). The detector, featuring ultra-thin silicon nitride windows and multiple veto systems, collected approximately 160 hours of solar tracking data between 2017-2018. Using machine learning techniques and the veto systems, we achieved a background rate of 1.06×105keV1cm2s11.06\times 10^{-5}\,\text{keV}^{-1}\text{cm}^{-2}\text{s}^{-1} at a signal efficiency of about 80%80\,\% in the 0.20.2-8keV8\,\text{keV} range. Analysis of the data yielded no significant excess above background, allowing us to set a new upper limit on the product of the axion-electron and axion-photon couplings of g_{ae}\cdot g_{a\gamma} < 7.35\times 10^{-23}\,\text{GeV}^{-1} at 95%95\,\% confidence level. This result improves upon the previous best helioscope limit and demonstrates the potential of GridPix technology for rare event searches. Additionally, we derived a limit on the axion-photon coupling of g_{a\gamma} < 9.0\times 10^{-11}\,\text{GeV}^{-1} at 95%95\,\% CL, which, while not surpassing CAST's best limit, provides complementary constraints on axion models.
Sleep is among the most important factors affecting one's daily performance, well-being, and life quality. Nevertheless, it became possible to measure it in daily life in an unobtrusive manner with wearable devices. Rather than camera recordings and extraction of the state from the images, wrist-worn devices can measure directly via accelerometer, heart rate, and heart rate variability sensors. Some measured features can be as follows: time to bed, time out of bed, bedtime duration, minutes to fall asleep, and minutes after wake-up. There are several studies in the literature regarding sleep quality and stage prediction. However, they use only wearable data to predict or focus on the sleep stage. In this study, we use the NetHealth dataset, which is collected from 698 college students' via wearables, as well as surveys. Recently, there has been an advancement in deep learning algorithms, and they generally perform better than conventional machine learning techniques. Among them, Convolutional Neural Networks (CNN) have high performances. Thus, in this study, we apply different CNN architectures that have already performed well in the human activity recognition domain and compare their results. We also apply Random Forest (RF) since it performs best among the conventional methods. In future studies, we will compare them with other deep learning algorithms.
Omnidirectional videos (ODVs) are redefining viewer experiences in virtual reality (VR) by offering an unprecedented full field-of-view (FOV). This study extends the domain of saliency prediction to 360-degree environments, addressing the complexities of spherical distortion and the integration of spatial audio. Contextually, ODVs have transformed user experience by adding a spatial audio dimension that aligns sound direction with the viewer's perspective in spherical scenes. Motivated by the lack of comprehensive datasets for 360-degree audio-visual saliency prediction, our study curates YT360-EyeTracking, a new dataset of 81 ODVs, each observed under varying audio-visual conditions. Our goal is to explore how to utilize audio-visual cues to effectively predict visual saliency in 360-degree videos. Towards this aim, we propose two novel saliency prediction models: SalViT360, a vision-transformer-based framework for ODVs equipped with spherical geometry-aware spatio-temporal attention layers, and SalViT360-AV, which further incorporates transformer adapters conditioned on audio input. Our results on a number of benchmark datasets, including our YT360-EyeTracking, demonstrate that SalViT360 and SalViT360-AV significantly outperform existing methods in predicting viewer attention in 360-degree scenes. Interpreting these results, we suggest that integrating spatial audio cues in the model architecture is crucial for accurate saliency prediction in omnidirectional videos. Code and dataset will be available at this https URL.
15
In this paper we present a new algorithm for online (sequential) inference in Bayesian neural networks, and show its suitability for tackling contextual bandit problems. The key idea is to combine the extended Kalman filter (which locally linearizes the likelihood function at each time step) with a (learned or random) low-dimensional affine subspace for the parameters; the use of a subspace enables us to scale our algorithm to models with 1M\sim 1M parameters. While most other neural bandit methods need to store the entire past dataset in order to avoid the problem of "catastrophic forgetting", our approach uses constant memory. This is possible because we represent uncertainty about all the parameters in the model, not just the final linear layer. We show good results on the "Deep Bayesian Bandit Showdown" benchmark, as well as MNIST and a recommender system.
This paper introduces foundational resources and models for natural language processing (NLP) of historical Turkish, a domain that has remained underexplored in computational linguistics. We present the first named entity recognition (NER) dataset, HisTR and the first Universal Dependencies treebank, OTA-BOUN for a historical form of the Turkish language along with transformer-based models trained using these datasets for named entity recognition, dependency parsing, and part-of-speech tagging tasks. Additionally, we introduce Ottoman Text Corpus (OTC), a clean corpus of transliterated historical Turkish texts that spans a wide range of historical periods. Our experimental results show significant improvements in the computational analysis of historical Turkish, achieving promising results in tasks that require understanding of historical linguistic structures. They also highlight existing challenges, such as domain adaptation and language variations across time periods. All of the presented resources and models are made available at this https URL to serve as a benchmark for future progress in historical Turkish NLP.
Research from institutions in Istanbul demonstrates that a standard U-Net architecture, when paired with an extensive and strategically developed data augmentation pipeline, achieves an AUC of 0.9855 and an accuracy of 0.9712 on the DRIVE dataset for retinal vessel segmentation. This approach matches or exceeds the performance of many more complex, state-of-the-art models while requiring significantly less computational resources and training time.
6
Quantum computing is a new model of computation, based on quantum physics. Quantum computers can be exponentially faster than conventional computers for problems such as factoring. Besides full-scale quantum computers, more restricted models such as quantum versions of finite automata have been studied. In this paper, we survey various models of quantum finite automata and their properties. We also provide some open questions and new directions for researchers. Keywords: quantum finite automata, probabilistic finite automata, nondeterminism, bounded error, unbounded error, state complexity, decidability and undecidability, computational complexity
Incorporating domain-specific constraints into machine learning models is essential for generating predictions that are both accurate and feasible in real-world applications. This paper introduces new methods for training Output-Constrained Regression Trees (OCRT), addressing the limitations of traditional decision trees in constrained multi-target regression tasks. We propose three approaches: M-OCRT, which uses split-based mixed integer programming to enforce constraints; E-OCRT, which employs an exhaustive search for optimal splits and solves constrained prediction problems at each decision node; and EP-OCRT, which applies post-hoc constrained optimization to tree predictions. To illustrate their potential uses in ensemble learning, we also introduce a random forest framework working under convex feasible sets. We validate the proposed methods through a computational study both on synthetic and industry-driven hierarchical time series datasets. Our results demonstrate that imposing constraints on decision tree training results in accurate and feasible predictions.
11
Despite the growing accessibility of augmented reality (AR) for visualization, existing computer-aided design (CAD) systems remain confined to traditional screens or require complex setups or predefined parameters, limiting immersion and accessibility for novices. We present TinkerXR, an open-source AR interface enabling in-situ design and fabrication through Constructive Solid Geometry (CSG) modeling. TinkerXR operates solely with a headset and 3D printer, allowing users to design directly in and for their physical environments. By leveraging spatial awareness, depth occlusion, recognition of physical constraints, reference objects, and hand movement controls, TinkerXR enhances realism, precision, and ease of use. Its AR-based workflow integrates design and 3D printing with a drag-and-drop interface for printers' virtual twins. A user study comparing TinkerXR with Tinkercad shows that TinkerXR offers novices higher accessibility, engagement, and ease of use. Participants highlighted how designing directly in physical space made the process more intuitive. By bridging the gap between digital creation and physical output, TinkerXR aims to transform everyday spaces into expressive creative studios. We release TinkerXR as open source to encourage further exploration of accessible, spatially grounded CAD tools.
Although machine translation systems are mostly designed to serve in the general domain, there is a growing tendency to adapt these systems to other domains like literary translation. In this paper, we focus on English-Turkish literary translation and develop machine translation models that take into account the stylistic features of translators. We fine-tune a pre-trained machine translation model by the manually-aligned works of a particular translator. We make a detailed analysis of the effects of manual and automatic alignments, data augmentation methods, and corpus size on the translations. We propose an approach based on stylistic features to evaluate the style of a translator in the output translations. We show that the human translator style can be highly recreated in the target machine translations by adapting the models to the style of the translator.
With the development of wearable technologies, a new kind of healthcare data has become valuable as medical information. These data provide meaningful information regarding an individual's physiological and psychological states, such as activity level, mood, stress, and cognitive health. These biomarkers are named digital since they are collected from digital devices integrated with various sensors. In this study, we explore digital biomarkers related to stress modality by examining data collected from mobile phones and smartwatches. We utilize machine learning techniques on the Tesserae dataset, precisely Random Forest, to extract stress biomarkers. Using feature selection techniques, we utilize weather, activity, heart rate (HR), stress, sleep, and location (work-home) measurements from wearables to determine the most important stress-related biomarkers. We believe we contribute to interpreting stress biomarkers with a high range of features from different devices. In addition, we classify the 55 different stress levels with the most important features, and our results show that we can achieve 85%85\% overall class accuracy by adjusting class imbalance and adding extra features related to personality characteristics. We perform similar and even better results in recognizing stress states with digital biomarkers in a daily-life scenario targeting a higher number of classes compared to the related studies.
In this work, we revisit/reinterpret/extend the model-independent analysis method (which we now call spread - luminosity distance fitting, spread-LDF) from our previous work. We apply it to the updated supernova type Ia catalogue, Pantheon+ and recent GRB compilations. The procedure allows us, using only FLRW assumption, to construct good approximations for expansion history of the universe, re-confirming its acceleration to be a robust feature. When we also assume General Relativity ("GR"), we can demonstrate, without any matter/energy model in mind, the need for (possibly nonconstant) dark energy ("GDE"). We find hints for positive pressure of GDE at z>1 with implications on either the complexity of dark energy, or the validity of one of the cosmological principle, interpretation of SN Ia data, or GR.
We discuss an autoencoder model in which the encoding and decoding functions are implemented by decision trees. We use the soft decision tree where internal nodes realize soft multivariate splits given by a gating function and the overall output is the average of all leaves weighted by the gating values on their path. The encoder tree takes the input and generates a lower dimensional representation in the leaves and the decoder tree takes this and reconstructs the original input. Exploiting the continuity of the trees, autoencoder trees are trained with stochastic gradient descent. On handwritten digit and news data, we see that the autoencoder trees yield good reconstruction error compared to traditional autoencoder perceptrons. We also see that the autoencoder tree captures hierarchical representations at different granularities of the data on its different levels and the leaves capture the localities in the input space.
518
We present a new perturbative/non-perturbative (P/NP) relation that applies to a broader class of genus-1 potentials, including those that possess real and complex instantons parametrized by a deformation parameter, such as polynomial and elliptic potentials. Our findings significantly extend the scope of quantum mechanical systems for which perturbation theory suffices to calculate the contributions of non-perturbative effects to energy levels, with or without the need for a boundary condition, depending on the potential. We further provide evidence for our results by predicting the corrections to the large-order behavior of the perturbative expansion for the Jacobi SD elliptic potential using the early terms of the instanton fluctuations that satisfy the P/NP relation.
There are no more papers matching your filters at the moment.