Wageningen University
SatCLIP is a geographic location encoder that provides efficient, low-dimensional, and general-purpose implicit representations of locations worldwide. It demonstrates superior predictive performance across a diverse set of downstream tasks and robust generalization to unseen geographic areas, achieving this by pretraining a location encoder on globally sampled satellite imagery using a contrastive learning objective.
This work introduces Spherical Harmonic (SH) embeddings and Sinusoidal Representation Networks (SirenNets) for encoding geographic locations, effectively addressing the rectangular domain assumption of prior methods and eliminating polar artifacts. The combined approach demonstrates superior accuracy across diverse tasks, including climate data interpolation and species distribution modeling, while also showing SirenNets can implicitly learn Fourier-like representations.
94
This study introduces intrinsic dimension (ID) as a novel, unsupervised metric to quantify the information content of geographic Implicit Neural Representations (INRs). It demonstrates that ID effectively reveals spatial artifacts and biases, and that global ID in frozen embeddings positively correlates with downstream task performance, while ID in task-aligned activations shows a negative correlation.
2
Humans show high-level of abstraction capabilities in games that require quickly communicating object information. They decompose the message content into multiple parts and communicate them in an interpretable protocol. Toward equipping machines with such capabilities, we propose the Primitive-based Sketch Abstraction task where the goal is to represent sketches using a fixed set of drawing primitives under the influence of a budget. To solve this task, our Primitive-Matching Network (PMN), learns interpretable abstractions of a sketch in a self supervised manner. Specifically, PMN maps each stroke of a sketch to its most similar primitive in a given set, predicting an affine transformation that aligns the selected primitive to the target stroke. We learn this stroke-to-primitive mapping end-to-end with a distance-transform loss that is minimal when the original sketch is precisely reconstructed with the predicted primitives. Our PMN abstraction empirically achieves the highest performance on sketch recognition and sketch-based image retrieval given a communication budget, while at the same time being highly interpretable. This opens up new possibilities for sketch analysis, such as comparing sketches by extracting the most relevant primitives that define an object category. Code is available at this https URL
26
Recent developments and research in modern machine learning have led to substantial improvements in the geospatial field. Although numerous deep learning architectures and models have been proposed, the majority of them have been solely developed on benchmark datasets that lack strong real-world relevance. Furthermore, the performance of many methods has already saturated on these datasets. We argue that a shift from a model-centric view to a complementary data-centric perspective is necessary for further improvements in accuracy, generalization ability, and real impact on end-user applications. Furthermore, considering the entire machine learning cycle-from problem definition to model deployment with feedback-is crucial for enhancing machine learning models that can be reliable in unforeseen situations. This work presents a definition as well as a precise categorization and overview of automated data-centric learning approaches for geospatial data. It highlights the complementary role of data-centric learning with respect to model-centric in the larger machine learning deployment cycle. We review papers across the entire geospatial field and categorize them into different groups. A set of representative experiments shows concrete implementation examples. These examples provide concrete steps to act on geospatial data with data-centric machine learning approaches.
Epidemic control frequently relies on adjusting interventions based on prevalence. But designing such policies is a highly non-trivial problem due to uncertain intervention effects, costs and the difficulty of quantifying key transmission mechanisms and parameters. Here, using exact mathematical and computational methods, we reveal a fundamental limit in epidemic control in that prevalence feedback policies are outperformed by a single optimally chosen constant control level. Specifically, we find no incentive to use prevalence based control under a wide class of cost functions that depend arbitrarily on interventions and scale with infections. We also identify regimes where prevalence feedback is beneficial. Our results challenge the current understanding that prevalence based interventions are required for epidemic control and suggest that, for many classes of epidemics, interventions should not be varied unless the epidemic is near the herd immunity threshold.
This work proposes SAMSelect, an algorithm to obtain a salient three-channel visualization for multispectral images. We develop SAMSelect and show its use for marine scientists visually interpreting floating marine debris in Sentinel-2 imagery. These debris are notoriously difficult to visualize due to their compositional heterogeneity in medium-resolution imagery. Out of these difficulties, a visual interpretation of imagery showing marine debris remains a common practice by domain experts, who select bands and spectral indices on a case-by-case basis informed by common practices and heuristics. SAMSelect selects the band or index combination that achieves the best classification accuracy on a small annotated dataset through the Segment Anything Model. Its central assumption is that the three-channel visualization achieves the most accurate segmentation results also provide good visual information for photo-interpretation. We evaluate SAMSelect in three Sentinel-2 scenes containing generic marine debris in Accra, Ghana, and Durban, South Africa, and deployed plastic targets from the Plastic Litter Project. This reveals the potential of new previously unused band combinations (e.g., a normalized difference index of B8, B2), which demonstrate improved performance compared to literature-based indices. We describe the algorithm in this paper and provide an open-source code repository that will be helpful for domain scientists doing visual photo interpretation, especially in the marine field.
This study explores the integration of AI, particularly large language models (LLMs) like ChatGPT, into educational settings, focusing on the implications for teaching and learning. Through interviews with course coordinators from data science courses at Wageningen University, this research identifies both the benefits and challenges associated with AI in the classroom. While AI tools can streamline tasks and enhance learning, concerns arise regarding students' overreliance on these technologies, potentially hindering the development of essential cognitive and problem solving skills. The study highlights the importance of responsible AI usage, ethical considerations, and the need for adapting assessment methods to ensure educational outcomes are met. With careful integration, AI can be a valuable asset in education, provided it is used to complement rather than replace fundamental learning processes.
This article proposes a graphical model that handles mixed-type, multi-group data. The motivation for such a model originates from real-world observational data, which often contain groups of samples obtained under heterogeneous conditions in space and time, potentially resulting in differences in network structure among groups. Therefore, the i.i.d. assumption is unrealistic, and fitting a single graphical model on all data results in a network that does not accurately represent the between group differences. In addition, real-world observational data is typically of mixed discrete-and-continuous type, violating the Gaussian assumption that is typical of graphical models, which leads to the model being unable to adequately recover the underlying graph structure. The proposed model takes into account these properties of data, by treating observed data as transformed latent Gaussian data, by means of the Gaussian copula, and thereby allowing for the attractive properties of the Gaussian distribution such as estimating the optimal number of model parameter using the inverse covariance matrix. The multi-group setting is addressed by jointly fitting a graphical model for each group, and applying the fused group penalty to fuse similar graphs together. In an extensive simulation study, the proposed model is evaluated against alternative models, where the proposed model is better able to recover the true underlying graph structure for different groups. Finally, the proposed model is applied on real production-ecological data pertaining to on-farm maize yield in order to showcase the added value of the proposed method in generating new hypotheses for production ecologists.
Convolutional neural networks (CNN) are known to learn an image representation that captures concepts relevant to the task, but do so in an implicit way that hampers model interpretability. However, one could argue that such a representation is hidden in the neurons and can be made explicit by teaching the model to recognize semantically interpretable attributes that are present in the scene. We call such an intermediate layer a \emph{semantic bottleneck}. Once the attributes are learned, they can be re-combined to reach the final decision and provide both an accurate prediction and an explicit reasoning behind the CNN decision. In this paper, we look into semantic bottlenecks that capture context: we want attributes to be in groups of a few meaningful elements and participate jointly to the final decision. We use a two-layer semantic bottleneck that gathers attributes into interpretable, sparse groups, allowing them contribute differently to the final output depending on the context. We test our contextual semantic interpretable bottleneck (CSIB) on the task of landscape scenicness estimation and train the semantic interpretable bottleneck using an auxiliary database (SUN Attributes). Our model yields in predictions as accurate as a non-interpretable baseline when applied to a real-world test set of Flickr images, all while providing clear and interpretable explanations for each prediction.
2
In the past decade, we have witnessed a dramatically increasing volume of data collected from varied sources. The explosion of data has transformed the world as more information is available for collection and analysis than ever before. To maximize the utilization, various machine and deep learning models have been developed, e.g. CNN [1] and RNN [2], to study data and extract valuable information from different perspectives. While data-driven applications improve countless products, training models for hyperparameter tuning is still a time-consuming and resource-intensive process. Cloud computing provides infrastructure support for the training of deep learning applications. The cloud service providers, such as Amazon Web Services [3], create an isolated virtual environment (virtual machines and containers) for clients, who share physical resources, e.g., CPU and memory. On the cloud, resource management schemes are implemented to enable better sharing among users and boost the system-wide performance. However, general scheduling approaches, such as spread priority and balanced resource schedulers, do not work well with deep learning workloads. In this project, we propose SpeCon, a novel container scheduler that is optimized for shortlived deep learning applications. Based on virtualized containers, such as Kubernetes [4] and Docker [5], SpeCon analyzes the common characteristics of training processes. We design a suite of algorithms to monitor the progress of the training and speculatively migrate the slow-growing models to release resources for fast-growing ones. Specifically, the extensive experiments demonstrate that SpeCon improves the completion time of an individual job by up to 41.5%, 14.8% system-wide and 24.7% in terms of makespan.
This contribution introduces a novel statistical learning methodology based on the Bradley-Terry method for pairwise comparisons, where the novelty arises from the method's capacity to estimate the worth of objects for a primary attribute by incorporating data of secondary attributes. These attributes are properties on which objects are evaluated in a pairwise fashion by individuals. By assuming that the main interest of practitioners lies in the primary attribute, and the secondary attributes only serve to improve estimation of the parameters underlying the primary attribute, this paper utilises the well-known transfer learning framework. To wit, the proposed method first estimates a biased worth vector using data pertaining to both the primary attribute and the set of informative secondary attributes, which is followed by a debiasing step based on a penalised likelihood of the primary attribute. When the set of informative secondary attributes is unknown, we allow for their estimation by a data-driven algorithm. Theoretically, we show that, under mild conditions, the \ell_\infty and 2\ell_2 rates are improved compared to fitting a Bradley-Terry model on just the data pertaining to the primary attribute. The favourable (comparative) performance under more general settings is shown by means of a simulation study. To illustrate the usage and interpretation of the method, an application of the proposed method is provided on consumer preference data pertaining to a cassava derived food product: eba. An R package containing the proposed methodology can be found on xthis https URL
Solar greenhouses are crucial infrastructure of modern agricultural production in northern China. However, highly fluctuating temperature in winter season results in poor greenhouse temperature control, which affects crop growth and increases energy consumption. To tackle these challenges, an advanced control system that can efficiently optimize multiple objectives under dramatic climate conditions is essential. Therefore, this study propose a model predictive control-coupled proximal policy optimization (MPC-PPO) control framework. A teacher-student control framework is constructed in which the MPC generating high-quality control experiences to guide the PPO agent's learning process. An adaptive dynamic weighting mechanism is employed to balance the influence of MPC experiences during PPO training. Evaluation conducted in solar greenhouses across three provinces in northern China (Beijing, Hebei, and Shandong) demonstrates that: (1) the MPC-PPO method achieved the highest temperature control performance (96.31 on a 100-point scale), with a 5.46-point improvement compared to the non-experience integration baseline, when reduced standard deviation by nearly half and enhanced exploration efficiency; (2) the MPC-PPO method achieved a ventilation control reward of 99.19, optimizing ventilation window operations with intelligent time-differentiated strategies that reduced energy loss during non-optimal hours; (3) feature analysis reveals that historical window opening, air temperature, and historical temperature are the most influential features for effective control, i.e., SHAP values of 7.449, 4.905, and 4.747 respectively; and (4) cross-regional tests indicated that MPC-PPO performs best in all test regions, confirming generalization of the method.
Crystallization and vitrification are two different routes to form a solid. Normally these two processes suppress each other, with the glass transition preventing crystallization at high density (or low temperature). This is even true for systems of colloidal hard spheres, which are commonly used as building blocks for novel functional materials with potential applications, e.g. photonic crystals. By performing Brownian dynamics simulations of glassy systems consisting of mixtures of active and passive hard spheres, we show that the crystallization of such hard-sphere glasses can be dramatically promoted by doping the system with small amounts of active particles. Surprisingly, even hard-sphere glasses of packing fraction up to ϕ=0.635\phi = 0.635 crystallize, which is around 0.5%0.5\% below the random close packing at ϕ0.64\phi \simeq 0.64. Our results suggest a novel way of fabricating crystalline materials from (colloidal) glasses. This is particularly important for materials that get easily kinetically trapped in glassy states, and crystal nucleation hardly occurs.
This research provides a comprehensive comparison of Model Predictive Control (MPC) and Reinforcement Learning (RL) for greenhouse climate management, specifically for lettuce cultivation, using a unified simulation framework. The study demonstrates that while RL achieves slightly higher crop yields with faster execution, MPC offers more economically efficient control with consistent climate management.
Hypotheses: A quantitative molecular-thermodynamic theory of the growth of giant wormlike micelles of nonionic surfactants can be developed on the basis of a generalized model, which includes the classical 'phase separation' and 'mass action' models as special cases. The generalized model describes spherocylindrical micelles, which are simultaneously multicomponent and polydisperse in size. Theory: By analytical minimization of the free-energy functional we derived explicit expressions for the chain-extension and chain-end distribution functions in the hydrocarbon core of mixed micelles from two surfactants of different chainlengths. Findings: The hydrocarbon core of a two-component micelle is divided in two regions, outer and inner, where the ends of the shorter and longer chains are located. The derived analytical expression for the chain-conformation free energy implies that the mixing of surfactants with different chainlengths is always nonideal and synergistic, i.e. it leads to decrease of the micellar free energy and to enhancement of micellization and micelle growth. The derived expressions are applicable to surfactants with different headgroups (nonionic, ionic, zwitterionic) and to micelles of different shapes (spherical, wormlike, lamellar). The results can be incorporated in a quantitative theory of the growth of giant mixed micelles in formulations with practical applications in detergency.
Hypotheses: Quantitative molecular-thermodynamic theory of the growth of giant wormlike micelles in mixed nonionic surfactant solutions can be developed on the basis of a generalized model, which includes the classical phase separation and mass action models as special cases. The generalized model describes spherocylindrical micelles, which are simultaneously multicomponent and polydisperse in size. Theory: The model is based on explicit analytical expressions for the four components of the free energy of mixed nonionic micelles: interfacial-tension, headgroup-steric, chain-conformation components and free energy of mixing. The radii of the cylindrical part and the spherical endcaps, as well as the chemical composition of the endcaps, are determined by minimization of the free energy. Findings: In the case of multicomponent micelles, an additional term appears in the expression for the micelle growth parameter (scission free energy), which takes into account the fact that the micelle endcaps and cylindrical part have different compositions. The model accurately predicts the mean mass aggregation number of wormlike micelles in mixed nonionic surfactant solutions without using any adjustable parameters. The endcaps are enriched in the surfactant with smaller packing parameter that is better accommodated in regions of higher mean surface curvature. The model can be further extended to mixed solutions of nonionic, ionic and zwitterionic surfactants used in personal-care and house-hold detergency.
Context: Software testability is the degree to which a software system or a unit under test supports its own testing. To predict and improve software testability, a large number of techniques and metrics have been proposed by both practitioners and researchers in the last several decades. Reviewing and getting an overview of the entire state-of-the-art and state-of-the-practice in this area is often challenging for a practitioner or a new researcher. Objective: Our objective is to summarize the body of knowledge in this area and to benefit the readers (both practitioners and researchers) in preparing, measuring and improving software testability. Method: To address the above need, the authors conducted a survey in the form of a systematic literature mapping (classification) to find out what we as a community know about this topic. After compiling an initial pool of 303 papers, and applying a set of inclusion/exclusion criteria, our final pool included 208 papers. Results: The area of software testability has been comprehensively studied by researchers and practitioners. Approaches for measurement of testability and improvement of testability are the most-frequently addressed in the papers. The two most often mentioned factors affecting testability are observability and controllability. Common ways to improve testability are testability transformation, improving observability, adding assertions, and improving controllability. Conclusion: This paper serves for both researchers and practitioners as an "index" to the vast body of knowledge in the area of testability. The results could help practitioners measure and improve software testability in their projects.
Africa is experiencing extensive biodiversity loss due to rapid changes in the environment, where natural resources constitute the main instrument for socioeconomic development and a mainstay source of livelihoods for an increasing population. Lack of data and information deficiency on biodiversity, but also budget constraints and insufficient financial and technical capacity, impede sound policy design and effective implementation of conservation and management measures. The problem is further exacerbated by the lack of harmonized indicators and databases to assess conservation needs and monitor biodiversity losses. We review challenges with biodiversity data (availability, quality, usability, and database access) as a key limiting factor that impact funding and governance. We also evaluate the drivers of both ecosystems change and biodiversity loss as a central piece of knowledge to develop and implement effective policies. While the continent focuses more on the latter, we argue that the two are complementary in shaping restoration and management solutions. We thus underscore the importance of establishing monitoring programs focusing on biodiversity-ecosystem linkages in order to inform evidence-based decisions in ecosystem conservation and restoration in Africa.
Video game development is a complex endeavor, often involving complex software, large organizations, and aggressive release deadlines. Several studies have reported that periods of "crunch time" are prevalent in the video game industry, but there are few studies on the effects of time pressure. We conducted a survey with participants of the Global Game Jam (GGJ), a 48-hour hackathon. Based on 198 responses, the results suggest that: (1) iterative brainstorming is the most popular method for conceptualizing initial requirements; (2) continuous integration, minimum viable product, scope management, version control, and stand-up meetings are frequently applied development practices; (3) regular communication, internal playtesting, and dynamic and proactive planning are the most common quality assurance activities; and (4) familiarity with agile development has a weak correlation with perception of success in GGJ. We conclude that GGJ teams rely on ad hoc approaches to development and face-to-face communication, and recommend some complementary practices with limited overhead. Furthermore, as our findings are similar to recommendations for software startups, we posit that game jams and the startup scene share contextual similarities. Finally, we discuss the drawbacks of systemic "crunch time" and argue that game jam organizers are in a good position to problematize the phenomenon.
There are no more papers matching your filters at the moment.