SAP Labs China
Agriculture plays a crucial role in the global economy and social stability, and accurate crop yield prediction is essential for rational planting planning and decision-making. This study focuses on crop yield Time-Series Data prediction. Considering the crucial significance of agriculture in the global economy and social stability and the importance of accurate crop yield prediction for rational planting planning and decision-making, this research uses a dataset containing multiple crops, multiple regions, and data over many years to deeply explore the relationships between climatic factors (average rainfall, average temperature) and agricultural inputs (pesticide usage) and crop yield. Multiple hybrid machine learning models such as Linear Regression, Random Forest, Gradient Boost, XGBoost, KNN, Decision Tree, and Bagging Regressor are adopted for yield prediction. After evaluation, it is found that the Random Forest and Bagging Regressor models perform excellently in predicting crop yield with high accuracy and low error.As agricultural data becomes increasingly rich and time-series prediction techniques continue to evolve, the results of this study contribute to advancing the practical application of crop yield prediction in agricultural production management. The integration of time-series analysis allows for more dynamic, data-driven decision-making, enhancing the accuracy and reliability of crop yield forecasts over time.
Performance testing in large-scale database systems like SAP HANA is a crucial yet labor-intensive task, involving extensive manual analysis of thousands of measurements, such as CPU time and elapsed time. Manual maintenance of these metrics is time-consuming and susceptible to human error, making early detection of performance regressions challenging. We address these issues by proposing an automated approach to detect performance regressions in such measurements. Our approach integrates Bayesian inference with the Pruned Exact Linear Time (PELT) algorithm, enhancing the detection of change points and performance regressions with high precision and efficiency compared to previous approaches. Our method minimizes false negatives and ensures SAP HANA's system's reliability and performance quality. The proposed solution can accelerate testing and contribute to more sustainable performance management practices in large-scale data management environments.
After a developer submits code, corresponding test cases arise to ensure the quality of software delivery. Test failures would occur during this period, such as crash, error, and timeout. Since it takes time for developers to resolve them, many duplicate failures will happen during this period. In the delivery practice of SAP HANA, crash triage is considered as the most time-consuming task. If duplicate crash failures can be automatically identified, the degree of automation will be significantly enhanced. To find such duplicates, we propose a training-based mathematical model that utilizes component information of SAP HANA to achieve better crash similarity comparison. We implement our approach in a tool named Knowledge-based Detector (K-Detector), which is verified by 11,208 samples and performs 0.986 in AUC. Furthermore, we have deployed K-Detector to the production environment, and it can save 97% human efforts in crash triage as statistics.
There are no more papers matching your filters at the moment.