The Earth's climate is constantly changing, significantly impacting human life and societal development. Over the past 420,000 years, the planet has experienced four cooling periods, interspersed with interglacial phases. The Holocene, which began around 12,000 years ago, is characterized by relatively comfortable conditions and includes 13 cycles of warming and cooling.
Currently, scientists studying ice cores from Greenland and Antarctica (Lake Vostok) can reconstruct climate data from hundreds of thousands of years ago. Approximately 5,500 years ago, the Holocene optimum began, after which temperatures declined. Since the mid-19th century, there has been a trend towards warming, driven by increased carbon dioxide levels and human impact.
Climate studies are crucial for assessing environmental conditions and economic factors, especially in agriculture. The current climate warming has become noticeable within just one generation and may affect resources and human survival. Analyzing past temperatures using information technologies enables the creation of reliable climate forecasts.
In recent years, significant progress has been made in processing big data, allowing the use of vast amounts of information for more accurate predictions and filling gaps in observations. The concept of the Internet of Things (IoT) connects devices for data collection, opening new horizons for scientific research.
Victoria Erofeeva, an associate professor at the Department of Environmental Safety and Engineering, Zhanna Zhukova, a senior lecturer at the same department, and a group of students from the Faculty of Cybernetics and Information Security compared methods for processing available temperature data from several meteorological stations in Queensland, Australia, for the historical period up to 2018, and conducted a control forecast for the next five years, as well as a final temperature change prediction up to 2030 using artificial intelligence.
For the analysis and prediction of climate changes, Queensland, Australia, was chosen for its numerous meteorological stations and long temperature records. Data on average annual temperatures from the stations were compiled into two files: the first file contained actual temperature data for 236 stations over the observation period from 1856 to 2022, while the second file included data from five stations located at different latitudes during the same period.
To forecast temperatures from the first file, methods such as k-nearest neighbors (KNN), Linear Regression, and seasonal autoregressive integrated moving average (SARIMA) were applied, without using random dispersion.
For a clear demonstration of the work, nine stations with long observation records were selected. A more detailed comparison of temperature variability from the second file was conducted using the Random Forest Regressor method for the five stations. This method allows for comparing the maximum and minimum predicted temperatures with actual values. The methods were evaluated based on the mean squared error (MSE).
"The accuracy of the forecast for the stations from the second file was calculated for two runs, as random dispersion was applied using the random forest method during forecasting. Each run of the program produces new values based on those available in file #2. As a result, predictions were obtained that accounted for random variables, which were different (but not significantly) for each run of the program.
The accuracy was calculated by comparing the predicted temperatures for the two runs with the actual temperatures from the first file. As a result of comparing various methods, the random forest regressor demonstrated that this method predicts temperature values with an accuracy of no less than 96 percent, while the smallest mean squared error was calculated with the k-nearest neighbors (KNN) method: 0.175. Based on the random forest regressor, we conducted forecasting for five stations up to 2030," noted Victoria Erofeeva.
During the research, the scientists noted that the accuracy of the forecasts depends on the size of the initial dataset and the number of hyperparameters, such as tree depth in the random forest, learning rate in gradient boosting, regularization coefficient in linear models, number of neighbors in the k-nearest neighbors method, and various metrics used for model evaluation.
"Comparing temperatures at the five stations for the first file using the random forest regression method showed that the highest maximum and minimum temperatures were predicted at the Weipa and Lockhart Airport stations, while the lowest were at Amberley and Aplthorpe stations," said Zhanna Sergeyevna Zhukova.
An important aspect of the study is the use of machine learning and big data for predicting future temperature regimes, providing a more comprehensive understanding of the complex processes occurring in the atmosphere. The results obtained can serve as a foundation for developing predictive models that account for both global trends and local features of climate dynamics.
More detailed data on future temperature changes can be utilized to enhance agricultural practices, urban planning, and ecological design in the context of climate change.
This material was prepared based on the article "Comparison of Temperature Forecasting Methods Using Data from Queensland, Australia."