Context: We were engaged to (i) assess the capabilities of an existing forecasting model for solar energy generation within Great Britain, and (ii) ascertain whether the forecasting scheme would benefit from the use of higher fidelity weather data (which would have been purchased at a higher price). Predictions from this model were fed into upstream processes for energy mixture optimisation and driving energy market costs. The existing model involved construction of specific features based on weather data, constructed to reflect the diurnal cycle across GB and was able to provide, in principle, forecasts from hours to several days ahead.
Our Assessment Process: Based on historical data, we isolated the characteristic properties of days in which the forecast demonstrated significant departure from the truth. By processes of automated data mining and human analysis we determined characteristic patterns and commonalities in the days where the algorithm performed poorly. Through this process, it was identified that a main contribution to the prediction arose from issues within the features selected for the model, and that this could be improved by introducing a degree of spatial variability in the existing features. A separate analysis was undertaken to assess the advantage of purchasing higher-fidelity weather data, and it was identified that little advantage was to be offered by using this, based on the existing features.
We also provided advice on how the model should be deployed within a re-training pipeline, providing recommendations based on MLOps best practices.
Follow-On Based on Recommendations: Based on the recommendations in our report, we were engaged to develop a better model for solar energy forecasting based on the existing weather data available. We constructed the recommended features, based on the diurnal cycle of each region within the UK (incorporating sunrise, sunset features and other related features), as well as regional weather data from the provided MetOffice source. After an extensive model development stage, a gradient boosting regression model was selected which provided superior, explainable predictions in this context. With the existing MetOffice data source, the model was 33% more accurate at one-day-ahead forecasts, and significantly higher for seven-day ahead forecasts. Moreover, it was identified that this model would benefit from higher resolution weather data should that be made available.
Timescales and Effort: This project predates AQ, and thus the model analysis was performed entirely manually. The assessment of error process to identify underperforming features etc involved 3 man-months, while the cost-benefit analysis of having higher performance data involved 1 man-month of effort. Within AQ this process is semi-automated, bringing the human effort down significantly. For a similar problem we would estimate 1-man-month effort for the feature analysis. The subsequent follow-on model development required 10 man-months effort.