University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy Master's Thesis in MSc. Wind Energy Systems Analysis and Method Selection of a Measure-Correlate-Predict Methodology for a Digital Wind Buoy January 2022 University of Kassel Department: Mechanics and Dynamics1 Degree programme: ONLINE M.SC. WIND ENERGY SYSTEMS – WES.ONLINE First examiner : Dr. Julia Gottschall Second examiner: Prof. Dr.-Ing. Detlef Kuhl Starting date: 13 July 2021 Date of submission: 13 January 2022 Ahmet Okan Sargin Status of confidentiality x Public Internal Confidential 1 Fachgebiet: FB 14 Bauingenieur- und Umweltingenieurwesen Institut für Baustatik und Baudynamik Baumechanik /Baudynamik University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy ii Abstract The determination of the site-specific wind conditions has a significant influence on the development and use of offshore wind energy. Lower uncertainties of wind potential result in cost-effective project financing. Floating lidar systems (FLS) or wind lidar buoys have become increasingly common in recent years as a measuring technology for the determination of offshore wind resource. However, due to harsh offshore environmental conditions, offshore measurements with FLS are prone to reliability issues which might result in lower data availabilities than required by industry guidelines. FLS are hard to reach during winter times in high wind periods with higher wave heights. It is not an exception that several months of FLS data would not be available for an MCP process. Motivated by this purpose, this work used a measure-correlate-predict (MCP) method to determine whether an interim step of gap-filling was required as part of a long-term correction procedure. With an hourly temporal resolution, the performance of a data filling algorithm with omnidirectional linear least squares was analyzed in depth. KPIs including MBE, MAE, and RMSE of mean wind speeds throughout concurrent periods were summarized from the investigation of deviations introduced by incremental sliding gaps of 1-day to 60-days gap scenarios. The model performance was assessed both for the training (SelfDF) and validation (ValDF) periods. The long-term wind speeds were derived for each iteration with and without a data-filling algorithm. Between the SelfDF and ValDF root mean square error of mean wind speed, a strong negative association was identified for all gap scenarios. This novel relationship (ISPE method) was used to determine the uncertainties in data-filling. The jackknife algorithm was deployed to assess the uncertainties in the long-term correction of both scenarios. One of the study's main questions was whether a short-term data filling phase was required before applying the long-term correction. Both scenarios showed identical long-term wind speed predictions negating this requirement for the considered MCP method. This was primarily due to the omnidirectional regression parameters and the reduced impact of the proportion of gaps on the model fit. The study reaffirmed the industry recommendation of 80% minimum availability for measurement campaign data as a reliable threshold since the mean deviation during 60-day gap periods was not more than 0.3% throughout the investigated iterations. University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy iii Acknowledgements I received much help, support, understanding, and empathy while working on this master's thesis. I'd like to sincerely thank my supervisor, Dr Julia Gottschall, for her invaluable advice in developing the research questions and methodology. I really appreciated the insightful feedback, which always reminded me of the big picture while correctly identifying potential issues in the assessment and keeping me in the right direction. I want to express my appreciation to Dr Andre Bisevic, Mr Oppermann from Unikims and the rest of the WES team for always being available to answer questions about the online wind energy systems (WES) programme over the years. My heartfelt gratitude also goes to Prof. Kuhl, who established such a magnificent concept that allowed us to develop and stay up with the new world without compromising our personal and professional responsibilities. I am grateful and honoured to have had the opportunity to work with such outstanding colleagues throughout my professional career. Thank you, Bungo, for introducing me to the wind industry, and thank you, Michael, Iain, for being there when I was learning the subtle nuances. Further, I would like to express my gratitude to Wilhelm, Wolfgang, Anna, and OWC colleagues for their patience and support throughout the master thesis process. My warmest gratitude goes to the valuable wind resource analysts and specialists who took part in the questionnaire and provided helpful feedback. I hope that this research will be beneficial to them. I am indebted for the opportunity and possibilities to live in a free social democracy dedicated to human rights, peace, and justice built on scientific principles. Society and our broader network are fundamental components of who and what we are, no matter where or when we live. My dearest friends may have endured even more than I have, and they deserve to be appreciated generously. Thank you for being there, Nicole and Friedrich, even while I was swamped with work and studies. My soul-brother Kivanc was my mentor and brain trust during the whole process. Many others, alongside myself, genuinely value his remarkable positivity, productivity, and heartfelt compassion. My gratitude also goes out to pg.lost for making the study time enjoyable. My mother and father, with the unconditional support and love that have supported me throughout my life, deserve enormous appreciation. I am privileged and inexpressibly thankful for their presence. Asuka, you invested even more energy in this study than me. Thank you for your love, understanding, patience and support. Without you, nothing would be possible. University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy iv Table of contents Abstract ii Acknowledgements iii Table of contents iv List of figures vi List of tables x Acronyms and abbreviations xii List of notation xiv 1 Introduction 1 1.1 Research questions 2 1.2 Literature review and questionnaire 5 1.3 Methodology overview 6 2 Methods and materials 10 2.1 Wind resource assessment 10 2.2 Statistical methods 12 2.2.1 Definition of uncertainty 12 2.2.2 Definition of type A and type B uncertainties 12 2.2.3 The mean 12 2.2.4 Variance and standard deviation 13 2.2.5 Covariance and correlation coefficient 13 2.2.6 Coefficient of determination 14 2.2.7 Mean bias, absolute bias and root mean square errors 14 2.2.8 Standard error 15 2.2.9 Kolmogorov-Smirnov statistic 15 2.2.10 Normal distribution 16 2.2.11 Weibull distribution 17 2.3 Review of MCP methods in wind resource assessments 17 2.3.1 Linear regression methods 19 2.3.2 Bin methods 22 2.3.3 Matrix methods 24 2.3.4 Novel computational methods 28 2.3.5 Quantile mapping methods 32 2.3.6 Empirical methods 32 2.4 Definition of the measure-correlate-predict (MCP) algorithms 33 2.4.1 Type classification of MCP 33 2.4.2 Definition of an algorithm 34 2.4.3 Classification of MCP methods 35 2.5 Questionnaire results 37 2.6 Definition of the key performance indicators and uncertainties 41 University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy v 2.6.1 The interface of the KPIs to the uncertainty method 44 2.6.2 Uncertainties in the long-term correction 45 2.6.3 MCP method uncertainty 46 2.7 Selection of the base-algorithm 51 2.8 Design of the code for iterative analysis 55 2.9 Datasets 59 2.9.1 Selection of the measurement dataset 59 2.9.2 Selection of the long-term reference dataset 59 2.9.3 Measurement campaign overview 60 2.9.4 Pre-processing and data preparation 61 3 Results 66 3.1 Evaluation of the MCP algorithms 66 3.2 Evaluation of the base-case algorithm results 67 3.2.1 Key performance indicators during the process 68 3.2.2 Data filling results 84 3.2.3 Long term correction results 87 3.3 Evaluation of the DF and LTC uncertainties 92 3.4 Proposed combined MCP uncertainty method 96 4 Discussion and conclusions 98 5 References 101 Annex A Questionnaire A-1 Annex B PreDF - Sectorwise exemplary results of the concurrent period B-1 Annex C KPI Results C-1 Annex D Evolution of self-prediction RMSE of MWS results D-1 Annex E Evolution of validation MBE of MWS results E-1 Annex F Evolution of validation MAE of MWS results F-1 Annex G Evolution of validation RMSE of MWS results G-1 Annex H Regression plots of self-prediction and validation RMSE H-1 Annex I MMIJ transfer functions to obtain data-filling uncertainties in an representative location I-1 Annex J Evolution of DFWS J-1 Annex K Evolution of LTWS K-1 Annex L Evolution of DF uncertainties L-1 Annex M Evolution of JK uncertainties M-1 Annex N Evolution of final uncertainties in LTWS N-1 University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy vi List of figures Figure 1-1. Scheme of the measure-correlate-predict (MCP) procedure ................................ 3 Figure 1-2. Flow chart of the methodology............................................................................. 9 Figure 2-1. Illustration of KS test ..........................................................................................16 Figure 2-2. Block diagram of a typical MCP ..........................................................................18 Figure 2-3. Illustration of the TLS method .............................................................................20 Figure 2-4. Minimization of errors in LLS (left) and TLS (right) with respect to model fit........21 Figure 2-5. Model fits [57] based on measurements (left) and representative algorithm points (right) ....................................................................................................................................23 Figure 2-6. VS method (left) and NL-MoM (right) with bins and resulting piecewise linear fits .............................................................................................................................................24 Figure 2-7. Sample data and first-order model for the wind speed-up ...................................25 Figure 2-8. Flowchart of the matrix method ..........................................................................26 Figure 2-9. Polynomial model used within matrix method, samples (left) and polynomial model (right) ....................................................................................................................................27 Figure 2-10. Schematic diagram of an ANN with 2N wind speed and wind direction input signals of N reference stations and two wind data output signals of the target station ......................29 Figure 2-11. Definition of an MCP algorithm at the example of linear regression ..................35 Figure 2-12. Mind map of energy production uncertainty according to the draft IEC 61400-15 .............................................................................................................................................46 Figure 2-13. Selection of number of subsets based on concurrent period ............................49 Figure 2-14. Sketch of the difference between JK and bootstrap resampling ........................49 Figure 2-15. Mapping of sub-uncertainty components ..........................................................50 Figure 2-16. Flowchart of evaluation of MCP uncertainty sub-uncertainty components at the example of industry practice [18] and TG6 technical guideline [35], with "correlation" uncertainty shown in amber as target of this study ...............................................................51 Figure 2-17. Flow chart of the code ......................................................................................56 Figure 2-18. Picture of the MMIJ station ...............................................................................61 University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy vii Figure 2-19. Weibull fit and histogram of MMIJ measurements in 2013 (left: ECN analysis, right: Fraunhofer IWES dataset) ...........................................................................................62 Figure 2-20. Time synchronisation .......................................................................................63 Figure 2-21. Annual trend analysis and comparison of reference datasets for the selected long- term period 2000-2018 .........................................................................................................64 Figure 3-1. MBE, MAE and DE results of the investigated MCP methods.............................66 Figure 3-2. Comparison of LTWS with different MCP methods .............................................67 Figure 3-3. PreDF – Heatmap of measured Weibull scale and shape factors for 1-day (left) and 60-days gap scenarios (right) in each column, respectively ..................................................69 Figure 3-4. PreDF – Heatmap of R² values of sectorwise hourly wind speeds correlation for 1- day (left) and 60-days gap scenarios (right) ..........................................................................70 Figure 3-5. PreDF – Box plot of R² values of sectorwise hourly wind speeds correlation for 1- day .......................................................................................................................................71 Figure 3-6. PreDF – Box plot of R² values of sectorwise hourly wind speeds correlation for 60- days gap ..............................................................................................................................72 Figure 3-7. PreDF – Heatmap of MBE of Weibull shape (left) and scale (right) factors for all iterations and gap scenarios – weighted from sectorwise analysis .......................................74 Figure 3-8. SelfDF – Boxplot of R² values of sectorwise hourly wind speeds correlation for 1- day scenario .........................................................................................................................76 Figure 3-9. SelfDF – Boxplot of R² values of sectorwise hourly wind speeds correlation for 60- day scenario .........................................................................................................................76 Figure 3-10. SelfDF - 3D evolution of RMSE of MWS for all sectors and gaps .....................77 Figure 3-11. SelfDF – Heatmap of MBE of Weibull shape (left) and scale (right) factors for all iterations and gap scenarios – weighted from sectorwise analysis .......................................78 Figure 3-12. SelfDF – Evolution of RMSE of MWS for 1-day (top) and 60-days (bottom) scenarios – omnidirectional analysis ....................................................................................79 Figure 3-13. SelfDF – Heatmap of RMSE of MWS for all iterations and gap scenarios – omnidirectional analysis .......................................................................................................80 Figure 3-14. ValDF – Evolution of MBE of MWS for 1-day (top) and 60-days (bottom) scenarios .............................................................................................................................................81 University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy viii Figure 3-15. ValDF – Evolution of MAE of MWS for 1-day (top) and 60-days (bottom) scenarios .............................................................................................................................................82 Figure 3-16. ValDF – Evolution of RMSE of MWS for 1-day (top) and 60-days (bottom) scenarios ..............................................................................................................................82 Figure 3-17. Regression plots of self-prediction and validation RMSE for 1-day (top) and 60- days (bottom) scenarios .......................................................................................................83 Figure 3-18. Evolution of MBE of observed vs predicted wind speeds for 60-days gap period .............................................................................................................................................84 Figure 3-19. Time series of observed vs predicted wind speeds for 60-days gap period starting on 01.07.2012 ......................................................................................................................85 Figure 3-20. Scatter plot of observed vs predicted wind speeds for 60-days gap period starting on 01.07.2012 ......................................................................................................................85 Figure 3-21. Comparison of wind direction frequency of observed vs predicted wind speeds for 60-days gap period starting on 01.07.2012...........................................................................86 Figure 3-22. Evolution of STDF-WS for 1-day (top) and 60-days (bottom) gap scenarios .....87 Figure 3-23. Evolution of LTWS without DF and LTWS with DF for 1-day (top) and 60-days (bottom) gap scenarios .........................................................................................................88 Figure 3-24. Scatter plot of DF predicted vs LTC predicted wind speeds for 60-days gap period starting on 01.07.2012 ..........................................................................................................88 Figure 3-25. Concurrent measured and referenced monthly wind speeds during short-term period ...................................................................................................................................90 Figure 3-26. Monthly windiness comparison of the short and long-term period .....................90 Figure 3-27. Measured wind frequency roses, measurement period 2013 (top left), measurement period 2014 (top right), measurement period 2015 (bottom left), long-term reference period (bottom right) .............................................................................................91 Figure 3-28. Evolution of DF uncertainties for 1-day and 60 days gap scenarios ..................93 Figure 3-29. Evolution of JK uncertainties in LT correction for 1-day and 60 days gap scenarios .............................................................................................................................................94 Figure 3-30. Evolution of combined uncertainties in LT correction for 1-day and 60 days gap scenarios ..............................................................................................................................94 Figure 3-31. Comparison of empirical and calculated uncertainties in wind speeds for 60 days gap period starting on 01.07.2012 ........................................................................................95 University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy ix Figure 3-32. Comparison of bootstrap and calculated uncertainties in wind speeds for 60-days gap period starting on 01.07.2012 ........................................................................................96 University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy x List of tables Table 2-1. Statistical characteristics of the wind [25] 11 Table 2-2. ANN settings at the example of regression methodologies used for the MCP methodology 29 Table 2-3. SVR settings at the example of regression methodologies used for the MCP methodology 31 Table 2-4. Classification of MCP methods according to Hanslian 33 Table 2-5. MCP method 1: Properties of linear regression methods 36 Table 2-6. MCP method 2: Properties of bin methods 36 Table 2-7. MCP method 2: Properties of properties matrix methods 36 Table 2-8. MCP method 3: Properties of novel computational methods 36 Table 2-9. MCP Method 4: Properties of quantile mapping methods 37 Table 2-10. MCP Method 5: Properties of empirical methods 37 Table 2-11. Summarized survey response to the question regarding KPI metrics 38 Table 2-12. Definition of KPIs 42 Table 2-13. PreDF KPI 43 Table 2-14. SelfDF KPI 43 Table 2-15. ValDF KPI for the gap 44 Table 2-16. Uncertainty estimators in the area of long-term corrections method uncertainty 45 Table 2-17. MCP algorithms for implementation of linear regression (LinReg) 52 Table 2-18. MCP algorithms for implementation of other methods 53 Table 2-19. Relationships and datasets for data filling at the example of data segments 57 Table 2-20: Reference relationships for the KPI classification 58 Table 2-21. MMIJ Instrumentation 60 Table 2-22. MMIJ short-term statistics 62 Table 2-23. Reference dataset statistics for the concurrent and long-term periods 65 Table 3-1. Coefficients of variation of considered MCP methods 67 University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy xi Table 3-2. PreDF - Summary statistics of RMSE of MWS for 1-day and 60-days gap scenarios 73 Table 3-3. PreDF - Summary statistics of KS of MWS for 1-day and 60-days gap scenarios 74 Table 3-4. SelfDF - Summary statistics of RMSE of MWS for 1-day and 60-days gap scenarios 77 Table 3-5. Summary statistics of ValDF for all gap periods 81 Table 3-6. LLS model parameter of validation period for 60-days gap period (start at 01.07.2012) 84 Table 3-7. LLS model parameter of LTC for 1-day, 20-days and 60-days scenarios 89 Table 3-8. Sectorwise LLS model parameter – full measurement period 92 University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy xii Acronyms and abbreviations ANN: Artificial neural networks .............................................................................................................. 28 CAPEX: Capital expenditure ................................................................................................................... 1 CDF: Cumulative distribution function ................................................................................................... 28 DE: Distribution error ............................................................................................................................. 16 DF: Data-filling ......................................................................................................................................... 3 DFWS: Data-filled short-term wind speeds ............................................................................................. 7 DT: Decision trees ................................................................................................................................. 28 ECMWF: European Centre for Medium-Range Weather Forecasts ....................................................... 7 ERA5: ECMWF Reanalysis 5th Generation ............................................................................................ 7 FLS: Floating lidar systems ..................................................................................................................... 1 ISo1: Industry software 1 - Windographer ............................................................................................. 35 ISo2: Industry software 2 - Windfarmer ................................................................................................. 35 ISo3: Industry software 3 - WindPRO ................................................................................................... 35 JK: Jack-knife .......................................................................................................................................... 8 JPD: Joint probability distribution .......................................................................................................... 24 KPI: Key performance indicator ............................................................................................................... 3 KS: Kolmogorov–Smirnov ..................................................................................................................... 15 LLS: Linear least squares ...................................................................................................................... 19 LTMOMM: Long-term mean of monthly means..................................................................................... 39 LTWS: Long-term wind speed ................................................................................................................. 6 MCP: Measure-Correlate-Predict ............................................................................................................ 2 MCPs: Multilayer perceptrons ............................................................................................................... 29 MEASNET: Measuring Network of Wind Energy Institutes ..................................................................... 1 ML: Machine learning ............................................................................................................................ 28 MTS: Matrix time series ......................................................................................................................... 28 NL-MoM: Non-Linear method of moments ............................................................................................ 23 OLS: Ordinary least squares ................................................................................................................. 19 PCA: principal component analysis ....................................................................................................... 19 University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy xiii PreDF: Prerequisites for data filling ......................................................................................................... 6 SelfDF: Self-predictions for data-filling .................................................................................................... 7 SVR: Support vector regression ............................................................................................................ 28 TLS: Total least squares ........................................................................................................................ 19 ValDF: Validation for data-filling .............................................................................................................. 7 VM: Variance method ............................................................................................................................ 21 VS: Vertical slice .................................................................................................................................... 23 WAsP: Wind Atlas Analysis and Application Program ............................................................................ 5 WD: Wind direction .................................................................................................................................. 6 WDD: Wind direction deviation .............................................................................................................. 42 WPD: Wind power density ..................................................................................................................... 10 WRA: Wind resource assessment ......................................................................................................... 10 WS: Wind speed ...................................................................................................................................... 2 WTG: Wind turbine generator................................................................................................................ 10 University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy xiv List of notation : predicted mean ................................................................................................................................ 19 : independent variable ....................................................................................................................... 19 : observed (measured) value ............................................................................................................. 12 : predicted value ................................................................................................................................ 12 : sample mean .................................................................................................................................... 13 : sample variance ............................................................................................................................. 13 : standard deviation .......................................................................................................................... 13 e: the random variable from the triangular distribution corresponding to the standard deviation of the ratios .................................................................................................................................................. 28 r: the average of wind speed ratios at the target site ............................................................................ 28 𝑥 𝑦𝑖 𝑦 𝑖 𝑥 𝑠𝑥 2 𝑠𝑥 Introduction University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 1 | 110 1 Introduction The recent coalition agreement by the government laid out specific targets for renewables, aiming to achieve 80% of electricity demand by renewable energies. Part of this plan envisions remarkable investments in offshore wind, targeting 30 GW by 2030 and ramping up to 70 GW by 2045 [1]. As the fuel of the wind energy projects is wind, the assessment of the wind resource for offshore projects plays a fundamental role in project financing. The stakeholders in the industry have therefore established best practices and standards. One of the key institutions is a group of commercial institutes named as the Measuring Network of Wind Energy Institutes (MEASNET), which aims for the standardisation of wind energy measuring processes so that findings may be recognised and used interchangeably. The MEASNET guideline for "EVALUATION OF SITE-SPECIFIC WIND CONDITIONS" has established the methodology and standards for a site assessment approach that will result in well-founded outcomes using state-of-the-art techniques/procedures [2]. The guideline prescribes a clear requirement for site-specific wind measurements as input to wind resource and energy yield assessments. Floating lidar technology was first launched in 2009 as an offshore wind measuring technology aimed at the wind industry's particular demands for wind resource assessment applications. Floating lidar systems (FLS) or wind lidar buoys have become since then increasingly common in recent years as a measuring technology for determination of the offshore wind resource. They replace wind measuring masts with comparable accuracy at significantly lower costs and shorter mobilization, saving a large portion of the project's initial capital expenditure (CAPEX) [3]. As the FLS technology matured and several commercial deployments were made, it became apparent that the post-processed data availability of the FLS measurement campaigns exhibited data gaps [4]. The typical way to handle data gaps in an onshore campaign would be to use an intra-mast anemometer and conduct synthesis, as well as use a correlation analysis from nearby measurement masts. In the offshore environment, the FLS failure results in the unit's complete downtime, thus making an intra-FLS correlation impossible. Therefore other methods for dealing with the data gaps within the measurement period is being investigated [4]. This study aims to investigate the impact of the data gaps on the long-term wind speeds as part of the "Digital Wind Buoy" project, which has been started by Fraunhofer IWES to develop procedures to address the limitation of the above-mentioned data gaps. Within the scope of Introduction University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 2 | 110 this project, methods will be analyzed, evaluated and developed to synthesize and extend measurement data to long-term periods by means of numerical models [5]. Further, as the stakeholders in the offshore wind industry are looking for ways to minimise the uncertainties in the energy yield prediction, the impact on the uncertainty has been investigated. In the subsequent subsections, a literature review, research questions, the feedback from stakeholders and the design of the particular analysis are presented. 1.1 Research questions FLS are the de-facto standard measurement technology for offshore wind resource assessment. However, the relatively lower post-processed system availabilities of FLS compared to offshore platforms bring the requirement that the uncertainty and bias introduced by a data gap are understood in a quantifiable manner [3]. Gap filling of meteorological time series is required for various applications requiring continuous data series, such as time series analysis, meteorological and climatological modelling [6]. Motivated by the industry problem stated in the previous paragraph, Fraunhofer IWES looked at the effect of data gaps in terms of bias in estimating siting parameters and how to mitigate it by correlating and filling in the gaps with data from mesoscale models [4]. The authors of [4] have shown that the influence of gaps grows steadily with gap length during the measurement periods. On both short and extended time periods, wind speed (WS) is subject to irregular variation with a wide variety of time scales superimposed on each other [7]. Current procedures prerequire at least one year (defined as a short-term) of measured wind data at the location of interest to provide a meaningful wind resource evaluation[2]. That, however, is not sufficient to predict the wind characteristics from year to year. Measure-Correlate-Predict (MCP) methods or a form of long-term scaling approach must be used to estimate long-term wind conditions based on the short-term measurement at the measurement location (target) [2]. The MCP methodology is summarized in Figure 1-1 as defined by the MEASNET. Introduction University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 3 | 110 Figure 1-1. Scheme of the measure-correlate-predict (MCP) procedure Source: [2] In the same guideline, Measnet characterises an MCP methodology suitable for both long- term correction and data filling of the wind speed time series. Similarly, Fraunhofer IWES applied the MCP procedure to test the impact of data gaps in its recent study [4]. Therefore, this study investigated the MCP methods as the initial step, which are broadly used within the industry to understand why the state-of-the-art techniques are being used by the stakeholders and not other ways. This is further extended by discussing whether the industry can learn from this experience and short-list methods with good prospects. By investigating the best method for data filling and the applicability for a wind resource assessment, this study aims to identify a suitable method to fill out the data gaps for an offshore measurement. The impact of the data gap on the robustness would be a key criterion to enable a robust wind resource assessment. Hence an indication of the maximum duration of the data gap from the study is considered very valuable. Basic functions with the different methods should be laid out briefly to ensure proper application, and an appropriate method should be applied for the analysis. In order to evaluate the uncertainty of the findings, users of an MCP method must have long- term data at each location from which to draw conclusions. There is a resulting concern about whether it is possible to assess the MCP prediction uncertainty using just the long-term reference site data and the shorter-term concurrent data at the target site [8]. This problem could be investigated by recording key performance indicators (KPIs) for the different analysis steps. The definition of KPIs to identify the most appropriate data filling method is a challenge. What would be the minimum acceptable criteria (key parameter) to perform the operations of the selected method? Is the uncertainty calculation a proxy to define the "best method"? Or, should the selected uncertainty method be applicable to the data-filling (DF) process and not Introduction University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 4 | 110 „long-term correction“? How should the uncertainties (step 1 – gap filling) and (step 2 – long- term correction) be combined as they might be dependable on each other? This study proceeds with the primary question of whether an interim step of data filling is necessary before the application of the long-term correction. A tangible method to define the uncertainty of the analysis has been investigated. Typically the expected end results of a concluded gap filling and long-term correction operation constraints the type of analysis. As the wind industry goes in the direction of energy time series as a key deliverable, the investigated method should ideally be suitable to deliver such output. Other final deliverables are a Weibull statistic or sector-wise wind frequency distribution. Finally, the best combination of methods (sequence) to conduct a data-filling (also referred sometimes as data synthesis or gap-filling) and long-term correction exercise is investigated as the final step. The first research question has been approached twofold. The first part consisted of a literature review on existing MCP methods, followed by the second step of a stakeholder questionnaire. The results of the literature review and questionnaire informed the decision about the methodology. A tangible outcome of this study is to inform the reader about the overall maximum acceptable gap duration in a year for an offshore measurement campaign for a robust wind resource assessment. It is noted that possible secondary investigations can be done to confirm the robustness of the gap-filling process. Environmental variables could be investigated within such analysis. Such analysis could explore the relationship between reference and target data in the best way possible to account for different weather conditions. There could be a situation where a certain “outlier“ weather condition is available within the concurrent data, which is not representative of the expected long-term. Further consideration of specific environmental conditions might be relevant for the procedure. There could be the risk that certain weather events could skew the results. And as the frequency of extreme weather events is expected to increase due to climate change, one should think about whether this assumption is likely to influence the investigated method beyond the fact that such extreme events are likely to increase the data gaps in commercial floating measurement campaigns. Introduction University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 5 | 110 1.2 Literature review and questionnaire A review of the literature on MCP methods and the use of data-filling of gaps was conducted to inform and structure the rest of the master’s thesis. The literature review included industry publications, white papers, user manuals of industry-standard software, published books and peer-reviewed journal articles. Cross-references from well-known studies like Carta [9] were helpful to obtain more information on the research topics. Further, a stakeholder questionnaire was designed and distributed to key industry experts to collect feedback on the research questions and suitable methods. As the long-term wind climate properties are needed in wind resource and energy yield assessments, and as obtaining complete time series data over the whole historical period is typically not possible, the purpose of any long-term wind correction is to derive a statistical representation of the expected long-term climate or an equivalent time series. The initial MCP methods were introduced in the 1940s to estimate the long-term mean annual wind speed based on a single reference station [9]. The relationship between the reference and observed datasets can be mathematically defined as a transfer model. According to [10], there are at least four main kinds of transfer models: 1. Models that represent the physics of the wind flow (e.g., CFD flow models) 2. Statistical models 3. Empirical models 4. Other (combinations of the above, such as Wind Atlas Analysis and Application Program (WAsP)) MCP models may fall into any category or a combination of them, showing that MCP models can be used in a broad range of situations [10]. According to Addison [11], MCP techniques, in comparison to physical modelling methods, often give a better degree of accuracy, particularly in complex terrain. Physical models like CFD or WAsP might also introduce unquantifiable uncertainty into the prediction process. As a result of these improvements, MCP techniques have become a frequent tool for wind farm developers and have been integrated into wind energy software packages [9]. The statistical MCP methods and corresponding correlation techniques introduced by Derrick [12], Mortimer [13], Taylor [14], Bechrakis [15], and Rogers [8] were investigated [16]. These methods are introduced and discussed in Section 2.3 alongside selected empirical methods. It is noted that, typically, MCP methods are used to estimate the magnitude of the wind speed at the target location but not the wind direction (WD). Nothing in the literature on MCP approaches specifically specifies stand-alone wind direction prediction at a target location [17]. Introduction University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 6 | 110 Mifsud also mentions that MCP techniques predict the long-term wind speed at a location but not the wind direction (WD) [17]. As referred in Section 2.4, the wind direction measurements are used typically as a classifier to divide the wind speeds into different bins or sectors, which are further processed in the respective algorithms. 1.3 Methodology overview Following the literature review and stakeholder questionnaire, the different MCP methods are discussed, followed by the preparation of the target and reference datasets for MCP to select a suitable method for a gap-iteration algorithm. The complete methodology applied within this study is presented in Figure 1-2. In line with the standard industry convention, the data-filling for gaps and long-term correction methods used in this study do not replace the observed (measured) time series but instead extends the existing observed dataset to the long-term [18]. The long-term correction of the entire measured period was conducted repeatedly with the industry-standard engineering software Windographer and WindPRO, equating to a total of 43 MCP runs. Further, a performance test algorithm has been run within the Windographer software to compare the available MCP methods in terms of their performance. Based on the sensitivity analysis of the final long-term wind speeds (LTWS) and results of the performance test, the omnidirectional linear regression method, with least-squares model fit with offset, was identified as a suitable solution for iterative analysis. It is noted at this stage that the sectorwise results were analyzed in parallel during the concurrent period to gain confidence in the MCP algorithm's performance and collate the KPI metric. The KPIs were evaluated under three groups. The first group is the “PreDF”, the acronym for “prerequisites for data filling”. PreDF looks at the relationship between the reference and observed datasets in order to make the judgement of whether the reference dataset is suitable for the MCP application. The PreDF-KPI also gives a benchmark about the performance of the MCP, as it does not involve any model and compares only two independent variables set for the concurrent short-term measurement period. A common analysis option for MCP performance is to test the result of predictions versus a known result, which is referred to as “self-predictions” [19]. Typically this method is used to slice sufficiently long-term measured data into chunks and compare the prediction results of these chunks with the measured data [20]. It is noted that the terminology “self-prediction” is used in a slightly different context here, where the time period of the initial and target period is identical. In statistics, the difference between model output and observed value is sometimes referred to as a residual, defining the accuracy of the model using the prediction error [21]. It Introduction University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 7 | 110 is noted that in this study, the term “residual” is reserved for a random error introduced by the model, as discussed in [22]. The second group of KPIs were obtained from evaluating the model performance from the concurrent periods. The linear fit obtained from the correlations between the measured and reference datasets for the concurrent periods is used to obtain the self-predictions for the same period. The results are compared with the measured data. From the comparison, the KPIs are defined, referred to as the self-predictions for data-filling (SelfDF). The third KPI group is gathered from the analysis of the gap periods, as these provide the “true" performance of the MCP data-filling procedure. The focus was laid on the mean wind speed mainly. The relationship between the measured and gap period was investigated as well to gain an understanding of the related uncertainties. The signifier for this KPI group is ValDF, standing for “validation for data-filling”. Different gap periods starting with one day up to sixty days were investigated to find a quantifiable metric to forecast the performance of the data-filling and long-term correction algorithm with European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis 5th Generation (ERA5) as a reference dataset. For each gap period, the gap was cut from the combined dataset, introducing a measured period with an artificial gap. A sectorwise linear least regression was applied within this training period (measured period with a gap) between the measured and reference hourly wind speeds values before obtaining the linear regression model to confirm the applicability. In the subsequent run, an omnidirectional linear regression model was run due to computational limitations. This model fit was used to obtain both self- prediction performances and to predict the wind speeds at the introduced artificial gap. The performance of a measure-correlate-predict (MCP) algorithm for data-filling with linear least squares was analysed in detail using two years of the Ijmuiden met mast (MMIJ) measurements (see Section 2.9.3) both with a data-filling process and without. A temporal resolution of one hour was selected for the correlations and model. The comparison of the root mean square errors (RMSE) of the mean wind speed (MWS) of the self-prediction and validation period show a strong negative correlation for the investigated periods, obtained from the metrics of the incremental gaps within the calculated gap period. The function of this relationship was used as a proxy to assess the quality of the prediction (following the MCP approach). This process is used to calculate the uncertainty in the data-filling of the gaps. The data-filled short-term period average is referred to as data-filled short-term wind speeds (DFWS). The analysis was proceeded by obtaining the long-term wind speeds in a new set of iterations in two loops. Within the inner loop, the single-day gap was moved through the measurement Introduction University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 8 | 110 period by shifting the start time by one day. The outer loop increased the gap duration incrementally by one day, starting with one day up to a total of 60-days. The LTWS was calculated in two scenarios for each iteration as mentioned above, using a data-filling and without data-filling procedure. The regression model was used to fill out the gap in the first scenario. Subsequently, the gap-filled dataset was used as if it was a measured time series, and a new regression model was obtained for long-term correction. This relationship was used to calculate the final LTWS for the first scenario. The second scenario was designed to obtain the LTWS without the data-filling procedure. The regression model was obtained from the relationship between the measured period, including the gap and the concurrent reference time series. Similarly, the LTWS was calculated using the same method. The uncertainties in the long-term correction were calculated using a jack-knife (JK) algorithm [8] using four subsets for each iteration. The results of the LTWS and uncertainties are compared to derive the conclusions. Subsequently, the final uncertainty in the MCP method was obtained by combining the uncertainty in data-filling and long-term correction. The shortcomings and future work are also discussed. Introduction University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 9 | 110 Figure 1-2. Flow chart of the methodology Source: Author’s own illustration Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 10 | 110 2 Methods and materials The statistical methods commonly used in MCP procedures are introduced within this section, and a brief introductory paragraph about wind resource assessment is given. The review of MCP methods summarises the available MCP algorithms based on a literature review. After that, the reviewed methods are grouped into classes to gain an overview of their applicability for the purpose of the study. Key performance indicators are introduced and discussed, followed by uncertainty assessment methods. Finally, the selection of the base-case algorithm for the iterative gap analysis is introduced. The code design based on the base-case algorithm and the used datasets is described in the last sections. 2.1 Wind resource assessment Wind resource assessment (WRA) is the discipline of determining the long-term wind climate and expected seasonal, diurnal, spatial and temporal variation at a proposed renewable energy project location. The outcome of a wind resource assessment typically includes long- term representative wind conditions at a hub height of a wind turbine generator (WTG), and sometimes across the rotor plane. Following flow modelling based on the wind climate statistics, the energy yield is modelled at a project location using WTG specific power and thrust curves as well as project-specific loss and uncertainty estimations. This information is used as input to a financial model to calculate the financial performance of the wind project. As a result, WRA is the most important activity in determining the feasibility of a wind energy project [23]. The purpose of this small part is to refresh the reader's memory on the link between kinetic energy of wind to rotor radius and wind speed, to emphasize that an increase in accuracy in per mille range as well as an increase in uncertainty estimates has a high impact on the financial model of offshore projects. Wind energy is proportional to the cube of the wind speed. The wind power density (WPD), or the power per unit of area normal to the direction the wind is blowing, is a commonly used unit of measurement as shown in the below equation [24]; pw = 1 2 ρvw 3 [W/m²] [e1] where: ρ =air density at standard atmosphere [kg/m³] vw =wind velocity [m/s] The kinetic energy advected by an air stream is proportional to the wind speed to the third power. Emeis states, therefore, that the climatological mean wind speed is insufficient to Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 11 | 110 determine the amount of wind energy available at a particular location since wind turbines may react to real wind speeds in seconds. Additionally, stresses and vibrations on structures such as wind turbines are highly dependent on the wind spectrum's high-frequency components. As a result, it is critical to quantify the wind speed's spatial structure and temporal oscillations. This may be accomplished by the computation of the wind speed distribution at a given location using representative long-term time series [25]. Data distributions are commonly approximated by mathematical functions with a small number of parameters. Emeis summarizes commonly used wind statistics parameters as shown in Table 2-1 [25]. Table 2-1. Statistical characteristics of the wind [25] Parameter Description Mean wind speed Indicates the overall wind potential at a given site, expected wind speed for a given time interval (first central moment) Wind speed fluctuation Deviation of the momentary wind speed from the mean wind speed for a given time interval Wind speed increment Wind speed change for a given time span Variance Indicates the mean amplitude of temporal or spatial wind fluctuations, expected fluctuation in a given time interval (second central moment) Standard deviation Indicates the mean amplitude of temporal or spatial wind fluctuations (square root of the variance) Turbulence intensity Standard deviation normalized by the mean wind speed Gust wind speed Maximum wind speed in a given time interval Skewness Indicates the asymmetry of a wind speed distribution around the mean value (third central moment) Kurtosis (flatness) Indicates the width of the wind speed distribution around the mean value (fourth central moment) Excess kurtosis Kurtosis minus 3 Frequency spectrum Indicates the frequencies at which the fluctuations occur Autocorrelation Indicates the gross spatial scale of the wind speed fluctuations, Fourier transform of the spectrum Structure function Indicates the amplitude of wind speed fluctuations, computed from wind speed increments Turbulent length scale Indicates the size of the large energy-containing eddies in a turbulent flow Turbulent time scale Indicates the time within which wind fluctuations at one point are correlated Probability density function Indicates the probability with which the occurrence a certain wind speed or wind speed fluctuation can be expected Source: [25] Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 12 | 110 2.2 Statistical methods This section discusses the fundamental statistical procedures that were used in the investigations for this study. When dealing with substantial statistical populations, in this specific case, wind measurements in the boundary layer, counting every object in the population is impossible. Hence the computation must be done on a sample of the population. Therefore a subset of the dataset is assumed to represent the statistical population subject to analysis [26]. The dataset used in this analysis is considered a sample of the available statistical population. Following definitions are made with regards to the notation: y i = sampled predicted value [e2] yi = sampled measured value xi = sampled reference value 2.2.1 Definition of uncertainty The formal definition of "uncertainty of measurement" provided for use in this analysis is – as defined in [27] the quantity associated with a measurement result representing the scatter of values that may reasonably be assigned to the physical amount measured. Standard uncertainty is a standard deviation resulting from a measurement [27]. Annex E of IEC 61400-12-1 includes a comprehensive summary of the theoretical basis for determining the uncertainty using bin-wise calculations [28]. 2.2.2 Definition of type A and type B uncertainties Type A uncertainty is defined by the statistical analysis of a sequence of observations which is used to assess uncertainty, whereas Type B uncertainty does not rely on statistical evaluation [27]. The Type A and Type B classifications are intended to identify the two distinct ways of assessing uncertainty components. It should be noted that both forms of evaluation are based on probability distributions, and the uncertainty components produced by either type are quantified using variances or standard deviations. [27] states that Type B uncertainties are obtained by scientific judgement based on the pool of available information. The uncertainty assessment conducted within this study is categorised as Type A uncertainty. 2.2.3 The mean The summation of the observations divided by the count of observations gives the arithmetic mean [29]. Time is considered as an independent variable for averaging. The sample mean, 𝑥 , is given by the following equation: Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 13 | 110 x̅ = x1+x2+…+ xn n = 1 n ∑ xi n i=1 [e3] 2.2.4 Variance and standard deviation The expectation of a random variable's squared difference from its population mean or sample mean is called variance. Variance is the measure of the spread, or how much a data group deviates from its average value [30]. The following equation gives the sample variance: sx2 = 1 n−1 ∑ (xi − x̅) 2n i=1 [e4] The standard deviation is the positive square root of the variance. The number 𝜎𝑥 represents the experimental standard deviation of the measurement dataset and provided by the formula for a series of n measurements of the same measurand [27]: sx = √ 1 n−1 ∑ (xi − x̅)2 n i=1 [e5] 2.2.5 Covariance and correlation coefficient Covariance measures how two variables change together, whereas variance examines how a single variable varies. Covariance is, therefore, can be interpreted for this paired co- movement. The expectation value is used to describe the covariance between two random variates, x and y, each having a sample size of n [31]. The equation of covariance is given in the [e6]; cov (x, y) = 1 n ∑ (xi − x̅)(yi − y̅) n n i=1 [e6] The Pearson correlation coefficient or Pearson product-moment correlation coefficient (PMCC) is a statistic that calculates the linear correlation between two sets of data. The sample Pearson correlation coefficients' absolute values vary between -1 and 1. The Pearson correlation coefficient will be referred to as the “correlation coefficient” in this study [32]. The correlation coefficient is a measure of how well two variables are related and is obtained by dividing the covariance by the product of each variable's standard deviations, whereas the increase and decrease of the correlation coefficient show the direction of the linear relationship [32]. In the case of the sample correlation, correlations of +1 or 1 correspond to data points sitting perfectly on a line. The equation is given [e7]: rxy = sxy sxsy [e7] Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 14 | 110 2.2.6 Coefficient of determination The square of the sample correlation coefficient is commonly abbreviated as R², and is a subset of the coefficient of determination. R² is simply the square of the sample correlation coefficient between the observed outcomes and the observed predictor values when just an intercept is given [33]. A percent can be used to represent the coefficient of determination giving an indication of how many data points are contained inside the regression equation's results line: if the R² is 0.80, then the regression line can define 80 percent of the points in consideration [34]. 2.2.7 Mean bias, absolute bias and root mean square errors Mean bias, absolute bias and root mean square errors are important metrics for the definition of the method uncertainty in data-filling, as discussed later in this study in Section 2.6.2 [35]. Mean bias error, or the statistical bias, occurs when the predicted value of the results differs from the genuine underlying quantitative parameter being evaluated [36]. The mean bias error (MBE) is, therefore, the metric that determines how closely a collection of projected values matches a set of observed values and given in the following equation: MBE = 1 n ∑(y i − yi) n i=1 [e8] The mean absolute bias, or mean absolute error (MAE) is the arithmetic average of absolute errors and is defined as the measure of errors between paired observations in statistics that reflect the same phenomena [37]. The MAE is defined in the following equation: MAE = 1 n ∑|y i − yi| n i=1 [e9] The root-mean-square error (RMSE) is a metric for comparing the values predicted by a model or estimate to the values observed. When based on a sample population, the variances are defined as prediction errors [38]. The standard deviation of the prediction errors is derived by taking the square root of the average of squared errors. The root mean square error (RMSE) is a measure of how spread out these prediction mistakes are, and it's often used to validate experimental results in climatology, forecasting, and regression analysis [39]. The RMSE is defined in the following equation: Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 15 | 110 RMSE = √ 1 n ∑(y i − yi)2 n i=1 [e10] 2.2.8 Standard error The standard deviation of a statistics sample distribution is the standard error of that statistic. The standard error of the mean is the standard deviation of the means sample distribution. For the application of confidence intervals and significance testing, standard errors are crucial [40]. A statistical accuracy is commonly expressed in terms of its standard error, which is the measure of the distributions spread [40]. Standard error, in other words, is a measure of the uncertainty in the model parameter values estimated [41] and given with the following formula: sx̅ = s √n [e11] 2.2.9 Kolmogorov-Smirnov statistic The two-sample Kolmogorov-Smirnov (named after Andrey Kolmogorov and Nikolai Smirnov) test is used to determine how closely the distribution of a set of predicted values matches that of observed or true values. The Kolmogorov–Smirnov test (KS test) is a nonparametric test used to compare two samples (two-sample KS test) in statistics [42]. The test examines the cumulative distributions of two datasets and calculates the greatest vertical distance between their empirical distribution functions. The test is sensitive to changes in the location and shape of the samples [42]. A test statistic of zero will result from two datasets with identical cumulative distributions. Figure 2-1 illustrates the KS test statistic, where the black arrow represents the two-sample KS statistic, whereas the red and blue lines are empirical distribution functions. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 16 | 110 Figure 2-1. Illustration of KS test Source: [42] The KS test is defined in the following equation: D = supx|F0(x) − Fdata(x)| [e12] where; F0(x) = the cumulative distribution function (CDF) of the predicted distribution Fdata(x) = the empirical distribution function of the observed dataset In addition to the KS test, distribution error (DE) introduced by UL [43] can also be calculated using the following equation, following the creation of the predicted and observed frequency distributions as defined in the manual of Windographer [43]; DE = ∑ (Fî − Fi) 2 Fi N i=1 [e13] where; Fî= frequency of the ith bin of the true observed distribution Fi=frequency of the ith bin of the predicted distribution 2.2.10 Normal distribution The normal distribution is a continuous probability distribution for a real-valued random variable, which is the most important and extensively used distribution in statistics [44]. Normal distributions are broadly used in statistics, for example, to describe real-valued random variables whose distributions are unknown in the natural sciences. The probability density function (PDF) of the normal distribution is given with the following formula [44]: f(x) = 1 σ√2π e− 1 2 ( x−x̅ σ ) 2 [e14] Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 17 | 110 2.2.11 Weibull distribution The frequency distribution of wind speed is typically defined in a compact form by means of a Weibull distribution [45]. The two-parameter Weibull distribution is expressed mathematically in [45] as: f(u) = k A ( u A ) k−1 e−( u A ) k [e15] where: u=horizontal wind speed [m/s] f(u)=Frequency of occurrence of wind speed A=Scale parameter [m/s] k=shape parameter [-] 2.3 Review of MCP methods in wind resource assessments A common approach used within MCP methods is shown in the block diagram in Figure 2-2. The operation is divided into two steps by the authors of [9]. The first part is to study the concurrent period to establish a link between the reference and observed datasets. The observed relationship is applied to the reference dataset to obtain the long-term site-specific time series in the second step. However, it is noted that this is not always identical in each MCP method, and sometimes the relationship might be applied to the short-term dataset [9]. Further, within the wind resource industry, it is prevalent to use the relationship only to the remaining period of the reference dataset and combine it with the measured dataset to obtain the long-term site-specific datasets. This is commonly referred to as an extended dataset and refers to the long-term time series. Reference data is defined as consistent, sufficiently long time series data with the same measurement types (in this case, wind speed and wind direction), with a high temporal resolution like hourly resolution and high quality. Wind measurement data collected over long- term, reanalysis data, mesoscale analysis, the long-term yield from wind turbines or yield; or wind indexes derived from wind turbine yield data or wind data might be used as reference data depending on the use case and application [35]. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 18 | 110 Figure 2-2. Block diagram of a typical MCP Source: As presented in [9] As stated by Addison [11], MCP's main difficulty is with the prediction model. Historically the MCP methods have been interested in deriving the long-term wind speed as accurate as possible. Nevertheless, the MCP procedure accounts for the wind direction deviation as well as described in the next paragraph. When predicting long-term site wind speed and direction distribution, systematic direction changes between reference and site observations may be employed. But typically, it is expected that the long-term wind direction distributions remain the same. The direction shift between time series is determined by obtaining the difference between the site direction and the reference mast for each time step after binning the reference time series. The mean of all reference wind directions within a direction sector is computed as an offset [46]. The offset is then added to the reference site wind direction measurements to obtain the long-term representative site wind direction time series. The main MCP methods are presented briefly in the subsequent sections to inform the rest of the study. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 19 | 110 2.3.1 Linear regression methods Linear regression models the relationship between two variables by fitting a linear equation to observed data. One variable is regarded as an independent variable, while the other is regarded as a dependent variable [47]. For visualisation of the relationship, a scatterplot is often deployed where the correlation coefficient (see Section 2.2.5) is used as a numerical measure of association between these two variables. The linear regression line has the following formula, where x is the independent variable, m the slope, b the offset and y the dependent variable. y = mx + b [e16] There are various methods for how the linear regression line can be fitted. The most common sub-methods (please refer to Section 2.4.2 regarding the taxonomy used in this analysis) are linear least squares (LLS) and total least squares (TLS). Further, the variance ratio method is discussed. There are three primary LLS formulas to choose from, as shown in [48]: • Ordinary least squares (OLS) • Weighted least squares • Generalized least squares As OLS is primarily used within the wind industry and recommended practices, the LLS is referred to as the OLS method within this analysis, mainly due to the broad implementation of the LLS acronym within the wind industry [49]. As shown in Figure 2-4, it minimizes the vertical distance (residual) between data points and the model fit. Derrick [12] presented that the simplest and most often used method for obtaining a model from a collection of points is the LLS fitting approach for wind resource assessments [50]. The linear fit parameters of the LLS are calculated using the equations [46] as shown below: m = ∑ ( xi − x̅)i (yi − y̅ ) ∑ (xi − x̅)2i [e17] b = y̅ − mx̅ [e18] where, y̅=predicted mean On the other hand, the TLS submethod is a technique for minimizing the sum of squared errors (residuals) measured orthogonally to the line of best fit as shown in Figure 2-4. It is also known as 'orthogonal least squares’ and sometimes referred to as York Method [51]. Industry- standard WindFarmer software refers to TLS as the principal component analysis (PCA) Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 20 | 110 method [20]. PCA is the technique of calculating the principal components and utilizing them to modify the change of basis of the data [52]. WindFarmer theory manual notes that the principal components are the uncorrelated parameters of the dataset [46]. The TLS method is illustrated in Figure 2-3 with the orthogonal distance from the fit as given by the below equation: di = di − mxi − b √m2 + 1 [e19] Figure 2-3. Illustration of the TLS method Source: Author’s illustration based on [46] The slope and offset values of the TLS fits are calculated as shown in the following equations in [46]; m = −B + √B2 + 1 [e20] B = 1 2 ∑ ( xi − x̅) 2 i − (yi − y̅ ) 2 ∑ (xi − x̅)(yi − y̅)i [e21] b = y̅ − mx̅ [e22] Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 21 | 110 Figure 2-4. Minimization of errors in LLS (left) and TLS (right) with respect to model fit Source: [53] (left), [54] (right) In statistics and economics, orthogonal regression has a long-standing tradition [51]. In certain cases, it has been thought to be preferable to standard least squares. The primary reason for this is that when there is no clear confidence in the independent (reference) dataset, and the dependent and independent variables are likely to have the same error margin, the conventional LLS might fail, as the vertical distance between the data and the fitted line is minimized using conventional least squares [51]. On the other hand, if there is more confidence in the independent variable, the LLS might perform better. It is noted that higher-order polynomials may be used in modelling the relationship between the reference and measured (target) datasets. This was not further investigated within this analysis as linear fits were found to provide reasonable results for wind resource applications [10]. According to [10], regression MCP techniques can be improved beyond typical linear regression methods if they contain a residual distribution model. WindPRO, for example, implemented this approach to capture the energy content of MCP adjusted site wind distributions better than regression models without this option [10]. The residual is defined as the random error in the model. In WindPRO, the residuals can be introduced to the linear regression model by assuming a zero mean Gaussian distribution or a model constrained on both wind direction and wind speed [22]. Rogers [8] created the variance method (VM) approach in response to a limitation of linear regression in which the wind resource may be underestimated in poorly correlated datasets [16]. It entails reducing the variation of the predicted wind speed at the target location to the Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 22 | 110 same level as the variance of the observed wind speed at the target site. This is presented in the following equation: yi − y̅ sy = xi − x̅ sx [e23] Source: [16] Multiple linear regression is a type of regression model in which more than one regressor variable is included [17]. With the development of statistical computer packages, multiple linear regression has become one of the most frequently utilized statistical procedures [55]. In multiple linear regression, the independent variables or functions of independent variables could consist of quadratic or hyperbolic elements. However, the relationship is still considered a linear regression, as the corresponding coefficients are linear [56]. 2.3.2 Bin methods The method of bins was introduced by Beltran as an alternative to linear regression. It is based on the approach of bins of the power curve performance measurement standard [28], which is a performance measurement standard for power curves. It has been shown that this approach can be used to estimate wind speed data in nacelle anemometers, in addition to being employed in power curve measurements [57]. The dataset is separated into bins and sectors to determine the wind speed. The goal wind speeds are binned by 0.5 m/s versus the reference wind speeds. In each bin with more than 10 data points, the mean of reference and target wind speed is determined. Then, as illustrated in the equation, a linear interpolation between these positions provides the target wind speed [57]. Ŵi tar = Wi tar + (Wi ref − Wb ref) Wb+1 tar − Wb tar Wb+1 ref − Wb ref [e24] Ŵi tar= predicted target wind speed Wb tar= bin average of the target measured wind speed in bin b Wb ref= bin average of the reference measured wind speed in bin b Source: [57] The model fits based on measurements, and the representative algorithm points are illustrated in Figure 2-5 Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 23 | 110 Figure 2-5. Model fits [57] based on measurements (left) and representative algorithm points (right) Source: [57] The “Vertical Slice” (VS) MCP method fits a piecewise linear curve to a scatter plot of target wind speeds versus reference wind speeds [58]. Wind speed at the target site versus concurrent wind speed at the reference location is used to create a scatter plot. The scatter plot for pairs is sectioned into equal-sized vertical stripes. The mean values of the target site wind speed for each stripe are calculated, and a pair between the latter values and the mean values of each stripe is shown on the diagram. The linear fit is then performed by connecting the pairs linearly, where the initial line starts at zero origin [59]. Leblanc further introduced a slightly revised version of the VS method similar to the LLS. This method is called the Non-Linear Method of Moments (NL-MoM), and is similar to the VS method in that it likewise splits the wind speed plot into bands or slices. However, as seen in Figure, the slices are perpendicular to the TLS linear fit of the data [58]. The VS method and NL-MoM are illustrated in Figure 2-6. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 24 | 110 Figure 2-6. VS method (left) and NL-MoM (right) with bins and resulting piecewise linear fits Source: [58]. 2.3.3 Matrix methods Matrix methods are nonlinear models employing a joint probability distribution (JPD) instead of attempting to impose a linear connection between two variables [51]. The prerequisite of linear models having residuals with a normal distribution is not required [51] for this method. According to [60], matrix methods is the general overarching definition for MCP methods where the wind speed and wind direction measurements are used to classify the data into bins of more than a single dimension. Hanslian [37] also notes that the use of the terminology “matrix method” within the industry is not consistent and often refers to different methods, and sometimes identical methods are referred to differently. It is noted that the classic matrix method introduced by [61] and Anderson [51], as applied within WindPRO, is discussed here as it is a commonly used approach. The matrix methods are based on the notion that long-term site data can be described using simultaneous onsite and reference data measurements. A combined joint distribution between the two variables, wind speed-up and wind veer, is used to represent the relationship [62]. The wind speed-up and wind veer are calculated based on the differences between the site and the reference concurrent wind speed and wind directions. The outcome of the differences is then sorted according to the reference wind speed and wind direction in the form of two matrices, with each element corresponding to a user-inferred reference wind speed and reference wind direction bin [10]. An example of the wind speed model for three sectors is shown in Figure 2-7. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 25 | 110 Figure 2-7. Sample data and first-order model for the wind speed-up Source: [10] Thøgersen notes in [19] that the method for modelling the joint distribution matrix should be determined by the specific dataset. According to [19], a mix of binned sample distributions and modelled joint Gaussian distributions might provide reasonable results. As mentioned above, the matrix approach is based on the joint distribution of the measured wind speed-ups and wind veers [19]. Hence for each measured sample following pairs of the quantities are calculated as shown below per [19]; ∆y = yobserved − yreference [e25] ∆θ = θobserved − θreference where, ∆y= wind speed u yobserved= observed wind speed yreference= reference wind speed ∆θ= wind veer θobserved= observed wind veer θreference= reference wind veer The flowchart of the above-discussed matrix method is shown in Figure 2-8. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 26 | 110 Figure 2-8. Flowchart of the matrix method Source: Author’s illustration based on [10] As shown in the concurrent period container of Figure 2-8, whenever observed data pairs were not available, the sample distribution statistics were used to fit a model. The model is then used to conduct interpolations and extrapolations into bins where no data is available. The sample distributions are calculated using a Wood and Watson (WW) method as discussed in [61]. The WW method is a sector-bin approach that uses regression analysis to identify the transfer function that describes the relationship between observed and reference datasets [63]. The parametric distribution is defined by the mean, standard deviation and correlation values. The sample and fitted polynomial model is shown in Figure 2-9 below. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 27 | 110 Figure 2-9. Polynomial model used within matrix method, samples (left) and polynomial model (right) Source: [10] Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 28 | 110 The matrix time series (MTS) by Lambert [64] is an adapted version of the matrix method [51]. The MTS method is applied within Windographer industry software. The first step of the MTS is to build this joint probability distribution. The algorithm generates a cumulative distribution function (CDF) using the joint probability distribution and the reference dataset, which is then used to convert the observed dataset to a percentile time series. The percentile time series with a 50% is the expected average based on the reference time series at the corresponding time step [51]. Finally, a Markov-based-reconstruction algorithm is used to extend the observed percentiles time series to the long-term. This algorithm generates artificial data matching the measured data in terms of frequency distribution, seasonal and diurnal patterns, and autocorrelation [43]. Windographer converts the synthetic percentile time series results into desired wind speed values in the final step. By utilizing the JPD to determine the target wind speed for each percentile value and reference wind speed in each time step, Windographer is reversing the previous procedure. Windographer employs the percentile value instead of the reference wind speed to get the predicted wind speed for that time step. Rather than retaining seasonal and diurnal patterns and autocorrelation, this step preserves the statistical link between observed and reference wind speeds [43]. Mortimer's approach [13] is another nonlinear method similar to the matrix method. The wind speed observations are binned by the reference site's wind speed and direction. Then two matrices are created by deriving ratios of the average of the observed wind speed to the reference site's wind speed, and the other including the standard deviations, respectively [65]. The below equation is used to predict the wind speed: yi = (r + e)xi [e26] Where; r is the average of wind speed ratios at the target site e is the random variable from the triangular distribution corresponding to the standard deviation of the ratios Source: [13] , [65] 2.3.4 Novel computational methods Amongst the linear regression and matrix models, there are also a couple of novel computational methods to conduct an MCP. These are mainly artificial neural networks (ANNs) and machine learning (ML) methods, including support vector regression (SVR) and decision trees (DTs) [17]. Due to their capacity to identify patterns in noisy or otherwise difficult data, ANNs have been employed to correlate and predict wind data [66]. A neural network comprises linked neurons Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 29 | 110 that take a set of weighted inputs. The function causes the neuron to provide an output when the weighted inputs are above the threshold. A feedforward neural network has layers of neurons with no lateral or backward connections. The network's input layer is the data from the reference location in the case of MCP. The network's last layer is the output layer providing the extended time series [16]. The weights of the interconnections and biases between the neurons in the different levels are updated through a learning process. The Levenberg–Marquardt algorithm may be used for this process [17]. Feedforward networks with multilayer perceptrons (MLPs) are typically used to do the regression [67]. Within an example study in [9], various reference stations' wind speed and direction were fed into an ANN's input layer. The model performed better when the wind direction was added to the input signal. As the number of reference stations increased, so did estimation inaccuracies. The schematic diagram is shown in Figure 2-10. Figure 2-10. Schematic diagram of an ANN with 2N wind speed and wind direction input signals of N reference stations and two wind data output signals of the target station Source: [9] The following ANN setup was used for the study in [17], as shown in Table 2-2. Table 2-2. ANN settings at the example of regression methodologies used for the MCP methodology Parameter Value WS - input values Wind speed and wind direction at the reference site WS - output values Wind speed at the target site WD - input values Wind velocity vector in selected directions at reference Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 30 | 110 Parameter Value WD - output values Wind velocity vector in selected directions at target Number of layers three Number of neurons in layer 30, 30, 10 Training methodology Levenberg–Marquardt algorithm Percentage of points used for training 70% Percentage of points used for verification 15% Percentage of points used for testing 15% Source: Author’s compilation, extract from [17] Provided that input data is accurate enough and the training was done effectively, ANN is a potential method that may serve as an alternative for long-term corrections in the wind sector [68]. 2.3.4.1 Machine learning algorithms Machine learning is the study of computer algorithms that learn from experience and data. Machine learning algorithms create a model using training data to make predictions or judgments without being explicitly programmed. Machine learning algorithms can be based on ANN as well [69]. ML is typically divided into supervised learning, unsupervised learning, and reinforced learning. By resolving the surrogate model construction problem as a quadratic programming problem, the SVR method offers a novel method for constructing smooth, nonlinear regression approximations [66]. The transfer function is shown in the below equation: f̃(x) = 〈w,∅(x)〉 + b [e27] |f̃(xi) − f(xi)| ≤ ε where, f(xi) = function to be approximated w = set of coefficients ∅(x)= map from input space to feature space ε= maximum tolerated error, predefined Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 31 | 110 It is further noted in [66] that the coefficient w may be found by solving a quadratic programming problem with slack variables and a cost function. An exemplary parametrisation of an SVR application is provided in Table 2-3. Table 2-3. SVR settings at the example of regression methodologies used for the MCP methodology Parameter Value WS - input values Wind speed and wind direction at the reference site WS - output values Wind speed at the target site WD - input values Wind velocity vector in selected directions at reference WD - output values Wind velocity vector in selected directions at target Method Hyperparameter optimisation Kernel Gaussian Solver Sequential minimal optimisation Source: Author’s compilation, extract from [17] A decision tree method is an ML application with a hierarchical data structure that uses the "divide and conquer" strategy [17]. A single decision tree model divides the feature space into regions and fits a basic model to each zone [70]. Assuming an example with a continuous response variable y and two independent variables x1 and x2; each part of the space specified by x1 and x2 is modelled separately in the first stage of the regression. The operation is repeated until a preset stopping rule is met. The best fit is attained at the end of each partition by selecting variables and a split-point in two [70]. Another method was proposed by Nielsen named as diffusion-based transformation. In this method, measurements and reference data are transformed to Gaussian variables prior to creating a statistical correlation. For this purpose, a novel transformation algorithm was inspired by Gastner and Newman's cartogram approach, which was initially created for showing themed maps in geographic information systems. Additionally, by converting wind data to Gaussian variables, conditional simulation of time series was performed using Fourier transformation [71]. Gradient boosting is another application of machine learning. The gradient boosting technique gradually improves prediction capacity by creating many models and focusing on difficult-to- estimate training cases [70]. Gradient boosting has been shown to be a very effective technique for filling gaps in meteorological time series by Körner [6]. There are various advantages to using multiple linear regression or neural networks over multiple linear regression or neural networks. Compared to neural networks and multiple linear regression, the computations may be performed in 1/500 to 1/300 on a standard desktop PC [6]. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 32 | 110 2.3.5 Quantile mapping methods The primary basis of the U&N method is the Q-Q method. This is a quantile method that consists of plotting quantile values derived from probability distributions of two datasets. If the relation between the two datasets is linear, then the Q–Q plot shows a straight line. It ignores simultaneity and focuses on the statistics of the datasets [20]. The U&N technique is oriented around the wind direction and wind speed, focusing on the probability distributions of both parameters. In contrast to the majority of other LTC methods, concurrency is merely used to ensure that the data represents the same time period. According to the authors, the approach could be enhanced by incorporating stability [20]. The SpeedSort approach includes sectorwise fitting a linear regression model with a non-zero intercept by comparing observed wind speeds data to the reference dataset. Because the line fitting procedure requires separate sorting of reference and site wind speeds, the fitted line assesses the relationship between wind speed frequency distributions rather than hourly values. Additionally, a veer analysis is performed, which results in the direction and speed of long-term reference sites being adjusted. The technique includes sector binning, sorting wind speeds, fitting the line and calculating the average veer for each sector prior to extending the short-term time series to the long-term [72]. 2.3.6 Empirical methods The bulk speed ratio (BSR) algorithm is an empirical method deployed by ISo1. It uses a relatively straightforward approach of matching observed (target) and reference wind speed data, assuming a linear connection with just slope parameter and no offset. The slope is computed by dividing the target mean wind speeds by the reference mean [73]. The 'Weibull Fit' algorithm is an MCP method proposed by van Lieshout [74] and implemented within ISo1. The scale factor of the Weibull fit is equal to the difference between the Weibull scale factors at the target and reference sites, multiplied by the exponent b. The Weibull fit method employs a power law model of the following form [73]: y̅ = αx β [e28] where: β = kx ky α = Ay Ax β Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 33 | 110 Wind index is an empirical MCP approach that is utilized in ISo3. It leverages MCP analysis by using monthly averages of the energy production without consideration of the directional distribution of the wind climate. While this approach may seem simplistic and rudimentary compared to more advanced MCP methods, it offers significant benefits in terms of stability and performance — even when other MCP methods appear to fail [10]. The KH (Knut Harstveit) method is a non-linear MCP technique, utilized at Kjeller Vindteknikk. This approach organizes non-zero reference and site concurrent wind speed data into 12 equal-width direction bins and the zero wind speed values into an additional 13th bin for both site and reference datasets. The average wind speed for each bin is then determined and weighted based on its frequency. Then, for each bin, the reference and site weighted averages are compared. These ratios are used as adjustment factors. While the adjustment factors are based on short-term data, they are expected to be true throughout time. Using this assumption, the weighted average of the reference long-term data for each bin is corrected. This yields the long-term site sector mean wind speed [20]. Tallhaug and Nygaard invented the non-regression T&N MCP method, which was published in 1993 and is utilized at Kjeller Vindteknikk. The mean and standard deviation of the site's and reference wind speeds, as well as the correlation coefficient of their relationship, are determined for each direction bin of the reference data. This technique explicitly incorporates the correlation coefficient of the relationship between the site and reference data when estimating the site's long-term wind speeds but does not employ the relationship's regression function. The authors note that the strength of the link between measured site data and contemporaneous reference data is critical to the method's accuracy [20]. 2.4 Definition of the measure-correlate-predict (MCP) algorithms Based on the literature review in Section 1.2, the MCP methods are classified, and subsequently, an MCP algorithm is selected for the study. 2.4.1 Type classification of MCP The classification of MCP methods proposed by Hanslian [60] is considered a useful tool to gain an overview of the applicable methods and thus to select a suitable method for this study. This is shown in Table 2-4. Table 2-4. Classification of MCP methods according to Hanslian Description Type 1 Type 2 Type 3 Results based on Reference Target Target Provides time series Yes No No Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 34 | 110 Description Type 1 Type 2 Type 3 Prediction of wind distribution No Yes No Suitable for the MCP application considered Yes No No Source: Author’s own visualisation based on [60] As the topic of interest requires a time series output, only type 1 MCP methods are of interest. 2.4.2 Definition of an algorithm The process for solving a mathematical problem in a limited number of steps, which typically requires the repeating of an action, is referred to as an algorithm [75]. In this study, an MCP algorithm is defined as the combination of a method, sub-method, model and concept. The MCP methods are already described in Section 2.3. The sub-methods are the primary tools available within the method, whereas the primary settings to conduct the model fits are categorized under the model header. There might be further options to run the MCP algorithm, where the user needs to make project-specific judgements to conduct a robust MCP; these are grouped into concepts. For the example of linear regression, the sub-methods are LLS and TLS, describing how the model is optimised to obtain the linear fit, whereas the model options are focused on the details of the model selections. Finally, the model can be fitted, several times repeatedly for different sectors, or be based on multiple values like with a high temporal resolution (hourly) or fewer values like in a monthly resolution. For the example of a monthly resolution, one might consider the weights of different months. These scenarios define the final MCP algorithm, as shown in Figure 2-11. In the subsequent section, the classification of the MCP methods is further presented in overview tables. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 35 | 110 Figure 2-11. Definition of an MCP algorithm at the example of linear regression Source: Author’s own illustration 2.4.3 Classification of MCP methods Within this section, the MCP methods are further summarized, with examples given from engineering software used within the wind industry. Following software solutions were available at the time of the assessment; • Industry software 1 (ISo1): Windographer • Industry software 2 (ISo2): Windfarmer • Industry software 3 (ISo3): WindPRO The previously discussed MCP methods are summarized in the following tables. It is observed that most of the methods are classified as Type 1, with an output of time series. Further, it is clear that linear regression, empirical, and matrix methods have a broader industry application based on the investigated industry software. As Hanslian stated, Type I methods are considered most appropriate for filling data gaps and point predictions. In contrast, Type II methods should be used for an accurate representation of the wind distribution [60]. Accordingly, only Type I methods available within industry software were evaluated in the next Section 2.7 to select the base-case algorithm suitable for iterative analysis. Based on the above classification criteria and literature review discussed in Section 2.3, the MCP methods are summarized as shown in Table 2-5 to Table 2-10 with respect to their types and applications in the specific software. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 36 | 110 Table 2-5. MCP method 1: Properties of linear regression methods Method: Linear regression Reference classification WD WD Type classification Type 1 Type 2 ISo1 ISo2 ISo3 Sub-method ✓ ✓ ✓ LLS - ✓ ✓ - TLS - ✓ - - VM - Source: Author’s own calculation/assessment Table 2-6. MCP method 2: Properties of bin methods Method: Linear regression Reference classification WD WD Type classification Type 1 Type 2 ISo1 ISo2 ISo3 Sub-method - - - Method of bins - ✓ - - Vertical slice - Source: Author’s own calculation/assessment Table 2-7. MCP method 2: Properties of properties matrix methods Method: Matrix Reference classification WD WS+WD WS+WD Type classification Type 1 Type 2 Type 1 + Type 2 ISo1 ISo2 ISo3 Sub-method - - ✓ Classification - WindPro matrix ✓ - - - Joint probabilistic [76] MTS - - - - - Matrix-analog (Hanslian) Source: Author’s own calculation/assessment based on Table 2-8. MCP method 3: Properties of novel computational methods Method: ANN Reference classification - - Type classification Type 1 Type 2 ISo1 ISo2 ISo3 Sub-method - - - ANN - - - - SVR - Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 37 | 110 Method: ANN - - - DT - Source: Author’s own calculation/assessment Table 2-9. MCP Method 4: Properties of quantile mapping methods Method: Quantile mapping Reference classification - - WS Type classification Type 1 Type 2 Type 1 + Type 2 ISo1 ISo2 ISo3 Sub-method - - - - - U&N ✓ - - - - SpeedSort Source: Author’s own calculation/assessment Table 2-10. MCP Method 5: Properties of empirical methods Method: Empirical methods Reference classification - WS+WD Type classification Type 1 Type 2 ISo1 ISo2 ISo3 Sub-method ✓ - ✓ Bulk speed ratio - ✓ - ✓ - Weibull scaling - - ✓ - Wind index - - - KH method - - - T&N method Source: Author’s own calculation/assessment 2.5 Questionnaire results The questionnaire was designed based on the research questions in the empiro environment [77], which is a free survey tool for students. It has been distributed to key industry analysts/experts through direct links using the LinkedIn platform as well as the online community “wind resource assessment group” (WRAG) with more than 400 registered members [78]. It is noted that the questionnaire was not accessible by other persons through a search engine or a publicly available link on the LinkedIn platform. The detailed charts of the answers to the questionnaire are presented in Annex A. The following paragraphs give a brief summary of the outcome as well as comments and recommendations of the participants. 25 analysts answered a total of 31 questions with an average total answer duration of 15 minutes. The majority (50%) of the respondents were consultants, followed by 25% developers Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 38 | 110 and other groups (WTG OEM, research and miscellaneous). More than 80% of the analysts had a master’s degree or PhD, with 60% more than 10 years of industry experience. For about half of the participants, the percentage of offshore work in their daily wind analysis job surpassed 25%. The majority of respondents (55%) believed that new approaches were essential to address data gaps in FLS measurements, while 40% expressed no view, and one analyst said it was unnecessary. There was a consensus with more than 75% that the interim step of data-filling (DF) should be applied prior to the long-term correction. According to the majority of the participants, an algorithm's output should be a time series with the same temporal resolution as the measurement time series (88%), or at least a time series with lower temporal resolution (16%). The response to the similar question, but this time for the end result of a concluded long-term correction operation, was broader; output with a temporal resolution identical to the measurement dataset led with 64%, followed by 48% lower resolution time series. In terms of data filling, the majority of respondents (36%) agreed that the most extended permissible gap duration each year should be less than 15 days, followed by 30 days (28 per cent), and the highest duration was 60-days, which was selected by just one analyst out of a total of 25. Among those that participated in the survey, 72% utilized in-house tools based on Python and Excel, while 36% used Windographer, and 24% chose WindPRO for usage in the workflow for an MCP operation, respectively. Other in-house solutions used by the participants included an internally designed tool programmed in R and with a web interface, Vortex LTC, Brightwind open-source python, in-house Java software analysis and database, as well as Matlab. Regarding the question of which metrics (key performance indicators, KPI) should be used to assess the performance of a DF / LTC process, the coefficient of determination received the highest percentage of responses (72%), followed by the root mean square error (RMSE), which received 60%. With a 56% share, most participants believed that the number of samples collected was an essential critical factor to consider when doing the MCP analysis. The distribution of responses per metric and type of the participant is presented in Table 2-11. Table 2-11. Summarized survey response to the question regarding KPI metrics Metric C on su lta nc y D ev el op er O th er R es ea rc h S ki pp ed W TG O E M To ta l Mean bias error (MBE) 8 3 0 1 0 1 13 Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 39 | 110 Metric C on su lta nc y D ev el op er O th er R es ea rc h S ki pp ed W TG O E M To ta l Mean absolute error (MAE) 7 3 1 0 0 1 12 Root mean square error (RMSE) 7 5 1 1 0 2 16 R² (coefficient of determination) 10 5 1 0 0 2 18 Mean wind direction 3 3 1 0 1 1 9 Wind veer 2 3 0 0 1 1 7 Weibull scale parameter (A) 1 2 0 1 0 1 5 Weibull shape parameter (k) 1 2 1 1 0 1 6 Wind power density 3 2 1 1 0 0 7 Kolmogorov-Smirnov test statistic regarding wind speed distribution 4 2 0 0 0 1 7 Number of samples in bin/sector (depending on the method) 8 4 1 0 0 1 14 Other (please use next question to enter your preference) 1 2 1 1 0 0 5 Source: Author’s own calculation/assessment Additional important criteria, stated by the experts, included the number of overlapping data points, the (theoretical) power production of a wind turbine using a real power curve (or several power curves), and whether the data filling is conducted inter- (with nearby measurements) or intra- (from the same measurement location and instrumentation). One response emphasised that it was critical to pay close attention to how effectively the reconstruction captured the energy content of the ensuing wind regime. This might be accomplished by using a synthesis check on data from an identical period. One of the experts stressed that all adjustments, whether they are data filling or long-term corrections, should be evaluated in terms of their influence on the uncertainty of the annual energy output estimate. Another expert stated that it is also necessary to compare the final long-term mean of monthly means (LTMOMM) wind speeds in order to determine the influence of the data-filling procedure chosen. In other words, if the source of data-filing reference data and the technique used to process it do not have a significant influence on the final LTMOMM, then greater confidence may be placed in the results. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 40 | 110 The participants were asked whether they could assign a rating to the previously described KPIs based on how important they were to the MCP process. It ranged from one to ten, with one being not significant and ten being highly important. For MBE, MAE, and RMSE, the percentage share of outcomes that were greater than the score 5; was 72%, 68%, and 76%, respectively. The coefficient of determination achieved the same outcome as the root mean square error (RMSE) at 76%. The distribution of responses was more uniform for wind direction, wind veer, Weibull scale, and shape characteristics. It should be emphasized that the KS statistic was also regarded as important, with a 56% share of scores over 5, indicating that it is significant. For data-filling and long-term correction in wind analysis, 80% of the experts indicated that they utilize the linear regression method for data-filling and long-term correction MCP. The matrix (48%) and ANN methods (40%) were the next most popular. Specifically, the Variance Ratio approach was mentioned directly in the category "alternative ways." In response to a question, one participant replied that he/she had no visibility to the details of the in-house algorithm. The next question requested participants to elaborate on their choice of sub-method. The LLS and TLS sub-methods of linear regression received 52% and 28% of the votes, respectively, but a sizable part (48%) answered that it depends on the study and that they have no pre- defined preference. When utilizing linear regression for DF and LTC, 64% of respondents reported that they use a linear first-order polynomial. Linear regression forced through zero, linear regression forced through zero with cut-off wind speed, and second-order polynomial each obtained 16% of the vote, while the “other” choice received 20%. When doing a data-filling / long-term correction study for mean wind speed, 12 sectors were the dominant response (80%) for the typical number of wind direction sectors. This was followed by 16 sectors (32%), 36 sectors (24%), and omnidirectional (single sector, 20% ). The leading temporal domain for the data-filling & long-term correction used by the analysts was an hourly resolution with 52%, followed by a 10-minutes resolution (44%). 84% of respondents indicated that they take seasonality into account during the data-filling/LTC process, either through seasonal balancing prior to MCP (32%), using monthly intervals (32%), or applying yearly divisions (20%). Another 20% of responders said that seasonality was not considered in their MCP workflow. For linear regression applications based on monthly values, 52% of respondents indicated using a weighted technique, while the remainder indicated that it was not relevant. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 41 | 110 68% of respondents indicated that atmospheric stability should be taken into account when performing a data-filling/MCP activity. This agreement was reduced to 48% in the following question, which questioned participants whether metocean conditions such as waves, currents, or air, pressure, or water temperature should be included in the process. It is worth noting that 24% and 32% of voters, respectively, abstained from answering the above- mentioned questions. When analysts were asked for their view on the most important metocean parameter that should be studied in relation to the data-filling process, a slight majority (60%) chose wave height, followed by air temperature (44%). Finally, experts were asked to comment on whether their choice of data-filling (DF) / long-term correction (LTC) methodology was based on performance testing and/or uncertainty analysis. The overwhelming majority (88%) affirmed this question. The experts discussed their recommended performance test and approach in greater detail. Two experts stated that they do LTC performance evaluations using industry-standard software (WindPRO/Windographer). Another individual stated that they employ a variety of ways and analyze the statistical distribution of all approaches in order to determine the consensus opinion. Along with determining how well the synthesised data and correlation capture wind speed and energy content, significant KPIs such as jack knife uncertainty, MBE, MBA, and distribution error were mentioned. Finally, another participant proposed applying methods used in well-known offshore meteorological masts and offshore lidar data sets, such as the FINO mast. The questionnaire was ended with expert advice and recommendations for this study. One expert objected to the questionnaire, hinting that it omitted a question on analysts' willingness to agree to data filling. One participant added that he would recommend conducting as many various approaches to the LTC as possible and then selecting the most appropriate methods after comparing final LTMOMM estimates, as this would provide a good sense of the ultimate result's sensitivity to the approaches employed. The eagerness was expressed that it would be interesting to compare long-term results obtained from a measured dataset that was not data filled to those obtained from the same dataset that was data-filled, which would be derived from measurements at the same location with the same instrumentation. The concluding comment was from another expert suggesting that data filling should be performed when it can be proved that the uncertainty associated with filling with unmeasured or non- targeted data is less than the uncertainty associated with leaving the gaps unfilled. 2.6 Definition of the key performance indicators and uncertainties Based on the findings of the literature research and the results of the questionnaire, the key performance indicators (KPIs) shown in Table 2-12 have been established. These key Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 42 | 110 performance indicators (KPIs) are divided into two primary categories: test statistic and test parameter. The lower MAE, MBE and RMSE values, as well as the higher the R² value, indicate that the predictions are closer to the measured values [6]. Table 2-12. Definition of KPIs KPI [Generic] Test Statistic? Test Parameter Generic Parameter Mean bias error (MBE) Yes - - Mean absolute error (MAE) Yes - - Root mean square error (RMSE) Yes - - R² (coefficient of determination) Yes - - Mean wind speed (MWS) - Yes - Mean wind direction (MWD) - Yes - Wind direction deviation (WDD) - Yes - Weibull scale parameter (A, Weib_A) - Yes - Weibull shape parameter (k, Weib_k) - Yes - Wind power density (WPD) - Yes - Kolmogorov-Smirnov test statistic regarding wind speed distribution (KS) Yes - - Number of samples in bin/sector (depending on the method) (TS) - - Yes Source: Author’s own calculation/assessment The selection of the reference dataset requires high quality and consistently measured wind speeds in order to obtain accurate estimations of the target site's wind resource [62]. As described in Section 2.9.2, this reference dataset is often a modelled dataset due to a set of limitations. In any case, the consistency and quality measurements are still valid. Further, the representativeness of the reference dataset is another important criterion [35]. In conclusion, one needs to compare the measured and reference dataset before conducting an MCP. The KPIs are summarized as the prerequisite to data-filling KPIs (PreDF). The test statistics used for PreDF KPI are shown in Table 2-13 below. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 43 | 110 Table 2-13. PreDF KPI Test statistic MWS MWD WDD Weib_A Weib_k WPD Wind rose MBE ✓ - Single value ✓2 ✓1 ✓ - MAE ✓ - - - - ✓ - RMSE ✓ - - - - ✓ - R² ✓ ✓ - - - - - KS ✓ - - - - - - Representativeness parameter - - - - - - ✓ Number of samples in bin/sector ✓ - - - - - - Source: Author’s own calculation/assessment Table 2-14 and Table 2-15 illustrate the test statistics that were utilized for the SelfDF and ValDF KPIs, respectively, in the same manner. Table 2-14. SelfDF KPI Test statistic MWS Weib_A Weib_k WPD MBE ✓ ✓2 ✓2 ✓ MAE ✓ - - ✓ RMSE ✓ - - ✓ R² ✓ KS ✓ - - - Source: Author’s own calculation/assessment 2 Measured and reference values are calculated sectorwise, MBE is obtained from a weighted average Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 44 | 110 Table 2-15. ValDF KPI for the gap Test statistic MWS MBE ✓ MAE ✓ RMSE ✓ Source: Author’s own calculation/assessment 2.6.1 The interface of the KPIs to the uncertainty method Validating MCP approaches by quantifying and modelling the uncertainty would improve the confidence in the long-term analysis. Uncertainty in on-site wind conditions might be evaluated by modelling the uncertainty in the MCP process [66]. Rogers advocated that as an uncertainty measure, the standard deviation of long-term forecast estimations be used. He suggested predicting the long-term target site data's properties using shorter contemporaneous data sets from the lengthier set. The uncertainty associated with the prediction is then assessed using the standard deviation of predictions across many data sets. The disadvantage of this strategy is that it can only be used with sufficiently high-quality and long-term measurable data (target). Saarnak also used a similar technique to calculate the mean and standard deviation of the biases for the long-term correction based on each subgroup of a longer dataset and use them as a measure of uncertainty [63]. Klinkert [68] did a very comprehensive literature review of uncertainty estimators in long-term correction procedures. His conclusion was that the correlation and standard deviation were the most common estimators used within the industry, at the same time presenting different metrics suitable for different purposes. Seasonality and long-term trends, for example, may contribute to uncertainty [68]. Even though Klinkert’s final evaluation of this research comprised just 19 papers on long-term uncertainty correction, the breadth of uncertainty approaches and parameter applications is extensive [68]. The parameters are often used to determine their sensitivity to other variables. Long-term studies often analyze uncertainty in terms of the time period between the measurement station and the reference dataset. Another typical approach is a comparison of the uncertainty introduced by the length of on-site measurements versus the use of sufficiently accurate reference datasets. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 45 | 110 Table 2-16. Uncertainty estimators in the area of long-term corrections method uncertainty Parameter Description Usage MBE MBE is a measure of the systematic errors in a measurement sample to some extent. To determine if the inaccuracy is systematic and, in that case, whether it is over-or underestimating the wind speed. Industry-wide application MAE The MBE's magnitude. The average difference is displayed to ensure that all variations are captured during the analysis MAE is used in displaying the error and analyzing a process. While the disparities may fluctuate significantly, the sign change might bring them to zero. MAE demonstrates the magnitude of the oscillations independent of their sign. Used in normalized and percentage forms and to a large extent in the ANN approach. Coefficient of determination The fraction of observed response variable variability can be explained by a linear regression model. Used to determine the degree to which linear regression, e.g. the MCP approach, adequately explains the variability. Frequently used in the reporting of wind assessment uncertainty. RMSE A simple-to-understand error indication, as it uses the same unit as the estimated variable. RMSE is involved in all instances involving error analysis. This is a frequent occurrence in short term analysis, as the length of the prediction interval is dependent on the length of the prediction interval. The purpose of long term analysis is to demonstrate the error's convergence with the period of concurrent data. Standard deviation The most often used method of expressing uncertainty in long-term adjustments The standard deviation for each result should be included. It is critical to determine and appropriately estimate the standard deviation, which can be challenging when dealing with serially correlated data. Source: [68]. In a recent study, Basse [74] noted that further research is needed to determine how systematic biases and, ultimately, the uncertainty associated with long-term correction of short- term wind data may be decreased efficiently and expeditiously. 2.6.2 Uncertainties in the long-term correction The below Figure 2-15 gives a comprehensive overview of the different uncertainty components relevant for the energy production of a WTG. It can be observed, that the long- term adjustment (correction) is a sub-component of the historical wind resource category in the proposed draft IEC 61400-15 framework [79]. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 46 | 110 Figure 2-12. Mind map of energy production uncertainty according to the draft IEC 61400-15 Source: [79] The proposed framework is mapped in Figure 2-15 below to the current technical guideline TG6 and an example from the industry practice [18]. It may be noticed that there is no unanimity in the nomenclature used to describe the components of the uncertainty impacting the historical wind resource. The fundamental purpose of this research is to get an understanding of the representativeness of the experimental performance test (method uncertainty) in the event of missing data in line with the research questions. This study does not aim to thoroughly compare and test the remaining uncertainty components of long-term correction. The topic of this thesis is correlation and on-site data synthesis uncertainties which are the sub-components of the method uncertainty. The method uncertainty is covered in more detail in the next section. 2.6.3 MCP method uncertainty TG6 states that the quality of the chosen long-term correction technique should be checked by the long-term correction procedure's reconstruction of the original measurement data or yield data [35]. The overlap period of existing short-term measurement and a long-term reference time series is separated into training and test periods [35]. The correlation calculated during training is applied to the long-term reference time series during testing [35]. The generated dataset is compared to the test period's original short-term data. This is done using statistical factors like mean and standard deviation, as well as measurement data like wind speed and Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 47 | 110 direction frequency distribution. The mean bias error, mean absolute error, root mean squared error, and distribution error may all be calculated [35]. It is also a requirement that a self- consistency test must be used to determine the quality of the applied MCP extrapolation and the validity of the MCP result [35]. An illustration of the assessment of MCP uncertainty sub- uncertainty components using an example from industry practice and the TG6 technical guideline is shown in Figure 2-16. The standard deviation metric of the estimates introduced by Rogers was initially introduced in Section 2.6.1, which is not suitable for use cases when no long-term measured (target) dataset is available [8]. Additionally, Rogers referred to Derrick, mentioning that the uncertainty of the slope and offset is typically used to simulate the relationship between the reference and target sites in linear regression. Derrick [12] described estimating the standard deviation of the expected wind speed using the slope and offset variances and covariances [8]. But Rogers dismissed this approach because it makes the assumption that the data are not serially correlated, which is not true in the specific use case. Windfarmer refers to correlation uncertainty by stating that it is calculated using the scatter of the correlation between the reference and site masts. The smaller the scatter, the less questionable the association [46]. This is not regarded to be an objective test-based technique that is appropriate for this sort of analysis. Brower [80] suggests estimating the method uncertainty using an empirical formula. The following simple equation approximates the overall uncertainty in the long-term mean wind speed as a function of the correlation coefficient, assuming normally distributed yearly wind speed variations and a homogenous reference station data record [80]. This is given in the following equation, valid only if the concurrent dataset is longer than a single year: σ = √ r2 NR σR 2 + 1 − r2 NT σR 2 [e29] where, r= Correlation coefficient NR= Number of years of reference data NT = Number of years of concurrent reference and target data σR= Standard deviation of the annual mean wind speed of the reference site as a percentage of the mean σT= Standard deviation of the annual mean wind speed of the target site as a percentage of the mean Similar to the method referred to in the Windfarmer manual, the empirical method of Brower is not a test-based approach and was not used in this study. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 48 | 110 It is possible to execute a performance test using Windographer [43], which offers KPIs for MBE, MAE, RMSE, and DE of the various MCP approaches that have been applied. Version 4 of Windographer does not have any feature for evaluating uncertainty. Another common method to assess uncertainties is Bootstrapping, which is intrinsically linked to Monte Carlo. Monte Carlo techniques have been used to simulate and approximate distributions in various sectors, including the wind industry. However, the Monte Carlo techniques are the general term for any methods employing random numbers. Accordingly, using the LTC approaches and not restricting the computations to likely values and without implying any underlying distribution, bootstrapping may be regarded as a branch of these Monte Carlo simulations [68]. It is noted by Valk, that resampling serves the same purpose as Monte-Carlo simulation for evaluation of MCP uncertainties. However, unlike the latter, resampling does not need an explicit probabilistic model [81]. According to Nielsen, the bootstrap technique is the most used resampling approach. The idea is to create artificial data sets of the same size as the actual time series by randomly sampling from it [71]. An example of the bootstrap for uncertainty assessment is provided in [81]. The authors noted that simulations might be used to determine the effect of random error on long- term correction methods. However, when assessing the uncertainty of a wind resource estimate, it is not always required to explicitly address random errors in the data source [81]. Valk added that the block length should be adequate for the bootstrap strategy to succeed. In the study [81], the authors have chosen a block length of 62.5 days corresponding to the satisfactory de-correlation of wind speed. The random sampling procedure of bootstrapping can be repeated several times, and the DF and LTC algorithms can be used to generate a new synthetic dataset. The uncertainty associated with the MCP approach can be expressed as the standard deviation of the final estimates. It is necessary to perform a large number of simulations in order to obtain reliable estimates. Unfortunately, this is one of the disadvantages of this method, which renders it unsuitable for the sliding gap window analysis used in this study. In DNV's recommended practice DNV-RP-J101, the jack-knife (JK) and bootstrap techniques are recommended for assessing uncertainty. The jack-knife estimate of variance quantifies the uncertainty of a study's conclusions by taking into account the variability of outcomes as succeeding subsets of data are excluded from the analysis [82]. When conducting the JK, Rogers arrived at a final decision regarding the number of jackknife subsets. The number of jackknife subsets was chosen to represent the median of the 12 data sets' best-performing number of jackknife subsets he used in his study [8]. It denotes the number of jackknife subsets that produces the lowest overall root mean square error for these Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 49 | 110 data sets [8]. As a result, four subsets are relevant for the context of this research as a two- year assessment period is used. This quantity of jackknife subsets was employed throughout the entirety of this master thesis' analysis. Figure 2-13. Selection of number of subsets based on concurrent period Source: [8] The difference between JK and bootstrapping is presented in Figure 2-14. Figure 2-14. Sketch of the difference between JK and bootstrap resampling Source: [71] In conclusion, it is proposed that the RMSE of the validation phase is employed as an uncertainty metric for the interim data-filling stage. This is compared with Brower's empirical uncertainty method. In terms of the uncertainty associated with the MCP method's long-term correction, the JK method is shown to be suitable, primarily to computational limitations. An exemplary bootstrapping uncertainty is produced in Section 3.3 for a single gap duration of 60- days at a given gap time start period to demonstrate a rudimentary comparison. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 50 | 110 Figure 2-15. Mapping of sub-uncertainty components Source: Author’s own illustration based on the example of industry practice [18], TG6 technical guideline [35] and new proposed framework within IEC 61400-15 [79] Areas Type Areas Areas Partial uncertainty components and sub-components statistical Reference data 2.1 Correlation statistical On-site data synthesis 2.2 Quality of the long-term dataset assumption Long-term adjustment 2.3 Representativeness of the comparison period - w ind speed distribution assumption Wind speed and direction distribution 2.4 Representativeness of the comparison period - w ind rose assumption Long-term period 2.5 On site data synthesis statistical not 1:1 more than one 8.1.3e - Selection of the operation period Color legend: Long-term correction 1- Long-term representation Long-term correction 8.1.3a - Representativeness of the long-term data for the site Historic wind resource2- MCP method uncertainty 8.1.3b - Consistency of the long-term data sources 8.1.3c - Method uncertainty 8.1.3d - Selection of the reference time period Industry practice (example) TG 6 Rev 11 IEC61400-15 Energy uncertainty Partial uncertainty components and sub- components Partial uncertainty components and sub-components Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 51 | 110 Figure 2-16. Flowchart of evaluation of MCP uncertainty sub-uncertainty components at the example of industry practice [18] and TG6 technical guideline [35], with "correlation" uncertainty shown in amber as target of this study Source: Author’s own illustration based on [18] and [35] 2.7 Selection of the base-algorithm Whilst each strategy has advantages and disadvantages, Brower [80] recommends tried-and- true methods for day to day applications in the wind industry. In that regard, linear regression methods are highlighted as simple to use and calculate long-term mean wind speed as accurately as any linear technique [80]. Table 2-17 shows the possible number of MCP algorithms using a linear regression method. In the case of linear regression, up to 20 scenarios are easily possible for a single method. As the focus is to understand the impact on the data filling with an iterative analysis, it should be possible to implement the selected method using open source programming language without significant computational effort. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 52 | 110 Table 2-17. MCP algorithms for implementation of linear regression (LinReg) A lg or ith m ID Method Sub Method Model S ec to r Ti m e D om ai n W ei gh ts Algorithm identifier 1 LinReg TLS 1OFtO 12 Hourly nW LinReg_TLS_1OF_12_Ho_nW 2 LinReg TLS 1OFtO 16 Hourly nW LinReg_TLS_1OF_16_Ho_nW 3 LinReg TLS 1OFtO 36 Hourly nW LinReg_TLS_1OF_36_Ho_nW 4 LinReg TLS 1OFtO omni Hourly nW LinReg_TLS_1OF_omni_Ho_nW 5 LinReg TLS 1OFtO N/A Monthly We LinReg_TLS_1OF_nS_Mo_We 6 LinReg TLS 1OFtO N/A Monthly iW LinReg_TLS_1OF_nS_Mo_iW 7 LinReg TLS 1OwOf 12 Hourly nW LinReg_TLS_1Ow_12_Ho_nW 8 LinReg TLS 1OwOf 16 Hourly nW LinReg_TLS_1Ow_16_Ho_nW 9 LinReg TLS 1OwOf 36 Hourly nW LinReg_TLS_1Ow_36_Ho_nW 10 LinReg TLS 1OwOf omni Hourly nW LinReg_TLS_1Ow_omni_Ho_nW 11 LinReg TLS 1OwOf N/A Monthly We LinReg_TLS_1Ow_nS_Mo_We 12 LinReg TLS 1OwOf N/A Monthly iW LinReg_TLS_1Ow_nS_Mo_iW 13 LinReg LLS 1OFtO 12 Hourly nW LinReg_LLS_1OF_12_Ho_nW 14 LinReg LLS 1OFtO 16 Hourly nW LinReg_LLS_1OF_16_Ho_nW 15 LinReg LLS 1OFtO 36 Hourly nW LinReg_LLS_1OF_36_Ho_nW 16 LinReg LLS 1OFtO omni Hourly nW LinReg_LLS_1OF_omni_Ho_nW 17 LinReg LLS 1OFtO N/A Monthly We LinReg_LLS_1OF_nS_Mo_We 18 LinReg LLS 1OFtO N/A Monthly iW LinReg_LLS_1OF_nS_Mo_iW 19 LinReg LLS 1OwOf 12 Hourly nW LinReg_LLS_1Ow_12_Ho_nW 20 LinReg LLS 1OwOf 16 Hourly nW LinReg_LLS_1Ow_16_Ho_nW 21 LinReg LLS 1OwOf 36 Hourly nW LinReg_LLS_1Ow_36_Ho_nW 22 LinReg LLS 1OwOf omni Hourly nW LinReg_LLS_1Ow_omni_Ho_nW 23 LinReg LLS 1OwOf N/A Monthly We LinReg_LLS_1Ow_nS_Mo_We 24 LinReg LLS 1OwOf N/A Monthly iW LinReg_LLS_1Ow_nS_Mo_iW 25 LinReg VR - 12 Hourly nW LinReg_VR_12_Ho_nW 26 LinReg VR - 16 Hourly nW LinReg_VR_16_Ho_nW 27 LinReg VR - 36 Hourly nW LinReg_VR_36_Ho_nW 28 LinReg VR - omni Hourly nW LinReg_VR_omni_Ho_nW Source: Author’s own calculation/assessment Table 2-18 presents the other MCP algorithms possible for the study, which were tested prior to the implementation of the iterations. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 53 | 110 Table 2-18. MCP algorithms for implementation of other methods3 A lg or ith m ID Method Sub Method Model S ec to r Ti m e D om ai n W ei gh ts Algorithm identifier 29 Bin Method VS - 12 Hourly 10_WSb ins BinMethod_VS_12_Hourly_10_W Sbins 30 Bin Method VS - 16 Hourly 10_WSb ins BinMethod_VS_16_Hourly_10_W Sbins 31 Bin Method VS - 36 Hourly 10_WSb ins BinMethod_VS_36_Hourly_10_W Sbins 32 Bin Method VS - omni Hourly 10_WSb ins BinMethod_VS_omni_Hourly_10 _WSbins 33 Matrix MTS Def 12 Hourly - Matrix_MTS_Def_12_Hourly 34 Matrix MTS Def 16 Hourly - Matrix_MTS_Def_16_Hourly 35 Matrix MTS Def 36 Hourly - Matrix_MTS_Def_36_Hourly 36 Matrix MTS Def omni Hourly - Matrix_MTS_Def_omni_Hourly 37 Matrix Wind- PRO Def 12 Hourly - Matrix_WindPRO_Def_12_Hourly 38 Matrix Wind- PRO Def 16 Hourly - Matrix_WindPRO_Def_16_Hourly 39 Matrix Wind- PRO Def 36 Hourly - Matrix_WindPRO_Def_36_Hourly 40 Matrix Wind- PRO Def omni Hourly - Matrix_WindPRO_Def_omni_Hou rly 41 QM Speed Sort Def 12 Hourly - QM_SpeedSort_Def_12_Hourly 42 QM Speed Sort Def 16 Hourly - QM_SpeedSort_Def_16_Hourly 43 QM Speed Sort Def 36 Hourly - QM_SpeedSort_Def_36_Hourly 44 QM Speed Sort Def omni Hourly - QM_SpeedSort_Def_omni_Hourl y 45 EM BSR ISo1 12 Hourly - EM_BSR_ISo1_12_Hourly 46 EM BSR ISo1 16 Hourly - EM_BSR_ISo1_16_Hourly 47 EM BSR ISo1 36 Hourly - EM_BSR_ISo1_36_Hourly 48 EM BSR ISo1 omni Hourly - EM_BSR_ISo1_omni_Hourly 49 EM Weibull scale ISo3 12 N/A - EM_Weibull scale_ISo3_12_N/A 50 EM Wind index ISo3 N/A N/A - EM_Wind index_ISo3_N/A_N/A Source: Author’s own calculation/assessment 3 Selected list of methods available in industry software. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 54 | 110 The particular focus of this study is on offshore applications. Thus, the complexity in the transfer functions between the target and reference sites are expected to be low. As stated by Duncan, over land and water, the diurnal and annual changes of near-surface wind speed are vastly different [83]. Wind speeds offshore are often thought to be stronger and less turbulent than onshore. Furthermore, the diurnal and annual changes of near-surface wind speed vary significantly between land and water. The diurnal cycle is almost non-existent at sea throughout the whole year due to the considerable thermal inertia of the sea surface. Because of the increased synoptic activity in the winter, wind speeds are higher than in the summer [83]. Accordingly, the MCP method suitable for offshore might not necessarily be complex. Therefore a widely used simplistic MCP algorithm might prove good enough results, whereby it could be validated easily during the coding process. In other words, the repeatability of the analysis of the gap-filling impact would be easier with a simple but proven method. Regarding the submethod, the selection was based on the LLS, as there was high confidence in the measured dataset. A first-order linear regression model with offset (1OwOf) was selected, as this is a well known and widespread method, providing robust results. This assumption is further tested and confirmed for this specific analysis with the performance testing algorithm within Windographer within Section 3.1. Similarly, the consideration of the wind directions, or the number of sectors, is an essential feature of the MCP algorithm. In general, terrain greatly influences wind direction, with the distance to the coastline from offshore locations having a considerable impact on the directional distribution [68]. The omnidirectional analysis was based on 41910 iterations for a total of 60 gap periods in sequential steps. The sectorwise approach would scale the number of iterations by the multiple factors of sector numbers accordingly due to the design of the code. Therefore, following sensitivity runs with sectorwise runs, the necessity of directional MCP was assessed. The selected target dataset location MMIJ is located far offshore without any coastal effects in the different sectors. Therefore, an omnidirectional analysis was found to be suitable, as there were no directional influences. Finally, it was concluded that the omnidirectional MCP was a reasonable simplification for the purpose of the study. The hourly temporal resolution was selected for the study, as this was considered important to understand the impact of the data filling. Finally, considering the above-mentioned criteria, the linear regression method with the LLS sub-method (MCP algorithm ID 22) has been chosen as the base case scenario for this study. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 55 | 110 2.8 Design of the code for iterative analysis Table 2-19 illustrates the relationships and datasets utilized for data filling and long-term correction. The study begins with the complete measured dataset covering a two-year period of 17472 hours. Gaps ranging from one day to sixty days are added in an outer loop, with a 24-hour increment (1-day). This gap is removed from the measured period in 24-day increments in a sliding window. This is referred to as the inner loop. At the time of the analysis's inception, the code was implemented sectorwise. As a result, the inner loop comprises a secondary loop across the sector bins. As previously noted, KPIs are gathered throughout each sector for the PreDF, SelfDF, and ValDF groups, based on the correlations provided in Table 2-20. During the validation step, the decision was taken to change the code to an omnidirectional (1 sector) version, primarily due to computational constraints. The outer loop has been designed in Jupyterlab. JupyterLab is an interactive development environment for notebooks, code, and data that is available over the web. Users may create and organize data science and scientific computing processes using the interface's flexibility [84]. The inner loops have been developed using Python [85] within the latest Anaconda environment [86]. NumPy [83] and pandas [84] were used within the python environment for calculations. Matploblib [87] module was utilized for visualisations, whereas sklearn.metrics [88], scipy.stats [89] and dc_stat_think [90] were deployed for statistical analysis. The module xlsxwriter was implemented to export the results to Excel. The overall design of the code is presented in Figure 2-17. It is noted that the training and test periods as defined in Table 2-19 are not random and don’t have equal durations but are always complimentary. The extension (creation of the synthesized data) does not replace observations. It should be mentioned that throughout the code's creation, the output of the Python code was compared to the output of Windographer in numerous phases to validate the findings. For bin analysis, a separate function was built to partition the data into matching bins. Furthermore, directional averaging was performed using the wind direction's vector components during sectorwise analysis. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 56 | 110 Figure 2-17. Flow chart of the code Source: Author’s own illustration Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 57 | 110 Table 2-19. Relationships and datasets for data filling at the example of data segments Ab b. Dataset Period Goal Note KPI Group KPI Description Relation- ship for model LT 1 LT … S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10 S 11 S 12 LT … -1 9 Y.1 Measured w artificial gap - - - - - - Y.2 Measured gap - - - - - - Y.3 Measured full - - - - - - X.1 Reference-DF - - - - - - X.2 Reference-DF - - - - - - X.3 Reference full - - - - - - X.4 Reference LT - - - - - B.0 Concurrent_w_gap Training Suitability Prerequisite PreDF Reference-observed None B.1 Model_w_gap Training Uncertainties 1. Step SelfDF Predicted-observed Y.1-X.1 B.2 Model-gap_self_prediction Not part of this study Y.2-X.2 B.3 Model_self_prediction Y.3-X.3 C Model_gap Test Validation 2. Step ValDF Predicted-observed (gap) Y.1-X.1 D Model_gapfilled DF Input to F 3. Step PostDF Predicted-observed Y.1-X.1 E Model_ltc LTC Impact of DF 4. Step Ltc Jackknife uncertainty Y.1-X.1 F Model_df_ltc LTC LTC 5. Step LtcDF Jackknife uncertainty D-X.3 Source: Author’s own calculation/assessment Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 58 | 110 Table 2-20: Reference relationships for the KPI classification Dataset Scenarios Uncertainties Case PreDF (no model) SelfDF (self-predictions) ValDF For verification - onsite metrics ConcurrentPeriod - Use PreDF_KPI_p0 Relation : - Reference : X.3 Target : Y.3 PreDF_KPI_p0 Relation : Y.3-X.3 Reference : X.3 Target : B.3 Not available in the use case. ConcurrentPeriod_gap 1-day to 60- days - Test PreDF_KPI_p2 Relation : - Reference : X.2 Target : Y.2 SelfDF_KPI_p2 Relation : Y.2-X.2 Reference : X.2 Target : B.2 ValDF_KPI_p1 Relation : Y.1-X.1 Reference : X.2 Target : C ConcurrentPeriod_w_gap 1-day to 60- days - Test PreDF_KPI_p1 Relation : - Reference : X.1 Target : Y.1 SelfDF_KPI_p1 Relation : Y.1-X.1 Reference : X.1 Target : B.1 - ConcurrentPeriod_gap_filled 1-day to 60- days RMSE-MWS (ValDF) Test - SelfDF_KPI_p3 Relation : Y.1-X.1 Reference : X.3 Target : D - Source: Author’s own calculation/assessment The greyed relationships in the above Table show possible investigations that were not part of this study. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 59 | 110 2.9 Datasets The measurement and reference datasets are discussed in the subsequent sub-sections. 2.9.1 Selection of the measurement dataset The meteorological met mast Ijmuiden (MMIJ) dataset was preselected and provided by Dr Gottschall for this analysis based on a previous investigation of impacts of gaps on offshore datasets [4]. TNO Energy Transition's wind energy division conducted a four-year meteorological measuring study by installing and operating the MMIJ in the Dutch North Sea between 2011 and 2015 by the commission of The Ministry of Economic Affairs, Agriculture, and Innovation [91]. MMIJ was located approximately 75 km west of Ijmuiden’s coast. Sensors are positioned at various heights (between 25 m and 100 m) to observe and record wind speed, direction, temperature, and pressure changes. A light detection and ranging (lidar) system was installed, measuring wind speed and direction up to 300 meters above the mast. The campaign included measurements on sea current and wave data using a wave buoy in order to construct safe and cost-effective foundations for future offshore wind turbines. The MMIJ dataset can be requested for research purposes by the TNO's data cloud manager [92]. A two full years dataset was provided by Dr Gottschall for the analysis at the top height with wind direction and wind speed data at 10 minutes temporal resolution. 2.9.2 Selection of the long-term reference dataset ERA5 was used as the reference dataset in the initial study conducted by Gottschall [4]. In order to conduct this study, this dataset has been pre-selected. It satisfies the criteria for reference dataset properties established by TG6 [35]. ECMWF is producing the ERA5 reanalysis as part of the Copernicus Climate Change Service (C3S), which contains a thorough record of the global atmosphere, land surface, and ocean waves from 1950 to the present. ERA5 benefits from a decade of advances in model physics, core dynamics, and data assimilation. In addition to a greatly improved horizontal resolution of 31 km, ERA5 includes hourly output [93]. ERA5 is accessible in the geographical domain worldwide, is well-documented, and has been extensively validated. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 60 | 110 In addition to ERA5, MERRA-2 and KNMI datasets were also evaluated during the analysis of the long-term period, as discussed in Section 2.9.4.3 2.9.3 Measurement campaign overview The details of the MMIJ instrumentation is provided in the ECN-Wind Memo-12-010 [94]. Thies First Class anemometers were deployed during the measurement campaign by ECN [94], as shown in Table 2-21. Table 2-21. MMIJ Instrumentation Sensor type Heights above LAT [m] Analysis use case Arrangement Distance from mast [m] Vertical distance from boom [cm] Sensor Measured variable Anemometer 92 Primary Top anemometer dual boom, 17.5° and 197.5° orientation - 1500 Thies First Class Advanced anemometer 10-minute average, standard deviation, minimum and maximum values Wind vane 87 Primary Triple boom arrangement with, 46.5°, 166.5° and 286.5° orientation Triple boom arrangement with, 46.5°, 166.5° and 286.5° orientation Triple boom arrangement with, 46.5°, 166.5° and 286.5° orientation 4.6 70 Thies First Class wind vane Anemometer 58.5 Secondary 7.0 150 Thies First Class Advanced anemometer Anemometer 27 Secondary 9.2 150 Thies First Class Advanced anemometer Source: Author’s own summary based on [94] The MMIJ is shown in Figure 2-18. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 61 | 110 Figure 2-18. Picture of the MMIJ station Source: [94] 2.9.4 Pre-processing and data preparation Fraunhofer IWES has conducted the screening and pre-processing of the measured time series for the time period 01 June 2012 until 30 June 2014 (short-term period). The pre- processed time series at the 92 m wind speed and 87 m wind direction level (Ijmuiden_filled_2012-2014) was provided as a “txt” file as input into this analysis. The measurements were done using several anemometers at the same heights. The anemometers were combined into a virtual anemometer representing the relevant height by removing tower shadow effects following the screening. An example methodology of obtaining the virtual anemometer without the tower shadow effects is provided within [94] in “Chapter 7.5”, header “True wind speed”. Further, the data coverage of the top height was increased by means of intra-mast correlation analysis. This was done to have the highest data coverage possible for the research exercise. The below comparison figure of the time series shows very good alignment with the results obtained by Fraunhofer IWES and ECN [83] for the period in question. As the start of the short- term period does not cover full years in 2012 and 2014, the year 2013 is suitable for a like-a- like comparison. In the below Figure 2-19, it can be observed that the difference in Weibull fit and histogram is negligible between the Fraunhofer IWES and ECN datasets. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 62 | 110 Figure 2-19. Weibull fit and histogram of MMIJ measurements in 2013 (left: ECN analysis, right: Fraunhofer IWES dataset) Source: Left: [83], right: Author’s own illustration via Windographer 2.9.4.1 Summary statistics of the short-term dataset The summary statistics of the short-term dataset is shown in Table 2-21. Table 2-22. MMIJ short-term statistics Variable Value – WS92 Measurement height [m] 92 Mean wind speed [m/s] 9.88 Median wind speed [m/s] 9.48 Minimum wind speed [m/s] 0.29 Maximum wind speed [m/s] 37.92 Standard deviation [m/s] 4.78 Weibull k [-] 2.18 Weibull A [m/s] 11.16 Possible data points 105120 Available data points 104844 Data availability [%] 99.74 Variable Value - WD87 Measurement height [m] 87 Mean wind direction [°] 223.1 Median wind direction [°] 207.5 Possible data points 105120 Available data points 104842 Data availability [%] 99.74 Source: Author’s own calculation/assessment Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 63 | 110 2.9.4.2 Time synchronisation Following the selection of the reference dataset, a combined dataset consisting of the reference and measured (target) dataset was created. The Pearson correlation coefficient was used within Windographer to calculate the maximum correlation between two data sets for the wind speed. This analysis step shifts the reference time step automatically to obtain the offset, which maximises the degree of correlation. In the below Figure 2-20, the results are presented, showing a minus one hour shift was required. This is done subsequently in Windographer. Figure 2-20. Time synchronisation Source: Author’s own illustration via Windographer 2.9.4.3 Definition the reference long-term reference period It should be determined whether there were any trends in the long-term reference dataset. The study looked at different long-term durations ranging from 10 to 20 years in length with the same method proposed in [18]. The slope of the fit was calculated by fitting normalized yearly wind speeds using a linear regression approach. The analysis has been repeated for each MERRA-2, KNMI, ERA5 nodes nearest to the MMIJ location. A time range was chosen that minimizes the impact of a probable trend while still being representative of the long-term reference period. Following the trend analysis, the ERA5 reference dataset “R5” located at 52.69° North and 3.60° East from 2000 to 2018 with 19 years of duration has been selected as the reference dataset for the analysis. The trend analysis is shown in Figure 2-21. Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 64 | 110 Figure 2-21. Annual trend analysis and comparison of reference datasets for the selected long-term period 2000-2018 Source: Author’s own illustration Methods and materials University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 65 | 110 2.9.4.4 Summary statistics of the concurrent and long-term reference period The summary statistics of the long-term reference dataset is shown in Table 2-23. Table 2-23. Reference dataset statistics for the concurrent and long-term periods Variable Long-term period Value – WS100 Concurrent period Value – WS100 Model height [m] 100 m 100 m Mean wind speed [m/s] 9.28 9.34 Median wind speed [m/s] 8.91 9.01 Minimum wind speed [m/s] 0.02 0.07 Maximum wind speed [m/s] 32.98 29.24 Standard deviation [m/s] 4.44 4.50 Weibull k [-] 2.23 2.25 Weibull A [m/s] 10.50 10.63 Possible data points 166559 17519 Available data points 166559 17472 Data availability [%] 100.0 99.7 Variable Value - WD87 Value - WD87 Model height [m] 100 100 Mean wind direction [°] 247.7 229.00 Median wind direction [°] 218.1 211.9 Possible data points 166559 17519 Available data points 166559 17472 Data availability [%] 100.0 99.7 Source: Author’s own calculation/assessment Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 66 | 110 3 Results Based on the evaluation of the MCP algorithms, the base-case scenario for the Python code was developed. The most critical performance indicators during the procedure were evaluated and presented. The outcomes of data-filling and long-term correction are detailed in the following sections. The uncertainty evaluation is given at the end of this section. The detailed results presented in this section are provided in the annexes from Annex B to Annex N. 3.1 Evaluation of the MCP algorithms In addition to the linear regression concepts shown in Table 2-17, the following concepts were tested to understand the suitability of the base case scenario. In order to gain confidence and select a reasonably robust MCP algorithm, the MCP methods presented in Section 2.7 were tested with an omnidirectional selection. This is done in ISo1 using the performance test functionality. This test is conducted within ISo1 with a cross- validation experiment, where a selected number of segments are created within the concurrent period, and the datasets are divided into training and test periods. The model is fit using the data within the training segments, and the output is generated for the testings periods. The observed and predicted are compared for a total of 400 randomized datasets, and the following test statistics are generated as shown in Figure 3-1 for this study. Figure 3-1. MBE, MAE and DE results of the investigated MCP methods Source: Author’s own illustration, generated in ISo1 Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 67 | 110 It can be observed that MTS and LLS perform best in the case of MBE and MAE with error values of -0.0005 m/s and 1.0 m/s for MBE and MAE, respectively, whereas TLS, VR, SS and BSR methods perform slightly better regarding the distribution error. The coefficient of variation (COV) is defined by the ratio of mean and standard deviation. COV results of the considered methods and submethods are shown in Table 3-1. It can be seen that despite the high number of algorithms considered for linear regression, the COV values are similar to the other methods. Table 3-1. Coefficients of variation of considered MCP methods Method Count Subtotal submethods COV BinMethod 4 1 0.05% EM 6 3 0.17% LinReg 21 3 0.30% Matrix 8 2 0.46% QM 4 1 0.03% Source: Author’s own calculation/assessment The long-term wind speed (LTWS) results of the different methods are presented in Figure 3-2, showing that the LTWS of the base-case algorithm is in good alignment with the other results. Figure 3-2. Comparison of LTWS with different MCP methods Source: Author’s own illustration, generated in ISo1 3.2 Evaluation of the base-case algorithm results The application of the base-case algorithm has been conducted in line with the flow chart shown in Figure 2-17 presented previously. During the validation and simulation runs, the Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 68 | 110 output of the python code was monitored for plausibility. In the subsequent subsections, the findings of the simulations are presented. 3.2.1 Key performance indicators during the process The KPI defined for PreDF in Table 2-12 were evaluated as a prerequisite to running an MCP for both data filling and long-term correction. This analysis was conducted sectorwise. Another use of calculating PreDF KPIs is to evaluate the performance of self-predictions to observed metrics changes. The sectorwise exemplary results of the concurrent periods are presented in detail in Annex B. The heatmaps of measured Weibull scale and shape factors, as well as R² values of sectorwise hourly wind speed correlations, are shown in Figure 3-3 and Figure 3-4, respectively, for 1-day and 60-days gap scenarios. The description “feature” represents the sector, the colours within the vertical columns represent the iteration results within the gap. In each heatmap, the evolution of iteration results is shown starting from top to bottom. As shown in the images below, the larger intervals cause a minor distortion in the A and k Weibull parameters. Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 69 | 110 Figure 3-3. PreDF – Heatmap of measured Weibull scale and shape factors for 1-day (left) and 60-days gap scenarios (right) in each column, respectively Scale factors Shape factors Source: Author’s own illustration, the description “feature” represents the sector, the colours within the vertical columns represent the iteration results within the gap. Scale factor in m/s, shape factor dimensionless. Figure 3-4 depicts the heatmaps of R² values of sectorwise hourly wind speed correlations for scenarios with 1-day and 60-day gaps scenarios. The hourly wind speed correlations (R²) are very good (>0.85) across all sectors and uniform throughout the sliding gap window in the respective period. A slight decrease in correlations can be observed in the longer 60-days-gap period, especially in the easterly sectors. In general, the R² values are considered very good, showing a significant correlation between the reference and measured datasets. As a result, a sector-based MCP approach is deemed appropriate. Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 70 | 110 Figure 3-4. PreDF – Heatmap of R² values of sectorwise hourly wind speeds correlation for 1-day (left) and 60-days gap scenarios (right) Source: Author’s own illustration, the description “feature” represents the sector, the colours within the vertical columns represent the iteration results within the gap. The coefficient of determination of sectorwise hourly wind speeds for 1-day and 60-days gap periods are presented in Figure 3-5 with a whisker plot. The "whiskers" plot (also box plot) is defined by the third quartile on the top and the first quartile on the bottom. The box is divided by the median. The whiskers represent error bars, with one extending upward from the third quartile to the maximum and the other extending downward from the first quartile to the lowest. Dot markers are also used to identify the outliers in the data. Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 71 | 110 For the single-day gap period, the sectorwise R² values show excellent correlation (R²>0.9) throughout the majority of sectors, as well as a very narrow distribution between the 25% and 75% quantiles for the whole duration. Similarly, the correlations do not diminish throughout the 60-day gap period, while the extent of the boxes increases somewhat during this time. Accordingly, the sectorwise correlations are deemed appropriate for hourly modelling of a linear regression MCP, both for data filling and long-term correction. Figure 3-5. PreDF – Box plot of R² values of sectorwise hourly wind speeds correlation for 1-day Source: Author’s own illustration, the description “feature” represents the sector. Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 72 | 110 Figure 3-6. PreDF – Box plot of R² values of sectorwise hourly wind speeds correlation for 60-days gap Source: Author’s own illustration, the description “feature” represents the sector. The MBE, MAE, and RMSE of mean wind speeds over concurrent periods are summarized for 1-day and 60-days in Table 3-2. The reader is reminded that the PreDF metrics do not involve any modelling and just show a comparison of the reference and target datasets in order to determine the dataset's appropriateness and representativeness for an MCP method, as specified in the technical standards [35]. The MBE, MAE, and RMSE values for the PreDF period are relatively high, indicating that despite its good correlations, the reference dataset cannot match the precision of wind speed observations. This is to be expected, given that the reference dataset is a global reanalysis with a coarse grid resolution, as opposed to a mesoscale modelling dataset. Although this is not a concern for this type of study, it does provide an opportunity to evaluate the algorithm's performance against a mesoscale modelling solution in a future exercise. It is noted at this stage that mesoscale simulations are not entirely independent from reanalysis solutions as they use dynamic downscaling methods driven by reanalysis data [35]. Summary statistics of the RMSE of MWS for 1-day and 60-days period are presented in Table 3-2. The summary statistics of the MBE and MAE of MWS are shown in Annex C. Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 73 | 110 Table 3-2. PreDF - Summary statistics of RMSE of MWS for 1-day and 60-days gap scenarios Sector Mean [m/s] Standard deviation [m/s] Max [m/s] 1-day 60-days 1-day 60-days 1-day 60-days 0 1.184 1.084 0.206 0.378 1.229 1.283 1 1.182 1.087 0.206 0.381 1.230 1.366 2 1.460 1.343 0.254 0.468 1.515 1.546 3 1.673 1.535 0.291 0.536 1.736 1.804 4 1.692 1.559 0.294 0.544 1.764 1.824 5 1.677 1.545 0.292 0.539 1.743 1.787 6 1.600 1.476 0.278 0.515 1.658 1.755 7 1.427 1.313 0.248 0.458 1.476 1.529 8 1.500 1.381 0.261 0.481 1.551 1.587 9 1.228 1.130 0.214 0.394 1.272 1.302 10 1.199 1.101 0.209 0.383 1.245 1.264 11 1.151 1.054 0.200 0.367 1.194 1.234 Source: Author’s own calculation/assessment Heatmaps of the mean bias error observed in the Weibull shape and scale factors for all iterations and gap scenarios – weighted from sectorwise analysis - are shown in Figure 3-7 for 1-day, 30 days and 60-days gap scenarios. The evolution of the scale and shape factors are provided in the following figures, indicating a good alignment between the reference and measured datasets. The MBE for shape factor ranges from -0.03 (blue) to -0.00 (yellow), indicating that the Weibull shape factor is nearly identical between the measured and reference datasets. The MBE of the scale factors demonstrates a greater discrepancy but similar low dispersion, ranging from 0.61 m/s (blue) to 0.67 m/s (red) (yellow). Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 74 | 110 Figure 3-7. PreDF – Heatmap of MBE of Weibull shape (left) and scale (right) factors for all iterations and gap scenarios – weighted from sectorwise analysis Source: Author’s own illustration, grey means no data values, feature stands for gap. The MBE, MAE and RMSE of WPD for 1-day and 60-days gap scenarios were documented during the iteration, which is presented in Annex C. The greatest difference in wind power density is seen in the south sector with the fewest samples. The MBE, MAE, and RMSE of WPD distributions are comparable across sectors, and they marginally decrease for the largest gap scenario. The overall statistics of the KS values are presented in Table 3-3, showing a moderate performance. The KS-statistic has a similar error margin distribution as the wind power density, with greater errors in the easterly sectors. A somewhat reduced KS error is detected in the primary wind direction components of sectors 7 and 8, which does not increase with gap size. This revealed that the distribution is unlikely to be influenced by increasing gap size. It should be highlighted that a mesoscale model product would have higher KS-statistic performance (lower value) when compared to this observed dataset. Table 3-3. PreDF - Summary statistics of KS of MWS for 1-day and 60-days gap scenarios Sector Mean Standard deviation Max 1-day 60-days 1-day 60-days 1-day 60-days 0 4.9% 4.4% 0.8% 1.6% 5.2% 6.1% 1 5.6% 5.3% 1.0% 1.9% 6.1% 7.0% 2 8.6% 8.0% 1.5% 2.8% 9.2% 10.3% 3 11.2% 10.5% 2.0% 3.7% 11.9% 13.2% 4 8.6% 8.1% 1.5% 2.9% 9.2% 10.3% 5 8.7% 8.3% 1.5% 2.9% 9.2% 10.9% 6 6.4% 6.1% 1.1% 2.1% 6.7% 7.8% Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 75 | 110 Sector Mean Standard deviation Max 1-day 60-days 1-day 60-days 1-day 60-days 7 4.1% 3.8% 0.7% 1.4% 4.3% 4.9% 8 5.0% 4.7% 0.9% 1.7% 5.3% 5.9% 9 4.5% 4.2% 0.8% 1.5% 4.7% 5.3% 10 7.1% 6.6% 1.2% 2.4% 7.6% 8.3% 11 6.4% 6.0% 1.1% 2.1% 7.0% 7.6% Source: Author’s own calculation/assessment Annex C contains heatmaps showing sectorwise wind direction deviation of wind speeds for 1-day and 60-day gap situations. Wind direction discrepancies are relatively moderate throughout sectors, ranging from -4.5° to -1.1°, with the primary sectors having the biggest offsets. In light of the aforementioned metrics, it is clear that the reference dataset chosen is appropriate and representative of the target location and that it may be utilized to make predictions with the chosen LLS algorithm. SelfDF KPIs are used to compare the outcome of predictions to a known outcome, which is represented by the true measured values. It is possible to assess the performance of the model with the use of the SelfDF key performance indicators. The relationships for SelfDF were previously detailed in Table 2-19 and Table 2-20. It should be mentioned at this point that, in real-life circumstances, there is no long-term dataset available for analysts to use in order to analyze the true performance of a model. Furthermore, because technical analysis time is often limited, it is necessary to execute a simplified procedure in order to evaluate the performance of any chosen MCP approach as rapidly as possible. In order to get further insight into the performance of the model, it is critical to judge the SelfDF performance and, if possible, look for a relationship with the validation performance, in which predicted values of a training model are compared to unknown true observed values, as described above. This is covered in further detail under the ValDF KPI. In the subsequent paragraphs, tables and figures, the performance of the LLS model is presented during the concurrent period. The R² correlations of hourly wind speeds between the model predictions and actual values are shown in Annex C with heat maps for scenarios with a 1-day and 60-day gap period. The results demonstrate that the distributions of R² are consistent across the sliding gap periods. A similar pattern can be observed in terms of correlations throughout bins; they are outstanding up to 0.92 with the exception of the eastern sectors, which have poorer correlations down to 0.77. The box plots of the data are shown in Figure 3-8 and Figure 3-9. Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 76 | 110 Figure 3-8. SelfDF – Boxplot of R² values of sectorwise hourly wind speeds correlation for 1-day scenario Source: Author’s own illustration, the description “feature” represents the sector. Figure 3-9. SelfDF – Boxplot of R² values of sectorwise hourly wind speeds correlation for 60-day scenario Source: Author’s own illustration, the description “feature” represents the sector. Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 77 | 110 Figure 3-10. SelfDF - 3D evolution of RMSE of MWS for all sectors and gaps Source: Author’s own illustration with Paraview [95] For the examined periods, the model's mean bias error is zero, and for the 1-day gap scenario, the MAE is roughly 1 m/s throughout the bins, as shown in Table 3-4. These findings are easily comparable to those previously given in Figure 3-1 for the various MCP investigations. A total of 1 m/s approximates a 10% relative MAE, which is in excellent agreement with previous MCP methods and demonstrates better performance. The 60-days scenario results in a somewhat lower average MAE. The root mean square error of mean wind speeds in the bins is visualized in a ParaView plot as shown in Figure 3-10, where the x-axis represents the number of iterations in time, the y-axis the gap duration from 1 to 60-days and the z-axis the directional sectors from 1 to 12. The magnitude of RMSE is represented with a colour. The plot is shown to emphasize that an RMSE value exists for each bin, iteration within the gap, and gap period, and secondly to demonstrate that the results are remarkably uniform around 1.2 m/s and consistent across bins, iterations, and gap periods, with the minor exception of sectors 4 to 7, where a higher error can be observed up to 1.6 m/s Table 3-4. SelfDF - Summary statistics of RMSE of MWS for 1-day and 60-days gap scenarios Sector Mean [m/s] Standard deviation [m/s] Max [m/s] 1-day 60-days 1-day 60-days 1-day 60-days 0 1.133 1.038 0.197 0.362 1.175 1.218 1 1.153 1.061 0.201 0.372 1.200 1.325 2 1.324 1.219 0.230 0.425 1.378 1.452 3 1.281 1.177 0.223 0.411 1.329 1.408 Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 78 | 110 Sector Mean [m/s] Standard deviation [m/s] Max [m/s] 1-day 60-days 1-day 60-days 1-day 60-days 4 1.461 1.347 0.254 0.470 1.521 1.583 5 1.473 1.356 0.256 0.473 1.528 1.577 6 1.404 1.295 0.244 0.452 1.455 1.566 7 1.353 1.243 0.235 0.433 1.399 1.439 8 1.435 1.322 0.250 0.461 1.485 1.525 9 1.162 1.070 0.202 0.373 1.204 1.238 10 1.088 0.999 0.189 0.348 1.127 1.147 11 1.087 0.996 0.189 0.347 1.127 1.156 Source: Author’s own calculation/assessment The heatmaps of the MBE of Weibull shape and scale factors are shown in Annex C of this document. When comparing the differences between the measured and reference periods, an anticipated improvement in the MBE values are seen, with a minor variation between the measured and model scale factors of 0.13 to 0.15 for the scale factors between the two periods. Scale factor deviations are insignificant with values between 0.008 m/s and -0.001 m/s. Figure 3-11. SelfDF – Heatmap of MBE of Weibull shape (left) and scale (right) factors for all iterations and gap scenarios – weighted from sectorwise analysis Source: Author’s own illustration, grey means no data values, feature stands for gap. The WPD statistics for the SelfDF period is provided in Annex C. With regard to the PreDF KPI, an improvement in the MBE, MAE, and RMSE of the WPD has been noticed. The root mean square error of WPD is reduced by 18% in the sector with the maximum error. This is to be anticipated, given the global reanalysis dataset was not originally intended to align well with Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 79 | 110 the absolute values in a measurement dataset. The comparison of WPD revealed no more information. A similar observation was made for the KS-statistic. Following the model fit, the predicted distribution exhibits good performance with an error margin of 1.9% to 2.4% in the primary wind directions. Figure 3-12 depicts the progression of the root mean square error of the MWS for the 1-day (top) and 60-day (bottom) scenarios of the omnidirectional analysis, with the heatmap shown for all iterations and gap situations in the next Figure 3-13. The findings of the omnidirectional root mean square error (RMSE) are now in great agreement with the results of the initial Windographer performance test of different MCP methods. In the 1-day gap case, it can be seen that the spread of the root mean square error during the measurement period is very limited. Despite the fact that this grows significantly for the 60-day case, the range of RMSE of MWS stays within a 0.05 m/s interval. Figure 3-12. SelfDF – Evolution of RMSE of MWS for 1-day (top) and 60-days (bottom) scenarios – omnidirectional analysis Source: Author’s own illustration Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 80 | 110 Figure 3-13. SelfDF – Heatmap of RMSE of MWS for all iterations and gap scenarios – omnidirectional analysis Source: Author’s own illustration, grey means no data values, feature stands for gap. MBE, MWS, and RMSE of mean wind speeds were examined throughout the validation period to determine the genuine performance of the tested approach. The difference between predicted and observed values are used to calculate the ValDF KPI, where predicted values are trained using the concurrent period rather than the validation period. Due to the fact that such a comparison is not attainable in real-world projects, any knowledge acquired from this part might prove very valuable. It is noted that the validation period KPIs are derived from an omnidirectional LLS modelling. The evolutions of MBE, MAE and RMSE of MWS are shown in Figure 3-14, Figure 3-15 and Figure 3-16, respectively, for 1-day and 60-days scenarios. Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 81 | 110 Figure 3-14. ValDF – Evolution of MBE of MWS for 1-day (top) and 60-days (bottom) scenarios Source: Author’s own illustration The mean bias error shows a strong oscillation around zero. This is considered reasonable considering the good performance of the model shown in the SelfDF period. As a result, the absolute errors are much larger. MAE and RMSE of ValDF mean wind speeds both demonstrate a higher dispersion around the mean. The coefficient of variance declines from 48% in the case of a single-day gap to 9% in the scenario of a 60-day gap. It's worth noting that this COV behaviour is the opposite of what was seen for the SelfDF KPI, as the increase in the gap size results in a higher number of samples. Hence the downwards trend is plausible. The summary statistics of the ValDF period over the gap periods are shown in Table 3-5. Table 3-5. Summary statistics of ValDF for all gap periods Description MBE MAE RMSE Mean of gap MWS [m/s] -0.003 0.930 1.243 Mean of standard-deviation [m/s] 0.200 0.293 0.397 Mean of maximum gap MWS [m/s] 0.446 1.457 1.948 Standard deviation of the mean [m/s] 0.003 0.026 0.025 Standard error [m/s] 0.000 0.003 0.003 Source: Author’s own calculation/assessment Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 82 | 110 Figure 3-15. ValDF – Evolution of MAE of MWS for 1-day (top) and 60-days (bottom) scenarios Source: Author’s own illustration Figure 3-16. ValDF – Evolution of RMSE of MWS for 1-day (top) and 60-days (bottom) scenarios Source: Author’s own illustration The link between SelfDF and ValDF RMSE was examined with the goal of establishing a proxy approach for assessing the uncertainty associated with data filling. For all gap situations, a very high negative relationship was observed between the SelfDF and ValDF RMSE of MWS. It is noteworthy to highlight that the relatively small RMSE error interval for self-prediction is linked with the larger error interval seen during the validation period. The inverse correlation Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 83 | 110 suggests that if MWS's self-prediction RMSE is quite large, the uncertainty in the data-filled gap period is very likely to be reduced. This relationship has the potential to be used to empirically assess the anticipated uncertainty in data-filling using normalized transfer functions. The regression plots of SelfDF and ValDF RMSE of MWS are shown in Figure 3-17, with detailed figures shown in Annex H. Figure 3-17. Regression plots of self-prediction and validation RMSE for 1-day (top) and 60-days (bottom) scenarios Source: Author’s own illustration When a representative measurement campaign is accessible, the strong negative connection discovered between the ValDF and SelfDF KPIs might be used as a proxy to judge the performance of a nearby future measurement campaign. More crucially, in a sufficiently offshore situation, this can serve as a credible empirical tool for assessing the uncertainties associated with data gaps. It should be emphasized that this link has not been mentioned or discussed in any related literature before. Because these results show a strong association, independent validation of these results would be required before this novel approach could be used in future studies. Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 84 | 110 For the remainder of this work, the approach is referred to as the inverse self-prediction error (ISPE) method. 3.2.2 Data filling results Figure 3-18 illustrates the evolution of the mean difference between actual and predicted wind speeds during a 60-day period. Additionally, the Figure for the 60-day gap period beginning on 01.07.2012 gives insight into the findings by displaying both actual and forecasted wind speed time series. Figure 3-18. Evolution of MBE of observed vs predicted wind speeds for 60-days gap period Source: Author’s own illustration, generated in ISo1 The overall mean bias error for the first day of July 2017 is minimal at -0.05 m/s; however, in the plot with time series deviations up to 3-4 m/s may be seen distinctly between the observed and predicted time series as shown in Figure 3-19. When the scatter plot, as seen in Figure 3-20, is evaluated, the magnitude of this variance becomes even more apparent. The parameters of the regression fit are shown in Figure 3-6, presenting a good correlation between the independent datasets. Table 3-6. LLS model parameter of validation period for 60-days gap period (start at 01.07.2012) Gap period Model Time steps Intercept [m/s] Slope R² 60-days Trained from concurrent time series 1438 0.856 1.058 0.79 Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 85 | 110 Figure 3-19. Time series of observed vs predicted wind speeds for 60-days gap period starting on 01.07.2012 Source: Author’s own illustration, generated in ISo1 Figure 3-20. Scatter plot of observed vs predicted wind speeds for 60-days gap period starting on 01.07.2012 Source: Author’s own illustration, generated in ISo1 Comparison of wind direction frequency of observed versus predicted wind speeds for 60-days gap period starting on 01.07.2012 is shown in Figure 3-21. While there is acceptable agreement across the broad sectors, it should be noted that the predicted primary wind direction of simplified MCP is offset approximately by a sector. July 2012 August 2012 0 5 10 15 20 W in d s p ee d ( m /s ) Comparison of time series, 60 days gap period, start. 01.07.2012 Predicted Observed 0 5 10 15 20 25 0 5 10 15 20 25 P re d ic te d ( m /s ) Comparison (60 days gap - start: 01.07.2012) Observed (m/s) Data Line of best fit Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 86 | 110 Figure 3-21. Comparison of wind direction frequency of observed vs predicted wind speeds for 60-days gap period starting on 01.07.2012 Source: Author’s own illustration, generated in ISo1 The standard deviation of all calculated STWS is 0.007 m/s, encompassing all gap times. The STWS has a low coefficient of variation, demonstrating a linear trend for the gaps, ranging from 0.01% for a single day gap to 0.12% for a 60-day gap. The highest and smallest deviations from the recorded short-term wind speed are respectively 0.26% and -0.34%, indicating outstanding performance. Figure 3-22 illustrates the progression of STDF-WS for 1-day (top) and 60-day (bottom) gap situations. Wind Direction Frequency (60 days gap period, start: 01.07.2012) 0° 30° 60° 90° 120° 150° 180° 210° 240° 270° 300° 330° 0% 12% 24% Observed Predicted Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 87 | 110 Figure 3-22. Evolution of STDF-WS for 1-day (top) and 60-days (bottom) gap scenarios Source: Author’s own illustration 3.2.3 Long term correction results The study’s key research question was whether an interim phase of data filling is required prior to applying the long-term correction. Therefore two versions of LTWS were constructed using the Python code and the procedures outlined above for each sliding window of the gap, ranging from a 1-day gap to a 60-day gap, starting with a 1-day gap and increasing to a 60-day gap afterwards. The first LTWS was produced by fitting an omnidirectional linear regression model to concurrent time series and reference datasets. Only concurrent measurements with gaps were utilized in the second relationship. The following Figure 3-23 depicts the evolution of the LTWS over a period of one day and sixty days. The following Figure 3-24 shows the comparison of data-filled long-term time series with long- term wind speed time series that did not go through the intermediate step of data filling for a 60-day gap period beginning on the first of July 2012. Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 88 | 110 Figure 3-23. Evolution of LTWS without DF and LTWS with DF for 1-day (top) and 60- days (bottom) gap scenarios Source: Author’s own illustration Figure 3-24. Scatter plot of DF predicted vs LTC predicted wind speeds for 60-days gap period starting on 01.07.2012 Source: Author’s own illustration, generated in ISo1 As seen above, both variants of the LTWS are identical and do not differ at all for the largest gap studied. This is predicted, given the omnidirectional regression parameters and the lessened influence of any change in model fit caused by the proportion of gaps. This conclusion may be drawn by examining the following Table 3-7 more closely; the LLS-slope model’s and 0 5 10 15 20 25 0 5 10 15 20 25 L L S w it h D F ( m /s ) Comparison (60 days gap - start: 01.07.2012) LLS without DF (m/s) Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 89 | 110 intercept parameters are equal for data-filled and starting (with gap) time series. The percentage of the biggest gap is 6.9%, which means that any change in the connection after the gap is filled affects just 7% of the final linear relationship. Table 3-7. LLS model parameter of LTC for 1-day, 20-days and 60-days scenarios Gap period Data-filling Fraction of data gap [%] Time steps Intercept [m/s] Slope R² 1-day Data-filled time series - 17472 0.441 1.011 0.918 Without data-filling 0.1% 17448 0.442 1.011 0.918 20-days Data-filled time series - 17472 0.435 1.012 0.922 Without data-filling 2.7% 17018 0.435 1.012 0.921 60-days Data-filled time series - 17472 0.404 1.014 0.926 Without data-filling 6.9% 16057 0.404 1.014 0.922 Source: Author’s own illustration, x-axis start of the gap-time. Additionally, the study is interested in observing and comprehending the effect of gaps on the LTWS. It can be seen that the long-term correction results vary considerably more than a quarter downward during the gap periods beginning in the early weeks of January 2013. Similarly, for the gap periods beginning in September 2013 and ending at the end of the corresponding year, the divergence is more upward. Annex K has thorough documentation of the LTWS for each gap period, including the measured wind speeds for comparison. While examining these figures, it is critical to note that the gap periods listed above omit a time of high wind periods. Basse [96] examined the seasonality and behaviour of reanalysis datasets in considerable detail using the linear regression method with residuals. For the majority of the investigated cases, the mean of the adjusted wind speed time series is underestimated for summer measurements, whereas it is overestimated for the winter season, where the outcome was dominated by the reanalysis data’s significant seasonality. Considering the aforementioned observations and the literature findings, the modest step-up in LTWS increase may be explained by the predicted “overcorrection” of the model fit, slightly overestimating average short-term wind speeds. This also highlights the importance of having a seasonally balanced short-term dataset while conducting an MCP. While the above argument may explain the overestimation of LTWS in that particular case, it does not explain the underestimation of wind speeds fully in the winter period from February to March 2013, as well as in February to March 2014. Figure 3-25 shows the course of monthly wind speeds throughout the measurement period for the concurrent combined measured and Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 90 | 110 reference dataset. On the other hand, Figure 3-26 illustrates the normalized monthly wind speeds obtained from the data, as well as the projected average normalized wind speed at the target site. As seen in Figure 3-26, the period from October 2013 to January 2014 was an exceptional high-wind season. As a result, it is expected that a significant gap established over such a period will raise the LTWS. In comparison, during a typical average year, normalized wind speeds fall below the 100% range beginning in February and gradually recover until September, when an underestimation of LTWS is predicted. Figure 3-25. Concurrent measured and referenced monthly wind speeds during short- term period Source: Author’s own illustration, via ISo1 Figure 3-26. Monthly windiness comparison of the short and long-term period Source: Author’s own illustration 0.6 0.8 1 1.2 1.4 1.6 1.8 A nn ua liz ed m on th ly w in d sp ee ds [- ] Date in month and year Monthly windiness based on LTWS Measured annualized with LTWS Reference long-term average Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 91 | 110 It is noted that the underestimation of the LTWS during the first February-March period is more pronounced than the second phase of the measurement period. Figure 3-27. Measured wind frequency roses, measurement period 2013 (top left), measurement period 2014 (top right), measurement period 2015 (bottom left), long-term reference period (bottom right) Source: Author’s own illustration, via ISo1 As may be seen in Figure 3-27, the wind rose was slightly different in 2013, with a high frequency of easterly winds in February and March. An omnidirectional linear model, as demonstrated by the sectorwise linear model fit parameter in Table 3-7, is bound to underestimate such periods. This argument identifies a disadvantage of omnidirectional evaluation and suggests that sector-specific analyses may be more suitable. Nonetheless, it should be highlighted that the disadvantage of a sector-based correlation would be fewer points in sectors for rare weather events. There might also be a restriction in analysing very brief gaps. This is thought to be a future study subject. Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 92 | 110 Table 3-8. Sectorwise LLS model parameter – full measurement period Sector Range Time steps Intercept [m/s] Slope R² 0 345° - 15° 1280 -0.179 1.062 0.905 1 15° - 45° 1066 0.22 1.006 0.863 2 45° - 75° 1131 0.428 1.024 0.885 3 75° - 105° 1307 -0.183 1.130 0.917 4 105° - 135° 834 0.73 1.019 0.879 5 135° - 165° 792 0.64 1.023 0.896 6 165° - 195° 1600 0.75 1.004 0.924 7 195° - 225° 2651 0.754 0.974 0.923 8 225° - 255° 2513 0.431 1.002 0.903 9 255° - 285° 1828 0.45 0.996 0.922 10 285° - 315° 1280 0.314 1.023 0.938 11 315° - 345° 1190 0.32 1.009 0.917 Source: Author’s own illustration In conclusion, whilst it is self-evident that the intermediate step of data filling was unnecessary within this study for the purpose of generating long-term wind speeds, generalizing this result without testing more sophisticated methods would be incorrect. Combining alternative data- filling procedures and/or using more advanced methodologies may result in a different output. For instance, ISo1 uses a Markov-based reconstruction mechanism to generate synthetic data to fill in gaps in a measured time series. This synthetic data has the same frequency distribution, seasonal and diurnal trends, and autocorrelation as the observed data [73]. Additionally, it would be beneficial to use statistical testing techniques with hypothesis testing where such methods are implemented. 3.3 Evaluation of the DF and LTC uncertainties Figure 3-28 illustrates the progress of DF uncertainty for 1-day and 60-day gap scenarios, as well as the percentage deviation from the observed short-term mean average. The coefficient of variation for short-term wind speed estimates is between 0.01% and 0.15% for 1-day and 60-day gaps, respectively. With a 52-day gap, the calculated maximum variation of the STWS average is -0.34%, which is considered a modest level. A similar deviation can be observed in Figure 3-28 for the 60- days gap scenario for the gap period starting in February 2013. The detailed evolution of the DF uncertainties can be seen in Annex L, alongside the measured time series at the bottom of each chart. A visual similarity between the deviation and uncertainty bounds is visible. Similar Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 93 | 110 to the discussion in Section 3.2.3, the evolution of DF uncertainties is driven primarily by the seasonality of the reference dataset and MCP method. This is a direct consequence of the inverse relationship from the predictions in the validation period. Furthermore, it can be observed that the averaged mean deviation in percentage is significantly lower than the associated uncertainty. Figure 3-28. Evolution of DF uncertainties for 1-day and 60 days gap scenarios Source: Author’s own illustration Figure 3-29 below illustrates the evolution of JK uncertainties in LT correction for scenarios with a 1-day and 60-day gap, respectively. The difference between the JK uncertainties for the scenario with data-filling and the scenario without data-filling is quite minor, with the difference growing somewhat for the scenario with the largest 60-day gap. While it can be observed that the JK uncertainties for the scenario without data-filling are more uniform, the other scenario, with data-filling, exhibits greater variability in the 60 days scenario with an increased COV of 38%, as compared to an increased COV of 18% in the scenario without data-filling. Throughout the 2013/2014 winter season, for example, the compensatory impact of the linear model for the very high wind period is highly visible in the JK uncertainties with DF. During that time, a reduction in the JK uncertainty is apparent, which can be attributed to the more uniform dataset due to data filling. The overall level of uncertainty with DF is around 0.21%, whereas the same figure is 0.02%, slightly less for the scenario without data-filling. Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 94 | 110 Figure 3-29. Evolution of JK uncertainties in LT correction for 1-day and 60 days gap scenarios Source: Author’s own illustration It was suggested to take into account the DF and JK uncertainty while assessing the MCP method's uncertainty. Assuming that each source of uncertainty is statistically independent of the others, the total uncertainty is defined as the square root of the squared uncertainty estimations. This is referred to as the final uncertainty in DF and LTC as shown in Figure 3-30. Figure 3-30. Evolution of combined uncertainties in LT correction for 1-day and 60 days gap scenarios Source: Author’s own illustration Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 95 | 110 It can be seen in the above figure that the total uncertainty is predominantly driven by the data filling uncertainty, which is not surprising. In light of the residual mean square error metrics from the validation periods, this is deemed reasonable. This also suggests the possibility that the missing gaps from an ideal representative 1-year assessment might account for a considerable portion of the LTC uncertainty. The standard error of LTWS predictions was determined to be 0.0% for all gap periods, indicating that the model is consistent. This is reasonable given the large number of forecasts made throughout the gap period, which totals more than 669 for each gap. Regarding the expected uncertainty, the standard deviation of the LTWS predictions might be a more appropriate comparison metric than the standard error. This metric is sometimes referred to as “standard error” in the literature [72]. Nevertheless, it is clear that the standard deviation of the LTWS considerably underestimates the uncertainty margin, as shown in Figure 3-31. Figure 3-31. Comparison of empirical and calculated uncertainties in wind speeds for 60 days gap period starting on 01.07.2012 Source: Author’s own illustration In a recent wind resource assessment study conducted in the Dutch North Sea [18], the omnidirectional correlation uncertainty has been assessed as 1.47% with a Monte Carlo simulation for an FLS measurement campaign with 69 days of a gap. It is interesting to see the good alignment with the above Figure 3-31, as we would expect to see at a minimum 1.4% total uncertainty in the MCP method for a similar gap size. Section 2.6.3 introduced bootstrapping, which may be thought of as a variant on Monte-Carlo simulations. This technique was not implemented due to the high computational power needed Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 96 | 110 for bootstrapping each iteration as a sliding analysis with several loops. Nonetheless, based on the findings of this study, it can be concluded and proposed that bootstrapping should be studied for MCP corrections, preferably in a comparable research project. Figure 3-32 presents a sensitivity case using the newest test version 5 of ISo1, which includes a bootstrapping analysis algorithm to estimate the uncertainties in long-term correction. The graph represents the results of a 500-iteration bootstrapping simulation utilizing an hourly omnidirectional LLS technique for the 60-day gap for both the data-filled and gap-free versions of the concurrent time series. Clearly, the acquired uncertainty level is substantially more than the estimate achieved in this research, which is around 1.4% for the 60-day gap scenario. This might be related to a large number of simulations or to other components of the analysis that were not examined at this point. This is unquestionably another area of research that warrants more exploration. Figure 3-32. Comparison of bootstrap and calculated uncertainties in wind speeds for 60-days gap period starting on 01.07.2012 Source: Author’s own illustration 3.4 Proposed combined MCP uncertainty method Suppose a high-quality, wake-free measurement dataset (benchmark dataset) with at least two years of data in an offshore environment is available. In that case, a combined ISPE & JK approach might be used to estimate the uncertainty in the long-term correction of a nearby FLS measurement campaign with data availability issues. Provided an FLS measurement campaign is in a representative location to the benchmark dataset, the combined ISPE & JK method could be tested as follows • Conduct a gap analysis for the benchmark dataset Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 97 | 110 • Obtain self-prediction RMSE and validation period RMSE of mean wind speed as described in this study • Investigate the linear relationship, and obtain transfer functions if there exists a strong correlation as found in this study for the benchmark dataset • Apply the transfer function of the benchmark dataset to estimate the DF-uncertainty based on the data gap period (1 to 60-days). • Conduct a JK uncertainty for long-term correction • Combine the DF-uncertainties with the JK uncertainty to obtain a final uncertainty in the long- term correction In the case of an FLS measurement campaign in the Dutch North Sea, in a representative location to MMIJ, the transfer functions provided in the Annex I of this study could be tested. Finally, maintaining representative offshore measurement masts in far-offshore conditions – wake-free environment and representative for broader regions are considered highly valuable for research. This can be done with joint-industry projects, can provide a valuable function for verification and validation of FLS campaigns for pre-deployment. Discussion and conclusions University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 98 | 110 4 Discussion and conclusions Wind resource assessment for offshore projects is critical for project financing. The duration of the data gap would be a critical criterion for determining the robustness of a wind resource assessment. Gap filling is required of meteorological time series for a variety of applications that require continuous data series. Fraunhofer IWES examined the effect of data gaps on the estimation of siting parameters in order to identify an appropriate method for filling in data gaps for an offshore measurement. This study sought to determine the effect of data gaps on long- term wind speeds as part of the "Digital Wind Buoy" project. This problem could be investigated by recording key performance indicators (KPIs) for different analysis steps. Therefore, the study aims to establish the maximum acceptable gap duration in a year for an offshore measurement campaign for a robust wind resource assessment. Secondary investigations can be done to confirm the robustness of the gap-filling process. After literature review and the conduction of a stakeholder questionnaire, the MCP method was selected, the target (MMIJ) and reference datasets (ERA5) for MCP were prepared. A performance test algorithm has been run to compare the available MCP methods. The omnidirectional linear regression method, with least-squares model fit with offset, was identified as a suitable solution. Different gap periods starting with one day up to sixty days were investigated to find a quantifiable metric to predict the performance of the data-filling and long-term correction algorithm. An omnidirectional linear regression model was used to obtain both self-prediction and to predict the wind speeds at the introduced artificial gap. The performance of a measure-correlate-predict (MCP) algorithm for data-filling with linear least squares was analysed in detail using two years of the Ijmuiden met mast (MMIJ) measurements. A temporal resolution of one hour was selected for the correlations and model. This model fit was used to obtain both self-prediction performances and to predict the wind speeds at the introduced artificial gap. An inner loop repeated the predictions with a moving gap within the concurrent period, whereas an outer loop increased the gap duration incrementally by 1-day, starting with one day up to a total of 60-days. This modelled relationship was utilized to derive the LTWS twofold. The first scenario generated short-term data-filled time series, which were then used to re-establish the model with the reference dataset and generate the final LTWS. The second scenario was created to acquire the extended (long-term) time series without the need for data-filling. Different MCP methods were tested with an omnidirectional sectoral selection within Windographer using the performance test functionality, and the base-case algorithm was selected as the omnidirectional linear regression with offset for the Python code. Discussion and conclusions University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 99 | 110 The KPI defined for PreDF was evaluated as a prerequisite to running an MCP for data filling and long-term correction. The MBE, MAE, and RMSE of mean wind speeds over concurrent periods were summarized for 1-day and 60-days. Despite its good correlations, the reference dataset could not match the precision of wind speed observations. Although this was not a concern for this type of study, it does provide an opportunity to evaluate the algorithm's performance against a mesoscale modelling solution. SelfDF KPIs were used to compare the outcome of predictions to a known outcome, which is represented by the true measured values. A total of 1 m/s approximated a 10% relative MAE, which was in excellent agreement with previous MCP methods. For the examined periods, the model's mean bias error was zero, and for the 1-day gap scenario, the MAE was roughly 1 m/s throughout the bins. When comparing the differences between the measured and reference periods, an anticipated improvement in the MBE values was observed. MBE, MWS, and RMSE of mean wind speeds were examined throughout the validation period to determine the genuine performance of the tested approach. In the 1-day gap case, the spread of the root mean square error during the measurement period was very limited. Despite the fact that this grew significantly for the 60- day case, the range of RMSE stayed within a narrow 0.05 m/s interval. A high negative relationship was observed for all gap situations between the SelfDF and ValDF RMSE of MWS. This relationship had not been addressed or discussed in any related literature. Because the data revealed a substantial correlation, independent validation is essential before using this unique technique in future investigations. This method is referred to as the inverse self-prediction error (ISPE) method. The ISPE method might serve as a credible empirical tool for assessing the uncertainties associated with data gaps in a sufficiently offshore situation. The evolution of the mean difference between actual and predicted wind speeds was investigated following the data-filling procedure. The short term average wind speed (STWS) predictions had a low coefficient of variation, demonstrating a linear trend for the gaps, ranging from 0.01% for a single day gap to 0.12%. The STWS's maximum and minimum deviations from the measured short-term wind speed were 0.26% and -0.34%, respectively, indicating exceptional performance. Considering that a 60-day gap time equates to 83% availability, the study reaffirmed the industry standard of 80% for measurement campaign data availability. One of the study's main questions was whether a short-term data filling phase was required before applying the long-term correction. The LTWS predictions were identical in both versions. This was mainly due to the omnidirectional regression parameters and the lessened influence of any change in model fit caused by the proportion of gaps. Discussion and conclusions University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 100 | 110 The long-term correction predictions varied seasonally. Extremely strong winds and sectoral fluctuations during times influenced the predictions slightly. As expected, the LTWS results showed overcorrection of linear regression methods. In conclusion, the intermediary stage of data filling was redundant in this investigation. However, generalizing this result without doing additional tests might be misleading. It is recommended to explore more advanced approaches for generating synthetic data to fill in gaps in a measured time series. The standard error of LTWS predictions was determined to be 0.0% for all gap periods, indicating that the model was consistent. This was comprehensible given the large number of predictions made throughout the gap period, which totals more than 669 for each gap. The evolution of DF uncertainties was driven primarily by the seasonality of the reference dataset and MCP method as a direct consequence of the inverse relationship from the predictions in the validation period. The total uncertainty was assessed as the square root of the squared uncertainty estimations of data-filling (ISPE-Method) and jackknife uncertainties. The combined uncertainty was driven by the data filling uncertainty suggesting that the possibility that the missing gaps from an ideal representative 1-year assessment might account for a considerable portion of the LTC uncertainty. Furthermore, it has been observed that the standard deviation of the LTWS considerably underestimated the uncertainty margin. Therefore it is suggested to take into account the DF and JK uncertainty while assessing the MCP method's uncertainty. Bootstrapping should be studied for MCP corrections as a suitable method in further detail, preferably in a comparable research project. The questionnaire's answers are considered extremely valuable and may help shape future studies' conceptualizations. These may include additional variables that may affect the MCP, more advanced non-linear MCP algorithms, data-filling approaches, and sensitivity analysis of metocean parameters. Finally, it is important to highlight that there might be significant year-to-year fluctuations in windiness, which may affect data-filling and MCP operations. According to Burton, many factors might contribute to these changes. According to the researchers, global climate phenomena such as El Nino, volcano eruptions, and solar activity oscillations may be connected. Additionally, the expected effects of human-induced global warming on the climate are controversial and are likely to affect wind conditions in the following decades [97]. This master thesis contains a thorough set of appendices and a summary of the data collected to allow for verification and investigation of any obtained results. References University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 101 | 110 5 References [1] MEHR FORTSCHRITT WAGEN: BÜNDNIS FÜR FREIHEIT, GERECHTIGKEIT UND NACHHALTIGKEIT. KOALITIONSVERTRAG ZWISCHEN SPD, BÜNDNIS 90/DIE GRÜNEN UND FDP. [Online]. Available: https://www.spd.de/fileadmin/Dokumente/ Koalitionsvertrag/Koalitionsvertrag_2021-2025.pdf (accessed: Oct. 1 2021). [2] Measnet, “SITE-SPECIFIC WIND CONDITIONS Version 2 April 2016,” April, 2016. [Online]. Available: https://www.measnet.com/wp-content/uploads/2016/05/Measnet_ SiteAssessment_V2.0.pdf [3] J. Gottschall, B. Gribben, D. Stein, and I. Würth, Floating lidar as an advanced offshore wind speed measurement technique: current technology status and gap analysis in regard to full maturity, 2041840X, vol. 6. [Online]. Available: https:// onlinelibrary.wiley.com/doi/10.1002/wene.250 [4] J. Gottschall and M. Dörenkämper, “Understanding and mitigating the impact of data gaps on offshore wind resource estimates,” Wind Energy Science, vol. 6, no. 2, pp. 505– 520, 2021, doi: 10.5194/wes-6-505-2021. [5] EnArgus Vorhaben '03EE3024' aus Suche nach ''. [Online]. Available: https:// www.enargus.de/detail/?id=1407485 (accessed: Jan. 13 2022). [6] P. Körner, R. Kronenberg, S. Genzel, and C. Bernhofer, “Introducing Gradient Boosting as a universal gap filling tool for meteorological time series,” Meteorologische Zeitschrift, vol. 27, no. 5, pp. 369–376, doi: 10.1127/metz/2018/0908. [7] R. B. Stull, An introduction to boundary layer meteorology: Kluwer Academic; Atmospheric Sciences Library, 13, 1988. [Online]. Available: https://books.google.de/ books?id=eRRz9RNvNOkC&newbks=1&newbks_redir=0&lpg=PP1&dq= An%20introduction%20to%20boundary%20layer%20meteorology&hl=de&pg=PR4 #v=onepage&q=An%20introduction%20to%20boundary%20layer%20meteorology&f=fal se [8] A. Rogers, J. Rogers, and J. Manwell, “Uncertainties in Results of Measure-Correlate- Predict Analyses,” European Wind Energy Conference and Exhibition 2006, EWEC 2006, vol. 3, 2005. [Online]. Available: https://www.researchgate.net/publication/ 237439775_Uncertainties_in_Results_of_Measure-Correlate-Predict_Analyses [9] J. A. Carta, S. Velázquez, and P. Cabrera, A review of measure-correlate-predict (MCP) methods used to estimate long-term wind characteristics at a target site, 13640321, vol. 27. [Online]. Available: https://linkinghub.elsevier.com/retrieve/pii/S1364032113004498 References University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 102 | 110 [10] Morten Lybech Thogersen, WindPRO / MCP Measure-Correlate-Predict: An Introduction to the MCP Facilities in WindPRO: EMD International A/S, 2010. [11] J. Addison, A. Hunter, J. Bass, and M. Rebbeck, “A neural network version of the measure correlate predict algorithm for estimating wind energy yield,” 2022. [Online]. Available: https://www.semanticscholar.org/paper/A-neural-network-version-of-the- measure-correlate-Addison-Hunter/3c323cabd4d960e605560059194a528cfffa5959 [12] A. Derrick, “Development of the Measure-correlate-predict strategy for site assessment,” Proceedings of the BWEA, 1993. [Online]. Available: https://www.researchgate.net/ publication/245913250_Development_of_the_Measure-correlate-predict_strategy_for_ site_assessment [13] A. A. Mortimer, “A new correlation/prediction method for potential wind farm sites,” Proc BWEA, pp. 349–352, 1994, doi: 10.1016/j.energy.2013.10.007. [14] M. Taylor, M. C. Brower, M. Markus, S. Meteorologist, and A. W. S. Truewind, “An Analysis of Wind Resource Uncertainty in Energy Production Estimates,” Proceedings of the European wind energy conference & exhibition, 2004. [Online]. Available: https:// vibdoc.com/an-analysis-of-wind-resource-uncertainty-in-energy-productio.html [15] D. Bechrakis, J. Deane, and E. Mckeogh, “Wind resource assessment of an area using short term data correlated to a long term data set,” Solar Energy, vol. 76, pp. 725–732, 2004, doi: 10.1016/j.solener.2004.01.004. [16] C. J. Sheppard, “Analysis of the measure-correlate-predict methodology for wind resource assessment,” 2009. [Online]. Available: https://humboldt-dspace.calstate.edu/ handle/2148/542 [17] M. Denis Mifsud, T. Sant, and R. Nicholas Farrugia, “Analysing uncertainties in offshore wind farm power output using measure-correlate-predict methodologies,” Wind Energy Science, vol. 5, no. 2, pp. 601–621, 2020, doi: 10.5194/wes-5-601-2020. [18] A. Pulo, O. Sargin, S. Schmidt, W. Schlez, and M. Stoaelinga, “Ten noorden van de Waddeneilanden Wind Farm Zone Wind Resource Assessment Prepared for : Wind Farm Zone Ten noorden van de Waddeneilanden Wind Resource Assessment Prepared for,” 2021. [Online]. Available: https://offshorewind.rvo.nl/file/download/55041024 [19] M. L. Thøgersen, M. Motta, T. Sørensen, and P. Nielsen, “Measure-correlate-predict methods: case studies and software implementation,” European Wind Energy Conference & Exhibition, p. 10, 2007. [Online]. Available: https://www.semanticscholar.org/paper/Measure-Correlate-Predict-Methods%3A-Case- References University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 103 | 110 Studies-and-Thøgersen-Motta/61b794c53c2744869064c83300600566817d86fe http://emd.dk/files/windpro/Thoegersen_MCP_EWEC_2007.pdf [20] S. Liléo, E. Berge, O. Undheim, R. Klinkert, and R. E. Bredesen, “Long-term correction of wind measurements. State-of-the-art, guidelinies and future work,” January, 2013. [Online]. Available: https://www.researchgate.net/publication/285769739_Long-term_ correction_of_wind_measurements_State-of-the-art_guidelines_and_future_work [21] Datacadamia - Data and Co, Statistics - (Residual|Error Term|Prediction error|Deviation) (e| ). [Online]. Available: https://datacadamia.com/data_mining/residual (accessed: Jan. 7 2022). [22] The MCP (Measure-Correlate-Predict) module - Learn moreEMD International. Accessed: Dec. 12 2021. [Online]. Available: https://www.emd-international.com/ windpro/windpro-modules/energy-modules/mcp/ [23] P. Pramod Jain, Wind Energy Engineering. New York: McGraw-Hill Education, 2011. [Online]. Available: https://www.accessengineeringlibrary.com/content/book/ 9780071714778 [24] E. Hau and H. von Renouard, Wind Turbines: Fundamentals, Technologies, Application, Economics: Springer Berlin Heidelberg, 2005. [Online]. Available: https:// books.google.de/books?id=Z4bhObd65IAC [25] S. Emeis, “Wind energy meteorology : atmospheric physics for wind power generation,” 1865-3529, 2013. [Online]. Available: https://link.springer.com/book/10.1007/978-3-642- 30523-8 [26] Statistical population - Wikipedia. Accessed: Dec. 11 2021. [Online]. Available: https:// en.wikipedia.org/wiki/Statistical_population#cite_note-4 [27] BIPM et al., Evaluation of measurement data ‐ Guide to the expression of uncertainty in measurement: JCGM, 2008. [Online]. Available: https://www.bipm.org/documents/ 20126/2071204/JCGM_100_2008_E.pdf/cb0ef43f-baa5-11cf-3f85-4dcd86f77bd6 [28] IEC, “IEC 61400-12-1:2017 Edition 2.0 Wind energy generation systems – Power performance measurements of electricity producing wind turbines,” International Standard, 2017. [Online]. Available: https://webstore.iec.ch/publication/26603 [29] Arithmetic mean - Wikipedia. Accessed: Dec. 11 2021. [Online]. Available: https:// en.wikipedia.org/wiki/Arithmetic_mean [30] Variance - Wikipedia. Accessed: Dec. 11 2021. [Online]. Available: https:// en.wikipedia.org/wiki/Variance#Sample_variance References University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 104 | 110 [31] Covariance ‐ from Wolfram MathWorld. Accessed: Dec. 12 2021. [Online]. Available: https://mathworld.wolfram.com/Covariance.html [32] Pearson Correlation - SPSS Tutorials - LibGuides at Kent State University. Accessed: Dec. 12 2021. [Online]. Available: https://libguides.library.kent.edu/SPSS/PearsonCorr [33] Coefficient of determination - Wikipedia. Accessed: Dec. 12 2021. [Online]. Available: https://en.wikipedia.org/wiki/Coefficient_of_determination [34] Coefficient of Determination (R Squared): Definition, Calculation - Statistics How To. Accessed: Dec. 12 2021. [Online]. Available: https://www.statisticshowto.com/ probability-and-statistics/coefficient-of-determination-r-squared/ [35] FGW e.V., “Technical Guidelines for Wind Turbines Part 6 (TG6) Determination of Wind Potential and Energy Yields,” Tg 6, 2020. [Online]. Available: https://wind-fgw.de/wp- content/uploads/2021/03/200921_TR6_Revision11_EN_ST_prev.pdf [36] Bias (statistics) - Wikipedia. Accessed: Dec. 12 2021. [Online]. Available: https:// en.wikipedia.org/wiki/Bias_(statistics) [37] Mean absolute error - Wikipedia. Accessed: Dec. 12 2021. [Online]. Available: https:// en.wikipedia.org/wiki/Mean_absolute_error [38] Root-mean-square deviation - Wikipedia. Accessed: Dec. 12 2021. [Online]. Available: https://en.wikipedia.org/wiki/Root-mean-square_deviation [39] RMSE: Root Mean Square Error - Statistics How To. Accessed: Dec. 12 2021. [Online]. Available: https://www.statisticshowto.com/probability-and-statistics/regression-analysis/ rmse-root-mean-square-error/ [40] Free Statistics Book. Accessed: Dec. 12 2021. [Online]. Available: https:// onlinestatbook.com/ [41] What is the Standard Error of a Sample ? - Statistics How To. Accessed: Dec. 12 2021. [Online]. Available: https://www.statisticshowto.com/probability-and-statistics/statistics- definitions/what-is-the-standard-error-of-a-sample/ [42] Kolmogorov–Smirnov test - Wikipedia. Accessed: Dec. 12 2021. [Online]. Available: https://en.wikipedia.org/wiki/Kolmogorov–Smirnov_test#Two-sample_Kolmogorov– Smirnov_test [43] UL International, Windographer: UL. Accessed: Jul. 1 2021. [Online]. Available: https:// www.windographer.com/ [44] Normal distribution - Wikipedia. Accessed: Dec. 12 2021. [Online]. Available: https:// en.wikipedia.org/wiki/Normal_distribution References University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 105 | 110 [45] I. Troen and E. Petersen, “European Wind Atlas,” Roskilde: Riso National Laboratory, 1989, vol. -1, 1989. [Online]. Available: https://www.osti.gov/etdeweb/biblio/5920204 [46] DNV GL - Energy, “WindFARMER Theory Manual Version 5.3,” April, 2014. [47] Linear Regression. Accessed: Dec. 24 2021. [Online]. Available: http:// www.stat.yale.edu/Courses/1997-98/101/linreg.htm [48] Wikipedia, Linear least squares. [Online]. Available: https://en.wikipedia.org/w/index.php ?title=Linear_least_squares&oldid=1054104043 (accessed: Dec. 25 2021). [49] “Renewables Renewables Software Data/Analytics,” [Online]. Available: https:// collateral-library-production.s3.amazonaws.com/uploads/asset_file/attachment/2498/ UL_Wind_SoftwareData_163.02.1018.EN.EPT_Digital.pdf [50] P. Baas, F. C. Bosveld, and G. Burgers, “The impact of atmospheric stability on the near-surface wind over sea in storm conditions,” Wind Energy, vol. 19, no. 2, pp. 187– 198, 2016, doi: 10.1002/we.1825. [51] M. Anderson and J. Bass, “A Review of MCP Techniques,” RES 03, 2004. [52] Principal component analysis - Wikipedia. [Online]. Available: https://en.wikipedia.org/ wiki/Principal_component_analysis (accessed: Dec. 25 2021). [53] Linear least squares example2 - Linear least squares - Wikipedia. Accessed: Dec. 24 2021. [Online]. Available: https://en.wikipedia.org/wiki/Linear_least_squares #/media/File:Linear_least_squares_example2.svg [54] Total least squares - Total least squares - Wikipedia. Accessed: Dec. 24 2021. [Online]. Available: https://en.wikipedia.org/wiki/Total_least_squares #/media/File:Total_least_squares.svg [55] V. A. Barbur, D. C. Montgomery, and E. A. Peck, Journal of the Royal Statistical Society. Series D (The Statistician), vol. 43, no. 2, pp. 339–341, 1994, doi: 10.2307/2348362. [56] Wikipedia, Regression analysis. [Online]. Available: https://en.wikipedia.org/w/index.php ?title=Regression_analysis&oldid=1060800391 (accessed: Jul. 1 2022). [57] J. Beltran, L. Cosculluela, C. Pueyo, and J. J. Melero, “Comparison of measure- correlate-predict methods in wind resource assessments,” in European Wind Energy Conference and Exhibition 2010, EWEC 2010, 2010, pp. 3280–3286. [Online]. Available: https://www.researchgate.net/publication/266242232_Comparison_of_ measure-correlate-predict_methods_in_wind_resource_assessments [58] M. Leblanc, D. Schoborg, S. Cox, A. Haché, and A. Tindal, “Is a Non-linear MCP method a useful tool for North American wind regimes,” in Proceedings of the AWEA References University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 106 | 110 2009 Windpower Conference and Exhibition, Chicago, IL, USA, 2009. [Online]. Available: https://www.yumpu.com/en/document/view/51447119/non-linear-mcp-gl- garrad-hassan [59] J. V. Miguel, E. A. Fadigas, and I. L. Sauer, “The influence of the wind measurement campaign duration on a measure-correlate-predict (MCP)-based wind resource assessment,” Energies, vol. 12, no. 19, 2019, doi: 10.3390/en12193606. [60] D. Hanslian, “The Matrix of Measure- Correlate-Predict Methods,” in 2017. [Online]. Available: https://businessdocbox.com/Green_Solutions/85349495-The-matrix-of- measure-correlate-predict-methods.html [61] J. C. Woods and S. Watson, “A new matrix method of predicting long-term wind roses with MCP,” Journal of Wind Engineering and Industrial Aerodynamics, vol. 66, pp. 85– 94, 1997, doi: 10.1016/S0167-6105(97)00009-3. [62] S. C. Ramli and M. H. Windolf, “Uncertainty in the application of the Measure-Correlate- Predict(MCP) method in wind resource assessment,” [Online]. Available: http:// c2wind.com/f/content/sundus_ramli_p0355.pdf [63] E. Saarnak, “Case Study of Uncertainties Connected to Long-term Correction of Wind Observations,” Uppsala universitet. [Online]. Available: https://www.diva-portal.org/ smash/get/diva2:622452/FULLTEXT01.pdf [64] T. Lambert and A. Grue, “The Matrix Time Series method for MCP,” in Proceedings of the WINDPOWER 2012 Conference, Atlanta, Georgia, USA, 2012. [65] N. D. Waars, “Lidar and MCP in wind resource estimations above measurement-mast height,” DTU. [Online]. Available: http://repository.tudelft.nl/ [66] J. Zhang, S. Chowdhury, A. Messac, and B.-M. Hodge, “Assessing Long-Term Wind Conditions by Combining Different Measure-Correlate-Predict Algorithms,” in 2014. [Online]. Available: https://www.nrel.gov/docs/fy13osti/57647.pdf [67] E. Alpaydin, Introduction to machine learning. Cambridge, MA, London: MIT Press, 2004. [Online]. Available: https://books.google.de/books?hl=de&lr=&id= tZnSDwAAQBAJ&oi=fnd&pg=PR7&dq=introduction+to+machine+learning+&ots= F3VR518nyj&sig=tw5aptPDqObfKsvlzeYUa1vktC0&redir_esc=y #v=onepage&q=introduction%20to%20machine%20learning&f=false [68] R. Klinkert, “Master of Science Thesis Uncertainty Analysis of Long Term Correction Methods for Annual Average Winds,” UMEÅ UNIVERSITY. [Online]. Available: http:// umu.diva-portal.org/smash/record.jsf?pid=diva2%3A556297&dswid=6024 References University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 107 | 110 [69] Wikipedia, Machine learning. [Online]. Available: https://en.wikipedia.org/w/index.php? title=Machine_learning&oldid=1063012883 (accessed: Jan. 1 2022). [70] M. Petrelli, Introduction to Python in Earth Science Data Analysis : From Descriptive Statistics to Machine Learning, Springer Textbooks in Earth Sciences, Geography and Environment, 1st ed. Cham. [71] M. Nielsen, “Long-term correction of wind observations by diffusion-based transformation,” DTU, 2019. [Online]. Available: https://backend.orbit.dtu.dk/ws/ portalfiles/portal/180029732/dtu_wind_e_0183.pdf [72] C. King and B. Hurley, “The SpeedSort, DynaSort and Scatter Wind Correlation Methods,” Wind Engineering, vol. 29, no. 3, pp. 217–241, 2005, doi: 10.1260/030952405774354868. [73] UL International, “Windographer Helpfile,” Accessed: Jul. 1 2021. [Online]. Available: https://www.windographer.com/ [74] Paul van Lieshout, Improvements in AEP Calculations Using IEC 61400. [Online]. Available: https://www.windtech-international.com/editorial-features/improvements-in- aep-calculations-using-iec-61400 (accessed: Jan. 13 2022). [75] Definition of ALGORITHM. [Online]. Available: https://www.merriam-webster.com/ dictionary/algorithm (accessed: Jan. 2 2022). [76] A. Romo Perea, J. Amezcua, and O. Probst, “Validation of three new measure-correlate- predict models for the long-term prospection of the wind resource,” Journal of Renewable and Sustainable Energy, vol. 3, no. 2, p. 23105, 2011, doi: 10.1063/1.3574447. [77] Questionnaire on "Analysis and Method Selection of a Measure-Correlate-Predict Procedure". [Online]. Available: https://www.empirio.de/s/dd9bXys1XW (accessed: Jan. 7 2022). [78] wrag groups.io Group. [Online]. Available: https://groups.io/g/wrag (accessed: Jan. 7 2022). [79] J. C. Y. Lee and M. J. Fields, “An overview of wind-energy-production prediction bias, losses, and uncertainties,” Wind Energy Science, vol. 6, no. 2, pp. 311–365, 2021, doi: 10.5194/wes-6-311-2021. [80] M. Brower et al., Wind Resource Assessment : A Practical Guide to Developing a Wind Project. Somerset, UNITED STATES: John Wiley & Sons, Incorporated, 2012. [Online]. References University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 108 | 110 Available: https://books.google.de/books?id=5dSzcF_cowkC&newbks=1&newbks_redir= 0&hl=de&redir_esc=y [81] C. de Valk, I. L. Wijnant, “Uncertainty analysis of climatological parameters of the Dutch Offshore Wind Atlas (DOWA),” Royal Netherlands Meteorological Institute; Ministry of Infrastructure and Water Management, De Bilt TR-379, 2019. [Online]. Available: https:// www.dutchoffshorewindatlas.nl/binaries/dowa/documents/reports/2019/12/10/knmi- report---uncertainty-analysis-of-climatological-parameters/ Uncertainty+analysis+of+climatological+parameters+of+the+DOWA.pdf [82] Det Norske Veritas, “USE OF REMOTE SENSING FOR WIND ENERGY ASSESSMENTS,” April, undefined-undefined, 2011. [Online]. Available: https:// rules.dnv.com/docs/pdf/dnvpm/codes/docs/2011-11/RP-J101.pdf [83] J. B. Duncan, P. A. van der Werff, and E. Bot, “Understanding of the Offshore Wind Resource up to High Altitudes ( ≤ 315 m ),” TNO, 2018. [Online]. Available: https:// repository.tno.nl/islandora/object/uuid%3Ab15f4402-f78f-41b5-bcf1-2d5cad45abf6 [84] Project Jupyter. [Online]. Available: https://jupyter.org/ (accessed: Jan. 10 2022). [85] Python.org, Welcome to Python.org. [Online]. Available: https://www.python.org/ (accessed: Jan. 10 2022). [86] Anaconda | The World's Most Popular Data Science Platform. [Online]. Available: https://www.anaconda.com/ (accessed: Jan. 10 2022). [87] Matplotlib — Visualization with Python. [Online]. Available: https://matplotlib.org/ (accessed: Jan. 10 2022). [88] scikit-learn, 3.3. Metrics and scoring: quantifying the quality of predictions. [Online]. Available: https://scikit-learn.org/stable/modules/model_evaluation.html#classification- metrics (accessed: Jan. 10 2022). [89] Statistical functions (scipy.stats) — SciPy v1.7.1 Manual. [Online]. Available: https:// docs.scipy.org/doc/scipy/reference/stats.html (accessed: Jan. 10 2022). [90] PyPI, dc-stat-think. [Online]. Available: https://pypi.org/project/dc-stat-think/ (accessed: Jan. 10 2022). [91] Meteomast IJmuiden (MMIJ) – Wind op Zee. [Online]. Available: https:// www.windopzee.net/en/locations/meteomast-ijmuiden-mmij/ (accessed: Jan. 10 2022). [92] This is Nimbus! Accessed: Dec. 6 2021. [Online]. Available: https:// nimbus.windopzee.net/ References University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 109 | 110 [93] H. Hersbach et al., “The ERA5 global reanalysis,” Quarterly Journal of the Royal Meteorological Society, vol. 146, no. 730, pp. 1999–2049, 2020, doi: 10.1002/qj.3803. [94] E. Werkhoven and J. P. Verhoef, “Abstract of instrumentation report - Offshore Meteorological Mast IJmuiden,” ECN, 2012. [Online]. Available: https:// www.windopzee.net/wp-content/uploads/2019/07/ecn-wind_memo-12-010_abstract_of_ instrumentatierapport_meetmast_ijmuiden.pdf [95] ParaView. [Online]. Available: https://www.paraview.org/ (accessed: Jan. 13 2022). [96] A. Basse, D. Callies, A. Grötzner, and L. Pauscher, “Seasonal effects in the long-term correction of short-term wind measurements using reanalysis data,” Wind Energy Science, vol. 6, no. 6, pp. 1473–1490, 2021, doi: 10.5194/wes-6-1473-2021. [97] T. Burton, D. Sharpe, N. Jenkins, and E. Bossanyi, Wind Energy Handbook: John Wiley & Sons, 2001. [Online]. Available: https://books.google.de/books?id=4UYm893y-34C References University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy 110 | 110 Berlin, 05.01.2022 Affirmation I herewith assure that I wrote the present thesis independently, the thesis has not been partly or fully submitted as graded academic work and that I have used no other means as the ones indicated. I have indicated all parts of the work in which sources are used according to their wording or to their meaning. I declare agreement to the inspection of my work with software to detect plagiarism. For this purpose I provide an anonymised electronic version of my work in a prevalent text editing format. Questionnaire University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy A-1 Annex A Questionnaire PreDF - Sectorwise exemplary results of the concurrent period University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy B-1 Annex B PreDF - Sectorwise exemplary results of the concurrent period KPI Results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy C-1 Annex C KPI Results Evolution of self-prediction RMSE of MWS results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy D-1 Annex D Evolution of self-prediction RMSE of MWS results Evolution of validation MBE of MWS results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy E-1 Annex E Evolution of validation MBE of MWS results Evolution of validation MAE of MWS results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy F-1 Annex F Evolution of validation MAE of MWS results Evolution of validation RMSE of MWS results University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy G-1 Annex G Evolution of validation RMSE of MWS results Regression plots of self-prediction and validation RMSE University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy H-1 Annex H Regression plots of self-prediction and validation RMSE MMIJ transfer functions to obtain data-filling uncertainties in an representative location University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy I-1 Annex I MMIJ transfer functions to obtain data-filling uncertainties in an representative location Evolution of DFWS University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy J-1 Annex J Evolution of DFWS Evolution of LTWS University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy K-1 Annex K Evolution of LTWS Evolution of DF uncertainties University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy L-1 Annex L Evolution of DF uncertainties Evolution of JK uncertainties University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy M-1 Annex M Evolution of JK uncertainties Evolution of final uncertainties in LTWS University of Kassel WES MScThesis Sargin - MCP Methodology for a Digital Wind Buoy N-1 Annex N Evolution of final uncertainties in LTWS