On the Detection and Selection of Informative Subsequences from Large Historical Data Records for Linear System Identification
Performing experiments for system identification of continuously operated plants might be restricted as it can impact negatively normal production or cause safety issues. In such cases, using historical logged data for system identification can become an attractive alternative instead of carrying out new experiments. However, since such plants work normally at operating points that are seldom changed, parameter estimation methods with logged data can suffer numerical problems. Methods to locate and select informative data sequences is a promising area that can support system identification in processes where erforming experiments is constrained. At least three main drawbacks of current approaches can be discussed. Firstly, detection tests used in data selection methods are based on time series models even though, they address dynamical systems where the input sequence should also be considered. In case of processes operating in closed loop, excitation caused by external disturbances is not detected if current approaches only evaluate changes in the set points. Secondly, upper interval bounds can be wrongly defined since the process is described by inputoutput models that assume white Gaussian noise (WGN) as additive stochastic disturbance. In practical applications, colored noise is more likely to be found than white Gaussian noise (WGN). Thirdly, in current methods model estimation with the retrieved selected intervals is not supported and therefore the quality of selected data for data-driven modeling cannot be practically assessed. In the data selection method proposed in the present thesis, called data selection for system identification (DS4SID), previous drawbacks are addressed and robust tests are designed and implemented. DS4SID can be applied to multivariate processes operating in open or closed-loop. Two tests are proposed for detection and bounding of informative intervals which simplifies the choice of user-defined parameters. A model is computed using a data merging method which can be used for further analysis. The performance of DS4SID is evaluated in a simulated and laboratory multivariate processes. A process unit of the lab-scale factory ŞμPlantŤ is used as industry-oriented case study. Models estimated with selected informative intervals are shown to have similar performance than estimates with the entire data set.
@book{doi:10.17170/kobra-202201055361, author ={Arengas Rojas, David Leonardo}, title ={On the Detection and Selection of Informative Subsequences from Large Historical Data Records for Linear System Identification}, keywords ={620 and Informationsmodell and Erfassung and Daten and Systemidentifikation}, copyright ={http://creativecommons.org/licenses/by-sa/4.0/}, language ={en}, year ={2022} }