Data Pre Processing

  • June 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Data Pre Processing as PDF for free.

More details

  • Words: 1,626
  • Pages: 16
4/CY/O8 Advanced System Identification

Lecture 4

Data Pre-processing

When data have been collected from the identification experiment, it is often necessary to do some processing on the data set prior to using it for identification.

There are several possible deficiencies in the data that should be attended to:

1. High-frequency disturbances in the data record, above the frequencies of interest to the system dynamics. 2. Occasional outliers and missing data. 3. Drift and offset, low frequency disturbances.

Dr. V.M. Becerra

1

4/CY/O8 Advanced System Identification

Lecture 4

Drifts and detrending There are two different approaches to dealing with these problems:

1. Removing the disturbances by explicit pre-treatment of the data.

2. Letting the noise model take care of the disturbances.

The first approach involves removing trends and off-sets by direct subtraction.

Dr. V.M. Becerra

2

4/CY/O8 Advanced System Identification

Lecture 4

Signal offsets There are several approaches to dealing with signal offsets or non-zero mean values: 1. Let y(t) and u(t) be deviations from a physical equilibrium: y (t ) = y m (t ) − y (t ) , y (t ) = y m (t ) − y (t ) 2. Subtract the mean values from the data. 1 N m 1 N m y = ∑ y (t ) , u = ∑ u (t ) N t =1 N t =1 3. Estimate the offset explicitly. For example,

A(q −1 ) y m (t ) = B(q −1 )u m (t ) + α + v(t )

Dr. V.M. Becerra

3

4/CY/O8 Advanced System Identification

Lecture 4

Drift, trends and seasonal variations

Methods to cope with other slow disturbances in the data are analogous to the previous approaches for dealing with offsets.

Drifts and trends can be seen as time-varying equilibrium points, or time-varying mean values.

With some knowledge about the frequencies of the slow variations, an alternative is to high-pass filter the data: y (t ) = F (q ) y m (t ), u (t ) = F (q )u m (t )

Dr. V.M. Becerra

4

4/CY/O8 Advanced System Identification

Lecture 4

Outliers and missing data In practice, data acquisition equipment is not perfect. It may be that single values of the input-output data are missing or corrupt due to malfunctioning of sensors or communication links. A practical method to deal with this problem is to replace outliers or missing measurements by smoothed estimates prior to parameter estimation.

Dr. V.M. Becerra

5

4/CY/O8 Advanced System Identification

Lecture 4

Model Validation – Residual Analysis Once a model has been identified, it is important to validate the model using a data set that should be independent of the data used to calculate the model parameters. The ‘leftovers’ of the modeling process - the part of the data that the model could not reproduce- are the residuals:

ε (t ) = ε (t ,θˆN ) = y (t ) − yˆ (t ,θˆN ) These residuals carry information about the quality of the model. When doing model validation, one typically computes the Mean Square Error, the autocorrelation of the residuals, and the cross-correlation between the residuals and the input. The values of the autocorrelation and cross-correlation should be small and lie within certain confidence limits.

Dr. V.M. Becerra

6

4/CY/O8 Advanced System Identification

Lecture 4

The Mean Square Error This is the average of the squared error:

1 MSE = N

N

2 ε ∑ (t ) t =1

This is a measure in a single positive number of how well the model output fits the measured data.

Auto-correlation of the residuals. As the residuals are assumed to be a white noise sequence, it is good to check its auto-correlation:

1 Rε (τ ) = N N

N

∑ ε (t )ε (t − τ ) t =1

For different values of τ = 1, 2, 3, 4, … If these numbers are not small for τ ≠ 0 then part of ε could have been predicted from past data, and so this is a sign of deficiency in the model.

Dr. V.M. Becerra

7

4/CY/O8 Advanced System Identification

Lecture 4

Cross-correlation between the residuals and the input Similarly, the residuals should not be correlated with the input, so it is also good to check the cross-correlation of the residuals and the input:

1 R (τ ) = N N εu

N

∑ ε (t )u(t − τ ) t =1

If there are traces of past inputs in the residuals, then there is a part of y(t) that originates from the past input and that has not been properly picked up by the model. Hence, the model could be improved.

Dr. V.M. Becerra

8

4/CY/O8 Advanced System Identification

Lecture 4

Subspace Identification – An introduction

A linear system can always be represented in state space form: x(k +1) = A x(k) + Bu(k) + w(t) y( k) = C x(k) + Du( k) + v( k)

where: x is a n-dimensional state vector u is a nu-dimensional input vector y is a ny dimensional output vector v is a ny-dimensional noise vector w is a n-dimensional process noise vector A, B, C and D are parameter matrices of the appropriate dimensions.

Dr. V.M. Becerra

9

4/CY/O8 Advanced System Identification

Lecture 4

The main idea behind subspace identification techniques is that given the input-output data sequences u(t), y(t), t=1…N, the state sequence x(t), t=1…N, is estimated first, and then the state space matrices A,B,C,D are found using a least squares procedure. Assume for a moment that not only y and u are measured, but also the state vector x. Now, with known u, y and x we can form a linear regression form the state space model above:

LMx(t +1)OP L O ,P Θ = MM A B PP Y (t) = M MN y(t) PQ MNC D PQ LMx(t)OP LMw(t)OP , E(t) = M Φ(t) = M P ( ) u t MN PQ MN v(t) PPQ Then the state space model above may be written as: Y (t) = Θ Φ(t) + E(t)

From this equation, the matrix elements in Θ can be estimated by the simple least squares method.

Dr. V.M. Becerra

10

4/CY/O8 Advanced System Identification

Lecture 4

How do we obtain the state sequence x(t), t=1..N from the input-output data? All state vectors that can be reconstructed from inputoutput data are linear combinations of the n k-step ahead output predictors (See Ljung 1999):

y(t + k|), t k =1,...n where n is the model order (the dimension of x) We can form these predictors and select an algebraic basis among its components.

LM y(t+1|)t OP x(t) = L MMy(t+#n|)t PP N Q The choice of L will determine the basis of the state space realization. The predictor y(t + k|) t is a linear function of u(s), y(s), s=1,…t The method is called subspace identification, because it is based on subspace projections – a concept from linear algebra. A well known algorithm for subspace identification is the so called N4SID, originally developed by Peter Van Overschee at the University of Leuven, Belgium.

Dr. V.M. Becerra

11

4/CY/O8 Advanced System Identification

Lecture 4

EXAMPLE An identification experiment has been carried out on a two tank mixing process. This process was located at the Control Engineering Centre Laboratory at City University, London, and a schematic diagram is given in Figure 1. ucold

uhot

T1

L1

L2

Tank 1

T2

Tank 2

Figure 1: The two tank mixing process The process has two manipulated inputs, which are the openings (in percentage) of the cold (ucold) and hot water valves (uhot).

Dr. V.M. Becerra

12

4/CY/O8 Advanced System Identification

Lecture 4

The four measured variables are the levels in tanks 1 and 2, L1 and L2 (in cm), and the temperatures in tanks 1 and 2, T1 and T2 (in degrees C). The experiment was carried out in open loop by manually specifying the values of the valve openings. The input sequences to the process were chosen to be binary pseudo-random signals shifting between 10% and 40% for the cold water valve, and between 20% and 40% for the hot water valve. The sampling time used was 20s. The data set consists of 190 samples.

Dr. V.M. Becerra

13

4/CY/O8 Advanced System Identification

Lecture 4

MATLAB code to estimate a state space model for the level in Tank 1 (uses the System Identification Toolbox): load ex090398; mix =iddata([L1],[ucold uhot]); mixd = detrend(mix,’constant’); mixe = mixd([1:95],:); mixv = mixd([96:190],:) ssmodel = n4sid(mixe,2); compare(mixv, ssmodel); resid(mixv, ssmodel); present(ssmodel) 40

35

25 1

L (cm)

30

20

15

10

0

20

40

60

80

100 s a m p le

120

140

160

180

200

0

20

40

60

80

100 s a m p le

120

140

160

180

200

0

20

40

60

80

100 s a m p le

120

140

160

180

200

50

ucold (%)

40 30 20 10 0

50

uhot (%)

40 30 20 10 0

Dr. V.M. Becerra

14

4/CY/O8 Advanced System Identification

Lecture 4

y1. (sim) mixv; measured ssmodel; fit: 93.25%

10

y

1

5

0

-5

-10 100

110

120

130

140

150

160

170

180

190

Correlation function of residuals. Output y1 1

0.5

0

-0.5

0

5

10

15 20 lag Cross corr. function between input u1 and residuals from output y1

25

0.4 0.2 0 -0.2 -0.4 -25

Dr. V.M. Becerra

-20

-15

-10

-5

0 lag

15

5

10

15

20

25

4/CY/O8 Advanced System Identification

Lecture 4

These are the model parameters returned by N4SID: A= x1 x2

x1 0.90194 -0.48226

x2 0.053728 -0.15668

x1 x2

u1 u2 0.0027808 0.0010754 0.0085512 -0.0001901

B=

C= y1

x1 41.407

y1

u1 0

x2 -2.2392

D=

Dr. V.M. Becerra

u2 0

16

Related Documents

Data Pre Processing
June 2020 11
Pre Processing
August 2019 36
Data Processing
June 2020 12
Data Processing
November 2019 22
Data Processing
November 2019 26