feature With Newton's help and a few improvements, real-time digital systems can find root-means efficiently.
Improve your
root-mean BY BRIAN NEUNABER
R
calculations
eal-time digital systems often require the calculation of a root-mean, such as a root-mean square (RMS) level or average magnitude of a complex signal. While averaging can be efficiently implemented by most microprocessors, the square root may not be— especially with low-cost hardware. If the processor doesn’t implement a fast square root function, it must be implemented in software; although this yields accurate results, it may not be efficient. One common method for computing the square root is Newton’s method, which iteratively converges on a solution using an initial estimate. Since we’re computing the square root of a slowly varying average value, the previous root-mean value makes a good estimate. Furthermore, we can combine the iterative Newton’s method with a first-order recursive averager, resulting in a super-efficient method for computing the root-mean of a signal. In this article, I’ll develop and present three efficient recursive algorithms for computing the root-mean, illustrating each method with signal flow diagrams and example code. To some degree, each of these methods trades hardware complexity for error. I’ll compare the computational performance and error of each method and suggest suitable hardware for each implementation. ROOT-MEAN The root-mean is computed as the square root of the average over time of its input. This average may be recursive or non-recursive, and I’ll briefly review the general case for both. Non-recursive average The non-recursive average, or moving average, is the weighted sum of N inputs: the current input and N-1 previous inputs. In digital filter-
www.embedded.com
embedded systems design FEBRUARY 2006
37
feature ing terminology, this is called a finite impulse response, or FIR filter:
()
N
( )
y n = ∑ an x −n n= 0
(1)
The most common use of the moving average typically sets the weights such that an=1/N. If we were to plot these weights versus time, we would see the “window” of the input signal that is averaged at a given point in time. This 1/N window is called a rectangular window because its shape is an N-by-1/N rectangle. There is a trick for computing the 1/N average so that all N samples need not be weighted and summed with each output calculation. Since the weights don’t change, you can simply add the newest weighted input and subtract the Nth weighted input from the previous sum:
() (
)
y n = y n −1 +
() ( )
1 ⎡ x 0 − x − N ⎤ (2) ⎦ N⎣
The simplest of these in terms of computational complexity and storage (while still being useful) is the first-order recursive average. In this case, the average is computed as the weighted sum of the current input and the previous output. The first-order recursive average also lends itself to an optimization when combined with the computation of the square root, which we’ll discuss shortly. In contrast to the non-recursive average, the first-order recursive average’s window is a decaying exponential (Figure 1). Technically, the recursive average has an infinite window, since it never decays all the way to zero. In digital filtering terminology, this is known as an infinite impulse response, or IIR filter. From Figure 1, we see that earlier samples are weighted more than later samples, allowing us to somewhat arbitrarily define an averaging time for the recursive average. For the first-order case, we define the averaging time as the time at which the impulse response has decayed to a factor of 1/e, or approxi-
While this technique is computationally efficient, it requires storage and circular-buffer management of N samples. Of course, many other window shapes are commonly used. Typically, these window shapes resemble, or are a variation of, a raised cosine between –π/2 and π/2. These windows weight the samples in the center more than the samples near the edges. Generally speaking, you should only use one of these windows when there is a specific need to, such as applying a specific filter to the signal. The disadvantage of these windows is that computational complexity and storage requirements increase with N. Recursive average The recursive average is the weighted sum of the input, N previous inputs, and M previous outputs:
()
N
( )
( )
M
y n = ∑ an x −n + ∑ bm y −m n= 0
m =1
(3)
Comparison of recursive and non-recursive averaging windows ta
1 fs ta y(t)
.
1 e
. .1 f t s
a
t
First-order recursive 1/N Non-recursive
Figure 1
Mean-square computation using Newton’s method for reciprocal square root x(n)
–
Σ
1/2
m(n)
x
z-1
a
Figure 2
38
FEBRUARY 2006
embedded systems design www.embedded.com
a
–
Σ 3
x
x
z-1
y(n)
feature mately 37%, of its initial value. An equivalent definition is the time at which the step response reaches 1–(1/e), or approximately 63%, of its final value. Other definitions are possible but will not be covered here. The weighting of the sum determines this averaging time; to ensure unity gain, the sum of the weights must equal one. As a consequence, only one coefficient needs to be specified to describe the averaging time. Therefore, for first-order recursive averaging, we compute the mean level as:
() (
) ()
( ) = x (n ) + a ⎡⎣m (n − 1) − x(n)⎤⎦
m n = 1 − a x n + a ⋅m n − 1
may work just fine, they don’t take into account the application in which the square root is required. Oftentimes, you may not need exact precision to the last
Listing 1. C++ class that computes the root-mean using Newton's Method for the reciprocal square root static const double Fs = 48000.0; // sample rate static double AvgTime = 0.1; // averaging time class RecipRootMean { public: double Mean; double RecipRootMean; double AvgCoeff;
(4)
RecipRootMean() { AvgCoeff = 1.0 - exp( -1.0 / (Fs * AvgTime) ); Mean = 0.0; RecipRootMean = 1.0e-10; // 1 > initial RecipRootMean > 0
where x(n) is the input, m(n) is the mean value, and a is the averaging coefficient. The averaging coefficient is defined as: a=e
} ~RecipRootMean() {}
−1 fS t
(5)
where t is the averaging time, and fS is the sampling frequency. The root-mean may then be calculated by taking the square root of Equation 4:
()
()
y n = m n
()
(
)
2 = x n + a ⎡ y n − 1 − x(n)⎤ ⎥⎦ ⎢⎣
(6)
double CalcRootMean(double x) { Mean += AvgCoeff * (x - Mean); RecipRootMean *= 0.5 * ( 3.0 - (RecipRootMean * RecipRootMean * Mean) ); return RecipRootMean * Mean; } };
Listing 2. C++ class that implements the floating-point version of Figure 3
where y(n) is the root-mean. EFFICIENT COMPUTATION METHODS Googling “fast square root” will get you a plethora of information and code snippets on implementing fast squareroot algorithms. While these methods
static const double Fs = 48000.0; // sample rate static double AvgTime = 0.1; // averaging time class NewtonRootMean { public: double RootMean; double AvgCoeff; NewtonRootMean() { RootMean = 1.0; // > 0 or divide will fail AvgCoeff = 0.5 * ( 1.0 - exp( -1.0 / (Fs * AvgTime) ) ); } ~NewtonRootMean() {}
Mean-square computation that combines averaging and iterative square root
Σ
x/y
x(n)
(1-a)/2
y(n)
double CalcRootMean(double x) { RootMean += AvgCoeff * ( ( x / RootMean ) - RootMean ); return RootMean; }
–
(1-a)/2
z-1
}; Figure 3
40
bit, or the algorithm itself can be manipulated to optimize the computation of the square root. I present a few basic approaches here.
FEBRUARY 2006
embedded systems design www.embedded.com
feature Only calculate it when you need it Probably the simplest optimization is to only calculate the square root when you absolutely need it. Although this may seem obvious, it can be easily overlooked when computing the root-mean on every input sample. When you don’t need an output value for every input sample, it makes more sense to compute the square root only when you read the output value. One example of an application when this technique can be used is RMS metering of a signal. A meter value that is displayed visually may only require an update every 50 to 100ms, which may be far less often than the input signal is sampled. Keep in mind, however, that recursive averaging should still be computed at the Nyquist rate. Logarithms Recall that: log
42
( x ) = 12 log ( x )
FEBRUARY 2006
(7)
If you’ll be computing the logarithm of a square root, it’s far less computationally expensive to simply halve the result instead. A common example of this optimization is the calculation of an RMS level in dB, which may be simplified as follows:
()
RMS x in dB
( ) ( )
= 20 log ⎡ mean x 2 ⎤ ⎥⎦ ⎣⎢ 2 ⎤ ⎡ = 10 log ⎣ mean x ⎦
(8)
() (
)
y n = y n −1 −
( (
) )
f ⎡⎣ y n − 1 ⎤⎦ f ′ ⎡⎣ y n − 1 ⎤⎦
(9)
If we wish to find y = m , then we need to find the root to the equation f(y)=y2-m. Substituting f(y) into Equation 9, we get:
() (
)
y n = y n −1 −
(
)
2
y n −1 − m
(
)
2y n −1
(10)
Rearranging Equation 9, we get: Newton’s Method Newton’s Method (also called the Newton-Rapson Method) is a well known iterative method for estimating the root of an equation.1 Newton’s Method can be quite efficient when you have a reasonable estimate of the result. Furthermore, if accuracy to the last bit is not required, the number of iterations can be fixed to keep the algorithm deterministic. We may approximate the root of f(x) by iteratively calculating:
embedded systems design www.embedded.com
1⎡ m(n) ⎤ y(n) = ⎢ y(n − 1) + 2⎣ y(n − 1) ⎥⎦
(11)
where y(n) is the approximation of the square root of m(n). Equation 11 requires a divide operation, which may be inconvenient for some processors. As an alternative, we can calculate 1 m and multiply the result by m to get m . Again using Newton’s Method, we find that we may itera-
tively calculate the reciprocal square root as: Y,(n)
=1Y, (n-l)[ 3- Y, (n-l) 2m(n) ]
(12)
and calculate the square root as: y(n) = Y, (n) m(n)
(13)
effIciently handle numbers both greater and less than one. We present this implementation as a signal flow diagram in Figure 2. The averaging coefficient, a, is defined by Equation 5, and Z-I represents a unit sample delay. A code listing for a c++ class that implements the computation in Figure 2 is presented in Listing 1. In this example class, initialization is performed
in the class constructor, and each call to CalcRootMeanO performs one iteration of averaging and square-root computation. Using direct square root Let's go back and take a closer look at Equation 11. Newton's method converges on the solution as quickly as possible without oscillating around it,
Although Newton's Method for the reciprocal square root eliminates the divide operation, it can be problematic for fIxed-point processors. Assuming that m( n) is a positive integer greater than 1, Y,(n) will be a positive number less than one-beyond the range of representation for integer numbers. Implementation must be accomplished using floating-point or mixed integer/fractional number representation. ROOT-MEAN USING NEWTON'S METHOD A subtle difference between Equations 10 and 11 is that m becomes m(n), meaning that we're attempting to fInd the square root of a moving target. However, since m(n) is a mean value, or slowly varying, it can be viewed as nearly constant between iterations. Since y(n) will also be slowly varying, y ( n-l) will be a good approximation to y ( n) and require fewer iterations-one, we hope-to achieve a good estimate. To calculate the root-mean, one may simply apply Newton's Method for calculating the square root to the mean value. As long as the averaging time is long compared to the sample period (t» lI/s), one iteration of the square root calculation should suffIce for reasonable accuracy. This seems simple enough, but we can actually improve the computational efficiency, which will be discussed in one of the following sections. Using reciprocal square root Unlike the iterative square-root method, however, the iterative reciprocal square-root requires no divide. This implementation is best suited for floating-point processing, which can
www.embedded.comlembedded systems design
I FEBRUARY 2006
\ '13
feature average of x(n). An equivalent signalflow representation of Equation 14 is presented in Figure 3. Here, an additional y(n–1) term is summed so that only one averaging coefficient is required. Note that x(n) and y(n–1) must be greater than zero. A code listing for a C++ class that implements the computation shown in Figure 3 is presented in Listing 2. As in the previous example, initialization is performed in the class constructor, and each call to CalcRootMean( ) performs one iteration of averaging/square-root computation. With some care, Figure 3 may also be implemented in fixed-point arithmetic as shown in Listing 3. In this example, scaling is implemented to ensure valid results. When sufficient word size is present, x is scaled by nAvgCoeff prior to division to maximize the precision of the result.
Listing 3. C++ class that implements the fixed-point version of Figure 3 static const double Fs = 48000.0; // sample rate static double AvgTime = 0.1; // averaging time static const unsigned int sknNumIntBits = 32; // # bits in int static const unsigned int sknPrecisionBits = sknNumIntBits / 2; static const double skScaleFactor = pow(2.0, (double)sknPrecisionBits); static const unsigned int sknRoundOffset = (unsigned int)floor( 0.5 * skScaleFactor ); class IntNewtonRootMean { public: unsigned unsigned unsigned unsigned
int int int int
nRootMean; nScaledRootMean; nAvgCoeff; nMaxVal;
IntNewtonRootMean() { nRootMean = 1; // >0 or divide will fail nScaledRootMean = 0; double AvgCoeff = 0.5 * ( 1.0 - exp( -1.0 / (Fs * AvgTime) ) ); nAvgCoeff = (unsigned int)floor( ( skScaleFactor * AvgCoeff ) + 0.5 ); nMaxVal = (unsigned int)floor( ( skScaleFactor / AvgCoeff ) + 0.5 ); } ~IntNewtonRootMean() {} unsigned int CalcRootMean(unsigned int x) { if ( x < nMaxVal ) { nScaledRootMean += ( ( nAvgCoeff * x ) / nRootMean ) - ( nAvgCoeff * nRootMean ); } else { nScaledRootMean += nAvgCoeff * ( ( x / nRootMean ) - nRootMean ); } nRootMean = ( nScaledRootMean + sknRoundOffset ) >> sknPrecisionBits; return nRootMean; } };
RMS computation optimized to eliminate the divide operation x(n)
Σ
x
Σ
x 3a/4
– 2-int[log2( )]
y(n)
z- 1
() (
+
Figure 4
44
FEBRUARY 2006
(
)
1 1 + a y(n − 1) 2 1 x(n) + 1− a 2 y(n − 1) y(n) =
(
)
(14)
where a is defined by Equation 5. Now y(n) converges to the square root of the
embedded systems design www.embedded.com
) ⎡x n () ⎣⎢
y n = y n −1
x
but if we slow this rate of convergence, the iterative equation will converge on the square root of the average of its inputs. Adding the averaging coefficient results in the following root-mean equation:
Divide-free RMS using normalization Now we’ll look at the special case of computing an RMS value on fixed-point hardware that does not have a fast divide operation, which is typical for low-cost embedded processors. Although many of these processors can perform division, they do so one bit at a time, requiring at least one cycle for each bit of word length. Furthermore, care must be taken to insure that the RMS calculation is implemented with sufficient numerical precision. With fixed-point hardware, the square of a value requires twice the number of bits to retain the original data’s precision. With this in mind, we manipulate Equation 14 into the following:
a 2y n −1
(
)
2
(
)
2 − y n −1 ⎤ ⎦⎥
(15)
Although the expression x(n)2–y(n–1)2 must be calculated with double precision, this implementation lends itself to a significant optimization. Note that a/2y(n–1) acts like a level-dependent averaging coefficient. If a slight time-dependent variance in the averag-
feature Listing 4. Freescale DSP563xx assembly implementation of divide-free RMS using normalization RMS ; r4: ; r4+1: ; x0:
address of output bits 24-47 [y_msw(n)] address of output bits 0-23 [y_lsw(n)] input [x(n)]
FS AVG_TIME AVG_COEFF
equ equ equ
48000.0 0.1 @XPN(-1.0/(FS*AVG_TIME))
move move move clb mpy
#>AVG_COEFF,x1 y:(r4)+,a y:(r4),a0 a,b x0,x0,a a,x0
mac
-x0,x0,a x0,y1
normf move mpy
b1,a a,x0 x1,x0,a y:(r4),y0
add
y,a
move move rts
a0,y:(r4)a,y:(r4)
;sampling rate in Hz ;averaging time in seconds ;calculate avg_coeff ;load avg_coeff ;get y_msw(n-1) ;get y_lsw(n-1) ;b=number of leading bits in y(n-1) ;a=x(n)^2 ;x0=y_msw(n-1) ;a=x(n)^2-y_msw(n-1)^2 ;y1=y_msw(n-1) ;normalize x(n)^2-y_msw(n-1)^2 by y_msw(n-1) ;x0=[x(n)^2-y_msw(n-1)^2]norm(y_msw(n-1)) ;a= AVG_COEFF ;*[x(n)^2-y_msw(n-1)^2]norm(y_msw(n-1)), ;y0=y_lsw(n-1) ;a=y(n-1)+avg_coeff ;*[x(n)^2-y_msw(n-1)^2]norm(y_msw(n-1))} ;save y_lsw(n) ;save y_msw(n)
Comparison of RMS calculation methods 0.006 0.005 0.004
Level
0.003 0.002 0.001 0 0
0.1
0.2
0.3
Figure 5
ing time can be tolerated—which is often the case—1/y(n–1) can be grossly approximated. On a floating-point processor, shifting the averaging coefficient to the left by the negative of the exponent approximates the divide operation. This process is commonly referred to as normalization. Some fixed-point DSPs can perform normalization by counting the leading bits of the accumulator and shifting the accumulator by that number of bits.2 In both cases, the averaging coefficient will be truncated to the nearest power of two, so the coefficient must be multiplied by 3/2 to round the result. This implementation is shown in Equation 16. 46
FEBRUARY 2006
is for the Freescale (formerly Motorola) DSP563xx 24-bit fixed-point processor. Of course, this method can be implemented even without fast normalizaRMS Newton's method RMS approximation tion. You can No divide RMS approximation implement a loop to Reciprocal square root method shift x(n)2–y(n–1)2 to the left for each lead0.4 0.5 0.6 0.7 0.8 0.9 1 ing bit in y(n–1). This Seconds will be slower but can be implemented with even the simplest y n = y n −1 of processors. ⎧ 3a − int{log 2 ⎡⎣ y(n−1)⎤⎦} ⎫ i⎪ ⎪⎪+ 4 2 ⎪ HIGHER ORDER AVERAGING ⎨ ⎬, 2 2 (16) Higher order recursive averaging may be ⎤ ⎡ ⎪ x n − y n −1 ⎪ ⎥⎦ ⎪⎭ ⎪⎩ ⎢⎣ accomplished by inserting additional av− int{ log 2 ⎡⎣ x ⎤⎦} eraging filters before the iterative square where 2 is a left-shift by root. These filters may simply be one or the number oflleading bits in x more cascaded first-order recursive sections. First-order sections have the adFigure 4 is the signal-flow diagram vantage of producing no overshoot in that represents Equation 16. Just as in the step response. In addition, there is Figure 3, x(n) and y(n–1) must be only one coefficient to adjust and quangreater than zero. tization effects (primarily of concern for A sample code listing that implefixed-point implementation) are far less ments Figure 4 is shown in Listing 4. than that of higher-order filters. This assembly-language implementation The implementer should be aware
() ( ()
)
(
embedded systems design www.embedded.com
)
feature that cascading first-order sections changes the definition of averaging time. A simple but gross approximation that maintains the earlier definition of step response is to simply divide the averaging time of each first-order section by the total number of sections. However, it is the implementer’s responsibility to verify that this approximation is suitable for the application. Second-order sections may also be used, if you want (for example) a Bessel-Thomson filter response. If second-order sections are used, it’s best to choose an odd-order composite response since the averaging square-root filter approximates the final first-order filter with Q=0.5. Care must be taken to minimize the overshoot of this averaging filter. Adjusting the averaging time of this filter in real time will prove more difficult, since there are a number of coefficients that must be adjusted in unison to ensure stability. RESULTS Three methods of calculating the RMS level are compared in Figure 5. The averaging time is set to 100ms, and the input is one second of 1/f noise with a 48kHz
sampling frequency. The first trace is the true RMS value calculated using Equation 6. The second trace is the RMS calculation using Equation 14. The third trace is the no-divide calculation of Equation 16. The fourth trace is the RMS value using the reciprocal squareroot method of Equation 13. For the most part, the four traces line up nicely. All four approximations appear to converge at the same rate as the true RMS value. As expected, the largest deviation from the true RMS value is the approximation of Equation 16. This approximation will have the greatest error during large changes in the level of the input signal, although this error is temporary: the optimized approximation will converge upon the true RMS value when the level of the input signal is constant. The errors between the three approximations and the true RMS value are shown in Figure 6. The error of the RMS approximation using Equation 14 slowly decreases until it is below 1e–7, which is sufficient for 24-bit accuracy. The optimized approximation of Equation 16 is substantially worse, at about 1e–4, but still good enough for many ap-
Error comparison of RMS calculation methods 1 0.1
SUITABLE FOR AVERAGE READER By combining recursive averaging with Newton’s method for calculating the square root, you’ll gain a very efficient method for computing the root-mean. Although the three methods I presented here are developed for different hardware and each, to some degree, trades off hardware capabilities for error, most of you should find one of these methods suitable for your application. ■ Brian Neunaber is currently digital systems architect of software and firmware at QSC Audio Products. He has designed real-time digital audio algorithms and systems for QSC and St. Louis Music and has an MSEE from Southern Illinois University. You may contact him at brian_neunaber@ qscaudio.com.
Error, Newton's method Error, no divide Error, reciprocal root
0.01 1.10-3 1.10-4 Error
plications. The approximation that uses the reciprocal square root is “in the noise”—less than 1e–9. For highly critical floating-point applications, this is the efficient method of choice. As you would expect, the errors discussed above will be worse with shorter averaging times and better with longer averaging times. Table 1 summarizes the approximate error vs. averaging time of these three methods, along with suitable hardware architecture requirements.
REFERENCES: 1.D. G. Zill. Calculus with Analytic Geometry, 2nd ed., PWS-Kent, Boston, pp. 170-176, 1988. 2.Motorola. DSP56300 Family Manual, Rev. 3, Motorola Literature Distribution, Denver, 2000.
1.10-5 1.10-6 1.10-7 1.10-8 1.10-9
1.10-10 0
0.1
0.2
0.3
0.4
0.5 Seconds
0.6
0.7
0.8
0.9
1
Figure 6
Approximate error vs. averaging time of RMS calculation methods Calculation method Approximate error t=1ms t=10ms t=100ms t=1 sec Reciprocal root (Figure 2) Combined root-mean (Figure 3) Divide-free root-mean (Figure 4)
1e–6 3e–6 1e–4
1e–8 3e–7 1e–4
1e–9 1e–7 1e–4
Table 1
48
FEBRUARY 2006
embedded systems design www.embedded.com
1e–10 1e–7 3e–5
Preferred hardware floating-point hardware divide any