Finger Tracking for Piano Playing through Contactless Sensor System: Signal Processing and Data Training using Artificial Neural Network Choo Chee Wee
Muralindran Mariappan
Robotics & Intelligent Systems Research Group Artificial Intelligent Research Unit (AiRU), Universiti Malaysia Sabah, Kota Kinabalu, Malaysia
[email protected]
Robotics & Intelligent Systems Research Group Artificial Intelligent Research Unit (AiRU), Universiti Malaysia Kota Kinabalu, Malaysia
[email protected]
Abstract— Various researches had attempted to unveil the technique of virtuoso pianists using technologies. These researches employ different types of sensors in order to capture motion data of piano playing. Researches that embark on this area faced a common problem, the sensors used in these works are directly touching the pianist, in other words this causes a change of piano playing experience. Since piano playing consists of very delicate interaction between the pianist and the piano, such change of experience may affect the pianist’s performance. These sensors are said to have change the piano playing experience of the pianist. Concluding the challenges faced by current technologies, a non-intrusive and long range capacitive sensor is developed. This sensor employs the RC oscillator method where the change of the capacitance is recorded in number of pulses. In this work, a prototype sensor is developed to sense different positions of the fingers on five keys of the piano out of the entire 88 keys. To validate the design, input data with known output position were collected and fed into an artificial neural network for training. The output of the neural network is shown in regression plots, where the overall coefficient of determination, R=0.96747. The fit value and the accuracy is reasonably good for the data set. The output data represents the location of the fingers on the piano keyboard with approximately 10mm deviation. Keywords—piano; pedagogy; capacitive sensor; artificial neural network
I. INTRODUCTION Tracing the finger position on piano has been a very important way of piano learning. The Theremin is one of the first instrument that fuse electronics with musical instrument [1]. Theremin uses capacitive sensing method as the first noncontact interface for an electronic musical instrument. Besides capacitive sensing, various sensing methods have been used for electronic musical instruments [2]. For example, ultrasonic sensor for musical motion detection [3] [4]. However, there are a few drawbacks for the sonar sensors, for example, they could not sense through obstacles, the resolution is limited, the sensing beam width is narrow, and in addition to that, the beam propagation is slow. Another example of sensing method is optical sensors. One example is
using light emitting diode (LED) plus photodiode [5] and the similar photo sensor arrays [6]. The optical method is limited in applications because they need line of sight, errors increase because of changes of reflectance and also affected by background light. One other common example is tracking using camera with various image processing techniques [7] [8]. Digital image processing are still affected by the changes in the environment and consume excessive computational resources, they still need improvements by more efficient and powerful algorithms and processors [1]. Occasionally, some rare sensing methods are used in musical instruments, such as radar and microwave motion sensors [9]. Although these sensors could sense through nonconductive obstacles, the cost of development for these sensors are high, they also face hardware complication, especially the topic on danger of human exposure to electromagnetic radiation. This area still requires further development to reach a feasible stage. The previously mentioned Theremin is the musical instrument that controlled by two hands without touching any part of the instrument [10]. This could be achieved because several advantages offered by capacitive sensor. As compared to the previously mentioned methods, capacitive sensors generates electric field that propagates through nonconductive obstacles, this enables it to detect conductive material through obstacles. This advantage does not carry the radiation problem that occurred in microwave where it also has a very low power consumption. The human body mainly consists of ionized water that includes specific electrical properties, such as conductivity. Detecting the change of the field parameters and processing it using various algorithm, various human activities could be recognized. The current technologies implies that it is important to sense the finger position of a pianist. The main challenge that arises is the sensor should obtain the data without touching the pianist that changes the piano playing experience of a pianist. Hence, this research aims to develop a sensor to sense the finger position of a pianist without altering the piano playing experience of the pianist. This paper covers a few main areas, which are firstly, the brief hardware design of the sensor. Next is in-depth
discussion on the method of signal processing and data training, which consists of the values of the parameters used. Lastly, the outcome of this system is discussed II. HARDWARE DESIGN Schmitt triggered oscillator is utilizes as the RC oscillator for this experiment. One of the main reasons Schmitt trigger method is chosen is because the pulse could be triggered in low current. Since high frequency oscillator is generally harder to deal with, high value resistance is being selected to lower the frequency, as a result, the resulting current flow is low. By definition, Schmitt trigger is a comparator circuit that implements hysteresis or two different threshold voltage levels for rising and falling edge, comparator or differential amplifier are commonly chosen where positive feedback are applied to the noninverting input. Schmitt trigger is an active circuit that receives analogue signals as input and convert them into digital output signals. The pulse frequency of the digital output varies depending on the changes of the capacitance as shown in Fig. 1, where C3 simulates the electrode of the system and C2 simulates the stray capacitance generated by the Schmitt trigger integrated circuit.
Fig. 3. Grid covered piano keys.
The electrode consists of a conductive metal plate. The surface area of the electrode affects the range of sensing significantly as capacitance, C is governed by the equation: C=ƐA/d
(1)
Where Ɛ is the dielectric constant of the material between the two conducting plates, A is the area of both conducting plates and d is the distance between the two conducting plates. In this application, the dielectric constant, which is the Ɛ of the piano keys and air could not be changed. III. SIGNAL PROCESSING AND DATA TRAINING
Fig. 1. Schmitt triggered oscillator.
The RC Oscillator are developed on a printed circuit board (PCB). Six electrodes with the area of five piano keys are connected to the input of the oscillator and the electrodes are placed under the keyboard area of a piano as shown in Fig. 2.
The signals collected through the electrodes are transferred serially to computer for further processing as shown in Fig. 4. First, the data will be filtered using a band-stop filter to remove environmental noise. After that, a median filter is used to remove spikes and mean filter is used for signal smoothing. Next, the data are divided to key pressed or non-key pressed through a threshold value. After the preprocessing is completed, the data are fed into an artificial neural network for training. Lastly, the trained neural network is validated using a different set of input signals.
Fig. 2. Six electrodes on PCB.
The five piano keys are covered with grids to indicate which location the key is pressed, as shown in Fig. 3. When user plays on any position of the key, the changes of capacitance produced when the finger is near to all six of the electrodes are recorded. Fig. 4. Signal processing and data training.
Frequency
A. Band-Stop Filter The acquired signals are first filtered with band-stop filter to remove environmental noise. The filter is designed to remove 50 Hz noise that is mainly caused by power supply. A Chebyshev band stop filter is selected because it has a steeper roll to remove the specific noise efficiently. The stop frequency Fstop =47 whereas pass frequency Fpass =53. Fig. 5 shows the simulation of magnitude response of the band-stop filter.
Sample Count Fig. 8. Input with spikes.
The function of the median filter is to replace each entry of the samples with the median of neighboring entries. A one dimension order three median filter is used to remove the spikes as shown in Fig. 9.
Frequency
The band-stop filter is then tested with input data from the electrode. Fig. 6 shows the Fast Fourier Transform (FFT) of the raw input data, one could observe that there are significant noises around the 50Hz area.
Frequency
Fig. 5. Magnitude response of the band-stop filter.
Sample Count
Fig. 6. FFT of raw input data.
Frequency
Fig. 7 shows the FFT of input data that are filtered by the stop band filter. The 50Hz components are removed from the input.
Sample Count Fig. 7. FFT of filtered data.
B. Median Filter Since the piano consists of various moving parts, vibrations will create change of capacitance especially when a user hits the key. Fig. 8 shows the raw input from an electrode, one could observe that there are irregular spikes caused by noises.
Fig. 9. Input after median filter.
C. Moving Average Filter Fig. 9 shows a fluctuating signal with no distinct pattern. Moving average filter is used for signal smoothing. The length of the moving average filter is 40 where the filter takes the average of every 40 consecutive samples of the waveform. Fig. 10 shows the application of the moving average filter right after the median filter is performed. After applying moving average filter, one could notice that there is a distinct change in one part of the waveform.
Frequency
Sample Count
Sample Count Fig. 10. Input after moving average filter.
Frequency
D. Thresholding When a finger hits the key, the electrode below will sense a sudden drop and then a rise of capacitance as the finger approaches the electrode such as the example shown in Fig. 10. A series of experiments are conducted to determine the threshold of the percentage of change when a key is pressed. As mentioned previously, six electrodes are installed under five piano keys. The closer the finger to an electrode, the more changes it could detect compared to the other neighboring electrodes. The input signal from all six of the electrodes are filtered with the previous mentioned filters as shown in Fig. 11. Each of the entries from the entire signals are computed. The instance with the highest change in percentage that also exceed the threshold is taken as a reference point. Taking fig as example, the fourth electrode shows highest percentage of change around instance 50, and all data from each electrode are recorded in that instance as input data.
For each output, six inputs from the electrodes are fed to the neural network with 50 hidden layers. Two values are produced at the output which is X and Y position as shown in Fig. 13. Around 1000 sets of input data are collected across the five keys.
Fig. 13. Netwrok Diagram.
The output of the neural network is represented in linear regression plots. The regression plots shows the outputs of the neural network with respect to the desired output of training, validation, and test sets. If the data sets fits along the 45 degree line, it is said that the fit is perfect, in other words, there is no output error for the training and the input test sets. As shown in Fig. 14, the fit is considerably good for all the data sets, with coefficient of determination, R values in all cases is about 0.95 or above.
Sample Count Fig. 11. Input after moving average filter.
E. Artificial Neaural Network Levenberg-Marquardt backpropagation algorithm is selected as it is a neural network training function that updates weight and bias values according to Levenberg-Marquardt optimization. This function often is the often is the fastest training algorithm that is recommended for problems solving. One of its drawback is it requires more memory than other algorithms. [13] The input after the preprocessing is fed to the neural network training algorithm where the expected output is represented in (X, Y) location in centimeters as shown in Fig. 12. User place on a specific point on the grid, the location is recorded as output for the neural network and the input is recorded through the electrodes.
Fig. 14. Plot for Neural Network Regression.
IV. RESULT AND DISCUSSIONS An independent set of 500 input data were used to test the neural network. The output of the data is shown in Fig. 15. where the observed output of X and Y is compared with the actual value. The deviation is obtain by computing the straight line distance between the observed location and the actual location. The average error is about 0.99cm from the actual value.
Fig. 12. Output in X-Y coordinates. Fig. 15. Results in X-Y coordinates.
Fig. 16 shows the coefficient of determination, R of the output data where R=0.92004. In another words, about 90% of the output data resembles the actual value.
ACKNOWLEDGMENT Authors would also wish to thank Mr. Tan Tze Tong for his feedback and deep insight on his 45 years of teaching. Most importantly, authors would also like to express sincere appreciation to Artificial Intelligence Research Unit (AiRU), University Malaysia Sabah for providing the funding to support this research. REFERENCES [1]
Fig. 16. Output Regression plot.
V. CONCLUSION In this study, a brief capacitive sensor design that could potentially detect a remote capacitance change is discussed, furthermore, the signal processing method and data training of the input signal is also presented. The output of the neural network is shown in regression plots, where the overall coefficient of determination, R=0.96747. Finally, another 500 sets of independent input data were used to test the neural network and the output shows R=0.92004, which indicates about 90% of the output data resembles the actual value. The fit value and the accuracy is reasonably good for all data set where the output data shows approximately 10mm deviation in (X, Y) coordinates. This work enable pianist to store their piano playing information in digital form.
P. A. Joseph and G. Neil, "Musical Applications of Electric Field Sensing," Computer Music Journal, vol. 21, no. 2, pp. 69-89, 1997. [2] C. Roads, The Computer Music Tutorial, Cambridge: MIT Press, 1996, pp. 617-658. [3] X. Chabot, "“Gesture Interfaces and a Software Toolkit for Performance with Electronics," Computer Music Journal, vol. 14, no. 2, pp. 15-27, 1990. [4] R. Gehlhaar, "SOUND=SPACE: an Interactive Musical Environment," Contemporary Music Review, vol. 6, no. 1, pp. 59-72, 1991. [5] R. Rich, "Buchla Lightning MIDI Controller," Electronic Musician, vol. 7, no. 10, pp. 102-108, Oct 1991. [6] D. Rubine and P. McAvinney, "Programmable Finger-tracking Instrument Controllers," Computer Music Journal, vol. 14, no. 1, pp. 2641, 1990. [7] D. Collinge and S. Parkinson, "The Oculus Ranae," ICMC Proceedings, pp. 15-19, 1988. [8] C. Wren, . F. Sparacino and et al., "Perceptive Spaces for Performance and Entertainment: Untethered Interaction using Computer Vision and Audition," Applied Artificial Intelligence (AAI) Journal, no. Special Issue on Entertainment and AI/ALife, March 1996. [9] S. Mann, "DopplerDanse: Some Novel Applications of Radar," Leonardo, vol. 25, no. 1, p. 91, 1992. [10] A. Glinsky, "Theremin: Ether music and espionage," University of Illinois Press, p. 403, 2000. [11] "Levenberg-Marquardt backpropagation," MathWorks, 2006. [Online]. Available: https://www.mathworks.com/help/nnet/ref/trainlm.html.