INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in
PROGRESS REPORT FOR THE ACADEMIC YEAR 2018 Scholar’s Name: Praveen Kumar Roll No: S17007 School: School of Computing and Electrical Engineering Date of Registration:01/02/2018 Date of Last Presentation: Date of Current Presentation: 25/02/2019
1. Research Objective
To develop MEMS and IOT based low-cost and ultra-low power Landslide Monitoring and Early Warning System.
Predict the landslide and soil movement with the help of machine learning algorithms from the real-world landslide data.
2. Introduction Landslides due to the movement of soil mass are a big problem in India especially in Himachal Pradesh and Uttarakhand [1]. Landslides are natural hazards that often happen without warning and cause massive damage to property and life across the world. However, with an unbelievable 11,000 deaths in the last 12 years, India tops the world in landslide deaths. According to the Geological Survey of India (GSI), in the year 2017, 12 landslides were reported in India. This year, the GSI has listed 23 events till August 2018 [2]. One way to overcome the landslide problem is to use early warning systems against landslides. Existing commercially available early warning systems use sensors like vibrating wire piezometers and in-place inclinometers (IPI). These systems are installed to determine the magnitude, rate, direction and types of landslides [3]. But these sensors are very costly, and because of their cost it is difficult to install many sensors for monitoring landslides. One solution to this problem is a low-cost landslide monitoring and early warning system, which works on the same principle as that of a conventional system but has very low cost and low power consumption. Furthermore, different machine learning algorithms can help in the prediction of landslides. The focus of machine learning in landslide mitigation is to timely predict the movements of the soil so that the lives can be saved. Some researchers have applied machine learning to landslides. For example, Hao et al. [4] broke the landslide displacement into cycle terms and trend terms and combined with the periodicity characteristics of time series to analyse cycle items of landslide displacement. Dujuan et al. [5] used the Back Propagation (BP) neural network to predict its displacement based on the work of Hao. Qiang and Duan [6] proposed a time series analysis with capabilities of the forecasting complex systems in development trend and adopted timing analysis method to establish the ARIMA model and the CAR model for landslide
INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in
displacement dynamic forecasts. Yesilnacar et al. [7] combined logistic regression and neural networks to overcome shortcomings of the statistical methods that could not effectively build models of complex geological disasters. Xiangenjun [8] used a rough set to dig out the inherent law of slope disaster activities from the historical slope data. Daifuchu [9] focused on the natural landslide spatial prediction in Hong Kong adopted two or single type to support vector machine for spatial prediction of landslide hazard and compared with Logistic regression models at the same time [10]. Machine-learning algorithms could use the data collected by monitoring systems and allow researchers to predict significant debris-flow in advance. There can be a large class of algorithms which can be used for such predictions. However particularly important are time-series forecasting algorithms. Here, machine-learning algorithm like SMO [11] and Autoregression [12], and ensemble algorithms (Random forest [13], bagging [14], stacking [15], voting [16] and SARIMA [17]) could be used for time-series forecasting to predict movement one-week ahead given the soil displacement of the previous weeks. For example, SMO optimizes the training of support vector machine. Autoregression is mostly used for predicting and finding out cause and effect relationship between variables. Random forest produces the best result from the collection of a random tree. Bagging is used to subsamples from the dataset with replacement and training the predictive model on those subsamples. Stacking combines multiple models of different types and the voting combines classifiers that use distinct pattern representations. In this research, we use these machine-learning algorithms to do time-series forecasting of debris flow on real-world data.
3. Work Done and Target Set for Last Year A landslide monitoring and warning system (LMWS) was engineered with real-time reporting for an active landslide site. System Architecture Figure 1A and 1B show the deployment and system architecture of the LMWS. As shown in Figure 1, the solar-powered LMWS is deployed on hills prone to landslides. The data from this system is used to trigger alerts on the mobile and web. Figure 1B shows the system architecture of the LMWS. The LMWS consists of a sensing unit, a data logging and thresholding unit, and an alert generation unit. These units work together to sense movement and weather data from a landslide site and log this data at a remote site on the cloud. Also, thresholding is used to generate alerts on phone and web from the system.
INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in
Figure 1. (A) The deployment architecture of the landslide monitoring and warning system (LMWS). (B) The system architecture of the LMWS.
Figure 2. Cloud data display
Figure 2 shows the data captured in a cloud database at a remote site from the LMWS. The data is logged every 10-minutes and it contains the weather parameters (temperature, pressure, relative humidity, light intensity, and rainfall) and soil parameters (soil movements, moisture in the soil by volume, and force acting by soil at the point of deployment).
INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in
Sensors The LMWS contains several sensors for measuring weather and soil parameters. These sensors include the following: 1. Motion Processing Unit: MPU sensor comes with MEMS based accelerometer and gyroscope combined in a single chip. The accelerometer used for the measurement of acceleration in X, Y and Z direction of debris flow and gyroscope data used for finding the total rotation of soil movement in X, Y, and Z direction. We can measure acceleration up to 16G and gyroscope rotation up to 2000 degree per second. 2. Soil Moisture: The capacitive soil moisture sensor works on the capacitive principle. When moisture comes between two plates then dielectric property changes and capacitance of those plates increases. For most types of slope failure, soil moisture plays a critical role because increased pore water pressure reduces the soil strength and increases stress [18]. We can measure the amount of moisture between 0 to 100 percent in the soil with the help of Soil Moisture Sensor. 3. Force Sensor: Water pressure reduces the shearing force between particles. The zone of soil that is below the water table will be fully saturated. The pressure in pores is higher than atmospheric pressure. Hence, it is said to be positive pore pressure. The force sensor is a variable resistor. Without applying any force on the surface of the sensor, the resistance will be very high. If force is applied to the sensor’s surface, then the resistance starts decreasing. We make a voltage divider with 10k ohm resistor for the force sensor and we apply a fixed voltage across the voltage divider. When force is applied to the surface of the sensor, the voltage across the sensor drops accordingly. We calibrate that voltage into the force unit in Newton. When we bury this sensor into the soil, then pore water pressure in the wet soil causes additional force on this sensor, which is read by the microcontroller. 4. DHT-22: This sensor used for measures temperature in centigrade and relative humidity in the air. A humidity sensor (or hygrometer) senses, measures and reports both moisture and air temperature. The ratio of moisture in the air to the highest amount of moisture at a particular air temperature is called relative humidity. 5. BMP-180: This sensor used for measure barometric pressure in mill bars (mb). Barometric pressure (also known as atmospheric pressure), is the pressure caused by the weight of air pressing down on the Earth. Imagine a column of air rising from the Earth’s surface to the top of the atmosphere. The air in the atmosphere has mass, so gravity causes the weight of that column to exert
INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in
pressure on the surface. Barometric pressure decreasing will bring cloudy and windy weather with a chance of precipitation. 6. BH1750 Sensor: Light sensor measures the sunlight intensity in lux. After rainfall event, if sunlight is low, then moisture from the soil will not vaporize, and the soil becomes wet for a long time that can trigger a landslide. Thus, measuring the light intensity is important. 7. Rain Gauge: The tipping bucket rain gauge collects the amount of rain in the collector. A seesaw below the collector can collect up to 2.25 ml of water before tipping. Each time the seesaw tips, the magnetic switch in the sensor can count the tip. Thus, by counting the tips one can compute the volume of water as rain as well as the amount of rain in mm. 8. LoRa: The Long-Range Radio Amplification module working on the radio frequency 433 MHz’s. It can transmit data up to 10 kilometres, but in hilly region it is able to transmit data up to 1 kilometre. This module is used to signal a wireless blinker or hooter. Table 1 represents the comparison between conventional sensors and low-cost sensors. The conventional sensors come with high accuracy and sensitivity, which increase their costs to very high value compared to low-cost sensors. The low-cost sensors provide a reasonable compromise between accuracy, sensitivity, and cost. Table 1. Conventional sensors v/s low cost sensors. Sensor Name
Conventional Sensors
Low Cost Sensors
Sensor
MEMS Type Uni-Axial/Bi-Axial
MEMS Type Tri-Axis
Range
±3/5/10&15º
0 - 35º
Accuracy
±0.05% FS
±3.0%
Resolution
0.008% FS/Repeatability ±0.01% FS FS: Full Scale
Sensitivity changes ±0.02% FS
Source
http://www.aimil.com/RESOURCES /RESOURCEFILE/542_SMI.pdf
https://store.invensense.com/datasheets /invensense/MPU-6050_DataSheet_V3%204.pdf
Inclinometer
Barometric Pressure Sensor
Light Sensor
500 to 1200 mbar
Pressure range: 300-1100 mbar
Operating temperature: -40 +85º C Accuracy ±0.4 mbar 0.6 µA (standby ≤ 0.1 µA at 25°C)
Operating temperature: -40 +85º C Relative Accuracy ±0.12 mbar Standby current: 0-4µA @ 25 º C
Resolution 0.00111 % FS https://aerospace.honeywell.com/en /~/media/aerospace/files /user-manual/hpbhpa-usermanual.pdf Power disputation 100 mW.
Resolution 0.01 mbar https://cdn-shop.adafruit.com/datasheets/BSTBMP180-DS000-09.pdf Power disputation 260 mW
INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in Collector emitter voltage - 6 V Emitter collector voltage -1.5 V
Operating Conditions Voltage - 2.4 - 3.6 V
Range of spectral bandwidth λ0.5 440 to 800 nm
Matching to Human eye’s (400 – 700 nm)
Wavelength of peak sensitivity 570 nm
Peak Wave Length - 560 – nm
Power supply- 2XAAA Batteries ~3V 2.3V to 5.5V (for only temp sensor)
Power supply- 3.3-6V DC
http://www.mouser.com/ds/2/427/tept5700247497.pdf
https://www.mouser.com/ds/2/348/bh1750fvi-e186247.pdf
Operating range- humidity 0-100%RH
Operating range- humidity 0-100%RH
Temperature -40~125 Celsius;
temperature -40~80Celsius;
Accuracy: humidity +/- 3%;
Accuracy-humidity +-2%RH (Max +-5%RH);
Temperature /Humidity Sensors
Accuracy: temperature 1 % http://wisense.in/wpcontent/uploads/2018/10/WiSense -Ambient-Humidity-and-Pressure-Sensor.pdf
Soil Pressure Sensor
https://www.sparkfun.com/datasheets/ Sensors/Temperature/DHT22.pdf
Range
0.5 – 15.0 MPa
0.1 – 77.4 MPa
Repeatability
Not Available
±2.0%
Accuracy
±0.5% FS
±10.0%
Operating Range
-10 º C to +70º C
-30 º C to +70 º C
Sensitivity
0.5 MPa
0.15 MPa
Not Available
+10%
Source
http://www.aimil.com/RESOURCES /RESOURCEFILE/542_SMI.pdf
http://www.trossenrobotics.com/productdocs/201010-26-datasheet-fsr402-layout2.pdf
Range
0 – 100%
0 – 100%
Accuracy
±0.20%
±5.0%
Sensitivity
0.001%
0.15%
Life Expectancy
10 -20 Years
2 – 5 years
Hysteresis
Soil Moisture Sensor
Accuracy: temperature <+-0.5Celsius
Environmentl conditions
Not affected by ambient environmental temperature conditions.
Affected by ambient environmental temperature conditions.
Operating Voltage
220VAC
3.3 – 5.5 VDC
Source
http://commercialequipments.in/wpcontent/uploads/2016/06/Soil.pdf
https://scholar.google.co.in/scholar? hl=en&as_sdt=0%2C5&q=dfrobot+soil +capacitive+moisture+sensor+SEN0193&btnG=
INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in
Working The LEWS works on a master-slave configuration. The working of the system is described in the following flow chart.
Figure 3. Flow chart for working of LMEWS As shown in Figure 3, the master initialises all sensors like accelerometer, soil moisture, DHT, BMP, Force and light intensity and sets the timer for 10-minutes. Then, the master system sleeps with SLEEP_MODE_PWR_DOWN for 10-minutes to save the power. In sleep the master consumes 5-milliamps and in operational mode it consumes 200milliamps. After wakeup, the master checks whether the 10-minutes are over or not. If 10-minutes are over, then the master first resets the slave system. After resetting the slave, the master records the value of the accelerometer and other weather sensors and send these values to the cloud via a GSM module. If the master is interrupted in sleep, then it wakes up and handles the interrupt. Two types of the interrupts in the system are the following: first one is a rain interrupt and the second one is an accelerometer (movement) interrupt. When an accelerometer interrupt triggered, then the master calculates the acceleration values in the X, Y, and Z directions. It also calculates the total rotation in radians per second. If this total rotation breaches the predefined threshold,
INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in
then the master sends an alert to the slave system, and the accelerometer data is send to the cloud. The slave system initialises LoRa module and sets its timer for 11-minutes. If for some reason the master system hangs, then the slave system can reset the master when 11minutes timer is over. The slave system receives the accelerometer interrupt from the master system and transmits the alert message to the traffic light and the siren that are connected wirelessly to the LEWS. The cloud system records the data sent by LEWS in every 10-minutes. When data arrives in the cloud, then a Z-score value is calculate based upon the previously stored values. If the total rotation received from system breaches the threshold Z-score value, then SMS alert messages are sent to the registered mobile number immediately. Deployment: We deployed the LEWSs at 10-selected sites in Mandi District. After installation LEWS system on hills at different sites, we place two poles with traffic light and siren alongside the road with wireless connectivity to alert vehicular traffic. The MEMS-based accelerometer senses the soil movement (acceleration). The accelerometer measures accelerations (rate of change of velocity of an object) in three orthogonal directions X, Y and Z. When interface with a microcontroller, this sensor provides analogue acceleration values. These analogue units are converted to “m/s2 “unit by using an appropriate calibration procedure. If our accelerometer senses the soil movement and this movement exceed the predefined threshold, then an alarm signal is sent to the traffic light and siren on the road for vehicular traffic. In parallel all the acceleration data and weather data are sent to the cloud and cloud system sends alerting SMSes to the registered people. These messages contain Google map coordinates of the movement location. The LEWS sends data to cloud in every 10 minutes for minute scale prediction of landslide. Landslide data processing: As our sensors are collecting data since their deployment, we contacted DTRL, DRDO for data from their conventional landslide monitoring system deployed at Chamoli district, Uttrakhand, India. DTRL DRDO gave us data from Chamoli district. These data are inclinometer sensors movement data in mm per m units (essentially the angle the inclinometer tilts). Chamoli landslide has five boreholes, and each borehole has five sensors in it. Thus, in total there are 25 sensors across 5 boreholes. First, we calculated the average relative displacement of each sensor from its initial reading at the time of installation of this sensor. Second, we chose those sensors from each borehole that gave the maximum average relative displacement. Thus, overall, we came-up with five sensors data, i.e., one per each borehole. As the data was sparse, we averaged the tilt over week as the time period. The dataset has 78 weeks where we split it into a 80:20
INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in
ratio: Sixty-two weeks for training and 16 weeks for testing machine learning algorithms. Algorithms: We have applied different algorithms like Sequential Minimal Optimization, Auto Regression, Random Forest, Bagging, Stacking, Voting, and SARIMA. Sequential Minimal Optimization: John Platt invented sequential minimal optimization (SMO) in 1998. It is a widely-used algorithm for solving the quadratic programming (QP) problem that arises during the training of support vector machines. QP running time complexity is O(N3) in the worst case when the data is true big data. SMO requires an amount of memory that is linear in the training set size N. The goal of the SMO algorithm is to return alphas that satisfy the constraint optimization problem below. The constraints are written as 'subject to (s.t.)' in machine learning. These alphas called as Lagrange multipliers. They play a major role to identify support vectors in our data. 1 𝑚𝑖𝑛𝛼 ∑ ∑ 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑗𝑋𝑖 . 𝑋𝑗 − ∑ 𝛼𝑖 2 𝑖
𝑗
(equality constraint) 𝑠. 𝑡. ∑ 𝛼𝑖 𝑦𝑖 = 0, 𝛼𝑖 ∈ [0, 𝐶] 𝑖
The user chooses a Kernel Function K for non-linear SVM to transform data into a higher dimension. Therefore, exchange the dot product of x with the Kernel Function K. 1 𝑚𝑖𝑛𝛼 ∑ ∑ 𝛼𝑖 𝛼𝑗 𝑦𝑖 𝑗𝑋𝑖 . 𝐾(𝑋𝑖 , 𝑋𝑗 ) − ∑ 𝛼𝑖 2 𝑖
𝑗
(Kernel function) 𝐾(𝑋𝑖 , 𝑋𝑗 ) = Φ(𝑋𝑖 ). Φ(𝑋𝑗 ) (equality constraint) 𝑠. 𝑡. ∑ 𝛼𝑖 𝑦𝑖 = 0, 𝛼𝑖 ∈ [0, 𝐶] 𝑖
Instead of providing all alphas as once SMO is formulated as iterative algorithms. It breaks the big QP optimization problem into the small sub-problems. Each sub-problem
INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in
is then solved analytically until convergence. Since large matrix computation is avoided, it scales between linear and quadratic in the training set size N depending on the data analysis problem. The SMO computation time is dominated by the evaluation of the SVM and thus it is faster for linear SVM problems and sparse datasets. Auto Regression: A regression model, such as linear regression, models an output value based on a linear combination of input values. Auto regression is a time series model that uses observations from previous time steps as input to a regression equation to predict the value at the next time step. It is a very simple idea that can result in accurate forecasts on a range of time series problems. For example: 𝑌 = 𝛽0 + 𝛽1 𝑋1 Where 𝑌 is the prediction, 0 and 1 are coefficients found by optimizing the model on training data, and X is an input value. This technique can be used on time series where input variables are taken as observations at previous time steps, called lag variables. For example, we can predict the value for the next time step (t+1) given the observations at the last two time steps (t-1 and t-2). As a regression model, this would look as follows: 𝑋(𝑡 + 1) = 𝛽0 + 𝛽1 𝑋(𝑡 − 1) + 𝛽2 𝑋(𝑡 − 2) Because the regression model uses data from the same input variable at previous time steps, it is referred to as an auto regression Seasonal Auto-Regressive Integrated Moving-Average (SARIMA): SARIMA is an extension of ARIMA model, which is statistical forecasting method popular for univariate time-series data forecasting SARIMA can model a data with a trend as well as a seasonal component by describing the auto-correlations in data [19]. Stationarity of Time-Series: A time-series with constant values over time for mean, variance, auto-correlation is stationary. Most statistical forecasting methods assume that a time-series can be made approximately stationary using mathematical transformations such as differencing [20]. The first step of building a SARIMA model is stationarizing the data.
INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in
Auto-Regressive Models: In an auto-regressive model, we predict a variable using past values of the same variable. Thus, an auto-regressive model is defined as: 𝑦𝑡 = 𝑐 + 𝜙1 𝑦𝑡−1 + 𝜙2 𝑦𝑡−2 +. . . +𝜙𝑝 𝑦𝑡−𝑝 + 𝜖𝑡
(1)
where p is the auto-regressive trend parameter, 𝜖𝑡 is white noise and 𝑦𝑡−1 , 𝑦𝑡−2 …𝑦𝑡−𝑝 denote the movement at previous time periods [19]. Moving-Average Models: A moving-average model uses past prediction errors in a regression model. A moving-average model is defined as: (2)
𝑦𝑡 = 𝑐 + 𝜖𝑡 + 𝜃1 𝜖𝑡−1 + 𝜃2 𝜖𝑡−2 +. . . +𝜃𝑞 𝜖𝑡−𝑞
where q is the moving-average trend parameter, 𝜖𝑡 is white noise and 𝜖𝑡−1 , 𝜖𝑡−2 … 𝜖𝑡−𝑞 are the error terms at previous time periods. If we combine auto-regression and a moving-average model on stationary data, we obtain a non-seasonal ARIMA model, which is defined as: 𝑦 ′ 𝑡 = 𝑐 + 𝜙1 𝑦 ′ 𝑡−1 +. . . +𝜙𝑝 𝑦 ′ 𝑡−𝑝 + 𝜃1 𝜖𝑡−1 +. . . +𝜃𝑞 𝜖𝑡−𝑞 + 𝜖𝑡 (3) SARIMA builds upon an ARIMA model and incorporates seasonal data. The seasonal parameters of the model are like the non-seasonal parameters of the model with the backshifts of the seasonal period. The three trend elements, same as ARIMA, that require calibration are trend auto-regressive order ‘p’, trend difference order ‘d’ and trend movingaverage order ‘q’. Additional four seasonal elements, that require calibration are, seasonal auto-regressive order ‘P’, seasonal difference order ‘D’, seasonal moving-average order ‘Q’ and the number of time steps for a single seasonal period ‘m’. The 𝑆𝐴𝑅𝐼𝑀𝐴 (𝑝, 𝑑, 𝑞) (𝑃, 𝐷, 𝑄)𝑚 model is defined as: 𝛷(𝐵𝑚 )𝜙(𝐵)𝛥𝑚 𝐷 𝛥𝑑 𝑋𝑡 = 𝛩(𝐵 𝑚 )𝜃(𝐵)𝑍𝑡
(4)
where ‘Zt’ is the white noise process. Differencing using ‘D’ parameter on seasonal component and ‘d’ parameter on non-seasonal component of time-series is given by: 𝛥𝑚 𝑋𝑡 = 𝑋𝑡 − 𝑋𝑡−𝑚 (5) 𝛥𝑋𝑡 = 𝑋𝑡 − 𝑋𝑡−1 (6) On applying equation 1 using ‘P’ and ‘m’ parameters on seasonal component and ‘p’ parameter on non- seasonal component of time-series, we obtain:
INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in
𝛷(𝐵 𝑚 ) = 1 − 𝛷1 𝐵 𝑚 −. . . −𝛷𝑃 𝐵𝑃𝑀 𝛷(𝐵) = 1 − 𝜙1 𝐵−. . . −𝜙𝐵
(7) (8)
On applying equation 2 using ‘Q’ and ‘m' parameters on seasonal component and ‘q’ parameter on non- seasonal component of time-series, we obtain: 𝛩(𝐵 𝑚 ) = 1 + 𝛩1 𝐵 𝑚 +. . . +𝛩𝑄 𝐵 𝑄𝑚 𝜃(𝐵) = 1 + 𝜃1 𝐵+. . . +𝜃𝑞 𝐵 𝑞
(9) (10)
Random Forest: Random forest algorithm was developed by Leo Breiman and Adele Cutler in 1984 [13]. Random forests or random decision forests are an ensemble learning method for classification and regression. At the time of training, the random forest algorithm output is the majority class (classification) or the value that is the mean of the prediction of individual trees (regression). By aggregation, the random forest algorithm corrects the problem of overfitting in decision trees [22]. Random forest algorithm is used for regression on continues valued attributes with the random tree algorithm as a base learner [23]. Since the target variable is a real-valued number, we fit a regression model to the target variable using each of the independent variables. Then for each independent variable, the dataset is split at several split points by trail-and-error. First, we calculate the Residual Sum of Squares (RSS) at each split point between the predicted value and the actual values. The RSS is defined by the following equation: 𝑅𝑆𝑆 = ∑(𝑦𝑖 − 𝑦𝐿 )2 + ∑ (𝑦𝑖 − 𝑦𝑅 )2 𝑙𝑒𝑓𝑡
𝑟𝑖𝑔ℎ𝑡
Where YL = mean y-value for left side. YR = mean y-value for right side. Yi = Points on the left and right sides of the split point. The attribute that has minimum RSS is selected at a node in the tree. Also, the split point on this attribute is the one that minimizes the RSS among all split points. This process is recursively continued until all attribute are covered. Overall, the dataset is split into several regions, and each region may represent a leaf node in the random tree. The random tree algorithm uses the K parameter for the number of randomly chosen attributes to make the tree. We choose K based upon a heuristic rule to be the integer of log2(number of predictors) + 1. The final output of the random forest algorithm is the average of all random tree outputs. Bagging:
INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in
Bagging is a machine learning ensemble Meta algorithm designed to improve the stability and accuracy of machine learning algorithms used in statistical classification and regression. It used subsamples from the dataset with replacement and training the predictive model on those subsamples. The final output model is average of that model for the better result.
Voting: Voting algorithm is used for combining classifiers which use distinct pattern representations and show that many existing schemes can be considered as special cases of compound classification where all the pattern representations are used jointly to decide. Stacking: Stacked generalization (or stacking) is a different way of combining multiple models used to combine models of different types. The procedure is as follows: 1. Split the training set into two disjoint sets. 2. Train several base learners on the first part. 3. Test the base learners on the second part. 4. Using the predictions from 3) as the inputs, and the correct responses as the outputs, train a higher-level learner. Parameters Tuning: SMO algorithm has two parameters first one is complexity parameter (C) that is used to build the 'hyperplane' between two classes which used for classification, regression or other tasks. A good classification is one where the hyperplane separates two class with the largest distance to the nearest training data points of any class. Since in general the larger the margin the lower the generalization error of the classifier. So, C controls how soft the class margins are, in practice how many instances are used as 'support vectors' to draw the linear separation boundary in the transformed Euclidean feature space. The second parameter of the SMO algorithm is an exponent (E) or kernel. In its simplest form, the kernel trick means transforming data into another dimension that has a clear dividing margin between classes of data [21]. We used the following values of C and E in SMO: C=0, 1 and E=1, 2, 3, 4 for polynomial kernel; C=0, 1 and E=1, 2 for normalized polynomial kernel; and, C=0 and E = 1 for RBF kernel. The best result for this algorithm was polynomial kernel where C=1 (hyperplane = 1) and Exponent E=1 (linear kernel). Using a grid search procedure eight free parameters were optimized in the SARIMA model. These parameters were varied between the ranges given in table 2. One reason for using the SARIMA model was that it allows one to account for a seasonal trend present in the time-series.
INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in
Table 2. Parameter Optimization of SARIMA Parameter
Range of Values
Trend Auto Regressive (p)
0, 1, 2
Trend Differencing (d)
0,1
Trend Moving-Average (q)
0, 1, 2
Trend
Absent, Constant, Trend, Constant Trend
Seasonal Auto-Regressive (P)
0, 1, 2
Seasonal Differencing (D)
0,1
Seasonal Moving-Average (Q)
0, 1, 2
Seasonal Periods (m)
0,1
Results: Following are the RMSE data with their corresponding algorithm. Table 3. Different algorithms fitted to training data with different borehole. TRAINING DATA SET Root-Mean Squared Error (RMSE) Borehole 2 Borehole 3 Borehole 4 Meter 12 Meter 6 Meter 15
Algorithm
Borehole 1 Meter 3
Borehole 5 Meter 15
AVERAGE RMSE
Random Forest
5.46
0
0.01
0.10
5.44
2.202
Voting
8.10
0
0.02
0.17
8.01
3.26
Linear Regression
14.73
0
0.03
0.27
12.36
5.478
Bagging
16.60
0
0.04
0.29
13.04
5.994
SMO
16.60
0
0.03
0.29
13.92
6.168
Gaussian
16.86
0
0.03
0.31
15.76
6.592
SARIMA
18.27
7.34
0.28
8.77
14.49
9.83
Stacking
33.67
0
0.10
0.44
22.07
11.256
Table 4. Different algorithms fitted to test data with different borehole. TESTING DATA SET Root-Mean Squared Error (RMSE) Borehole 2 Borehole 3 Borehole 4 Meter 12 Meter 6 Meter 15
Algorithm
Borehole 1 Meter 3
Borehole 5 Meter 15
AVERAGE RMSE
SARIMA
0.0
0.11
9.17
1.18
19.51
5.99
SMO
0.37
0
10.64
1.14
20.21
6.472
Bagging
0.14
0
11.57
1.16
20.95
6.764
Voting
2.28
0
19.61
1.30
16.52
7.942
Linear Regression
11.04
0
9.98
1.17
24.76
9.39
Random Forest
0
0
27.14
1.56
20.35
9.81
Gaussian
9.81
0
16.63
1.34
28.51
11.258
INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in Stacking
24.32
0
27.18
1.72
34.01
17.446
As seen in table 3 when different algorithms applied to this dataset during training time, then there was a large variation in the average RMSE. For example, best performing algorithm in training data set like Random forest and Voting possessed 2.202 and 3.26 mm/m as RMSEs; however, SARIMA and Stacking algorithm had much larger RMSEs of 9.83 mm/m and 11.256 mm/m. When these models were generalized to the test dataset, we found that the SARIMA model performed very well followed by the SMO algorithm. Thus, both SARIMA and SMO algorithm predicted the time-series movement data relatively well. Figure 4 shows the results of the best performing SARIMA model during training and test accordingly to the borehole. The blue line represents the actual data, and the orange line represents the value predicted from the SARIMA model.
Relative movement (mm/m)
0 -20
50 1 2 3 4 5 6 7 8 9 10111213141516 0
-40 -50
-60 -80
-100
-100
-150
-120
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61
Week
Actual RD13
Predict RD13
Testing graph for Borehole 1 Meter 3
Actual RD13
Predict RD 13
Training graph for Borehole 1 Meter 3
INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in
-42.75
0 1 3 5 7 9 11 13 15 17 19
-42.8
-20
-42.85
-40
-42.9
-60
-42.95
-80
-43
-100 ActualRD212
PredictRD212
ActualRD212
Testing graph for Borehole 2 Meter 12
0
-20
1
3
5
7
9 11 13 15 17 19 21 23 25
-40 -60 -80 ActualRD36
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61
PredictRD212
Training graph for Borehole 2 Meter 12
0.5 0 -0.5 1 5 9 131721252933374145495357616569737781 -1 -1.5 -2 -2.5 Actual RD36
PredictRD36
Testing graph for Borehole 3 Meter 6
PredictRD36
Training graph for Borehole 3 Meter 6
20
-50 1
3
5
7
9
11 13 15 17 19 21
-55
0 -20 1 5 9 1317212529333741454953576165697377 -40
-60
-60
-65
-80 ActualRD415
PredictRD415
Testing graph for Borehole 4 Meter 15
ActualRD415
PredictRd415
Training graph for Borehole 4 Meter 15
INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in
20
50
0 1
3
5
7
0
9 11 13 15 17 19 21 23
1 5 9 13 17 21 25 29 33 37 41 45 49 53 57 61
-20 -50
-40 -60
-100 ActualRD515
PredictRD515
Testing graph for Borehole 5 Meter 15
ActualRD515
PredictRD515
Training graph for Borehole 5 Meter 15
Figure 4. Training and Test RMSE in mm/m of SARIMA model
Discussion and Conclusions: The focus of machine learning in landslide mitigation is to timely predict the movements of the soil so that the lives can be saved. By applying different algorithms like-Sequential Minimal Optimization(SMO), Linear Regression, Random Forest and ensemble version of these algorithm which included bagging, stacking, voting and SARIMA. Amongst all algorithms, SARIMA model performed the best on this dataset. In the nearest future, we want to compare the results of SARIMA model with other algorithms like deep learning, LSTM model, MLP and Holt-Winters method. Also, we plan to perform machine learning on data collected by our low-cost LEWS deployed in Mandi district. Courses: Semester
Courses EE-592p Selected topics in IoT
1st
Semester
HS-616 Managerial Thinking and Decision Making CS-671 Deep Learning and Applications
2nd Semester
CS-660 Data Mining for decision making
3rd Semester
CS-606 Computational Modelling of Social Systems HS-650 Statistical Methods
CS-601 Probability and random process
Grade C C E O E
CGPA 6.1 7.5
Currently taking Currently taking
4. Planned Work for the Next Year Currently, our LEWS can detect the soil movement up to 1-meter depth from the surface of earth. This year our plan to engineered a system to detect soil movement up to 15meter depth from the surface of earth. This sub-surface system buried in the 15-meter bore hole from the surface level, so it can detect soil movement every meter and up to
INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in
15-meters. With help of this system we can measure the slope and direction of movement in the landslide, and we can monitor what is happening in the landslide in sub-surface area. Landslide event occur due to slope failure in the hill, sometimes slope failure is deep inside the hill so with 1-meter system we cannot find the slope failure. With help of subsurface system, we can detect the slope failure. For example, if slope failure is at 10meter level. Thus, 10 to 15-meter accelerometer will be stable and 1 to 10-meter accelerometer will detect the soil movement. We have deployed 10 LEWSs in Mandi district and 5 LEWSs in Sirmour district in Himachal Pradesh last year. We are collecting the data from these sensors at every 10minute interval. The data that we are collecting is minute scale time-series data. Now, we have huge amount of data from these sensors and we will do machine learning over this dataset to predict the soil movement in the near future.
5. Workshops/Conferences Attended
Landslide Mitigation and Detailed Project Report (DPR) Preparation, IIT Mandi in 29 August 2018. 3rd Himachal Pradesh Science Congress, IIT Mandi in October 2018. Winter School on Cognitive Modelling, IIT Mandi in February 2019.
6. Paper Published/Communicated and Other Achievements
Dutt, V., Chaturvedi, P., Agrawal, S., P. Kumar, Priyanka, S., Mali, N., A. Pathania & Kala, U. (2018). Smart IOT based test-bed system for lab scale landslide monitoring experiment, Patent Application 201813039735. New Delhi, Patent Office Dwarka New Delhi 110078, 2018/10/22.
Kumar, P., Shroti, S., Chaturvedi, P., Sihag, P., Agarwal, S., Pathania, A., Mali, N., Singh, R., Uday, K.V., Dutt, V.,(in press, 2019) Daily-scale predictions of debris movement in chamoli Uttarakhand area using conventional and deep machine-learning methods.(ICITG2019, 064, v1).
Pathania, A., Kumar, P., Kesri, J., Agarwal, S., Sihag, P., Mali, N., Singh, R., Chaturvedi, P., Uday, K.V., Dutt, V.,(in press, 2019) Reducing power consumption of weather stations for landslide monitoring.(ICITG2019, 062, v1).
Won the 3 rd prize in the Development of Innovative Prototypes for Disaster Risk Reduction (DRR), Shimla, Himachal Pradesh.
7. References 1.
Pande, R. K. (2006). Landslide problems in Uttaranchal, India: issues and challenges. Disaster Prevention and Management: An International Journal, 15(2), 247-255.
2.
Landslide Recent Incidents - Geological Survey of India. Retrieved from https://gsi.gov.in
INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in
3.
Chaturvedi, P., Srivastava, S., & Kaur, P. B. (2017). Landslide Early Warning System Development Using Statistical Analysis of Sensors’ Data at Tangni Landslide, Uttarakhand, India. In Proceedings of Sixth International Conference on Soft Computing for Problem Solving (pp. 259-270). Springer, Singapore.
4.
Hao,X.Y.,Hao,X.H,Xiong,H.M.,et al, JournalofEngineeringGeology,7(3):279-283, 1999.
5.
Du, J., Yin, K.L., Chai, B., Chinese Journal of Rock Mechanics and Engineering,(09): 1783-1789, 2009.
6.
Li, Q., Li, R.Y., Journal of Yangtze River Scientific Research Institute,22(6), 2005.
7.
E.Yesilnacar, T.Topal. Engineering Geology, 79:251-266, 2005.
8.
Wang, G.Y., Cui, H.L., Li, Q., Rock and Soil Mechanics, 30(8): 2418-2422, 2009.
9.
Lin, D.C., An, F.P., Guo, Z.L., et al., Rock and Soil Mechanics, 32(1), 2011.
10. Jian Huang, Zhihuan Liu, and Ni Li. “Study on displacement prediction of landslide based on neural network “, ISSN: 0975-7384 CODEN(USA): JCPRC5. 11. J. Platt, “Sequential minimal optimization: A fast algorithm for training support vector machines,” Microsoft Res., Bengaluru, India Rep. MSR-TR-98-14, Apr. 1998. 12. "What is an Autoregressive Model?". deepai.org. 13. L. Breiman, 2001, Random forest, Machine Learning, vol. 45, no. 1, pp. 5-32. 14. Breiman, Leo (1996). "Bagging predictors". Machine Learning. 24 (2): 123–140. 15. Wolpert, David. (1992). Stacked Generalization. Neural Networks. 5. 241-259. 16. Josef Kittler; Robert P.W. Duin; et al. (1998). "On combining classifiers". IEEE TPAMI. IEEE. 20 (3): 226–239. 17. Hyndman, Rob J; Athanasopoulos, George. 8.9 Seasonal ARIMA models. Forecasting: principles and practice. oTexts. Retrieved 19 May 2015. 18. Ray, Ram & Jacobs, Jennifer. (2006). Relationships among remotely sensed soil moisture, precipitation and landslide events. Natural Hazards. 43. 211-222. 10.1007/s11069-006-9095-9. 19. Asteriou D., Hall S. G., ARIMA Models and the Box-Jenkins Methodology, Applied Econometrics pp. 265286, 2011. 20. Hyndman R.J., Athanasopoulos G., Forecasting: Principles and Practice. 21. "The Kernel Trick". deepai.org. 22. Hastie, Trevor; Tibshirani, Robert; Friedman, Jerome (2008). The Elements of Statistical Learning (2nd ed.). Springer. ISBN 0-387-95284-5. 23. Ian H. Witten , Eibe Frank , Mark A. Hall, Data Mining: Practical Machine Learning Tools and Techniques, Morgan Kaufmann Publishers Inc., San Francisco, CA, 2011
INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in
REPORT BY APC/DC COMMITTEE 1. Has the student met the targets set for last year? (a) Mention the Achieved Targets:
(b) If not what are the major reasons? 2. Is there a reasonable target set for next year? Give detailed plan.
3. What is the perception of the student and guide(s) about the fraction of thesis work completed?
4. What is the approximate time scale for thesis submission (only for students in their 5th year or above for Ph.D. and 3rd year and above for M.S. students).
5. Any other observations of the committee.
INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in
Recommendation of APC/DC (Tick Appropriately) 1. (a) Continuation of Registration is Recommended/ Not Recommended. (b) Continuation of Scholarship/Research Assistantship Recommended/ Not Recommended. (c) Enhancement of Scholarship from JRF to SRF is Recommended/ Not Recommended (only after Two Year of Registration). 2. Source of Funding/Scholarship: 3. OVERALL PERFORMANCE: Very Good/Good/Satisfactory/Unsatisfactory 4. Any Other Recommendation/Comments (Attach separate sheet).
COMMITTEE MEMBERS S. No.
Faculty Name
School/Department Signature
1
Dr. Varun Dutt (guide)
SCEE, IIT Mandi
2
Dr. Venkata Uday Kala (Coguide)
SE, IIT Mandi
3
Dr. Arnav Bhavsar
SCEE, IIT Mandi
4
Dr. Shyamasree Dasgupta
SHSS, IIT Mandi
Remarks
5 6 Signature of the Supervisor Chairperson Date:
School Date: Associate Dean (Research) Date:
Note:
INDIAN INSTITUTE OF TECHNOLOGY MANDI MANDI- 175 001 (H.P.), INDIA www.iitmandi.ac.in (i)
Ph.D. Scholar shall, after Registration, submit a written report to Doctoral Committee in the required format, annually for the first three years, and every six months thereafter. (ii) M.S. Scholar shall, after Registration, submit annually a written report to Academic Progress Committee. Attach additional sheets if required.