Analysis Of Hotel Pricing Data.docx

  • Uploaded by: anirudh chaudhary
  • 0
  • 0
  • April 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Analysis Of Hotel Pricing Data.docx as PDF for free.

More details

  • Words: 1,502
  • Pages: 7
Analysis of Hotel Pricing Data I am Karan Jain , pursuing my B.Tech in Manufacturing Process and Automation Engineering (M.P.A.E.)from N.S.I.T, New Delhi. I wish to present my finding for the MBA Salaries data set that you altruistically provided to me to eke out relevant insights from. After analyzing the data for a while and considering many variables, I realized the burdensome task analyzing is. I have been committed in writing insights and managerial relevance’s for articles that you provide to me while inculcating the knowledge that you provide. The actual time that the task takes and the decision regarding selecting the particular feature to come up with a particular acceptable model struck me today when I had to think like a data scientist.

Introduction Hotel Pricing is a complex phenomenon involving myriad of characteristics of be factored into account when conducting the analysis and to determine the correct price to be set for a particular room. For example, one would be ludicrous in setting the price of a hotel room located in a non metro, non tourism city for a price at which even those who fulfill the above mentioned criteria glower. The objective of the analysis is so to crunch the data for the given 42 cities, of which some are metro and some non metro, thus covering cities fulfilling many such criteria, to yield a model where if provided a new city and a hotel with a given city of features, we are able to more or less predict the price for a particular room depending upon these characteristics. Embroiling and engulfing those characteristics leaving irrelevant at the margin is the key here

Describing the Dataset The data set provided involves the following variables :

 Dependent Variable DECISION VARIABLE RoomRent

UNITS

MEANING

Rupees

Rent for the cheapest room, double occupancy, in Indian Rupees. Some hotels have more than one type of double occupancy room. For simplicity, we picked the cheapest room with double occupancy.

 External Factors Many external factors can potentially influence the RoomRent. The dataset captures some of these external factors, as explained below. VARIABLE Date

UNITS Text

IsWeekend

Dummy

IsNewYearEve CityName Population

Dummy Text Number

CityRank

Dummy

IsMetroCity

Dummy

IsTouristDestination

Dummy

MEANING We have hotel room rent data for the following 8 dates for each hotel: {Dec 31, Dec 25, Dec 24, Dec 18, Dec 21, Dec 28, Jan 4, Jan 8} If a hotel is sold out on a given date, assume that the price of the hotel room on the date it is sold out is the maximum price from the sample of dates for which prices are available. We use ‘0’ to indicate week days, ‘1’ to indicate weekend dates (Sat / Sun) ‘1’ for Dec 31, ‘0’ otherwise Name of the City where the Hotel is located e.g. Mumbai` Population of the City in 2011 (See Table A1 below) Rank order of City by Population (e.g. Mumbai = 0, Delhi = 1, so on); (See Table A1) ‘1’ if CityName is {Mumbai, Delhi, Kolkatta, Chennai}, ‘0’ otherwise We use ‘1’ if the city is primarily a tourist destination, ‘0’ otherwise. For example, Goa and Agra are primarily tourist destinations. We assume that most people who visit Goa and Agra and stay in their hotels are in these cities primarily for tourism.

 Internal Factors Many Hotel Features can influence the RoomRent. The dataset captures some of these internal factors, as explained below. VARIABLE HotelName

UNITS Text

MEANING e.g. Park Hyatt Goa Resort and Spa

StarRating Airport HotelAddress HotelPincode HotelDescription FreeWifi FreeBreakfast HotelCapacity HasSwimmingPool

Number km Text Number Text Dummy Dummy Number Dummy

e.g. 5 Distance between Hotel and closest major Airport e.g. Arrossim Beach, Cansaulim, Goa 403712 e.g. 5-star beachfront resort with spa, near Arossim Beach ‘1’ if the hotel offers Free Wifi, ‘0’ otherwise ‘1’ if the hotel offers Free Breakfast, ‘0’ otherwise e.g. 242. (enter ‘0’ if not available) ‘1’ if they have a swimming pool, ‘0’ otherwise

Getting Started in Interpreting results to go about calculating Hotel Room Prices: At first , one may find oneself in the dense jungle of unstructured data with this deluge of information of 42 cities which contain as many as 12 distinct features in helping you decide the cogent room price failing which all the effort goes down an erroneous path. I first drew a correlation diagram to get the basic idea in how to frame the variables and their relation in fluctuating room prices over varying metrics. I won’t discuss in detail the technicality of the approach but I would like to keep the reader in the thick of the developing situation which can be done by visuals, often effective. Here are the correlation diagrams which I split in two phases so that one can have a better understanding of the variables

With this you may be able to relate many of the variables but there was this correlation with room rent that held me flabbergasted as some of the variables showed no correlation whatsoever despite the logical relevance in the relation. What I acquired from the analysis was that there are some variables which we logically relate to room rent but are instead related to occupancy. We tend to think along the lines , higher the occupancy , cheaper the hotel which may not be true as it circumscribes certain conditions for that catastrophic conclusion to be

drawn. The variables which related to occupancy and not the room price were as follows:    

Airport Free Wifi service Weekends incidence Has Breakfast

Which in layman jargon would translate to the constraints: whether hotel is nearer to an airport or not, whether hotel provides wireless fidelity service or not, whether the day hotel sold out a room happens to be a weekend or not and whether the edifice believes in providing complimentary breakfast or not. I concluded the following logical table of relation which in first glance would let you acknowledge in advance the facsimile of model to be developed later in this report. S.no

Star Capacity Rating Room +ve +ve Rent

Swimming Tourist Destination +ve +ve

New Year Incidence +ve

Metro

Population

-ve

-ve

The aforementioned table clearly indicates the relation of a particular variable with Room Rent.

Linear Model Formulated: I now had the burden of using these 7 features aforementioned to determine a quality equation yielding me the price least variant from the original. But I wanted to include the interaction between variables to include the nuanced inter relation and reduce the isolated erroneous relations. Case 1) Consider the variables Metro and Tourist Attraction. I can really surmise these variables by considering the Tourist Attraction variable to determine the room

price with the condiment of interaction by the metro variable as a minor influencing metric. Case 2) Consider the variables New Year Eve and Tourist Attraction. Again following the same line of thought I would characterize New Year eve as an influencing metric. It can be logically related as hotel prices seem to escalate and shoot up when travelling around new year for the sheer demand even for petty rooms on the account of slew of travelers.

Thus my linear model included such interaction variables in addition with other variables. Here is a glimpse of my model and its summary characteristics: > summary(model7) Call: lm(formula = RoomRent ~ HasSwimmingPool + HotelCapacity + Population + (IsMetroCity:IsTouristDestination) + IsTouristDestination + StarRating + HasSwimmingPool:IsNewYearEve + IsNewYearEve:IsTouristDestination, data = pricingtrain) Residuals: Min 1Q Median -13653 -2356 -651

3Q Max 1062 309334

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -8.345e+03 4.428e+02 -18.845 < 2e-16 *** HasSwimmingPool 1.797e+03 1.961e+02 9.163 < 2e-16 *** HotelCapacity -1.113e+01 1.252e+00 -8.893 < 2e-16 *** Population -6.864e-05 3.256e-05 -2.108 0.0350 * IsTouristDestination 2.307e+03 2.033e+02 11.351 < 2e-16 *** StarRating 3.699e+03 1.333e+02 27.749 < 2e-16 *** IsMetroCity:IsTouristDestination -1.447e+03 3.474e+02 -4.165 3.14e-05 *** HasSwimmingPool:IsNewYearEve 1.768e+03 4.471e+02 3.954 7.74e-05 *** IsTouristDestination:IsNewYearEve 7.727e+02 3.213e+02 2.405 0.0162 * --Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 6908 on 9915 degrees of freedom Multiple R-squared: 0.1795, Adjusted R-squared: 0.1788 F-statistic: 271.1 on 8 and 9915 DF, p-value: < 2.2e-16

The variable ‘7’ appended after model clearly indicates the times I failed to yield a coherent analysis or failed to substitute a variable with a better one, or ended up adding a redundant variables. Conclusion: I would like to state the following statement to help hotel manager follow the statement and rate the room as the most expensive. Clustering according to a statement which I yielded from a model is my conclusion. It is as follows A hotel which has a swimming pool, considerable capacity, a high satiating rating for comfort, is located in a town which has amenities of a cosmopolitan city but is mainly a tourist attraction should classify themselves as wheat from the chaffe. If the day of soliciting rooms happens to be on a new year, Hotel Company might end up turning a downslide in revenue, if there was one.

Related Documents


More Documents from "api-19665029"