Assignment.docx

  • Uploaded by: himanshu
  • 0
  • 0
  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Assignment.docx as PDF for free.

More details

  • Words: 2,284
  • Pages: 9
ASSIGNMENT-01 SUBMITTED BY- ASHUTOSH KUMAR

Q1-Describe big data system with respect to its application. ANSBig data is a term that describes the large volume of data – both structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves. Big Data is also data but with a huge size. Big Data is a term used to describe a collection of data that is huge in size and yet growing exponentially with time. In short such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently. Examples of Big Data Social Media A single Jet engine can generate 10+terabytes of data in 30 minutes of flight time

Q2. Explain 3V with respect to big data. ANS

Volume – The name Big Data itself is related to a size which is enormous. Size of data plays a very crucial role in determining value out of data. Also, whether a particular data can actually be considered as a Big Data or not, is dependent upon the volume of data.

Hence, 'Volume' is one characteristic which needs to be considered while dealing with Big Data. 

Variety – The next aspect of Big Data is its variety. Variety refers to heterogeneous sources and the nature of data, both structured and unstructured. During earlier days, spreadsheets and databases were the only sources of data considered by most of the applications. Nowadays, data in the form of emails, photos, videos, monitoring devices, PDFs, audio, etc. are also being considered in the analysis applications. This variety of unstructured data poses certain issues for storage, mining and analyzing data.



Velocity – The term 'velocity' refers to the speed of generation of data. How fast the data is generated and processed to meet the demands, determines real potential in the data. Big Data Velocity deals with the speed at which data flows in from sources like business processes, application logs, networks, and social media sites, sensors, Mobile devices, etc. The flow of data is massive and continuous.

Q3. Describe Healthcare Analysis for big data. ANSThe application of big data analytics in healthcare has a lot of positive and also life-saving outcomes. Big data refers to the vast quantities of information created by the digitization of everything that gets consolidated and analyzed by specific technologies. Applied to healthcare, it will use specific health data of a population (or of a particular individual) and potentially help to prevent epidemics, cure disease, cut down costs, etc. Big Data Applications in Healthcare 

Patients Predictions for an Improved Staffing

or our first example of big data in healthcare, we will look at one classic problem that any shift manager faces: how many people do I put on staff at any given time period? If you put on too

many workers, you run the risk of having unnecessary labor costs add up. Too few workers, you can have poor customer service outcomes – which can be fatal for patients in that industry. Big data is helping to solve this problem, at least at a few hospitals in Paris. A Forbes article details how four hospitals which are part of the Assistance Publique-Hôpitaux de Paris have been using data from a variety of sources to come up with daily and hourly predictions of how many patients are expected to be at each hospital. 

Electronic Health Records (EHRs)

It’s the most widespread application of big data in medicine. Every patient has his own digital record which includes demographics, medical history, allergies, laboratory test results etc. Records are shared via secure information systems and are available for providers from both public and private sector. Every record is comprised of one modifiable file, which means that doctors can implement changes over time with no paperwork and no danger of data replication. 

Real-Time Alerting

Other examples of big data analytics in healthcare share one crucial functionality – real-time alerting. In hospitals, Clinical Decision Support (CDS) software analyzes medical data on the spot, providing health practitioners with advice as they make prescriptive decisions. However, doctors want patients to stay away from hospitals to avoid costly in-house treatments. Analytics, already trending as one of the business intelligence buzzwords in 2019, has the potential to become part of a new strategy. Wearable will collect patients’ health data continuously and send this data to the cloud. 

Prevent Opioid Abuse in the US

Our fourth example of big data healthcare is tackling a serious problem in the US. Here’s a sobering fact: as of this year, overdoses from misused opioids have caused more accidental deaths in the U.S. than road accidents, which were previously the most common cause of accidental death.

Q4. Difference between Structured, Unstructured and Siloed data. ANS

Structured

Any data that can be stored, accessed and processed in the form of fixed format is termed as a 'structured' data. Over the period of time, talent in computer science has achieved greater success in developing techniques for working with such kind of data (where the format is well known in advance) and also deriving value out of it. However, nowadays, we are foreseeing issues when a size of such data grows to a huge extent, typical sizes are being in the rage of multiple zettabytes. Examples of Structured Data An 'Employee' table in a database is an example of Structured Data 

Unstructured

Any data with unknown form or the structure is classified as unstructured data. In addition to the size being huge, un-structured data poses multiple challenges in terms of its processing for deriving value out of it. A typical example of unstructured data is a heterogeneous data source containing a combination of simple text files, images, videos etc. Now day organizations have wealth of data available with them but unfortunately, they don't know how to derive value out of it since this data is in its raw form or unstructured format. Examples of Un-structured Data The output returned by 'Google Search'



Siloed data.

Semi-structured data can contain both the forms of data. We can see semi-structured data as a structured in form but it is actually not defined with e.g. a table definition in relational DBMS. Example of semi-structured data is a data represented in an XML file. Examples Of Semi-structured Data Personal data stored in an XML file-

Q5. Discuss what launched the Big Data Era. ANS

signal changing timing



store all of the world music



5billion mobile user in 2010



74billion MasterCard user per year



1million customer transaction per year wall mart



Accurate weather prediction

Q6. Describe 6V of big data. ANS- The 6V are:

Volume: The ability to ingest process and store very large datasets. The data can be generated by machine, network, human interactions on system etc. The emergence of highly scalable low-cost data processing technology platforms helps to support such huge volumes. The data is measured in pet bytes or even Exabyte.



Velocity: Speed of data generation and frequency of delivery. The data flow is massive and continuous which is valuable to researchers as well as business for decision making for strategic competitive advantages and ROI. For processing of data with high velocity

tools for data processing known as Streaming analytics were introduced. Sampling data helps in sorting issues with volume and velocity. 

Variety: It refers to data from different sources and types which may be structured or unstructured. The unstructured data creates problems for storage, data mining and analyzing the data. With the growth of data, even the type of data has been growing fast.



Variability: This refers to establishing if the contextualizing structure of the data stream is regular and dependable even in conditions of extreme unpredictability. It defines the need to get meaningful data considering all possible circumstances.



Veracity: It refers to the biases, noises and abnormality in data. This is where we need to be able to identify the relevance of data and ensure data cleansing is done to only store valuable data. Verify that the data is suitable for its intended purpose and usable within the analytic model. The data is to be tested against a set of defined criteria.



Value: Refers to purpose, scenario or business outcome that the analytical solution has to address. Does the data have value, if not is it worth being stored or collected? The analysis needs to be performed to meet the ethical considerations.

Q8. Examine Big Data Contributions to marketing ANS- Big data is more than just a buzzword. In fact, the huge amounts of data that we're gathering could well change all areas of our life, from improving healthcare outcomes to helping to manage traffic levels in metropolitan areas and, of course, making our marketing campaigns far more powerful. 

More targeted advertising

As publishers gather more and more data about their visitors, it'll enable them to serve up more and more relevant advertising. In the same way that Google and Face book already offer up

detailed targeting options, third-party vendors will offer the same array of choice. Imagine being able to target people based on the articles that they've read or based on a lookalike audience of your ideal reader. 

Semantic search

Semantic search is the process of searching in natural language terms instead of in the short burst of keywords that we're more used to. Big data and machine learning make it easier for search engines to fully understand what a user is searching for, and smart marketers are beginning to incorporate this into their site search functionality to improve the user experience for their visitors. 

More relevant content

In the same way that Netflix can serve up personalized recommendations, publishers will be able to serve up more relevant content to their visitors by tapping into their wealth of data to determine which content people are most likely to enjoy. Even content marketers will be able to get into the job, and digital marketers will need to learn to stop thinking of their blog as a static site. In the same way that you get different results when you Google the same phrase in different locations, your blog should look different depending upon who's looking at it.

Q9. Explain astronomical Scale of Data definition ANS - Astronomy is the study of the universe, and when studying the universe, we often deal with unbelievable sizes and unfathomable distances. To help us get a better understanding of these sizes and distances, we can put them to scale. Scale is the ratio between the actual object and a model of that object. Some common examples of scaled objects are maps, toy model kits, and statues. Maps and toy model kits are usually much smaller than the object it represents, whereas statues are normally larger than its analog. Example Sheet:-

Determine the Ratio 1. Select the fitness ball as the object that the Sun will be scaled to. 2. The diameter of the fitness ball is 24”, and the diameter of the Sun is 1,400,000𝑘𝑚. 3. Use the metric system. 4. 2.54𝑐𝑚 = 1”, so: 2.54𝑐𝑚 1 ∗ 24 ≈ 61𝑐𝑚 5. The fitness ball needs to be in meters, so: 61𝑐𝑚 1 ∗ 1𝑚 100𝑐𝑚 = 0.61𝑚 And the Sun needs to be in meters as well, so: 1,400,000𝑘𝑚 1 ∗ 1,000𝑚 1𝑘𝑚 = 1,400,000,000𝑚 6. To get the ratio, we divide the diameter of the fitness ball by the Sun: 0.61m/1,400,000,000m≈ 4.4E-10 Solar System Object

Actual

Actual Diameter (𝑚)

Diameter

Scaled

Diameter

(𝑐𝑚)

(𝑘𝑚) Sun

1,400,000

1,400,000,000

61

Mercury

4,900

4,900,000

0.21

Venus

12,000

12,000,000

0.52

Earth

13,000

13,000,000

0.57

Mars

6,800

6,800,000

0.30

Jupiter

140,000

140,000,000

0.61

Uranus

51,000

51,000,000

0.22

Q.10 Discuss the limitations of Hadoop? ANS- Hadoop is an open-source software framework for distributed storage and distributed processing of extremely large data sets. Big Limitations of Hadoop for Big Data Analytics

Issue with Small Files Hadoop is not suited for small data. System lacks the ability to efficiently support the random reading of small files because of its high capacity design. Small files are the major problem in HDFS. A small file is significantly smaller than the HDFS block size (default 128MB). If we are storing these huge numbers of small files, HDFS can’t handle these lots of files, as HDFS was designed to work properly with a small number of large files for storing large data sets rather than a large number of small files

Slow Processing Speed In Hadoop, with a parallel and distributed algorithm, Map Reduce process large data sets. There are tasks that need to be performed: Map and Reduce and, Map Reduce requires a lot of time to perform these tasks thereby increasing latency. Data is distributed and processed over the cluster in Map Reduce which increases the time and reduces processing speed.

No Real-time Data Processing Apache Hadoop is designed for batch processing, that means it take a huge amount of data in input, process it and produce the result. Although batch processing is very efficient for processing a high volume of data, but depending on the size of the data being processed and computational power of the system, an output can be delayed significantly. Hadoop is not suitable for Real-time data processing. No Delta Iteration Hadoop is not so efficient for iterative processing, as Hadoop does not support cyclic data flow (i.e. a chain of stages in which each output of the previous stage is the input to the next stage).

More Documents from "himanshu"