Data Analytics In The Cloud Soa World

  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Data Analytics In The Cloud Soa World as PDF for free.

More details

  • Words: 2,322
  • Pages: 26
Open Source SOA in the Cloud: Data Analytics in the Cloud Tom Plunkett Michael Sick

[email protected] [email protected]

SOA World 2009

Overview

Data Analytics in the Cloud

Introductions

• Who are we? • Baselines & definitions

Opportunity

• Targeted Use Cases • Technical convergence & opportunities • Commercial opportunities & drivers

Technology & Standards

• State of current technology • Commercial & FOSS solutions • Hadoop Focus

Challenges

• Challenges to Meet Target Use Cases • Economic challenges & the role of “free” • Wide scale challenges in Cloud and data analytics

Questions

• Questions • Contacts

This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick 2

Data Analytics in the Cloud: Introductions

Introductions

Opportunity

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

Introductions

Opportunity

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick 3

Introductions

Opportunity

Tom Plunkett

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

Extensive Federal Government Experience IBM Certified SOA Solution Designer Patents Teach OOP and Java for Virginia Tech

This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick 4

Introductions

Opportunity

Michael Sick

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

Commercial & Federal Enterprise Architect Owner: Serene Software Inc. – EA Services Firm Clients include: BAE, USAF, Raytheon, BearingPoint, McGraw-Hill, Sun Microsystems, Badcock Furniture Fascinated by technology -15 years running

This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick 5

Introductions

Opportunity

Serene Software

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

• Serene is a boutique consulting company focusing on delivery of Enterprise Architecture services and solutions • Service Areas – IT Governance – IT Strategy – IT Cost Containment – Service Oriented Architectures (SOA) – IT Solution Selection – IT Audit & Analysis • Experience includes: BAE, USAF, Raytheon, BearingPoint, McGraw-Hill, Sun Microsystems, Badcock Furniture, … • Founded in 2003 (privately held, no debt) and headquartered in Jacksonville, FL

This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick 6

Introductions

Opportunity

Draft NIST Definition of Cloud Computing

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

A model for enabling convenient, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service provider interaction Essential Characteristics

Delivery Models

Deployment Models

• On-demand self-service

• Cloud Software as a Service (SaaS)

• Private cloud

• Ubiquitous network access • Location independent resource pooling • Rapid elasticity

• Cloud Platform as a Service (PaaS) • Cloud Infrastructure as a Service (IaaS)

• Community cloud • Public cloud • Hybrid cloud

• Measured Service

Source: Draft NIST Definition of Cloud Computing, 06/2009 This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick 7

Introductions

Opportunity

OSI Open Source Definition

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

Free Redistribution Source Code Derived Works Integrity of The Author's Source Code No Discrimination Against Persons or Groups No Discrimination Against Fields of Endeavor Distribution of License License Must Not Be Specific to a Product License Must Not Restrict Other Software License Must Be Technology-Neutral Source: http://www.opensource.org/docs/osd This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick 8

Introductions

Opportunity

The Open Group SOA Definition

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

Service-Oriented Architecture (SOA) is an architectural style that supports service orientation Service orientation is a way of thinking in terms of services and service-based development and the outcomes of services

Source: http://www.opengroup.org/projects/soa/doc.tpl?gdid=10632 This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick 9

Data Clouds & Data Grids – What‘s the difference?

Introductions

Opportunity

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

Often Data Clouds & Data Grids are used interchangeably, we make the following distinctions Data Grids

Data Clouds

• Grid computing system optimized to share large amounts of distributed data

• Focuses on perception of infinite storage, computing capacity

• Focus on technical capabilities

• Focus on cost, virtualization & flexible capacity

• Often combined with computational grid computing systems

• Enables scale-up/scale-down economics

• Data often moved to compute grid for use

• Data moved rarely, locality is a key feature

• Often oriented towards highly structured scientific data computing applications

• Clouds thus far focusing on column oriented, massively scalable data stores

Sources: Wikipedia & [Grossman 1] This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick 10

Introductions

Opportunity

Definition: Mashups

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

Web available resource that combines data/functions from two or more external resources Idea of mashup efforts is to reduce the cost of producing and consuming resources Integration should be fast, easy Often focuses on widely available formats/protocols like RSS or Atom over HTTP

This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick 11

Data Analytics in the Cloud: Opportunities

Introductions

Opportunity

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

Introductions

Opportunity

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick 12

Use Case: Cloud Data Analytical Tools for Intelligence Community Field Analyst

Introductions

Opportunity

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

Problem Statement: Analytical Tools Obsolete On Deployment, field analysts need timely, configurable data analytics. How does cloud based DA meet the needs of IC analysts Customer Problem

Cloud Analytical Tools Solution

• Traditional business intelligence tools require years to develop

• Recomposable Cloud Computing Data Analytical Tools

• Field Analysts confront situations which are rapidly changing • Petabytes of data require analysis

This work is licensed under a Creative Commons Attribution 3.0 United States License

– Apache Hadoop

Customer Value • Enabling field analysts to quickly build the analytical tool they need to analyze petabytes of data

– Mashups – Service-Oriented Architecture

Tom Plunkett & Michael Sick 13

Why the “Buzzword” Soup? Convergence of Capabilities

Cloud Computing

Data Analytics

SaaS

Mashups

This work is licensed under a Creative Commons Attribution 3.0 United States License

Opportunity

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

Convergence of capabilities New opportunities in breadth and depth of DA services

Free Open Source Software (FOSS)

Virtualization

Introductions

• Big Data: Cloud disk and data storage engines make petabyte environments available to new clients • Value Based Billing: Heavy use of FOSS in the cloud reduces costs directly & indirectly • Capacity Scaling: Scaling up/down of capacity in pay-go fashion makes DA available to wider audience • Composable UI’s: Capability to assemble DA results into various interfaces Tom Plunkett & Michael Sick

14

Introductions

Early Data Analytic Cloud Consumers/Providers

Government Organizations

DAaaS Providers This work is licensed under a Creative Commons Attribution 3.0 United States License

Questions

Big Internet Companies

• Yahoo, Amazon – can build DA on inf.

SaaS Companies

• Force.com – DA & Warehousing to SBA’s

Social Platforms

• Facebook – sell DA access to anon. user info

Insurers

• BCBS – private clouds across consortium

Healthcare & Biotech

• Kaiser Permanente – common DA services

Rating Agencies

• S & P – open DA cloud to customers

Intelligence Community

• CIA –private org-wide Cloud

Services

Example Companies

Services

Cloud DA Opportunities

Large datacentric Traditional Co’s

Challenges

Services

Internet Scale Service Providers

Types

Technology & Standards

Defense Managed Services • DISA -- offer DA to .mil clients Healthcare

• SSA – offer DA to fraud prevention analysts

DAaas Infrastructure

• Cloudera –managed Hadoop instances

SMB DAaaS Provider

• ?? – managed DAaaS, simplified, low cost

Services

Profile

Opportunity

Data Analytics in the Cloud

Tom Plunkett & Michael Sick 15

Data Analytics in the Cloud: Technology & Standards

Introductions

Opportunity

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

Introductions

Opportunity

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick 16

Introductions

Opportunity

Google MapReduce

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

Algorithm for computing distributed problems using a divide and conquer approach with a cluster of nodes Master node Maps input into smaller sub-problems and distributes the work to the cluster. A worker node may further map the work for a further cluster of nodes. The worker nodes then process the smaller problems, and return the answers back to the master node

Master node then Reduces the set of answers into the answer to the original problem

This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick 17

Introductions

Opportunity

Apache Hadoop

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

Open Source implementation of the MapReduce algorithms Hadoop can store and process petabytes of data Subprojects include HBase, Chukwa, Hive, Pig, and ZooKeeper Yahoo (more than 100,000 CPUs in >25,000 computers running Hadoop) and other companies make extensive use of Hadoop

This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick 18

As-Is Hadoop Simplified Reference Architecture

Chukwa

Zookeeper

This work is licensed under a Creative Commons Attribution 3.0 United States License

ETL

Opportunity

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

HBase

Apache Hadoop

Business Intelligence

Introductions

Pig

Structured Data Unstructured Data

Hive

Tom Plunkett & Michael Sick 19

Introductions

Opportunity

Apache Hadoop Sub-projects

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

Hadoop Subprojects

Capabilities

Example Companies

Chukwa

• Data collection system for monitoring and analyzing large distributed systems

• Yahoo

HBase

• Similar to Google’s BigTable • Distributed database for structured data • Multi-dimensional sorted map

• Yahoo

Hive

• Data warehouse infrastructure for large datasets • Hive QL query language

• Facebook

Pig

• High-level language for data analysis • Compiler for Map-Reduce programs

• Yahoo

Zookeeper

• Configuration, Naming, Distributed Synchronization, and group services

• Yahoo

This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick 20

Data Analytics in the Cloud: Challenges

Introductions

Opportunity

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

Introductions

Opportunity

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick 21

Introductions

Opportunity

To-Be Simplified Hadoop Architecture

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

REST API HBase SOAP API Business Intelligence

Query Language

Pig

Hive

Apache Hadoop Chukwa

Zookeeper

Structured Data Unstructured Data

Algorithm Library

ETL This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick 22

Introductions

Opportunity

Key Challenges

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

Infrastructure

Adoption

Emerging Challenges

Administration

Input & Analysis

Output

This work is licensed under a Creative Commons Attribution 3.0 United States License

Hardware

Speed of Rack Interconnects, Multi-core

Parallelization

Core platform, Data Analytic Components

Node Affinity

Make use of super nodes, XML i/o, en/de-crypt

Cost

“brutally efficient” pricing, FOSS advantages

Cost Models

Accurate, open models of CapEx, OpEx costs

Migration Pain

Full warehouse migration, ETL,

Ease of Admin.

Parallel current RDBMS, Warehouse admin

Debugging

Distributed debugging, integration w/ Provider

Flexible Provisioning

Multi-level provisioning – co., dept, individual

System Reporting

Reporting, audit trails, view to DA system

ETL Integration

Interface, metadata optimized for ETL loading

Intuitive API’s

Declarative & programmatic cross language

Product Integration

BI, Applications (SAP, Oracle Financial, Lawson)

Data Visualization

Viewing & drill down of very large data sets

Intuitive API’s

Declarative & programmatic cross language

Mashups/Dynamics

Easy discovery of data & functions & workflows Tom Plunkett & Michael Sick 23

Introductions

Opportunity

Solutions: Projected & In-Progress

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

Infrastructure

Adoption

Emerging Challenges

Administration

Input & Analysis

Output

This work is licensed under a Creative Commons Attribution 3.0 United States License

Hardware

Interconnect $$ dropping, hardware maturing

Parallelization

Platforms advance, market for components

Node Affinity

Discovery of capability, affinity into Hadoop, …

Cost

FOSS’s game to loose, small diff * a lot = a lot

Cost Models

Industry standard ROI/IRR models for CC

Migration Pain

Migration toolkits for traditional DW products

Ease of Admin.

Integrated & extended admin packages

Debugging

Commercial distributed debugging

Flexible Provisioning

Multi-level provisioning – co., dept, individual

System Reporting

Reporting, audit trails, view to DA system

ETL Integration

ETL interface, support of popular packages

Intuitive API’s

SQL like interface in core, language bindings

Product Integration

3rd party adaptors, IWay et al

Data Visualization

Modeling, meta-data, traceability, and new UI’s

Intuitive API’s

SQL like interface in core, language bindings

Mashups/Dynamics

Generic datatypes, discovery services Tom Plunkett & Michael Sick 24

Data Analytics in the Cloud: Questions

Introductions

Opportunity

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

Introductions

Opportunity

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick 25

Introductions

Opportunity

Question? & Contact Information

Data Analytics in the Cloud

Technology & Standards

Challenges

Questions

Principle Architect / Partner Michael A. Sick 888.777.1847 [email protected]

Cloud Computing Architect Tom Plunkett 888.777.1847 [email protected]

Address Serene Software 116 19th Ave. North, Suite 503 Jacksonville Beach, FL URL: www.serenesoftware.com

Address Serene Software 116 19th Ave. North, Suite 503 Jacksonville Beach, FL URL: www.serenesoftware.com

This work is licensed under a Creative Commons Attribution 3.0 United States License

Tom Plunkett & Michael Sick 26

Related Documents

Blogging In The Cloud
November 2019 17
Soa
November 2019 28
Soa
November 2019 24