Cloud Computing and the NetBeans IDE Enable the Army Research Laboratory’s NextGeneration Simulation System.
Ron Bowers
Dennis Reedy
Army Research Laboratory
Elastic Grid LLC. 3 June 2009
Agenda > > > >
Where we work and what we do Overview of the architecture Why and how we are using Cloud Computing Next steps and follow up
The Army Research Laboratory >
The Army Research Laboratory (ARL) is the Army's corporate basic and applied research laboratory. Our mission is to provide innovative science, technology, and analysis to enable full-spectrum operations.
>
We represent the Survivability/Lethality Analysis Directorate (SLAD) of ARL.
We have some experience with computers...
SLAD Mission Ensure that US personnel and equipment…
…survive and function effectively in hostile circumstances
Ballistic Threats
Nuclear, Biological and Chemical
Electronic Warfare
Information Warfare
SLAD performs both experimentation and modeling
SLAD Ballistic Vulnerability/Lethality (V/L) Modeling >
> >
>
SLAD’s primary tool for performing ballistic V/L analysis is MUVES. MUVES development began in 1984. The current version (2.16) is a single-threaded C application. We are currently developing MUVES 3, which is an all-new replacement system.
Program Focus >
Provide the next generation of simulation system for the V/L analyst community
Mostly Java. Dynamic distributed and service-oriented. Will support over 100 concurrent users. Incorporates a computational grid, parallelized system that distributes tasks and computes results that are graphically displayed. Will operate in both “batch” and interactive modes.
Interesting Challenges >
We have few servers, but many powerful workstations.
>
Required application assets vary
>
Architecture must exploit analyst community machines Share CPU, memory and disk Heterogeneous deployment environment Need real-time provisioning of application assets Must be able to route functionality to machines that are best capable of executing tasks/functions Must be able to scale on demand based on real time need and use of the system
Legacy of performance issues and nightmares
Solution Approach >
>
>
>
Choose technology that embraces dynamic distributed capabilities Craft a loosely coupled service oriented architecture that segments the system into functional roles Choose persistence technologies and approaches that allow for low latency and high concurrency Represent data as it moves through the system
>
In-flight (hot in-memory), Swap, Long Term, Archived
Keep Disk I/O out of the main stream processing
What’s Underneath Domain-specific Services and Algorithms Application Infrastructure Dynamic Container
Quality of Service
Monitoring and Management Persistence Management
What’s Underneath Domain-specific Services and Algorithms Application Infrastructure Dynamic Container
Quality of Service
Monitoring and Management Persistence Management JavaSpace
Apache Active MQ
Apache Derby
What’s Underneath Domain-specific Services and Algorithms Application Infrastructure Dynamic Container
Quality of Service Rio
Monitoring and Management Persistence Management
What’s Underneath Domain-specific Services and Algorithms Application Infrastructure Dynamic Container
Quality of Service
Monitoring and Management Persistence Management
Gomez
Gomez >
>
>
>
Was established as a prototype for MUVES 3 architectural enhancements. Now forms the basis of the MUVES 3 service-oriented architecture (SOA). Includes all of the non-sensitive services used by MUVES 3. Is an open source project (LGPLv3).
What’s Underneath Domain-specific Services and Algorithms Application Infrastructure Dynamic Container
Quality of Service
Monitoring and Management Persistence Management
MUVES 3 System Organization MUVES 3 UI Sim Client
• Executes analysis jobs.
Gateway
Persistence • Attachment point for clients. • Monitors system load. • Controls job submission.
• Stores analysis results.
MUVES 3 UI >
>
>
The MUVES 3 UI is built on the NetBeans Platform. Used for: Input preparation Team collaboration Job submittal and monitoring. Result visualization Components that interact with back-end services are developed in Gomez and used in the MUVES 3 UI.
Our NetBeans Experience >
Favorite things Swing-based Fast form development Easy deployment via JNLP Extremely well supported by the community
>
Biggest issue Integrating libraries that are updated frequently
MUVES 3 Execution Client
Gateway Sim Pool
Sim Pool Busy Sim Pool
Persistence
MUVES 3 Execution Gateway
Client Submit job
Sim Pool
Select Sim Pool
Sim Pool Busy Sim Pool
Persistence
MUVES 3 Execution Client
Gateway
Worker
Task Space
Worker
Worker
Worker
Job Monitor
Sim Pool
Persistence
MUVES 3 Execution Client
Gateway
Worker
Worker
Worker
Worker
Task Space
Ray Tracer
Vehicle Performance
Personnel Vulnerability
Job Monitor
Deploy additional services
Specialized Physics Sim Pool
Persistence
MUVES 3 Execution Client
Gateway
Worker
Worker
Worker
Worker
Submit job
Task Space
Ray Tracer
Vehicle Performance
Personnel Vulnerability
Job Monitor
Specialized Physics Sim Pool
Visualize Results
Store results Persistence
Dynamic Clustering Service Interface
Service Client Proxy injection Association declaration
Proxy
Service Service Selection Strategy
Rio
Service
Service injection
Service discovery
Available strategies: • Fail-Over – Uses one service unless that service goes away. • Round-Robin – Iterates over all discovered services. • Utilization – Like round-robin, but ignores services that are low on system resources.
The Persistence Meta-service > >
Stores the results of analyses. Consists of four layers:
> >
Layers are implemented using dynamic clustering. Supports data life-cycle management
>
In-memory cache (distributed JavaSpace), Swap, (Apache Active MQ) Long-term storage (Apache Derby), and Archive (Hibernate+Oracle).
Data storage is leased, and leases can expire. When the cache fills, results are moved to swap.
Must work for the next 20 years.
Cloud Computing >
Public Cloud:
>
Obvious national security issues
Private Cloud
Conceptually already there
Realtime provisioning of applications Dynamic allocation SLA based approach
Cloud Computing:Applicable for Testing? >
Goals
>
Issues
>
Demonstrate the performance and scalability of MUVES 3 over dozens (or hundreds :-) ) of computers. Execute multiple integration tests concurrently.
Small number of computers available locally for testing. Difficult coordination issues due to Army security policies.
Approach
Test the MUVES 3 architecture (Gomez) on Amazon Elastic Compute Cloud (EC2)
Cloud Adoption Challenges > >
>
>
>
Getting approval :) Administrative burden We don’t want to build AMIs or go through the time to provision an entire stack every night. Minimize changes Avoid developing special code and testing framework for cloud deployment/orchestration. Ideally, transparently switch from LAN-based deployment to the cloud. We must preserve the dynamic distributed semantics architected in the system: Service selection strategies Dynamic discovery semantics Run multiple concurrent test cases and roll up test results.
Cloud Adoption Approach >
Use Elastic Grid (EG)
Eases development and deployment of Java applications into the Cloud Provides automated management, fault detection, and scalability for the application Allows focus on development, not cloud infrastructure
Elastic Grid Overview >
Cloud Management Fabric
>
Dynamically instantiate, monitor & manage application components SLA policy driven with strategies like service scalability, relocation, fault detection & recovery, etc.
Cloud Virtualization Layer
Abstracts specific Cloud Computing provider technology Allows portability across specific implementations
You can deploy on Amazon EC2, Private LAN based Cloud; Eucalyptus, Sun Cloud and others soon
Cloud Activation & Deployment Groovy Client
Build, Create Release, S3 1 Upload Archive 4 Upload JUnit test results
2
3
Create Clusters Download and deploy application resources
5
Download Test results and post process
Deploy
Application Monitors
Application Agents
Cloud Activation & Deployment S3
Groovy Client
Test Cluster Test Cluster Test Cluster
Extending Continuous Integration > >
Automated build and test, both unit and integration tests Extend this to include Continuous Deployment
If CI passes, use Test Cloud Bursting to deploy, verify and validate system Deploy system and run tests
SVN Repo.
Test Runner
Retrieve and process test results
Next Steps >
Lots to think about
Summary >
>
> >
>
>
Using the public cloud for scalability testing and verification a good choice Without Elastic Grid we would have a much more difficult experience (may not have done it) Looking to expand metrics gathering & collection Looking to incorporate NetBeans as a client for cloud visualization We are looking to make this part of our permanent development environment How to spread the gospel of cloud computing for DoD
Ron Bowers
Dennis Reedy
Army Research Laboratory
Elastic Grid LLC
[email protected]
[email protected]
37