Part 1: Hadoop A demo about setting up a hadoop cluster with PoolParty
Service oriented deployment • I want to deploy a service (http/mail/etc.)
• I don’t care about the platform • I want it to work • I want to do it a million times • I want it now
Self managing
Gruntwork Why?
Let’s be • Cutting-edge • Intuitive • Tool-driven • Lazy
Tools Because we are human
And not apes
Big frontier lots of settlements • Shell scripts • Capistrano • Package managers (apt, yum, tpkg)
• Chef • Puppet
PoolParty Enjoyable cloud infrastructure
Demo
Part 2 Distributed Algorithms Discussion of distributed algorithms, the Hermes project and “nosql”
What? • A distributed algorithm is an
algorithm designed to run on computer hardware constructed from interconnected processors. (Wikipedia)
Why? • Because scale is becoming increasingly important
• “Datacenters” are becoming accessible
• Commodity hardware is cheap • Network is cheaper
When?
• Now
Assorted types • MapReduce • Atomic Commit • Consensus • Mutual exclusion (distributed mutex)
• Distributed search
Why it’s easy
• Math is fun
Why it’s hard • Account for failure • Unsafe networks • Data sharding • Job distribution
Decentralization assumptions • Nodes are prone to failure • Nodes are homogenous* • The dataset is large • Nodes are cheap (easy to add/remove)
• Network is unowned
Scale • Greater utilization of hardware • Inexpensive • Cooperative application space • And it’s green
NoSql • Scaling relational databases is not easy and seriously no fun at all
• Key/Value stores are easier to scale
NoSql • BigTable (column oriented database)
• Cassandra • Voldemort • Scalaris
Paxos • An algorithm for deciding
consensus within a network of unreliable nodes.
• “Transaction” layer • Atomic commits • Strong data consistency
Paxos • Devised by Leslie Lamport in 1990 • Published in 1998 • Based on viewsource replication (published 2 years earlier)
Big names • Google’s Chubby (and BigTable) • IBM San Volume Controller • Microsoft
What?
What? (cont’d)
What? (cont’d)
What? (cont’d)
What? (cont’d)
What? (cont’d)
What? (cont’d)
Hermes Open-source internode communication project
What • Erlang-y • Consensus algorithms • Distributed mutex • Mapping/Reduction
Where? (almost) http://github.com/auser/hermes/tree/master
Thanks
[email protected]
Thanks • Ari Lerner • AT&T CloudTeam • And all the various funny image sources
• irc.freenode.net/#poolpartyrb • You