Open Source World '09 Scaling Web Applications

  • Uploaded by: Gavin M. Roy
  • 0
  • 0
  • May 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Open Source World '09 Scaling Web Applications as PDF for free.

More details

  • Words: 1,065
  • Pages: 36
Scaling Web Applications Gavin M. Roy, Chief Technology Officer - myYearbook.com 2009-08-12

Wednesday, August 12, 2009

About myYearbook.com ✤

Social Network, Top Teen Destination



2007 ✤



2009 ✤



100M Page views per Month

1.5B Page views per Month

Top 20 Page view site as ranked by comScore

Wednesday, August 12, 2009

“It’s a series of tubes.” Former US Senator Ted Stevens

Wednesday, August 12, 2009

Early Stages



Emphasis on the application



Usually Monolithic and tightly coupled



Many base architectural decisions are made

Wednesday, August 12, 2009

Traffic Influx Playing whack-a-mole: the price of popularity

Wednesday, August 12, 2009

Pressure Point: Database ✤

Database Strain ✤

High IO



Slow Queries



Single Point of Failure



Maintenance



High Connection Counts

Wednesday, August 12, 2009

Solutions for Database Scaling ✤

Many methods for scaling databases ✤

Replication



Sharding



Removing Joins



Moving data to different servers



Caching

Wednesday, August 12, 2009

Solving Database IO Problems ✤

Database in Memory ✤

Expensive!



Linux Kernel Disk Buffer Cache



Works Very Well



Faster Disks



Direct Attached > Remote Storage (SAN/NAS/etc)

Wednesday, August 12, 2009

Solving Slow Queries



Programmers are not DBA’s



Proper indexing



Remove Complexity by reducing Joins



Data partitioning in OLTP



Good Schema > Fully de-normalized data

Wednesday, August 12, 2009

Solving Slow Queries ✤

Sharding by User ✤

CRC of Unique Key determines server where data lives



PostgreSQL via pl/proxy



MySQL via client code



More complex to back-up



Support partial outages

Wednesday, August 12, 2009

Database Scaling via Replication ✤



Multi-Master Replication ✤

All servers must agree on data prior to commits (slow).



Consistent data across all servers

Master-Slave Replication ✤

All writes on master server



Reads distributed across slaves

Wednesday, August 12, 2009

Service Oriented Architecture



Databases separated by application or service



Service data is stand alone from primary data



May need data from primary service



Need tools to keep data relevant



Allows for distributed application architecture

Wednesday, August 12, 2009

Solving Large Connection Counts



Connection Pooling ✤

Java has this built in via JDBC connection pooling



Most other languages do not



Not Persistent Connections



Maintains Persistent Connections

Wednesday, August 12, 2009

Database Connection Pooling ✤



PostgreSQL ✤

pgBouncer - Skype



pgPool

MySQL ✤

DBSlayer - NY Times



Cherokee Web Server

Wednesday, August 12, 2009

Caching ✤

Everyone’s favorite daemon: memcached ✤

Clients automatically pool and shard data



Key / Value Store



Volatile*



Economy of Scale ✤

Failure of small memcache cluster none is impacting



Many nodes make light work

Wednesday, August 12, 2009

Caching Bottleneck: Network



Cache servers can transfer lots of data quickly



Keep regular insight into your network health

Wednesday, August 12, 2009

Removing Bottlenecks: Caching ✤



Local Caching ✤

Shared Memory (System V Shared Memory / Posix)



PHP - APC / Zend



File-system



memcached on web server?

Multi-tier caching, local caching then memcached cluster caching

Wednesday, August 12, 2009

Pressure Point: Web Application ✤

Load Balancing



Tightly Coupled



High CPU Utilization



High IO and Shared Filesystem Data



Scaling Across Multiple Servers



Application Deployment

Wednesday, August 12, 2009

Spreading the Load ✤

Hardware ✤

Expensive



Software



By Service



DNS Round Robin ✤



Relies on Client

Content Delivery Network

Wednesday, August 12, 2009

Decoupling Your Application





Web Applications spend a lot of time waiting ✤

Database connection time



Database query time - Even writes



Caching service connection time

Enter Message Queues and Remote Workers

Wednesday, August 12, 2009

Decoupling via Message Queues ✤

Offload data writes and actions



Offers granularity and control of impact



Scale workers to scale transaction rate



Many good brokers and standards ✤

RabbitMQ, ActiveMQ, ZeroMQ, OpenAMQP, Java Message Service



Be language / vendor neutral

Wednesday, August 12, 2009

Decoupling via Remote Workers



Message Queues



Gearman



Hadoop



Map / Reduce



Offload costly CPU activity to farms of machines

Wednesday, August 12, 2009

Resolving File IO Slowness



Scale-Out NAS



Facebook’s Haystack



PHP: Optcode Caching



Logging, syslog to remote servers

Wednesday, August 12, 2009

Scaling Across Many Servers ✤



Common Configuration ✤

memcached / tokyo tyrant



NAS stored configuration files

Health Checks ✤

Function of Load Balancers



Need for additional insight

Wednesday, August 12, 2009

Application Deployment ✤

Pull vs Push ✤

Push works with low qty of hosts ✤

Easy! Use scripts and ssh



Pull upon notification of code updates



Version control ✤

Include an internal API call that returns version information

Wednesday, August 12, 2009

Knowing Your Pipes Why is the site down?

Wednesday, August 12, 2009

Clogged Pipes? ✤

CPU Pegged



Database Bloat



Disk Issues



Network Issues ✤

Lots of Traffic



Hardware Problems?



Crashed Daemons



Crashed Servers

Wednesday, August 12, 2009

Monitor Your Network ✤

Servers ✤





System Profile (CPU, Memory, Disk)

Network Hardware ✤

Bandwidth



System Profile

Daemons

Wednesday, August 12, 2009

Monitor Your Network ✤



Use Internal Services ✤

Nagios



Hyperic



OpenNMS

Use External Services ✤

Gomez



Alertra



Pingdom

Wednesday, August 12, 2009

Monitor your Application



Build Profiling into Application ✤

How much time do I spend connecting to a database?



How much time do I spend in business logic?



Sample profile data in Application

Wednesday, August 12, 2009

Trending Application Behavior



How can I predict future utilization based upon existing growth?



Identify trends in application behaviors



Troubleshoot when something is just not right

Wednesday, August 12, 2009

Staplr Or what’s my entire site up to?

Wednesday, August 12, 2009

Staplr: PostgreSQL

Wednesday, August 12, 2009

Staplr ✤

Pollers ✤

Apache HTTP, Lighttpd, Cherokee



SNMP (APC PDU’s, Isilon NAS, Net-SNMPD)



Memcache



MySQL



PgBouncer



PostgreSQL



SysStat



and many more

Wednesday, August 12, 2009

Monitor Your Application ✤

Use client profiling ✤

JavaScript sends statistical data to server



Profile JavaScript action times



Page load times



Abandonment rates



Analytical data

Wednesday, August 12, 2009

Questions?



Follow me on twitter: http://twitter.com/crad



Contact me via email: [email protected]

Wednesday, August 12, 2009

Related Documents

Web 2.0 Y Open Source
June 2020 2
Open Source
May 2020 36
Open Source
May 2020 27
Open Source
November 2019 48
Open Source
November 2019 50

More Documents from ""