Scaling PostgreSQL Under Fire Gavin M. Roy Chief Technology Officer myYearbook.com pgDay Europe 2009
About myYearbook.com 2007 - 100M Page views per Month 2009 - 1.5B Page views per Month Top 5 Social Network in the United States as measured by Hitwise Top 25 trafficked site in the United States as measured by ComScore 99% Uptime
The 1am Phone Call
Growth is a double-edged sword.
Database Project Considerations • Good Schema • Is it designed to Scale? • Is it designed to be Maintained? • Good Hardware • Will you have enough space for unexpected growth? • Will it be fast enough to handled additional load? • Will it be stable in production under load?
Database Project Considerations • Good Planning • What will you do when there is a failure? • How long will it take to recover? • What kind of failures will you have? • How will you handle upgrades and downtime?
Internet Startup Growth Cycle 1. Prototype 2. Launch 3. Re-Engineer (Fix problems) 4. Add new functionality 5. Repeat Steps 3 and 4
Internet Startup Growth Cycle • Steps 1 & 2 • Limited Budget • Limited Time & Resources • Steps 3 & 4 • Increased Budget • Limited Time & Resources
“The best laid schemes o’ mice an’ men Gang aft agley” - Robert Burns, To a Mouse
Prepare for Growth • Hardware • CPU Horsepower based upon need • Disk based upon need • RAM based upon budget. • Get 2
“The Cloud”
Plan for Growth
Concurrency & Data Growth
Concurrency in PostgreSQL • Each running PostgreSQL connection carries overhead • More connections == Slower queries • Pool your connections • pgBouncer - light weight libevent based pooling daemon • pgPool II - Does pooling and much, much more • Language specific pooling
Table Partitioning • Supported in PostgreSQL as of 8.1 • Excellent method for maintaining data • Allows for removal of aged data without bloat • Focused SELECTS while allowing ad-hoc SELECT across all partitions
Vertical Partitioning Data • Isolate application data in different database servers • Replicate common data needed for joins
Horizontally Partitioning Data • Spread table data across multiple database servers • Use a CRC or Hashing algorithm to determine server location of data • Roll your own in your client code • Use pl/Proxy • Plan for growth, use multiple server slots per server
“Anything that can possibly go wrong, does.” - Jack Sack
Recovering from Failures • Daily backup is not enough • Disaster recovery option • Replicate data for failover and maintenance • via Replication Tools like Londiste, Bucardo and Slony • via Warm Standby (PITR Log Shipping) • Hot standby? Maybe in 8.5
PostgreSQL is YeSQL
Saying no to the naysayers
No[SQL] is Not an Option* • Key/Value store databases are not new, they’re just trendy • Facebook still has to use Memcached even though it developed the Cassandra distributed key/value database. • Same type of developers who jumped on MySQL because “it was fast” • Data without schema has limited use • Ad-hoc reporting is a key business value that is realistically unachievable in non-structured data • Find the balance of speed and simplicity with normalization
How to kill PostgreSQL performance.
Inadequate Hardware • Not enough RAM • Not enough disk bandwidth • Depending on topology • Not enough disks • Not enough controllers • Slow communication bus • Over-saturated CPU
Lock Contention == Death • Nothing brings a server to its knees faster than a long running Exclusive Lock • Even high share count locks go hand-and-hand with slowness • Reduce lock contention • Use partitioning schemes • Use concurrent operations for maintenance related activities
Bloat == Slower Death • MVCC is both integral to PostgreSQL performance and impacts performance • Enter Heap Only Tuples (HOT) • Updates to non-indexed columns in rows can re-use the same tuple • Dead tuples can be re-used • Index Bloat • Massive index bloat can occur in high-write transaction databases • Address by concurrently reindex tables without locking*
“Knowledge is Power” - Sir Francis Bacon
Reacting can be Proactive • check_postgres.pl by Greg Mullane • Nagios plugin • Bloat • Management activity such as last analyze and vacuum • wal file count, txid wrap around, sequence exhaustion • many other items it checks
Be Trendy • Know your database behavior over time • Predict future issues and behavior • Identify issues as they occur • Review impact of maintenance • Know if your heap use exceeds your index use • Look for daily change
Trending and Analysis at myYearbook.com • Cacti - http://www.cacti.net • Posuta - http://code.google.com/p/posuta/ • pgFouine - http://pgfouine.projects.postgresql.org/ • Staplr - http://github.com/gmr/staplr
Fin • Questions? • Follow me on Twitter: http://twitter.com/Crad • Leave feedback: http://2009.pgday.eu/feedback