Starling + Workling: Simple Distributed Background Jobs With Twitter's Queuing System Presentation

  • Uploaded by: Oleksiy Kovyrin
  • 0
  • 0
  • October 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Starling + Workling: Simple Distributed Background Jobs With Twitter's Queuing System Presentation as PDF for free.

More details

  • Words: 2,771
  • Pages: 101
PLAY/TYPE

Starling + Workling: simple distributed background jobs with Twitter’s queuing system

morning, hi, my name is rany keddo, i run a little startup in frankfurt called play/type.

PLAY/TYPE

git clone \ git://github.com/purzelrakete/cows-not-kittens.git

OR...

• sudo gem install gitjour • gitjour list • gitjour clone cows_not_kittens you might want to start by grabbing the demo project. i’m serving it.

PLAY/TYPE

WTF?

this talk is about running code asynchronously in your rails application. this means removing long running or side effect code from your request cycle.

for some reason cows started to creep into the slides while i was working on them. hoping to start a trend *away* from cats in tech presentations... you’ll see this reflected in the example project. make sure it’s working by rake db:migrate && starting irb in the project, then you type

PLAY/TYPE

>> CowSubsystem.moo

PLAY/TYPE

Cows not Kittens

this is an example app to demonstrate why you need background work, and how you can do this. also, you can milk cows with this application.

PLAY/TYPE

1 class CowsController < ApplicationController 2 resource_this 3 4 # milking has the side effect of causing 5 # the cow to moo. we don't want to 6 # wait for this while milking, though, 7 # it would be a terrible waste ouf our time. 8 def milk 9 @cow = Cow.find(params[:id]) 10 @cow.milk 11 end 12 end

PLAY/TYPE

1 class Cow < ActiveRecord::Base 2 3 # TODO: SAP integration 4 def milk 5 moo 6 end 7 8 # Bothersome side-effect 9 def moo 10 CowSubsystem.moo 11 end 12 end

PLAY/TYPE

Milk it

show the application

PLAY/TYPE

Real Examples

lets look at some real examples!

PLAY/TYPE

1 AnalyticsHit.create \ 2 :potential_user_id => potential_user_id, 3 :event => "converted" 4

thinking: doesnt really belong in the request cycle: statistics.

PLAY/TYPE

1 class PageView < ActiveRecord::Base 2 belongs_to :viewer 3 belongs_to :viewable, :polymorphic => true 4 end

nor does this: stuff that does not have to have an immediate effect on the page you’re rendering.

PLAY/TYPE

1 CommentMailer.deliver_created(comment) 2

or this, thinking: this should be put in the background, really.

PLAY/TYPE

1 Blackbook.get \ 2 :username => "[email protected]", 3 :password => "milky"

and especially this sort of long running process - scraping contacts from webmailer.

PLAY/TYPE

Wherefore art thou, Rails?

no active* way of handling this... something consistent that works for almost everybody. instead: too many options. that’s why people come to this sort of talk.

You are a snowflake again.

which solution will you tie yourself to? decide now, because doing background stuff last is like deciding to write your tests *after* the code is done.

PLAY/TYPE

Trust nobody!

my solution to this: remain independent of all these background technologies, by building a little worker framework with providers, like active record.

Workling

wrote workling. aims of workling are...

PLAY/TYPE

Workling • Easy plugging of new Job Runners

wrote workling. aims of workling are...

PLAY/TYPE

Workling • Easy plugging of new Job Runners • Nice Rails integration

wrote workling. aims of workling are...

PLAY/TYPE

Workling • Easy plugging of new Job Runners • Nice Rails integration • Plays nicely with tests

wrote workling. aims of workling are...

PLAY/TYPE

Workling • Easy plugging of new Job Runners • Nice Rails integration • Plays nicely with tests • Lightweight and hackable

wrote workling. aims of workling are...

PLAY/TYPE

PLAY/TYPE

1 script/plugin install \ 2 git://github.com/purzelrakete/workling.git 3 4 script/plugin install \ 5 git://github.com/tra/spawn.git

workling will automatically use spawn if it is installed.

PLAY/TYPE

create a worker class in app/workers

PLAY/TYPE

1 2 3 4 5 6 7 8 9 10 11

# # handle asynchronous mooing. # class CowWorker < Workling::Base # let the moo-ings begin! def moo(options = {}) cow = Cow.find(options[:id]) cow.moo end end

subclass workling:base, add a method. you need to have an options argument.

PLAY/TYPE

1 class Cow < ActiveRecord::Base 2 3 # TODO: SAP integration 4 def milk 5 CowWorker.async_moo(:id => id) 6 end 7 8 # bothersome side-effect 9 def moo 10 CowSubsystem.moo 11 end 12 end

now make the asynch call in your milk method.

PLAY/TYPE

Milk it!

PLAY/TYPE

What’s Spawn? 1 script/plugin install \ 2 git://github.com/tra/spawn.git

explain what’s going on here... we’ve used spawn as a runner for workling. what’s spawn?

PLAY/TYPE

1 spawn do 2 logger.info("I feel sleepy...") 3 sleep 11 4 logger.info("Time to wake up!") 5 end

by itself you can run it like this. it will fork the process....

PLAY/TYPE

1 >> fork { sleep 100 } 2 => 1060

like this, basically, but with all rails fixes and tweaks in place. above: drops to unix, the OS copies the process & creates a child process. try this in your console and use top to look at the processes.

PLAY/TYPE

workling + spawn inherits these traits.

PLAY/TYPE

• Fast. Happens at OS level

workling + spawn inherits these traits.

PLAY/TYPE

• Fast. Happens at OS level • Rails copy can be big. Irb says ~35MB

workling + spawn inherits these traits.

PLAY/TYPE

• Fast. Happens at OS level • Rails copy can be big. Irb says ~35MB • Local. Happening on same Machine

workling + spawn inherits these traits.

PLAY/TYPE

• Fast. Happens at OS level • Rails copy can be big. Irb says ~35MB • Local. Happening on same Machine • Kill scenario - no persistence, job lost

workling + spawn inherits these traits.

PLAY/TYPE

“ Twitter’s Evan Weaver and nesting friend. ...If you just want to fire and forget a local process as you say, I think Spawn is pretty good.

before i started on workling, i asked evan weaver of chow fame (twitter now) what he thought. this is his what he said about spawn.

PLAY/TYPE

BackgroundJob

new kid on the block. very nice take on things.

PLAY/TYPE

1 script/plugin install \ 2 git://github.com/purzelrakete/workling.git 3 4 ./script/plugin install \ 5 http://codeforpeople.rubyforge.org/svn/rails/plugins/bj 6 7 ./script/bj setup

lets start over, workling + bj. don’t need to do anything else, since bj is automatically detected.

PLAY/TYPE

1 Workling::Remote.dispatcher = 2 Workling::Remote::Runners::BackgroundjobRunner.new 3

however, the workling runner can also be set manually like this, inside of environment.rb or under config/initializers. this is being done automatically for you.

PLAY/TYPE

Milk it!

PLAY/TYPE

Why the lag?

i will explain... first of all, what is backgroundjob.

PLAY/TYPE

next slide: installing. already did this.

PLAY/TYPE

• Written by Ara T. Howard (codeforpeople)

next slide: installing. already did this.

PLAY/TYPE

• Written by Ara T. Howard (codeforpeople) • Sponsored by Engineyard

next slide: installing. already did this.

PLAY/TYPE

• Written by Ara T. Howard (codeforpeople) • Sponsored by Engineyard • Lightweight, persistent.

next slide: installing. already did this.

PLAY/TYPE

1 ./script/plugin install \ 2 http://codeforpeople.rubyforge.org/svn/rails/plugins/bj 3 4 ./script/bj setup

1 create_table :bj_config do |t| 2 t.column "command" , :text 3 t.column "state" , :text 4 t.column "priority" , :integer 5 t.column "tag" , :text 6 t.column "is_restartable" , :integer 7 t.column "submitter" , :text 8 t.column "runner" , :text 9 t.column "pid" , :integer 10 t.column "submitted_at" , :datetime 11 t.column "started_at" , :datetime 12 t.column "finished_at" , :datetime 13 t.column "env" , :text 14 t.column "stdin" , :text 15 t.column "stdout" , :text 16 t.column "stderr" , :text 17 t.column "exit_status" , :integer 18 end setup is running this migration.

PLAY/TYPE

1 job = Bj.submit 'cat /etc/password' 2 Bj.table.job.find(:all) # jobs table

PLAY/TYPE

1 if(job.finished) ... t.column t.column t.column t.column t.column t.column

"pid" "finished_at" "stdin" "stdout" "stderr" "exit_status"

, , , , , ,

:integer :datetime :text :text :text :integer

If you want something back... these are some useful columns in the db. they are available on the job object, too.

PLAY/TYPE

workling + bj inherits these traits.

PLAY/TYPE

• Warmup speed: load Rails 1x / Request

workling + bj inherits these traits.

PLAY/TYPE

• Warmup speed: load Rails 1x / Request • Memory: copy of Rails / Request. No leaks.

workling + bj inherits these traits.

PLAY/TYPE

• Warmup speed: load Rails 1x / Request • Memory: copy of Rails / Request. No leaks. • Kill scenario - Persistent over DB

workling + bj inherits these traits.

PLAY/TYPE

• Warmup speed: load Rails 1x / Request • Memory: copy of Rails / Request. No leaks. • Kill scenario - Persistent over DB • Jobs runner process manages itself

workling + bj inherits these traits.

PLAY/TYPE

• Warmup speed: load Rails 1x / Request • Memory: copy of Rails / Request. No leaks. • Kill scenario - Persistent over DB • Jobs runner process manages itself • Runner can be on another machine

workling + bj inherits these traits.

PLAY/TYPE

howz it work? this is why the moo came later than with spawn.

PLAY/TYPE

• Starts a thread for each job

howz it work? this is why the moo came later than with spawn.

PLAY/TYPE

• Starts a thread for each job • The thread invokes a new OS process

howz it work? this is why the moo came later than with spawn.

PLAY/TYPE

• Starts a thread for each job • The thread invokes a new OS process • ./script/runner loads rails

howz it work? this is why the moo came later than with spawn.

PLAY/TYPE

• Starts a thread for each job • The thread invokes a new OS process • ./script/runner loads rails • Results written to DB

howz it work? this is why the moo came later than with spawn.

PLAY/TYPE

• Starts a thread for each job • The thread invokes a new OS process • ./script/runner loads rails • Results written to DB • Client side gets results from DB

howz it work? this is why the moo came later than with spawn.

PLAY/TYPE

Added Bj Runner to Workling like this...

Added the BJ runner yesterday. here’s how it was done...

3 module Workling 4 module Remote 5 module Runners 6 class BackgroundjobRunner < Workling::Remote::Runners::Base 7 cattr_accessor :routing 8 9 def initialize 10 BackgroundjobRunner.routing = 11 Workling::Starling::Routing::ClassAndMethodRouting.new 12 end 13 14 def run(clazz, method, options = {}) 15 stdin = @@routing.queue_for(clazz, method) + 16 " " + 17 options.to_xml(:indent => 0, :skip_instruct => true) 18 19 Bj.submit "./script/runner ./script/bj_invoker.rb", 20 :stdin => stdin 21 22 return nil # that means nothing! 23 end 24 end 25 end 26 end 27 end explain what’s going on.

1 2 3 4 5 6 7 8

@routing = Workling::Starling::Routing::ClassAndMethodRouting.new unnormalized = REXML::Text::unnormalize(STDIN.read) message, command, args = *unnormalized.match(/(^[^ ]*) (.*)/) options = Hash.from_xml(args)["hash"] if workling = @routing[command] workling.send @routing.method_name(command), options.symbolize_keys end

PLAY/TYPE

Starling

PLAY/TYPE

1 2 3 4 5 6

gem sources -a http://gems.github.com/ sudo gem install starling-starling sudo gem install fiveruns-memcache-client script/plugin install \ git://github.com/purzelrakete/workling.git

add github to your sources if you havent already done so. explain fiveruns client.

PLAY/TYPE

1 mkdir /var/spool/starling 2 sudo starling -d 3 script/workling_starling_client start

need 2 processes running. 1: starling. 2: workling starling client.

PLAY/TYPE

1 Workling::Remote.dispatcher = 2 Workling::Remote::Runners::StarlingRunner.new

PLAY/TYPE

Milk it already...

Starling

PLAY/TYPE

lightweight queue that speaks memcached. developed at twitter by blaine cook 2 make twitter arch more msg-oriented.

PLAY/TYPE

4 5 6 7 8 9 10 11 12 13

# Put messages onto a queue: require 'memcache' starling = MemCache.new('localhost:22122') starling.set('my_queue', 1) # Get messages from the queue: require 'memcache' starling = MemCache.new('localhost:22122') loop { puts starling.get('my_queue') }

Memcache Client

PLAY/TYPE

Memcache Client

PLAY/TYPE

• Errors in Memcache Client (Robot Co-Op 1.5.0)

Memcache Client

PLAY/TYPE

• Errors in Memcache Client (Robot Co-Op 1.5.0)

• Solution: http://github.com/fiveruns/

memcache-client/tree/mastermemcacheclient/tree/master

PLAY/TYPE

workling + bj inherits these traits.

PLAY/TYPE

• Warmup speed: very fast.

workling + bj inherits these traits.

PLAY/TYPE

• Warmup speed: very fast. • Memory low, unless you’re leaking. Use

God to monitor / restart your workers.

workling + bj inherits these traits.

PLAY/TYPE

• Warmup speed: very fast. • Memory low, unless you’re leaking. Use

God to monitor / restart your workers.

• Kill scenario - Persistent over Starling

workling + bj inherits these traits.

PLAY/TYPE

• Warmup speed: very fast. • Memory low, unless you’re leaking. Use

God to monitor / restart your workers.

• Kill scenario - Persistent over Starling • Need to manage processes

workling + bj inherits these traits.



PLAY/TYPE

... The main things lacking in Starling are non-destructive reads (transactions), and speed.

twitter moving away from starling. putting msgs back onto queue not possible after kill/ crash.



PLAY/TYPE

... The main things lacking in Starling are non-destructive reads (transactions), and speed.

• Transactions. Imagine Starling is killed just

after reading a msg off a queue... not reliable. Doesnt map nicely onto memcache

twitter moving away from starling. putting msgs back onto queue not possible after kill/ crash.



PLAY/TYPE

... The main things lacking in Starling are non-destructive reads (transactions), and speed.

• Transactions. Imagine Starling is killed just

after reading a msg off a queue... not reliable. Doesnt map nicely onto memcache



It can take 20 minutes to play back a Starling journal after a crash on a very powerful machine. In production, this is about 19.5 minutes too many.

twitter moving away from starling. putting msgs back onto queue not possible after kill/ crash.

apparently stable, millions of messages / day with workling + starling. we are using starling at play/type and for us, it’s fine. but if replay for huge traffic / destructive reads are an issue, starling isn’t for you.

PLAY/TYPE

TODOs

workling is up on github. fork it! here’s what needs to be done, come join the project.

PLAY/TYPE

MemcachelikeRunner

PLAY/TYPE

MemcachelikeRunner

take the StarlingRunner and refactor it to be generic for all Queue Systems that imitate the memcache api. once this is done, we’ll be able to plug in the following... sparrow + workling running out there, no code unfortunately.

PLAY/TYPE

MemcachelikeRunner • Sparrow (“a really fast lightweight queue

written in Ruby that speaks memcache. “)

take the StarlingRunner and refactor it to be generic for all Queue Systems that imitate the memcache api. once this is done, we’ll be able to plug in the following... sparrow + workling running out there, no code unfortunately.

PLAY/TYPE

MemcachelikeRunner • Sparrow (“a really fast lightweight queue

written in Ruby that speaks memcache. “)

• RudeQ (DB based, no process for queue)

take the StarlingRunner and refactor it to be generic for all Queue Systems that imitate the memcache api. once this is done, we’ll be able to plug in the following... sparrow + workling running out there, no code unfortunately.

BeanstalkdRunner

might be possible to run this with a MemcachelikeRunner.

PLAY/TYPE

BeanstalkdRunner • Fast non persistent Queue written in C.

might be possible to run this with a MemcachelikeRunner.

PLAY/TYPE

BeanstalkdRunner • Fast non persistent Queue written in C. • Written for “Causes” on Facebook

might be possible to run this with a MemcachelikeRunner.

PLAY/TYPE

PLAY/TYPE

AMPQRunner

PLAY/TYPE

BackgroudndRB

heavyweight of backgrounding, oldest solution. lots of people using this.

FUD?



PLAY/TYPE

I wish, people will check their facts before making any claims, I am kinda getting tired of fighting this FUD within community. There are few outstanding issues, but BackgrounDRb supports many features that other similar alternatives doesn’t offer. And I am working on it.

- Hemant

backgroundrb comes with emotional baggage, for me. who’s running backgroundrb in the room, hands up? who has problems with it? who has NO problems?

PLAY/TYPE

BackgroundRB

PLAY/TYPE

BackgroundRB • As of version1.0.3 - complete rewrite with Packet, no DRB code in there anymore.

PLAY/TYPE



Packet is a network programming library in the spirit of EventMachine and yet it has nice functionality of letting you attach callbacks to workers running in separate process. It can even let you invoke callbacks running on worker in different machine and stuff like that. When I took over project it was based on DRb, but since then I have removed DRb and BackgrounDRb is 100% based on evented model of network programming.

- Hemant

my personal impression: still heavy. waiting for somebody to try integrating it into workling, no personal need.

PLAY/TYPE

Okay, but what about Workling Status and Return?

have a real world examle. old school, circa Feb. 2008 social network imports over gmail scraping... need this out of the request, but the response has to be shown, too.

PLAY/TYPE

1 class NetworkWorker < Workling::Base 2 def search(options) 3 accounts = options[:accounts] 4 uid = options[:uid] 5 6 accounts.map do |network| 7 Blackbook.get \ 8 :username => network[:username], 9 :password => network[:password]) 10 end 11 12 Workling::Return::Store.set(uid, accounts) 13 end 14 end

explain how this works - scraping gmail. return store: again, using memcache api.

PLAY/TYPE

1 def poll 2 @results = Workling::Return::Store.get \ 3 params[:workling_uid] 4 5 # TODO: handle no results, results 6 # and results with errors 7 end

PLAY/TYPE

Rany Keddo [email protected]

Questions? Lunch!

Related Documents


More Documents from ""