?
Who is this guy? I didn’t re-implement Ruby in Erlang, or write a web server in assembly that’s 10x faster than Apache, or start a successful company. At the end of the day, I’m just a guy who makes web sites. Jan Tik: http://flickr.com/photos/jantik/6708183/
Testing is overrated Luke Francl
So I had to pick something controversial. Don’t get me wrong, testing is great. Never forget the first time I saved myself from committing buggy code with my own unit test. And once written, programmatic tests provide a nice regression framework that helps catch future errors and makes refactoring possible. But I think it’s overemphasized to the detriment of other defect-detection techniques.
story runner
fuzz
RSpec stub random
fixtures
object mother
unit tests
Shoulda
miniunit
Watir
Mocha
mock
rcov
green bar
Test::Unit
behaviors
BDD TDD coverage
Selenium
test-along
test-first
autotest
test cases
We as developers hear, read, and write a lot about testing. Why so much? I think it’s because it’s something we, as programmers, can control. We usually can’t hire QA testers. It may be a struggle to institute code review in our company. We may not have the authority to set up usability tests. But we can write code! And so we play to our strength -- coding -- and try to code our way out of buggy software.
All you need is tests
In the worst case, this leads to a mindset that developer testing is all you need, and if we can only get to 100% code coverage, we’ll be bug free. You’ve got people having Rcov length contests. I read a blog entry just last week by a guy who was suggesting the “End of Bugs” due to behavior driven development and 100% rcov code coverage. (I didn’t mention his name in my talk, but this was Adam Wiggins from Heroku: http:// adam.blog.heroku.com/past/2008/7/6/the_end_of_bugs/ I didn’t know he’d be at RubyFringe, but he came up to me later and was like “Hi, I’m Adam. You called me an idiot.” Sorry Adam! Seriously, he was really nice about it. We had a good talk about testing.)
Extensive research
So I’ve been doing extensive research about the benefits of developer testing... - Code Complete 2nd, Steve McConnell - Facts and Fallacies of Software Engineering, Robert L. Glass And I’ve come to the conclusion that there are some significant weaknesses of developer testing. audreyjm529: http://flickr.com/photos/audreyjm529/678762774/
testing is hard Testing is hard, and most developers aren’t very good at it. The reason is that most developers tend to write “clean” tests that verify the normal path of program execution, instead of “dirty” tests that verify error states or boundry conditions (which is where most errors lie). McConnell reports: Immature: 5 clean for every 1 dirty. Mature testing org: 5 dirty for 1 clean. Not less clean tests -- 25x more dirty tests! aussiegall: http://flickr.com/photos/aussiegall/2238073479/
total_withholdings = 0 employees.each do |employee| if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT government_retirement = compute_government_retirement(employee) end company_retirement = 0 if employee.wants_retirement && eligible_for_retirement(employee) company_retirement = get_retirement(employee) end gross_pay = compute_gross_pay(employee) personal_retirement = 0 if eligible_for_personal_retirement(employee) personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay) end withholding = compute_withholding(employee) net_pay = gross_pay - withholding - company_retirement government_retirement - personal_retirement pay_employee(employee, net_pay) total_withholdings = total_withholdings + withholding total_government_retirement = total_government_retirement + government_retirement total_retirement = total_retirement + company_retirement end save_pay_records(total_withholdings, total_government_retirement, total_retirement)
Let’s take a look at an example (see the handout for a version you can read). This is taken from CC2e and I have translated it to Ruby. How many test cases do you think it should take to fully test this code? A simple “clean” test with all booleans true will give you 100% rcov code coverage. Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions, error states...Full list of test cases in the hand out
total_withholdings = 0 employees.each do |employee| if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT government_retirement = compute_government_retirement(employee) end
1
company_retirement = 0 if employee.wants_retirement && eligible_for_retirement(employee) company_retirement = get_retirement(employee) end gross_pay = compute_gross_pay(employee) personal_retirement = 0 if eligible_for_personal_retirement(employee) personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay) end withholding = compute_withholding(employee) net_pay = gross_pay - withholding - company_retirement government_retirement - personal_retirement pay_employee(employee, net_pay) total_withholdings = total_withholdings + withholding total_government_retirement = total_government_retirement + government_retirement total_retirement = total_retirement + company_retirement end save_pay_records(total_withholdings, total_government_retirement, total_retirement)
Let’s take a look at an example (see the handout for a version you can read). This is taken from CC2e and I have translated it to Ruby. How many test cases do you think it should take to fully test this code? A simple “clean” test with all booleans true will give you 100% rcov code coverage. Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions, error states...Full list of test cases in the hand out
total_withholdings = 0 employees.each do |employee| if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT government_retirement = compute_government_retirement(employee) end
1
company_retirement = 0 if employee.wants_retirement && eligible_for_retirement(employee) company_retirement = get_retirement(employee) end gross_pay = compute_gross_pay(employee) personal_retirement = 0 if eligible_for_personal_retirement(employee) personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay) end withholding = compute_withholding(employee) net_pay = gross_pay - withholding - company_retirement government_retirement - personal_retirement pay_employee(employee, net_pay) total_withholdings = total_withholdings + withholding total_government_retirement = total_government_retirement + government_retirement total_retirement = total_retirement + company_retirement end save_pay_records(total_withholdings, total_government_retirement, total_retirement)
Let’s take a look at an example (see the handout for a version you can read). This is taken from CC2e and I have translated it to Ruby. How many test cases do you think it should take to fully test this code? A simple “clean” test with all booleans true will give you 100% rcov code coverage. Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions, error states...Full list of test cases in the hand out
total_withholdings = 0 employees.each do |employee| if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT government_retirement = compute_government_retirement(employee) end
1
company_retirement = 0 if employee.wants_retirement && eligible_for_retirement(employee) company_retirement = get_retirement(employee) end gross_pay = compute_gross_pay(employee) personal_retirement = 0 if eligible_for_personal_retirement(employee) personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay) end withholding = compute_withholding(employee) net_pay = gross_pay - withholding - company_retirement government_retirement - personal_retirement pay_employee(employee, net_pay) total_withholdings = total_withholdings + withholding total_government_retirement = total_government_retirement + government_retirement total_retirement = total_retirement + company_retirement end save_pay_records(total_withholdings, total_government_retirement, total_retirement)
Let’s take a look at an example (see the handout for a version you can read). This is taken from CC2e and I have translated it to Ruby. How many test cases do you think it should take to fully test this code? A simple “clean” test with all booleans true will give you 100% rcov code coverage. Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions, error states...Full list of test cases in the hand out
total_withholdings = 0 employees.each do |employee| if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT government_retirement = compute_government_retirement(employee) end
1
company_retirement = 0 if employee.wants_retirement && eligible_for_retirement(employee) company_retirement = get_retirement(employee) end gross_pay = compute_gross_pay(employee) personal_retirement = 0 if eligible_for_personal_retirement(employee) personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay) end withholding = compute_withholding(employee) net_pay = gross_pay - withholding - company_retirement government_retirement - personal_retirement pay_employee(employee, net_pay) total_withholdings = total_withholdings + withholding total_government_retirement = total_government_retirement + government_retirement total_retirement = total_retirement + company_retirement end save_pay_records(total_withholdings, total_government_retirement, total_retirement)
Let’s take a look at an example (see the handout for a version you can read). This is taken from CC2e and I have translated it to Ruby. How many test cases do you think it should take to fully test this code? A simple “clean” test with all booleans true will give you 100% rcov code coverage. Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions, error states...Full list of test cases in the hand out
total_withholdings = 0 employees.each do |employee| if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT government_retirement = compute_government_retirement(employee) end
1
company_retirement = 0 if employee.wants_retirement && eligible_for_retirement(employee) company_retirement = get_retirement(employee) end gross_pay = compute_gross_pay(employee) personal_retirement = 0 if eligible_for_personal_retirement(employee) personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay) end withholding = compute_withholding(employee) net_pay = gross_pay - withholding - company_retirement government_retirement - personal_retirement pay_employee(employee, net_pay) total_withholdings = total_withholdings + withholding total_government_retirement = total_government_retirement + government_retirement total_retirement = total_retirement + company_retirement end save_pay_records(total_withholdings, total_government_retirement, total_retirement)
Let’s take a look at an example (see the handout for a version you can read). This is taken from CC2e and I have translated it to Ruby. How many test cases do you think it should take to fully test this code? A simple “clean” test with all booleans true will give you 100% rcov code coverage. Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions, error states...Full list of test cases in the hand out
total_withholdings = 0 employees.each do |employee| if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT government_retirement = compute_government_retirement(employee) end
1
company_retirement = 0 if employee.wants_retirement && eligible_for_retirement(employee) company_retirement = get_retirement(employee) end gross_pay = compute_gross_pay(employee) personal_retirement = 0 if eligible_for_personal_retirement(employee) personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay) end withholding = compute_withholding(employee) net_pay = gross_pay - withholding - company_retirement government_retirement - personal_retirement pay_employee(employee, net_pay) total_withholdings = total_withholdings + withholding total_government_retirement = total_government_retirement + government_retirement total_retirement = total_retirement + company_retirement end save_pay_records(total_withholdings, total_government_retirement, total_retirement)
Let’s take a look at an example (see the handout for a version you can read). This is taken from CC2e and I have translated it to Ruby. How many test cases do you think it should take to fully test this code? A simple “clean” test with all booleans true will give you 100% rcov code coverage. Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions, error states...Full list of test cases in the hand out
total_withholdings = 0 employees.each do |employee| if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT government_retirement = compute_government_retirement(employee) end
1
company_retirement = 0 if employee.wants_retirement && eligible_for_retirement(employee) company_retirement = get_retirement(employee) end gross_pay = compute_gross_pay(employee) personal_retirement = 0 if eligible_for_personal_retirement(employee) personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay) end withholding = compute_withholding(employee) net_pay = gross_pay - withholding - company_retirement government_retirement - personal_retirement pay_employee(employee, net_pay) total_withholdings = total_withholdings + withholding total_government_retirement = total_government_retirement + government_retirement total_retirement = total_retirement + company_retirement end save_pay_records(total_withholdings, total_government_retirement, total_retirement)
Let’s take a look at an example (see the handout for a version you can read). This is taken from CC2e and I have translated it to Ruby. How many test cases do you think it should take to fully test this code? A simple “clean” test with all booleans true will give you 100% rcov code coverage. Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions, error states...Full list of test cases in the hand out
total_withholdings = 0 employees.each do |employee| if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT government_retirement = compute_government_retirement(employee) end
16
company_retirement = 0 if employee.wants_retirement && eligible_for_retirement(employee) company_retirement = get_retirement(employee) end gross_pay = compute_gross_pay(employee) personal_retirement = 0 if eligible_for_personal_retirement(employee) personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay) end withholding = compute_withholding(employee) net_pay = gross_pay - withholding - company_retirement government_retirement - personal_retirement pay_employee(employee, net_pay) total_withholdings = total_withholdings + withholding total_government_retirement = total_government_retirement + government_retirement total_retirement = total_retirement + company_retirement end save_pay_records(total_withholdings, total_government_retirement, total_retirement)
Let’s take a look at an example (see the handout for a version you can read). This is taken from CC2e and I have translated it to Ruby. How many test cases do you think it should take to fully test this code? A simple “clean” test with all booleans true will give you 100% rcov code coverage. Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions, error states...Full list of test cases in the hand out
total_withholdings = 0 employees.each do |employee| if employee.government_retirement_withheld < MAX_GOVT_RETIREMENT government_retirement = compute_government_retirement(employee) end
17 16
company_retirement = 0 if employee.wants_retirement && eligible_for_retirement(employee) company_retirement = get_retirement(employee) end gross_pay = compute_gross_pay(employee) personal_retirement = 0 if eligible_for_personal_retirement(employee) personal_retirement = personal_retirement_contribution(employee, company_retirement, gross_pay) end withholding = compute_withholding(employee) net_pay = gross_pay - withholding - company_retirement government_retirement - personal_retirement pay_employee(employee, net_pay) total_withholdings = total_withholdings + withholding total_government_retirement = total_government_retirement + government_retirement total_retirement = total_retirement + company_retirement end save_pay_records(total_withholdings, total_government_retirement, total_retirement)
Let’s take a look at an example (see the handout for a version you can read). This is taken from CC2e and I have translated it to Ruby. How many test cases do you think it should take to fully test this code? A simple “clean” test with all booleans true will give you 100% rcov code coverage. Structured basis testing. Count 1 for the method, 1 for each, 1 for if, 1 for boolean = 6 test cases. McConnell ultimately lists 17 test cases. Logic combinations, boundary conditions, error states...Full list of test cases in the hand out
Code Coverage
code coverage
Dangers of relying on code coverage. Led my boss to write “the red lines are the valuable ones”. Rcov documentation is very clear about this -- if you read it. But people boil down something very complicated (their tests) to this one number (code coverage) and then compare. Makes no sense. Test-to-code ratio. Could there possibly be a more useless number? Unless it’s 1:0, it tells you just about nothing.
Code Coverage
code coverage
Dangers of relying on code coverage. Led my boss to write “the red lines are the valuable ones”. Rcov documentation is very clear about this -- if you read it. But people boil down something very complicated (their tests) to this one number (code coverage) and then compare. Makes no sense. Test-to-code ratio. Could there possibly be a more useless number? Unless it’s 1:0, it tells you just about nothing.
Code Coverage
code coverage
Dangers of relying on code coverage. Led my boss to write “the red lines are the valuable ones”. Rcov documentation is very clear about this -- if you read it. But people boil down something very complicated (their tests) to this one number (code coverage) and then compare. Makes no sense. Test-to-code ratio. Could there possibly be a more useless number? Unless it’s 1:0, it tells you just about nothing.
def test_last_day_items_are_privacy_scoped_for_non_friends non_friend = create_user story = stories(:learning_no) story.published_at = 10.minutes.ago story.save! story = stories(:aaron_private_story) story.published_at = 5.minutes.ago story.save! items_for_non_friend = accounts(:quentin_and_aaron).last_day_items assert_privacy_status(items_for_non_friend, "Public") end
You can’t test what isn’t in the spec. Requirements errors are the most expensive to fix if they sneak into production. Story: Slantwise client wanted monthly billing. We thought “Basecamp”. What they really wanted: customer punches in how many users they want, for how many months, and is then billed all at once. Fortunately they are cheap to fix if caught in production. Iterative development.
You can’t test code that’s
not there
You can’t test what isn’t in the spec. Requirements errors are the most expensive to fix if they sneak into production. Story: Slantwise client wanted monthly billing. We thought “Basecamp”. What they really wanted: customer punches in how many users they want, for how many months, and is then billed all at once. Fortunately they are cheap to fix if caught in production. Iterative development.
Tests have bugs
Tests are code, code has bugs. Tests are just as likely to have bugs as the code they’re testing. jpctalbot: http://flickr.com/photos/laserstars/640499324/
def test_critical_functionality begin ... Bunch of stuff to exercise code ... # Commented out by Luke to fix test failure # assert "Some important assert", condition rescue # Don't let anything fail this test! end end
Sweet! 100% test coverage! So who tests the tests? I don’t think there’s a way to do this automatically. You need to review them by hand. Adapted from: http://thedailywtf.com/Comments/AddComment.aspx? ArticleId=5128&ReplyTo=138758&Quote=Y
Developer testing isn’t very good at finding defects
Flowizm: http://flickr.com/photos/flowizm/178152601/
Defect Detection Rates of Selected Techniques Unit testing
Code reviews
Code inspections
Prototyping System test
0%
25%
50%
75%
100%
Defection detection rates from Code Complete. Full table is in your handout. Unit test: 15-50% Informal code reviews: 20-35% Formal code inspections: 45-70% Modeling/prototyping: 35-80% System test (black box): 25-55% Note: First of all, unit testing isn’t all that great at finding defects. Formal code inspections can catch up to 70% of the defects. Note also the strength of prototyping, with up to 80%. I think this is what makes iterative development such a big win.
Manual testing Code reviews Unit tests
User testing
* Set overlap completely fabricated The interesting thing is that different defect detection techniques tend to find different types of defects.
Complements to developer testing
GAV01: http://flickr.com/photos/gavinatkinson/196048031/
Manual testing
And of course, there is manual QA. A good QA person is worth their weight in gold. I once worked with a guy who was an absolute machine at finding bugs, and he was really good at explaining how they happened and creating bug reports. You always end up doing some amount of manual testing. It makes sense to have testers to do this instead of making programmers do it. Story: how we do manual testing: QA person responsible for verifying fixes; also does exploratory, blackbox tests. Stuck in Customs: http://flickr.com/photos/stuckincustoms/858339201/
So if developers aren’t very good at testing, what are they good at? Criticizing other people’s code. http://www.osnews.com/story/19266/WTFs_m Informal “Code reviews” can find between 20-35% of all defects. Formal “code inspections” between 45-70%. The difference between formal and informal code reviews.
code review kitty is not pleased with your code
Sociological aspect to code reviews. Tell story of my first code review. Reviewee’s ego as well as code is on the line. http://flickr.com/photos/louse101/454412441/in/set-72157600062650522/
Growing better developers Aside: can code reviews help us become better developers? Skeptical of methodology. 10x developers will be successful no matter what methodology they use. So, can code reviews help us become better programmers? - reading code is the best way to learn. - constructive criticism from better programmers As a programmer who’s not young enough to know everything any more, I am hopeful.
Usability testing
I have been blown away by the problems we have found using usability testing.
The ultimate
You can have 150% code coverage and thousands of unit tests. Not one of them will tell you if your application sucks. Jeff Atwood calls usability problems The Ultimate Unit Test Failure. hans.gerwitz: http://flickr.com/photos/phobia/2308371224/
From Don’t Make Me Think by Steve Krug You may thing usability testing involves expensive labs with two-way mirrors and cameras everywhere. But usability testing testing doesn’t have to be expensive! It’s fun and cheap with Steve Krug’s techniques. We use $20 screen recording software and a USB microphone and pay participants about $50.
Don’t put all your eggs in one basket...
I’m not saying don’t write tests. I’m saying, don’t put all your eggs in one basket. Andrew Dowsett
...or you’ll end up as
roadkill
...or you’ll end up as roadkill.
Thanks!
(You can yell at me over drinks.)
Jan Tik
jpctalbot
audreyjm529
Flowizm
aussiegall
GAV01
Stuck in Customs hans.gerwitz
Andrew Dowsett Jan Tik: http://flickr.com/photos/jantik/6708183/ audreyjm529: http://flickr.com/photos/audreyjm529/678762774/ audreyjm529: http://flickr.com/photos/audreyjm529/678762774/ jpctalbot: http://flickr.com/photos/laserstars/640499324/ Flowizm: http://flickr.com/photos/flowizm/178152601/ GAV01: http://flickr.com/photos/gavinatkinson/196048031/ Stuck in Customs: http://flickr.com/photos/stuckincustoms/858339201/ hans.gerwitz: http://flickr.com/photos/phobia/2308371224/ Andrew Dowsett