Boston.rb: Collaborative Filtering

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Boston.rb: Collaborative Filtering as PDF for free.

More details

  • Words: 839
  • Pages: 50
Collaborative Filtering

Tyler McMullen

... which for the purposes of this talk means:

Recommendations

Netflix

Google Reader

Pandora

Last.fm

...and of course... Amazon

(shameless plug)

I like to think of it as a fill-in-blank puzzle.

Bob Suzie Joe

Item A Item B Item C 5 1 5 5 1 ? 1 5 1

Dataset

Dataset

Dataset

Correlate

Correlate

Correlate

Recommendations

Content Booster

Output

Data

Data

Data > Algorithms

Data

Amazon uses a simple item-to-item correlation system

Data

Amazon uses a simple item-to-item correlation system How can they get away with that? ~ 20 million items n million users

Data

If every user bought 200 items their user-item matrix would be 0.001% full

purchases ratings

Data

purchases ratings views shopping cart votes wishlists baby registry wedding registry tell-a-friend

Data

purchases ratings views shopping cart votes wishlists baby registry wedding registry tell-a-friend anything you can measure!

Data

Data

Data > Algorithms

more different data > more of the same data

Correlation

Correlation

Find patterns in the data sets

Correlation Pearson Singular Value Decomposition

Correlation Pearson Singular Value Decomposition

Kendall tau coefficient Spearman's rho point biserial correlation coefficient

Correlation

Word of Caution: Watch for O(n2) here

Recommendation

Recommendation

This is the part where we figure out what you'll like.

Recommendation So we have all these correlation matrices. One for each of the datasets that we correlated.

Bob Bob Suzie Joe

0.87 0.74

Suzie -0.74 -0.9

Joe 0.856 0.1

Recommendation So let's say we have a user named Fred...

Joe 0.9 Bob 0.75 Suzie 0.5

Recommendation

Joe

Joe 0.9 Bob 0.75 Suzie 0.5

Item A Item B

5 4

Bob Item B Item C

5 2

Suzie Item C Item A

2 2

Recommendation Joe Item A Item B

5 4

Bob Item B Item C

5 2

Suzie Item C Item A

Item A Joe – 5 Suzie – 2

Item B Joe – 4 Bob – 5

Item C 2 2

Bob – 2 Suzie – 2

Recommendation Item A Joe – 5 Suzie – 2

Item B Joe – 4 Bob – 5

Item C Bob – 2 Suzie – 2

Item A Item B Item C

3.93 4.45 2

Recommendation

Item A Item B Item C

3.93 4.45 2

Content Boosting

Content Boosting

Your users reveal their preferences in their actions.

Content Boosting

Your users reveal their preferences in their actions.

If I mark every horror movie in your system as a ”1”... I don't like horror movies.

Content Boosting

Your users reveal their preferences in their actions.

If I mark every horror movie in your system as a ”1”... I don't like horror movies. If I rate every Will Smith movie as ”5 stars”... I probably like Will Smith.

Content Boosting

All Items have properties.

Content Boosting

All Items have properties.

Movies have genres, actors, studio, locations, etc...

Content Boosting

All Items have properties.

Movies have genres, actors, studio, locations, etc... Comics have genres, writers, artists, publishers, etc...

Content Boosting

All Items have properties.

Movies have genres, actors, studio, locations, etc... Comics have genres, writers, artists, publishers, etc... Kittens have color, gender, breed, cute captions, etc...

Content Boosting I Am Legend

5

Action Will Smith

Cloverfield

4

Action No Will Smith

Independence Day

4

Action Will Smith

Sleepless in Seattle Romance No Will Smith

1

Content Boosting I Am Legend

5

Action Will Smith

Cloverfield

4

Action No Will Smith

Independence Day

4

Action Will Smith

Sleepless in Seattle Romance No Will Smith

1

So what do my preferences say about me?

Content Boosting I Am Legend

5

Action Will Smith

Cloverfield

My mean rating is 3.5, so...

4

Action No Will Smith

Independence Day

4

Action Will Smith

Sleepless in Seattle Romance No Will Smith

So what do my preferences say about me?

1

Content Boosting I Am Legend

5

Action Will Smith

Cloverfield

My mean rating is 3.5, so...

4

Action No Will Smith

Independence Day

4

Action Will Smith

Sleepless in Seattle Romance No Will Smith

So what do my preferences say about me?

1

Action: +0.8

Content Boosting I Am Legend

5

Action Will Smith

Cloverfield

My mean rating is 3.5, so...

4

Action No Will Smith

Independence Day

Romance No Will Smith

Action: +0.8 Romance: -2.5

4

Action Will Smith

Sleepless in Seattle

So what do my preferences say about me?

1

Content Boosting I Am Legend

5

Action Will Smith

Cloverfield

My mean rating is 3.5, so...

4

Action No Will Smith

Independence Day

Romance No Will Smith

Action: +0.8 Romance: -2.5

4

Action Will Smith

Sleepless in Seattle

So what do my preferences say about me?

1

Will Smith: +1

Content Boosting

Your recommendations are only as good as the amount and quality of your data.

Content Boosting

Your recommendations are only as good as the amount and quality of your data.

Content Boosting is thus especially useful if you have limited data.

Output

Output

I have nothing interesting to say about output...

Output

I have nothing interesting to say about output... Moving on.

Now let's look at some code.

http://github.com/tyler/collaborative_filter

Related Documents