The Snowflake Number

  • November 2019
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View The Snowflake Number as PDF for free.

More details

  • Words: 1,818
  • Pages: 3
The Snowflake Number ∗



Erik Duval, Katrien Verbert , Xavier Ochoa , Wayne Hodgins

ABSTRACT This is a paper about mass hyper-personalization; more specifically: about how to measure personalization in web based systems and beyond, using a series of metrics that we call ’snowflake numbers’.

1.

THE SNOWFLAKE EFFECT

Many of the more exciting and successful current web applications rely to a great degree on the personalized user experiences they enable. Typical examples include amazon.com that presents itself as an individualized book store with personalized recommendations, web based radio stations like last.fm or pandora.com that take into account the personal preferences and interest of the listener, or social networking sites like facebook.com that support the interactions between a user and his network of friends. We refer to this trend of mass hyper-personalization, which can also be observed in non-web based systems like clothing, food or travel, as ”The Snowflake Effect” [3]. The name derives from the notion that, just like every snowflake in a snowstorm is unique, we are all unique individuals, with our specific characteristics and interests. The better a web application adapts to these characteristics and interests, the more relevant and useful it is. We are interested in mass personalization where we can achieve this effect at very large scale (for everyone, all the time) and in hyper-personalization because we believe that it is important to push the boundaries beyond simple changes in generic templates. We believe that it is important to better understand how the Snowflake Effect can be put to good use, for individuals and society as a whole alike. Such understanding requires a more precise way to measure the characteristics that make ∗Dept. Computerwetenschappen, Katholieke Universiteit Leuven, Belgium †Escuela Superior Polit´ecnica del Litoral, Ecuador ‡Autodesk, Inc., USA



us really unique. This paper introduces a set of metrics that enable such precise measurements. It generalizes our earlier work on learnometrics that considered the field of technology enhanced learning [5] and on a quantitative analysis of usergenerated content on the Web [4].

2.

SNOWFLAKE NUMBER

A simple example, consider all the playlists of songs - the iTunes Music Store lists 1.421.247 public iMixes at this moment, and almost 7 million votes on those playlists. Playlists (public as iMixes or private) are unique for many iTunes users, but maybe not for all? Or consider someone’s listening history: probably, if you only consider one song, then that is not very unique for most of us: there are probably other people who listened to this song too. Take a set of 500 songs that someone has listened to: there may still be other people that listened to these songs. But is there a number n that makes the listener unique, in that there would be no other person that listened to the same n songs? The Snowflake Number for a person then is the minimum number of items that make him or her unique in a given application. This can take several forms: bookmarks or tags on delicious.com; tags on last.fm or bibsonomy.org; songs played on last.fm or pandora.com; books bought on amazon.com or songs bought on the iTunes Music Store; slides favoured on slideshare.com or video favoured on youtube.com, etc. We can apply a similar point of view to the non-digital world and consider groceries, food and other items in a shopping cart or refrigerator, a meal or cupboards; skills, knowledge and abilities of individuals that make them the right or unique person best for a job or role on a project team, parts of an assembly of a machine, etc. For instance, if a reader has only bought very popular books on amazon, then any book he has bought will have been bought by someone else as well. Therefor, that choice does not make him unique and his snowflake number is greater than one. If any combination of two books that he has bought, has also been bought by another user, then his snowflake number will be higher than two. Imagine that all combinations of six books bought by this consumer were also bought by at least one other amazon customer and that no other user has bought a particular set of seven books that this reader has bought, then his snowflake number is seven. Intuitively, someone with a more mainstream taste will have a higher snowflake number, whereas someone with a more

exotic taste will have a lower one. More formally, for a given set of items I, where Ii is the subset of items related to a particular user i, the Snowflake Number si for that user i is n if there are no users with whom he shares n items, but there is at least one other user j with whom user i shares n − 1 items:

features: for instance, they will be the first, and therefor the only ones, to have introduced a tag. As a system becomes more popular, the snowflake numbers will start to rise: idiosyncratic tags aside, it is probably becoming quite difficult to find a tag that hasn’t been entered yet in systems like delicious.com. • Applications of the Snowflake Number could be quite diverse: for instance, in some contexts, one explicitly wants to avoid crossing the Snowflake Number: k-anonymity is such an approach where the general idea is to have duplicates in any sequence of data so that the data sequence cannot point to one single person.

∃j : card(Ii ∩ Ij ) = n − 1 ∧ @k : card(Ii ∩ Ik ) = n Of course, there may be users i whose set Ii is subsumed in that of another user k. For those users, we define the Snowflake Number as infinite ∞: this is somewhat arbitrary, but ensures that ”users with a lower snowflake number are more unique”. Indeed, if two users have exactly the same taste, they are not unique at all.

• If we define the Snowflake Number over a graph, with items and users represented by nodes and edges that connect users with their items, then there are graph characteristics that we can relate to the Snowflake Number [2]. In that approach, we can also fold the graph over the items to study how the Snowflake Number diffuses over the graph.

As a simple example, on facebook.com, the snowflake number of the first author of this paper on facebook is 2 if we consider friends as items: Erik is a friend of Leo and Wayne and there is nobody else in the facebook universe who is a friend of both Leo and Wayne.

• People with a high snowflake number contribute little new information. They are not really unique and do not really add any interesting connections. People with a low snowflake number contribute unique connections between items and therefor make the underlying graph more connected. They act somewhat as hubs in the underlying network [2]. Of course, this is not a value judgment on the people involved - indeed, two ’soul brothers’ have by definition an infinite snowflake number - but an evaluation about the connections that they, and they alone, add to the network.

The snowflake number also applies to groups, when we want to determine the specific characteristics of a group. Moreover, we find many examples of the snowflake effect outside the realm of web based systems in a strict sense. For instance, consider the votes for the EuroVision Song Contest Finals in 2008 [1]: Switzerland was the only country that voted for Germany, together with Bulgaria. Switzerland also voted for Albania, whereas Bulgaria didn’t. Hence, the Snowflake Number for Switzerland is two. On the other hand, the Snowflake Number for Russia is four, as it voted for Armenia, Croatia, Georgia and Serbia, and no other country voted for those four countries; moreover, for every other subset of three countries that Russia voted for, there is another country that also voted for those three countries.

• There are certainly other possible metrics. As a simple example, we could define the snowflake number for a a ∩Ib ) . In pair of users a and b as s(a, b) = 1 − card(I card(Ia ) that case, we could define the snowflake number of a P N

3.

TOWARDS SNOWFLAKE NUMBER RESEARCH

In this paper, we just touch upon a rather large theme. There are many variations on the snowflake number idea.

• One can either consider the ordered or unordered list of items, for instance in time: the ordered Snowflake Number of web page visits would consider how many people visited the same n web pages as their most recent pages. That Snowflake Number will probably be much lower than the unordered version that considers all the web pages one has ever visited. • It will be interesting to investigate how the snowflake number is distributed over a community of users: will there be the usual Heavy-Tailed distribution with the well-known long tail effect? Or is the distribution of snowflake numbers more akin to a Gaussian distribution? • Another interesting theme will be to study the evolution of the Snowflake Number in time: in the initialization phase of an application, most users will have a snowflake number of one as they will have unique

s(a,x )

i user a from a set of N users as s(a) = i=1 . N −1 This would mean that more unique users get a higher snowflake number, the highest value is 1 and the lowest value is 0.

• Most importantly, we want to extend the scope of what we try to measure by also explicitly addressing in the snowflake number the uniqueness of the situation, environment, context, etc. This requires that we add an adequate way to take relevancy of connections between users and items into account.

4.

CONCLUSION

Studying characteristics of the snowflake number will enable us to better understand what it means to be ”unique” in a particular community. Such research will also help us understand how we can best support the unique characteristics, requirements and aims of the user community. That is an essential characteristic of successful and relevant web applications and therefor we believe that the ideas presented above can contribute significantly to web science.

Acknowledgements We acknowledge the comments and feedback from Martin Wolpers, Daan Bohnen, ”Tom” and Thomas Broeker on earlier versions of this paper that circulated in blogosphere.

5.

REFERENCES

[1] Eurovision Song Contest 2008. http://en.wikipedia.org/wiki/Eurovision Song Contest 2008, 2008. [2] A.-L. Barabasi. Linked: How Everything Is Connected to Everything Else and What It Means. Plume, 2003. [3] E. Duval. Snowflake effect voor leren. In WTR Trendrapport 2008: ICT – fundament voor vernieuwing, pages 32–36. SURF, 2008. [4] X. Ochoa and E. Duval. Quantitative analysis of user-generated content on the web. In Proceedings of WebEvolve2008: Web Science Workshop at WWW2008, 2008. [5] X. Ochoa and E. Duval. Relevance ranking metrics for learning objects. IEEE Transactions on Learning Technologies, 1(1):34–48, 2008.

Related Documents

The Snowflake Number
November 2019 11
Snowflake
November 2019 6
Snowflake
August 2019 22
The Koch Snowflake
June 2020 4
Snowflake Template
June 2020 3
The Number
October 2019 24