Emerging quality Creating dynamic user and content profiles in online knowledge networks
Thieme Hennis 02-Sep-08
Type, theme, and length of project Classification: 1A This project can be classified as research treating ethical and societal aspects of concrete technological developments. A virtual identity has both psychological and economical significance. In my research, the motivation to share knowledge and help others is an important aspect of the way the virtual identity of an individual is built. It also presumes and supports other economical structures, such as a more flexible employer-employee relationship.
Theme: Virtual reality The Internet has become an essential element of many economies and societies. Similarly, we are intrinsically attached to and have become part of the Web (Kelly, 2005). The last few years, we have seen an enormous increase in people being active online; connecting, creating, sharing, and building up identities. Smart data-mining systems are able to create dynamic profiles of people and content representing expertise, relevance, and quality. Internet pioneer Wendy Hall describes it as follows; Every time you do something on the internet, it is effectively logged, building up this profile that is with you for your life…. We will be able to build software that can interpret that profile to help get the answer that you need in the context that you’re in (Smith, 2006).
Length of research: 4 years The author applies for a single Ph.D. research (length: 4 years).
Research team Quality is a socially defined concept. I will try to make it quantifiable by measuring certain use & user relationships within decentralized networks. At present, the research team consists of the following people; Professor Wim Veen – TU Delft – Wim has been involved in research into learning and innovation in education for many years. He developed a very relevant and useful model concerning networked learning. Alpha (sociological-educational theories) Dr. Jaco Appelman – TU Delft – Jaco has been involved in collaboration software research for many years. He will assist with methodological and content issues. Beta (collaborative software & systems engineering) Job Timmermans, M.Sc. – PEERS – Job is co-founder of PEERS, has finished master degrees in philosophy and systems engineering. His role in the project is to discuss the true practical application of the developed system. Alpha (philosophy of quality) & Beta (collaborative software & application interface)
Research description In decentralized (virtual) networks with tools and technologies that allow anyone to contribute anything, it is increasingly problematic to determine reliability of content and people online. The research I propose must bring forward rules and variables that can be used (by for example software engineers) to let quality and expertise emerge over time and be visible.
What represents quality? In the last decade, the internet has evolved into a platform allowing any person to participate and contribute. Various easy-to-use web technologies empower people to share interests and knowledge, search and structure content, and connect with friends and peers.
Web 2.0 represents a blurring of the boundaries between Web users and producers, consumption and participation, authority and amateurism, play and work, data and the network, reality and virtuality (Zimmer, 2008). With the increase of online participation, a number of issues have emerged regarding quality, authority, expertise, and trust (Keen, 2007). With organizations becoming more open and seeking ways to make use of the contributions of people around the world, these issues become even more prevalent (Abbott, 2000). As there are many new tools for publishing and creating new content, there are tools that are specifically made to search, filter, rate, evaluate and recommend content to people in certain contexts. Still, filtering through more and more resources hidden online or in internal networks, remains difficult (Benkler, 2006; Howe, 2006).
Finding the right resources and people Search algorithms of popular search engines focus on popularity (or authority) rather than on what is commonly regarded as quality. In this process, no human reviews are involved and thus created in a sub-optimal manner (Lewandowski & Höchstötter, 2008). Because similar search algorithms and ineffective content management systems are used within organizations, most of the time spent by knowledge workers is spent in recreating already existing (and in-house available) information; A lot of money and intellectual power is spent on reinventing the wheel and searching for knowledge. This is a huge problem for companies and a central challenge for KM research (Swaak, Ifamova, Kempen, & Graner, 2004). People define quality. Usually this involves relying on others, such as experts or people you trust. This should also be the case for the way search engines and content management systems determine quality. In specific; this means the inclusion of human reviews and other metadata generated by people (times used, favorited, tagged or recommended) in structuring and managing content. In doing that, quality is linked to context and more transparent for the user and related to certain context variables (which may be user input variables in search engines).
Standards A number of initiatives, such as PICS (Resnick & Miller, 1996; Armstrong, 1997) and Resource Profiles (Downes, 2005) propose protocols or frameworks that can be used to evaluate, rate, or structure online content. Many websites have implemented rating and reputation mechanisms to increase transparency and indicate trust in content and people. Still, a general standard for online content does not exist. Wikipedia co-founder Larry Sanger has recently called for a system for syndicating and rating online data, claiming it to be the obvious next step (and Big Idea) for the Internet. It will enable systems to weight data not just on Google-style PageRank algorithms, but also things like quality according to generally trusted sources; or quality according to your peer group; or quality according to academic and academic-endorsed sources; etc. (Sanger, 2008) What Sanger proposes, is a system that includes relevant information of a person with the rating in order to add context and enrich the information about pieces of content with relevant metadata (such as quality according to peer group, evaluations, usage). As the Web is populated with more data, it becomes easier to automatically mine these kinds of user and usage statistics about people and their behavior online, popularity and interest, friends and activities and turn into valuable metadata. For example, APML (Attention Profile Markup Language) and ULML (User Labor Markup
Language) intend to set standards for capturing and sharing information about people online. When you combine this people metadata with active feedback generated by users (through rating and evaluations), profiles of people and content can be made automatically (through use) that can be used to increase motivation to contribute and share, enhance flexibility for freelance workers and organizations, and improve efficiency in finding people and content (Choi, Kruk, Grzonkowski, Stankiewicz, Davis, & Breslin, 2006).
Hypothesis, research questions, instruments, and methodology Both user profile (expertise level & domain) and usage (number of views, clicks, ratings, recommendations, etc.) are relevant and should be utilized to determine quality and relevance of content. Furthermore, using this information for profiling the original contributor leads to a system of dynamic and up-to-date expertise profiles based on the value of contributions. The hypothesis I want to falsify is that in virtual knowledge networks, findable expert and content profiles can be made by analyzing how content is used, and by whom, and linking the results to the original contributor. The two fundamental assumptions that make up the hypothesis are: 1. 2.
User and usage information determine quality and domain of creations; and Quality of creations determine the creator’s expertise.
About these two important assumptions, a lot has been written and done. The recent increase of people being active online and sharing content allows for complex data retrieval and profiling algorithms for dynamically determining quality. How this translates into research is described in the next section.
Research questions and instruments As mentioned, the above assumptions are addressed by various reputation systems and rating, quality, and profiling mechanisms. I will first investigate the most important relationships, rules, and (upcoming) standards in the generation of metadata about quality, authority, and expertise by these (stand-alone) systems. Concurrently I will look into processes of knowledge workers using different types of publishing and rating tools, and find out the most important variables of quality in knowledge networks. These variables are then ordered along different levels: i.e. personal (expertise, competencies, passion, etc.), relational (quality of interactions), and informational (usefulness of content, reliability source). This first step of literature research and case-studies is inductive, and results in a model that will be validated by doing a large-scale survey. Through regression analysis personal biases will be filtered out and an empiric foundation will be created for the interpretatively developed model. Step
Research question(s) or description
Instrument / Method
Outcome
1.
CONTENT PROFILES: What are variables and (metadata) standards and initiatives for defining quality of content? User-driven: active rating and evaluation Machine-driven: measuring usage USER PROFILES: What are variables and (metadata) standards and initiatives for defining expertise of persons? User-driven: recommendations etc. Machine-driven: determination of authority (based on several factors) A first case study will provide insight into criteria, possibilities and constraints of using different tools. How can the standards and variables be measured, using existing tools?
Literature, deskresearch
Authoritative paper(s) about “Metadata standards and quality in decentralized online networks” AND/OR “Metadata standards, profiling mechanisms and authority/expertise protocols and rules in decentralized online networks”. Criteria, possibilities, and constraints & toolbox.
2.
3.
Idem
Case study: interview, survey, experiment
4.
Using the outcomes of the three steps, I will describe the most important variables and requirements for determining quality and expertise in online networks. How can user-driven and machine-driven metadata about quality of content translate to dynamic expertise profiles of content creators (or: How should content-profiles influence user-profiles?) How should the expertise-profiles influence content-profiles? Additionally, I will clarify the requirements for the case studies and the research that follows, to test the hypothesis. These requirements include instruments/technologies used, user-participation, size of network, and more. VALIDATION MODEL: Does the interrelation of contentCase study: Framework for measuring metadata and user-metadata in determining quality and interview, survey search quality within expertise improve finding of people and resources in organizations & Critical organizations? success factors for the model What are critical success factors? Describing the outcomes of the research. Report and functional design for the proposed system.
5.
6.
Timeline The steps in the above table are ordered chronologically. The timeline below describes the structure in more detail: 1. 2.
3.
4.
Year 1: Step 1, 2, 3 – Literature research, creating research framework and quality model and theory, conducting an exploratory case study, preparing further case studies and writing papers. Year 2: Step 4 & 5 – Developing and deploying the model in research communities and evaluation of model. More specifically; Describing how different tools are used to create and share information, and how these tools define quality/expertise. Evaluating and refining the model and theory. This means describing (i) how usage (popularity, rating, reviewing, etc.) and users (experts versus laymen) together determine quality of content, and (ii) how this translates to the expertise or authority of the content creator. Year 3: Step 5 – Similar to the second, but with more focus on converging research results in order to create an improved and more abstract model for quality and expertise in online knowledge networks. The two main requirements are that the model functions as desired and that it can be used as a basis for creating metadata generating software. Year 4: Step 6 – Describing and finalizing my research: make it useful for practical solutions.
Methodology; Grounded Theory Because I will develop a new theory about quality based on existing literature and research, the chosen methodology is grounded theory. Grounded theory can be described as a research method in which the theory is developed from the data, rather than the other way around. It is an inductive approach, meaning that it moves from the specific to the more general. Because theories for virtual identities, quality and rating systems, and constitutions around the increased empowerment of people are currently taking shape, this is the best approach: utilizing it to create a better model.
Societal impact and valorization My objective is to create a system that measures people’s activities and contributions online and automatically translates this to a virtual identity (or karma) that can be found by the right persons in the right context. Such a system allows people to be found and employed more directly and flexibly (Malone & Laubacher, 1998). Depending on how efforts are valued and used by community, the virtual identity of the contributor changes. I suppose this leads to two things; People contribute valuable content to community (otherwise it will not add value to their ID); People are more intrinsically motivated to contribute (fun, community feeling) rather than by financial reward. Still, the virtual ID forms a bridge to future job opportunities or assignments based on (motivation-based) contributions.
Such a system will change organizational structures, and create a more flexible and free economy, as speculated by Pekka Himanen: Could there be a free market economy in which competition would not be based on controlling information but on other factors – an economy in which competition would be on a different level (and, of course, not just in software, but in other fields, too)? – Pekka Himanen; the Hacker Ethic and the Spirit of the Information Age (2001) – Competition, then, would be then based on the contrary, the sharing of information and resources between people and in flexible networks and communities. I know that this is another testable assumption, but that could be done in further research. Before we can do that though, we must build the foundation of this system.
Case study; Sustainable network My analysis of quality and expertise in virtual environments (like online communities) will be the basis of PEERS 1 Interaction Management System ; software analyzing interaction of users with each other and online content. All described relationships, rules, and standards will be built in it, so it can be tested and applied immediately. Currently, we are deploying our software at different organizations in different settings. The following will serve as exploratory case study in the research; Sustainability network (100-250 professionals) consisting of DKA (De Kleine Aarde), Enviu, OSIRIS, and the TU Delft Sustainability department (SEPAM faculty). These organizations, concerned with sustainability and alternative technologies, have clearly expressed their interest and commitment to contribute and be part of the proposed research. I will deploy different software tools within these organizations, and use PEERS Interaction Management System to create dynamic exchangeable profiles of people and content. They allow users to make use of content and connect with people outside of their own organization. Tools and technologies already used by the organizations will be part of the research, if they allow measurement of use and users by PEERS IMS.
1
http://aboutpeers.com
Works Cited Abbott, V. (2000). Web page quality: can we measure it and what do we find? A report of exploratory findings. J Public Health , 22 (2), 191-197. Armstrong, C. (1997, May 19). Metadata, PICS and Quality. Retrieved August 10, 2008, from Ariadne magazine: http://www.ariadne.ac.uk/issue9/pics/ Benkler, Y. (2006). Wealth of Networks; How Social Production Transforms Markets and Freedom. New Haven, CT: Yale University Press. Choi, H. C., Kruk, S. R., Grzonkowski, S., Stankiewicz, K., Davis, B., & Breslin, J. G. (2006). Trust Models for Community-Aware Identity Management. Identity, Reference, and the Web Workshop at the WWW Conference, 2006. Downes, S. (2005). Resource Profiles. Journal of Interactive Media in Education , 5. Himanen, P. (2001). The Hacker Ethic and the Spirit of the Information Age. New York: Random House. Howe, J. (2006, June). The Rise of Crowdsourcing. Retrieved August 10, 2008, from Wired Magazine (14): http://www.wired.com/wired/archive/14.06/crowds.html Keen, A. (2007). The Cult of the Amateur. New York: Doubleday Business. Kelly, K. (2005, August). We Are the Web. Retrieved August 08, 2008, from Wired Magazine (13): http://www.wired.com/wired/archive/13.08/tech.html Lewandowski, D., & Höchstötter, N. (2008). Web Searching: A Quality Measurement Perspective. In A. Spink, & M. (. Zimmer, Web Searching: Interdisciplinary Perspectives (pp. 309-343). Dordrecht: Springer. Malone, T., & Laubacher, R. (1998, September-October). The dawn of the E-lance economy. Harvard Business Review , 144-152. Resnick, P., & Miller, J. (1996). PICS: Internet Access Controls Without Censorship. Communications of the ACM , 39, 87-93. Sanger, L. (2008, July 8). Syndicated Web ratings - an idea whose time has come? Retrieved August 8, 2008, from Citizendium Blog: http://blog.citizendium.org/2008/07/09/syndicated-web-ratings-an-idea-whose-time-has-come/ Smith, D. (2006, May 21). All set for a baby.com revolution. Retrieved August 10, 2008, from Guardian - The Observer: http://www.guardian.co.uk/technology/2006/may/21/news.theobserver Swaak, J., Ifamova, L., Kempen, M., & Graner, M. (2004). Finding in-house knowledge: patterns and implications. IKNOW04. Graz, Austria: Telematica Institute. Available at https://doc.telin.nl/dscgi/ds.py/Get/File-40767. Zimmer, M. (2008). Preface: Critical Perspectives on Web 2.0. First Monday (online) , 13 (3).
Preliminary budget As yet, I request the full amount needed to complete this research: €300.000 for a fulltime (4-year) PhD position, including research team, logistics and travel support, accommodation and all other expenses.
Valorization workshop The valorization workshop consists of 2 parts. In November a modest online conference will be held. I will do this using free conferencing and collaboration technologies. I will put 4 important questions forward which are addressed with by the invited speakers (15 minutes per speaker). A discussion follows with participants with my research and research question as the main topic. An offline meeting will be held in December with all stakeholders, including individuals from PEERS, research committee, and potential cases. Depending on the possibility of having this hosted by an institution, a maximum of €1000 is needed to hire office space and arrange beverages.
Summary for laymen (in Dutch) De laatste 10 jaar heeft het internet zich ontwikkeld met technologieën die mensen steeds beter in staat stelt om content te maken, toe te voegen, en te beoordelen. Bijna elke persoon kan met behulp van een computer en een internet verbinding zijn/haar passies, interesses, en kennis delen, en dat gebeurt dan ook. Verschillende mechanismen bestaan om die overvloed aan content te filteren en categoriseren, maar het blijft erg moeilijk om online of in virtuele netwerken het kaf van het koren te scheiden. Dit geldt voor zowel voor content (wat is betrouwbaar/van hoge kwaliteit?) als voor mensen (is deze persoon echt een expert op dit gebied?). De toename online activiteiten van mensen schept naast meer content, ook betere mogelijkheden om deze content te structureren en waarderen. Dit kan op verschillende manieren: Ten eerste kan het gebruik van content worden gemeten: dit is zowel het passieve lezen, als het actieve structureren/beoordelen/evalueren van content; Ten tweede kan worden gemeten door wie de content gebruikt. De hypothese is dat door zowel het gebruik als de gebruiker te meten en te analyseren, er hele specifieke en up-todate profielen gemaakt kunnen worden van content en mensen. Deze profielen zijn dynamisch en afhankelijk van de activiteiten omtrent persoon of content. Naarmate iemand actiever is, krijgt deze een rijker profiel (hoeft niet per se beter te zijn) en naarmate een stuk content meer gebruikt wordt, kan deze beter worden geprofileerd. Zo’n systeem ondersteunt het decentraal en flexibel werken van kenniswerkers in virtuele of open organisaties.