Supporting Exploration In Social Dataanalysis

  • April 2020
  • PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA


Overview

Download & View Supporting Exploration In Social Dataanalysis as PDF for free.

More details

  • Words: 1,311
  • Pages: 4
Supporting Exploration in Social Data Analysis Adam Perer Human-Computer Interaction Lab Department of Computer Science University of Maryland College Park, MD 20742 [email protected] Ben Shneiderman Human-Computer Interaction Lab Department of Computer Science University of Maryland College Park, MD 20742 [email protected]

Abstract Recent implementations of social data analysis services have focused on providing visual overviews of data and forums for public debate and commentary. While these are necessary features for supporting data analysis for social collaboration, we argue that more attention should be spent on improving the exploration experience of users. Improved exploration will allow users to dig deeper into the data. Furthermore, richer analysis tools will benefit from the collaborative efforts of many users. There is a need for sophisticated filtering, which allows users to find patterns, gaps and outliers in the midst of an overwhelming visualization. There is also a need for an integration of statistics with the visualizations, which facilitate discovery by providing important clues to complex data. We believe that supporting advanced filtering techniques and an integration of statistics and visualization will increase the utility of social data analysis services.

Keywords Exploratory data analysis, social data analysis, information visualization, statistics, filtering Copyright is held by the author/owner(s). CHI 2008, April 5 – April 10, 2008, Florence, Italy

ACM Classification Keywords

ACM 1-xxxxxxxxxxxxxxxxxx.

H5.m. Information interfaces and presentation (e.g., HCI): Miscellaneous.

2

Introduction Websites such as Data360 [1], ManyEyes [9] and Swivel [8] have introduced the masses to the social capabilities of data analysis tools. On these websites, users can upload data, create visualizations, and enable a community of interested viewers to take part in the analysis. By sharing comments, creating different projections, annotating interesting outliers or patterns, or offering speculation on why the data is the way it is, the audience can go beyond the data provider’s insights. They can point out missed insights, provide different interpretations, or suggest mistakes in the underlying data set. These features are extremely important in delivering accountability to data analysis, as well as leveraging the intellect of many interested analysts. Furthermore, these social data analysis websites act as forums to communicate data in interactive ways previously only available to developers or purchasers of expensive software suites. Although the features offered by these services are both groundbreaking and exemplary, we argue that improving the exploratory capabilities should receive further attention. In this position paper, we focus on two specific goals: 1) allowing users to dig deeper using sophisticated filtering tools, and 2) integrating statistics with visualizations to guide users to data with interesting properties. We believe these additional features will allow richer exploration on social data analysis services, and consequently, richer social activity.

Beyond Overviews: Digging Deeper with Filtering

In current social data analysis systems, the visualizations often feature limited interactive capabilities for exploration. They present rich overviews of the data, ranging from traditional scatterplots and histograms to more modern treemaps and network visualizations. However, we argue that these sites offer limited functionality for filtering and zooming, step 2 of the Visual Information Seeking Mantra [6]. There are several side effects for not properly supporting filtering. First, scalability is limited as all data is forced to always be present in the visualization. Second, users may compensate for this first constraint by filtering off-site. This behavior has a danger of users throwing out important data accidently simply because they did not have access to a visual filtering process. It is important to note that enabling filtering increases the number of paths of exploration and thus also the complexity of the system. These types of interactions can further confuse novices who wish to contribute to analysis. One possible solution is to allow the creators of the visualization to produce guidance in the spirit of SpotFire guides [7], which can direct novice users through important analytical steps. A more sophisticated solution would support Systematic Yet Flexible (SYF) guides, which records users’ progress towards authored goals while also supporting flexible exploration [4]. Visual filtering works best when supported by dynamic queries and range sliders to give users direct control. Since users should be able to filter in as many dimensions that exist in the data set, the interface

3

should present the stack of filtered dimensions to users in a coherent manner.

Integrating Statistics with Visualization

Figure 1. The rank-by-feature framework helps users find important patterns between many columns in data.

Figure 2. A complex network visualization (top) can be simplified using statistical rankings, color-coding, and filtering (bottom).

Even with filtering, sometimes certain phenomenon cannot be found solely with visualization. Particularly with large data sets, visualizations will not always highlight important trends of the underlying data. Statistical properties can be used to detect important datapoints, relationships, and clusters. Statistical analysis can aid the comprehension of visualizations by numerically suggesting (or confirming) visual output. Presenting boxplots, standard deviations, lowertriangular matrices will greatly improve the exploratory data analysis capibilitly of these websites. When users are faced with data with many columns, choosing which dimensions to plot in a scatterplot can be quite tedious and challenging. The rank-by-feature (RBF) framework, shown in Figure 1, suggests statistically interesting pair-wise columns that can help guide users to interesting phenomenon [5]. The resulting scatterplots (not pictured) appear when a user hovers offer each cell in the matrix. When users are navigating stack charts, highlighting other similar stacks in the pattern-finding spirit of TimeSearcher [2], will help users overcome the overview displacement distortions. When users are trying to interpret a chaotic network visualization, color-coding and filtering the nodes by centrality measurements in the spirit of SocialAction [3] can increase comprehension. Similarly, relevant statistical information can also be displayed in scented widgets to further improve navigation [10]. Although statistical techniques are even more

complicated to comprehend and use effectively, the discoveries they can lead to can outweigh the cost of instruction. Furthermore, in a collaborative environment, users can partition effort by navigating in their respective areas of statistical expertise. This statistical information will also empower users filter out statistically unimportant data, bringing simplicity to initially overwhelming visualizations. Finally, having statistical overviews of visual information will also help users trust the resulting information, not allowing users to maliciously hide or distort the visual representations.

Conclusion In this paper, we speculate that a richer experience on social data analysis websites can be had if more attention is paid to the exploratory capabilities. Advanced filtering techniques allow users to step beyond overviews and take advanced paths to finding insights. Additionally, integrating the visualizations with statistical analysis can reduce the complexity of complex visualizations while also guiding the users to interesting gaps, outliers and patterns. While both of these requirements suggest an increase in the complexity of an interface, the richer explorative capabilities can leverage the true power of the masses: many explorative paths for many insights.

Citations [1] [2]

[3]

Data360 Data360. (2007). Hochheiser, H. and Shneiderman, B. Dynamic Query Tools for Time Series Data Sets, Timebox Widgets for Interactive Exploration. Information Visualization, 3, 1 (2004), 1-18. Perer, A. and Shneiderman, B. Balancing Systematic and Flexible Exploration of Social Networks. IEEE Transactions on Visualization and Computer Graphics, 12, 5 (2006), 693-700.

4

Perer, A. and Shneiderman, B. Systematic Yet Flexible Discovery: Guiding Domain Experts through Exploratory Data Analysis. (Under Submission)(2007). [5] Seo, J. and Shneiderman, B. A Rank-by-Feature Framework for Interactive Exploration of Multidimensional Data. Information Visualization, 4, 2 (2005), 99-113. [6] Shneiderman, B. The Eyes Have It: A Task by Data Type Taxonomy for Information Visualization. In Proc. Visual Languages(1996), 336-343. [7] Spotfire DecisionSite (2007). [8] Swivel Swivel. (2007). [9] Viégas, F. B., Wattenberg, M., van Ham, F. K., Jesse and McKeon, M. Many Eyes: A Site for Visualization at Internet Scale. In Proc. IEEE Symposium on Information Visualization (InfoVis 2007)(2007). [10] Willett, W., Heer, J. and Agrawala, M. Scented Widgets: Improving Navigation Cues with Embedded Visualizations. In Proc. IEEE Symposium on Information Visualization (InfoVis 2007)(2007). [4]

Related Documents