Digital Library Visualization Using SproutCore and SVG


Abstract


PaperCube was written using the SproutCore JavaScript framework, which is geared towards the creation of highly interactive, “cloud,” or thick client applications in the web browser. Using only JavaScript and standards-based rendering technologies such as Scalable Vector Graphics, PaperCube allows users to browse a version of the CiteSeer digital library and view paper and author relationships using a set of dynamic visualizations.

Stemming from the need for a flexible graph API for various views in PaperCube, the NodeGraph class was created as a generalized solution that can display any type of relational data as an undirected graph rendered using Scalable Vector Graphics. The class is not PaperCube-specific and could be easily integrated into other applications.

The NodeGraph class is able to display a mixed set of content objects and has many customizable methods and properties to keep the base implementation as generalized as possible. In PaperCube, the views that use this API extend and customize these methods and properties. These methods allow the developer to abstract away the details of different content data types and provide a consistent API. When a graph rendered using this class is altered based on any threshold value including depth, the graph is designed to smoothly transition using animations. All of the properties that determine the currently visible data can be adjusted dynamically through UI control elements through bindings. Since SVG is DOM-based, aggressive caching of DOM nodes is used to optimize performance and only apply transitions to altered parts of a visualization and leave the rest untouched.


Table of Contents

Introduction
Visualization using SproutCore and SVG
Architectural Overview
Client Architecture
SVG-Based Views
Generalized Animated Transitions
Controlling the Viewport
Rendering SVG-Based Views
The SproutCore-Based NodeGraph Class
Extendible Methods
Customizable Properties
High-Level Rendering Algorithm
Conclusion
Bibliography

The main goal of our research was two-fold. First, to develop and test the effectiveness of an application that allows a scholar to interact with a digital library and explore bibliographic meta data using a defined set of visualizations. The second goal was to push the limits of modern web browsers. By using web standards-based technologies, the goal was to explore the possibility of creating a dynamic, desktop-like experience that incorporates rich, interactive visualizations.

The experimental application, PaperCube, was validated through a user study which showed that it was very useful when it comes to augmenting digital library search by reducing the “cognitive load” on a researcher and aiding the “discoverability” of new research material. Furthermore, it was shown that participants thought that it was “visually exciting and intuitive” application and an “amazing example of the apps that we’ll be seeing on the web in a couple of years.”

PaperCube was developed as part of Peter Bergström’s master’s thesis [1] to test the effectiveness of using a suite of visualizations to augment digital library navigation. PaperCube expands upon previous research done by Peter Bergström as an undergraduate, CircleView [2], which was a basic visualization of digital library meta data that used server-side generated SVG.

Stemming from the need for a flexible graph API for various views in PaperCube, the NodeGraph class was created as a generalized solution that can display any type of relational data as an undirected graph rendered using Scalable Vector Graphics. The class is not PaperCube-specific and could be easily integrated into other applications that use SproutCore.

PaperCube consists of three distinct components. First, a MySQL database that stores the bibliographic meta data for papers and authors. Second, a PHP-based database interface that listens to requests for paper or author bibliographic meta data from the client and returns the result from the database in JSON format [7]. Third, the SproutCore-based application that runs in a web browser. The majority of the business logic and complexity lies in the web browser leaving the two layers below lightweight.

Leveraging the powerful features in SproutCore including bindings and observers, PaperCube aims to deliver unparalleled interactivity within the confines of a web browser. Bindings and observers enabled PaperCube to easily implement resolution independence in its visualizations. Resolution independence was key because citation networks can be very large. By adjusting a slider control in the UI, a visualization can be zoomed in using SVG-based transforms without needing to explicitly redraw. Furthermore, through the use of bindings, slider controls can dynamically alter the display parameters of the visualizations, permitting the depth of a paper citation network to be adjusted or the thresholds that determine if a node should be displayed as part of the graph changed.

Using SproutCore’s rich data model, content data objects are incrementally and asynchronously retrieved from the server in the form of JSON objects ensuring that only needed data impacts performance. This loading scheme permits complex visualizations to be progressively rendered using SVG to display hundreds, if not thousands, of nodes and give instant feedback to the user while loading additional data.

SproutCore applications rely on controllers to bridge the gap between the data model and the views that render the data. The controllers enable the manipulation of the display thresholds through bindings and observers to adjust the amount of information rendered without unnecessary code overhead.

Some of the views in PaperCube require animated transitions. To easily manage animations throughout PaperCube, there is central animation controller object. This object manages a queue of HTML or SVG DOM elements that need to be animated form one point of the screen to another.

Internally, a heartbeat function executes every forty milliseconds that checks if any pending animations are present in the scheduling queue. If the queue contains scheduled animations, a loop will traverse the queue and execute one frame per animation during one heartbeat. Once an animation is complete and the DOM element is positioned at the desired end point, it will be removed from the queue. If a large number of elements are scheduled to be animated, the loop may take longer to execute. In that case, instead of making all the animations take longer to finish, the animator will drop frames in order to ensure that of the animated subjects will complete their animations within the desired time. The position of an element is based on the start time, duration, and current time.

In Example 1, “Code showing how to queue a subject SVG node to be animated”, an SVG circle element, svgNode, is being animated diagonally left and down across the screen in 250 milliseconds.


Resolution independence is at the heart of PaperCube. A controller is used to relay positioning information between the zooming widget and the views. When the dimension values change, the various views are notified through observers that they need to redraw if they are visible. In addition to the current zoom factor, the controller also manages the width and height of the area where the view is rendered, or in other words, the canvas.

Since a view can be zoomed in using various means, more information than just the width and height need to be managed. The window is treated as a viewport that may only show a part of the total visible area of the canvas. Therefore, the controller has a portal height and width, portal x-position and y-position offsets, canvas height and width, and zoom value. As a view is zoomed in, the canvas dimensions increase but the portal dimensions remain constant. The offset x and y positions are used to determine where to place the top left corner of the portal. The offset positioning is accomplished by using negative top and left absolute positions of the canvas element. Figure 3, “The canvas and the preview inside of the browser” shows the description of how the different dimension properties are related.


Since SVG is vector-based, scaling is not an issue. Therefore, the canvas is actually rendered at the highest possible zoom value. For example, if a view allows for 10x zoom, the canvas is rendered ten times the size of the portal and then scaled to 1/10th of the size if the currently selected zoom value is 1x. By using the transform attribute on the root SVG group element that contains the visualization’s DOM nodes, the calculation in JavaScript is performed in Example 2, “Code for applying the SVG transform attribute to scale the visualizations”. The zoom value from the controller is inverted and saved as zI. In the calculation, the scale command component of the transform is set to zI then it repositioned using the translate command by calculating the offset amount by taking the width and height of the canvas and dividing it by zI.


Using Scalable Vector Graphics instead of HTML- or Canvas tag-based rendering methods has many benefits. The drawing primitives in SVG are easy to manage and are easily reused. Furthermore, scaling and rendering text is very simple compared to the other options.

In order to use Scalable Vector Graphics within the SproutCore application, which is HTML-based, PaperCube uses the “XHTML 1.0 Strict” document type and include both the XHTML and SVG namespaces. This allows PaperCube to dynamically create SVG DOM nodes using DOM2 JavaScript operations. This combined namespace worked correctly in all the web browsers that support native, standards-compliant SVG rendering.

When views that want to use SVG are initialized, the needed base svg and g elements are programmatically generated using DOM manipulations as shown in Example 3, “The init function of an SVG-based view showing DOM manipulations”.


Stemming from the need to have a graphing component for Collaborators view, the NodeGraph class was created as a generalized solution that can be used to show any data with relationships using an undirected graph rendered using Scalable Vector Graphics leveraging the SproutCore framework. The NodeGraph class is not PaperCube-specific and can be easily integrated into other SproutCore applications.

The NodeGraph class is able to display a mixed set of content objects and has many customizable methods and properties to keep the base implementation as generalized as possible. In PaperCube, the views that use this API extend and customize these methods and properties. These methods allow the developer to abstract away the details of different content data types and provide a consistent API. When a graph rendered using this class is altered based on any threshold value including depth, the graph is designed to smoothly transition using animations. All of the properties that determine the currently visible data can be adjusted dynamically through UI control elements through bindings. Since SVG is DOM-based, aggressive caching of DOM nodes is used to optimize performance and only apply transitions to altered parts of a visualization and leave the rest untouched.

The class is used in PaperCube in four different views: Paper Graph, Papers, Collaborators, and Author Cites. The class is agnostic to the type of SproutCore content object being rendered and is able to render a mixed set of content objects. Furthermore, it is fully customizable. Views that are implemented with this class extend and customize a set of methods and properties to keep the base class as generalized as possible. These methods allow the developer to abstract away the details of different content data types and provide a consistent interface to the class. When the graph is altered with the same root node, the graph is designed to smoothly transition using animations.

Table 1, “Extendible methods in the NodeGraph class” shows a list of all of the methods that can be overridden by the developer in order to customize the NodeGraph class to create the desired visualization behavior.


For example, for Collaborators view, the performCustomRequest method, shown in Example 4, “Customized performCustomRequest method”, calls the appropriate method in the adaptor to retrieve the collected GUIDs that are required.


The NodeGraph class requires the actual content object during some steps in the rendering process. This needs to be implemented on a per content-type basis because there are different types of content. Shown in Example 5, “Customized findCustomObject method”, the findCustomObject method takes a GUID string and returns the associated item. In this case it will return a Papercube.Author content object. If no author is found, null will be returned.


The relations on which the graph is based vary depending on the content type. In this case, the graph is showing collaboration relationships and this relationship is provided as an array of GUIDs. Accessing the _attributes array on a content object is not recommended unless the developer knows that it is not a computed property. In the case of Collaborators view, it is not computed, and is accessed directly as shown in Example 6, “Customized findCustomObjectAttr method”. If it is a computed property, it does not exist the array and needs to be accessed using the general SproutCore accessor function, get.


Given a relation, the NodeGraph needs to access its GUID. In the case of Collaborators, this is quite simple because the relation object is an array containing the GUID and weight attributes as shown in Example 7, “Customized getGuidForRelation method”. There are other cases where these functions can be quite complex.


Shown in Example 8, “Customized calcRelationWeight method”, the calcRelationWeight method looks at the importance of the relation. In this case, it is the author’s number of collaborators which is located at the second index of the array.


The relationMeetsCustomThreshold method, shown in Example 9, “Customized relationMeetsCustomThreshold method”, takes a relation and calculates if it should or should not be displayed. Most views have a set of sliders that determine the thresholds that an item needs to meet in order to be shown. If more than one threshold is used, a composite truth value needs to be computed. In this case, this method looks at two thresholds, the linkThreshold, which is the number of collaborators and the paperThreshold, which is the number of papers that the author has written. If the author meets both of these thresholds, it is included.


Lastly, the generateCustomMetaData method, shown in Example 10, “Customized generateCustomMetaData method”, needs to be customized in order to show the meta data for an item. In the case of collaborators, the bibliographic meta data is associated with Papercube.Author content objects.


Also, a rich set of properties, listed in Table 2, “Customizable nodeGraph class properties”, are exposed that dictate the rendering and what parts of the graph are visible. The developer can customize the colors, sizing, labeling, and meta data options. One can turn off edge rendering or show edges without weight calculations. Since the NodeGraph class can display any type of content, basic information needs to be provided such as the attribute to use for the title and the human-readable name of the content type.

NameDescription
nodeColorNode background color. Default '#FFCB2F'.
nodeColorSelSelected node background color. Default 'yellow'.
nodeBorderColorNode border color. Default '#EAA400'.
nodeBorderColorSelSelected node border color. Default '#FFB60B'.
nodeTextColorNode text color. Default '#666'.
nodeDefaultRadiusThe node size ratio from the size of the screen. Default is 25 times smaller than height.
nodeBorderWidthThe default width of the node border. Default is 1 unit.
nodeXYRatioNode radius x-y ratio. The default is x radius is 1.5 times y.
edgeColorEdge color. Default is '#333'.
edgeColorSelSelected edge color. Default is 'red'.
edgeTextColorEdge text color. Default is '#666'.
edgeMinWidthThe minimum width of an edge. Default is 1 unit.
nodeTextRatioThe nodeTextRatio allows the font size for the node to be calculated. The font size is calculated as radius/nodeTextRatio. Default is 2 units.
edgeTextPosOffsetThe position offset for the edge label. A value of 0.1 would puts the edge label close to start node. 0.5 would put it in the middle of the edge. 0.9 would put it close to the end node. Default is 0.3.
nodeOpacityNode opacity. Default is 1.
edgeOpacityEdge opacity. Default is 0.2.
edgeOpacitySelSelected edge opacity. Default is 0.5.
showEdgeLabelIf set to YES show the edge label, if NO, hide the label. Default is YES.
useEdgeWeightWidthIf set to YES calculate the edge width by looking at the weight of the item, otherwise, skip this operation. Default is YES.
showEdgesIf NO, do not show edges. Default is YES.
defaultTitleKeyThe key in the content object for the default title display. Default is 'title'.
contentTypeViewingThe name of the content type being displayed. Default is 'Paper'.
viewNameThe name of the view. Default is 'none'.
metaDataBoxHeightSmallMeta data box small height. Default is 120 pixels.
metaDataBoxWidthSmallMeta data box small width. Default is 400 pixels.
metaDataClassNameClass name for meta data DIV. Default is ''.

Table 2. Customizable nodeGraph class properties


The NodeGraph’s layout algorithm positions nodes around the graph’s root node in a circular layout and uses several different parameters to determine the child node positions. Several passes are made on the data to collect information needed during rendering. First, a recursive method visits all the nodes and determines if they meet the criteria to be visible and collects any additional GUIDs that need to be retrieved from the server. Second, another pass traverses all the visited nodes and positions them. Once they have been positioned, they are rendered on the screen.

The default node size and offset are calculated and saved as global variables based on the first recursive traversal of the data. The offset is the default distance between the root and its children. Inside the NodeGraph’s render method, after determining the deepest visited node in the root node’s network of relations, the default radius and offset can be calculated as shown in Example 11, “The code for determining the radius and offset values used in rendering the graph”. The NODEGRAPH_DEFAULT_SCALE is the deepest zoom level that is supported, which is defaulted to 10. In the case of PaperCube, the _h and _z properties are set from the canvasController’s display properties. The nodeDefaultRadius property is determined by the developer since it is a property that can be overridden.


After the defaults are determined, the nodes are positioned based on these properties during the second pass that traverses all the visited nodes from the first pass. The child nodes for a given node have their radius calculated and individually positioned at the same time. By looking at all the visible child nodes, the radii for all the nodes are determined together. The more child nodes, the smaller they will appear on the screen.

For the child nodes, the radius is determined by the formula r = (2×π×o)×(d/360)/c/4, where r is the radius of the circle, o is the global offset distance for the view, and c is the number of children of its parent. The mathematical formula translates to the code in Example 12, “The radius calculation for the child nodes”.


During the rendering pass, the angle of the node is passed in and its children can take up to the smallest value of 90 degrees or 1.5 times the angle of the node’s parent. This is based on the fact that if the root node has a lot of children, the allotted space for the node is limited and the layout algorithm tries to reduce the possibility of overlapping nodes. No matter which angle is determined, the angle is bisected so that the nodes will fan out around the specified angle.

Since the position of the parent node is known, the position of its child nodes can be easily calculated using basic geometry. A local offset to the parent node is calculated for each child node. The more children a node has, the closer it is positioned to its parent node. The formula lo = o×(1-c/m) determines the local offset, lo, where o is the global offset distance, c is the number of children of the node, and m is the maximum number children for all the child nodes of the parent node.

Then, to calculate the local x and y positions for each node, a loop starts with the start angle for the children for a node. Then by looking at the number of children that should be rendered, the local angle, t, for each node can be determined. Using the local offset, the parent nodes position, x and y, the node’s position, lx and ly can be determined.

In code, the formulas translate into the code shown in Example 13, “Calculating the position for each child node of a node”. Please note that some additional constants listed are shown, but they are always set and do not impact the formula. Note that the delta and theta variables are passed from the parent’s executions of the recursive method. Also, additional calculations allow for animation of newly created nodes. The animation transitions the node from its parent’s position to where it should end up.


Although the rendering algorithm requires several passes to determine out what should be rendered, it is efficient because it first narrows down the data that should be displayed and then works on those nodes exclusively. Also, the actual positioning component of the algorithm is only done on nodes that need to be repositioned, and not on all nodes that may be rendered on screen. The actual rendering is done in a linear fashion and only on the nodes and edges that require layout.

When the state of a NodeGraph-based visualization changes based on display thresholds, the rendering algorithm looks at the nodes that should remain visible. From the set of visible nodes, the rendering method calculates the new x- and y-positions and radius for the nodes and then adds the nodes to be animated from their previous position. If the nodes change state from being visible to being hidden, they are not animated out, but rather hidden immediately. In the case of adding a set of new nodes, they are animated from their parent to their final positions.

PaperCube, which uses the NodeGraph class, was designed to augment—not replace—existing digital library services and set out to push the limits of web browsers. SproutCore makes it possible to create rich, highly interactive web applications that are able to use a variety of rendering methods including Scalable Vector Graphics. The NodeGraph class was created as a generalized SVG-based solution to visualize any type relational data as an undirected graph inside a SproutCore application.