We initially intended to go with a node-network representation, but there were simply too many nodes.

Instead, we opted to use multiple of visualisations with related datasets and interactions, as depicted in the video below:

In the future, larger amounts of time could have been devoted to exploring the representation of the datasets as nodes, however, given our time-frame, I stand by our decision to use interaction to enhance the mental model across different complementary visualisations.

The following is our report.

]]>

Data was translated from the original core data to form an adjacency list.

This data had to be reduced, eliminating repetitions (so, for one thing, it could be mapped on a spreadsheet). There are also various versions of this data, it was transformed into different formats (csv, tab-delimited-text,) at different subsets to the original (~28630 nodes), to experiment with across different visualization programs.

Spreadsheet software were used to generate some quick visualizations. Apple’s iWork Numbers was found to be too unstable for large amounts of data, with Excel surpassing it in performance.

The following scatter chart was generated to visualize similar course-to-subject structures. The x-axis is UOS_Index, the y-axis is Course_ID. (Ignore the symbols).

This same data was then fed into some of my own previous java-visualization applications. However, the full data set would run out of memory (due to inefficiencies in implementation for this specific case).

Hence, other visualization programs were experimented with.

– Ggobi, i unfortunately, I could not seem to figure out, and it seemed outdated.

– Tulip, i had heard good things about of it’s windows equivalent, however, it did not appear to that it could accept adjacency matrix’s.

– Gephi, i had also heard good things about this program and decided to try it out. It’s an amazing program (for one, allowing the import of multiple formats).

Gephi also has inbuilt layout algorithms with manipulable variables. These were very fun (and useful) to experiment with. I initially experimented with a ~10% subset of the data. A couple of results are shown below.

Algorithms such as ‘Yifan Hu Proportional’ and ‘Fruchterman Reingold’ seemed quite useful to the context. ‘Yifan Hu’s Multilevel’ and ‘Force Atlas’ also appeared interesting.

These same algorithms were then applied to the entire UOS to Course dataset.

The algorithms were stopped after an hour or so, but a clearer graph may have been established given more computing power / time to run.

The time it takes for this amount of data to reach viable visual patterns in real time is inappropriate for the project.

Alternatives include:

- Pre-Cluster Nodes for later Visualization in real-time (Eades, et. al.)
- Pre-Compute Locations of all Nodes
- Ordering the rows in the excel graph examples, (like using genome / dna – like visualizations) to compare the similarity between course streams and show shared subjects could also be an alternate form of visualizing similarities in the core dataset.
- Overall, alternative visualizations to the project.

.

Such data can be used as hard data to set node positions.

]]>*Our visualization system is designed to facilitate future potential University of Sydney students with exploring, and deciding upon, available courses and degrees. Through the agile design of such a system, we will conduct user studies in order to improve our system and evaluate the adoption of various themes and Natural User Interface (NUI) elements within the context of information visualization.*

The Gestalt Principles are theories on visual perception which can be utilized in the effective creation of visualizations. They consist of principles of:

- Similarity
- Proximity
- Continuation Principle
- Connectedness
- Figure-Ground Relationship
- Closure
- Symmetry
- Area

.

However, the use of such principles in information visualization is useless without evaluating it’s effect (positive or otherwise). Such evaluation can be done through interviews, qualitative / quantitative techniques, analytical inspection (observation, heuristic evaluation), empirical evaluation forms (usability tests (e.g. ‘Think Aloud’) which are usually done in the early stages of design). When a system is near completion, controlled experiments are often done to gain quantitative results through strict procedures.

.

Eye tracking can also be used for studies, in one particular study, useres were presented with different stimulai;

and their eye movements were used to discern how crossings affect eye movements and performance, the impact of crossings differing with crossing angle and size of graphs, and that people have geodesic-path tendency in searching for shortest paths.

.

The below diagram is also an interesting representation of where the graph visualisation research community is aiming for. I think it’s great that we’re developing theories on how users’ read graphs, but believe there should always be user test – feedback loop between the making of algorithms and the theories used.

]]>

- Edges roughly pointed in one direction
- a. Nodes evenly distributed

b. Long edges avoided. - Edge Crossing Minimized.
- Edges should be as straight / vertical as possible.

.

The Sugiyama Method aims to address these conventions, and is useful for dependency diagrams, flow diagrams, conceptual lattices and other directed graphs. Essentially, layered networks are useful in representing Dependency relations.

The Sugiyama Method:

- Cycle Removal

– may temporarily Reverse some edges

– each cycle must have at least one edge against the flow (NP Hard), requires heuristic (e.g. enhanced greedy heuristic) or Randomized Algorithms. - Layering. (assigning y)

– vertices may be introduced to split edges - Node Ordering

– the Ordering is all that matters (not co-ordinates), NP Hard.

– Many heuristics. e.g. Layer-by-layer sweep (two-layer crossing problem), addressed by 1. Sorting, 2. Barrycentre, or 3. Median, methods. - Co-Ordinate Assignment.

Slides on this topic can be found here.

]]>Information Visualisation amplifies human cognitive capabilities by:

**Increasing Cognitive Resources**– e.g. visuals expanding memory.**Reducing Search Space**– e.g. large amount of data in small space.**Enhance Pattern Recognition**– e.g. info organised spatially by time.**Support Perceptual Inference of Relationships**– often difficult.**Perceptual Monitoring**– e.g. large number of, and unexpected, events.**Providing a Manipulation Medium**– e.g. exploration,*collaboration*.

Information visualisation, combined with data analysis, can be applied to analytic reasoning to support the sense-making process.

.

The multidisciplinary field of Visual Analytics consists of:

**Analytical Reasoning Techniques**

method by which users obtain*deep insights*that directly support*situation assessment, planning, and decision making.***Data Representations and Transformations**converting all types of conflicting and dynamic data to support visualisation and analysis.

**Techniques for Production, Presentation, & Dissemination**of analysis results, communicating information to various audiences in the correct context.

**Visual Representation & Interaction Techniques**allowing users to quickly explore and understand large amounts of data by utilising the human eye’s broad bandwidth.

.

Visual Analytics must facilitate high-quality human judgement with a limited time. They must enable diverse analytical tasks:

**Understanding Past & Present**quickly, recognising trends and events that have produced current conditions.

**Identifying Potential Futures**(and their Warning signs).

**Monitoring Current Events for Warning Signs**and unexpected events.

**Determining Indicators**

of the Intent of an Action / Individual- Supporting Decision Maker in Times of Crisis

These tasks are conducted through individual and collaborative analysis, often under extreme time pressure. Visual analytics must enable **Hypothesis-Based** and **Scenario-Based** Analytical Techniques, providing support for the analyst to reason based on available evidence.

.

Established Graph Drawing algorithms often attempt to solve 1. Scalability, 2. Visual Complexity, or 3. Domain Complexity.

Such algorithms have included;

*[ GEOMI (Geometry for Maximum Insight) is a visual analysis tool for large and complex networks. ]*

Navigation can often be aided by animations that preserve the mental map.

Some visualisations like the actor-movie (two mode network), suitable use (p,q)-core for filtering. Like k-core, but using two variables per class clustering.

.

Source

- Illuminating the Path: The Research and Development Agenda for Visual Analytics (nvac)

.

**Centrality** can be locally measured by degree, or via global distance measures. Though more expensive, global means are often more valuable, with example methods being the measure of **Betweeness** (‘gatekeepers’), **Closeness**, (sum of shortest paths to all vertices), and **Eccentricity **(length of the longest shortest path). The use of any of these methods are, of course, dependent on purpose. Feedback measures such as Status/Hub/Authority (useful for page length)/eigenvector can also assess centrality.

Centrality can be displayed by node graphs, radial drawings, hierarchical drawings(,dendrograms), etc.

.

**Cohesive Subgroups** help identify meaningful social groups. Their components can be strong/weak, cycles/cyclic, connected (k-connectivity)/isolated, or, cut vertex/separation-pair.

Cliques (complete subgraphs, where all nodes are connected), can be found via **n-clique**, with ‘n’ dictating the maximum path length of members of a clique, allowing the relaxation of the definition with it’s increase. **N-Clan** usefully extends n-clique, requiring the diameter of the clique to be no greater than n. Dense areas can also be found with **k-core**, in which every vertex is adjacent to at least k other vertices. **K-plex** finds a set of vertices in which every vertex is adjacent to All except k of the other vertices (connected to n-k vertices).** ****n-clique **and **n-clan** are about **reachability** (path length). **k-core** and **k-plex** are about **degree**.

.

**Structural Equivalence and Network Positions** can greatly reduce complex networks via methods such as the Block Model (or image matrix). This is done by clustering via cliques, distance, or similarity. Similarity of social positions can be conceptualised by; structural equivalence, automorphic equivalence, regular equivalence, outdegree and indegree equivalence, blockmodelling and generalised blockmodelling.

Nodes are structurally equivalence if they hold identical positions in the network. Blockmodel essentially combines similar nodes into one. Nodes that are structurally equivalent are also automorphically equivalent. (common – vast majority)

Automorphic nodes do not have to be connected to exactly the same nodes, but to nodes that play analogous roles in the network.

Nodes are regularly equivalent if they have ties to the same role, revealing social structures. Regular equivalences relaxes the definition further, no longer requiring degree, but only that you know one person in a class. (common – niche applications)

All of these forms of equivalence and several others [Pattison 1993], have the property that: • There is a path at the block level if and only if there is at least one path at the node level.

.

**Network Measures/Statistics** can be used to analyse and compare networks. These statistics consist of;

– Degree Distribution

– Clustering Coefficient

– Diameter

– Average Path Length

– Connected Component

– Density

Such analysis’ can be performed using pajek by Vladimir Batagelj.

Blockmodelling can aid with matrix rearrangement views and clustering. The identification of ‘Triads’ / Pattern Searching has also been useful in social network analysis, along with the normalization of data.

.

Sources:

- Social Network Analysis: Methods and Applications [Wasserman and Faust 94]
- Network Analysis: Methodological Foundations LNCS 3418 Tutorial [Brandes and Erlebach eds. 04]

**Scale-Free Networks** are quite popular and useful. SFNs’ have a power-law degree distribution, at least asymptotically. This occurs in social networks (with phenomenons such as “Six Degrees of Separation (S. Milgram, John Guare (1967)), communication networks, biological networks (metabolic), world wide web (1999, 19* separation), scientific collaboration, citation patterns, etc. Power-Law Degree distribution is attributed to high clustering coefficients (i.e. “My friends will ikely know each other.”).

Exponential, or more purely random networks (road networks) have a **Poisson distribution** (following models such as Erdos-Renyi’s 1960 democratically random model). Watts-Strogatz Model; Small World Networks, simulates regular to small-world to random networks, finding that this 6 degrees of separation comes from randoms that know a bunch of people you don’t.

Essentially, Scale-Free Networks’ Network model have **exponential growth**, and **preferential attachment**. They have **power-law degree distribution**, **high clustering coefficient** and extremely **small average path length O(log log n)**. These attributes make SFN’s useful for modelling real-world networks.** **Other networks include small-world networks and power-law random sparse networks (Fan Chung).