UOS to Course Pre-Visualization
September 28, 2011 § Leave a comment
Our COMP5048 project requires data on UOS to Course relations. Hence, some pre-visualizations were done for analytical purposes.
Data was translated from the original core data to form an adjacency list.
This data had to be reduced, eliminating repetitions (so, for one thing, it could be mapped on a spreadsheet). There are also various versions of this data, it was transformed into different formats (csv, tab-delimited-text,) at different subsets to the original (~28630 nodes), to experiment with across different visualization programs.
Spreadsheet software were used to generate some quick visualizations. Apple’s iWork Numbers was found to be too unstable for large amounts of data, with Excel surpassing it in performance.
The following scatter chart was generated to visualize similar course-to-subject structures. The x-axis is UOS_Index, the y-axis is Course_ID. (Ignore the symbols).
This same data was then fed into some of my own previous java-visualization applications. However, the full data set would run out of memory (due to inefficiencies in implementation for this specific case).
Hence, other visualization programs were experimented with.
– Ggobi, i unfortunately, I could not seem to figure out, and it seemed outdated.
– Tulip, i had heard good things about of it’s windows equivalent, however, it did not appear to that it could accept adjacency matrix’s.
– Gephi, i had also heard good things about this program and decided to try it out. It’s an amazing program (for one, allowing the import of multiple formats).
Gephi also has inbuilt layout algorithms with manipulable variables. These were very fun (and useful) to experiment with. I initially experimented with a ~10% subset of the data. A couple of results are shown below.
Algorithms such as ‘Yifan Hu Proportional’ and ‘Fruchterman Reingold’ seemed quite useful to the context. ‘Yifan Hu’s Multilevel’ and ‘Force Atlas’ also appeared interesting.
These same algorithms were then applied to the entire UOS to Course dataset.
The algorithms were stopped after an hour or so, but a clearer graph may have been established given more computing power / time to run.
The time it takes for this amount of data to reach viable visual patterns in real time is inappropriate for the project.
- Pre-Cluster Nodes for later Visualization in real-time (Eades, et. al.)
- Pre-Compute Locations of all Nodes
- Ordering the rows in the excel graph examples, (like using genome / dna – like visualizations) to compare the similarity between course streams and show shared subjects could also be an alternate form of visualizing similarities in the core dataset.
- Overall, alternative visualizations to the project.
Such data can be used as hard data to set node positions.