SPARQL 1.1 Controlled 3D RDF Visualization - from a Force-Directed Layout to a Molecular Visualization of DNA using Nexus in OpenSimulator

Nexus is an experiment with Semantic Web RDF data visualized in three dimensiodians that can be done collaboratively amongst many people (and concurrently) at disparate locations.  Nexus also acts as a platform to try out various design ideas, technologies, and methodologies.  The original Nexus design read and displayed RDF data and could also export it.  I have reworked the back-end of Nexus to use RDF internally and to communicate with its front-end client(s) in pure N-Triples.  The internal RDF representation enables the use of SPARQL (the query language for RDF) via Jena ARQ to manipulate the RDF graph and thus the over-all visualization.  In this posting, I will show the SPARQL 1.1 commands used to manipulate the structural data of a strand of DNA that has been converted to RDF from the original PDB format.  The resulting display will be shown as a force-directed layout and then manipulated into a physical RDF layout determined by crystal structure coordinates contained within the RDF.  Essentially, this will allow for molecular visualization within Nexus allowing us to actually see the strand of DNA in a physical form.

Basic Visualization Design Concepts in Nexus
The basic unit of information we want to visualize is the RDF triple:

Subject - Predicate - Object

In keeping with the "pure RDF" concept, this triple would be annotated with a RDF triples using a display ontology designed for Nexus, it's prefix being "nex".  Statements like nex:color, nex:xyz, nex:glow, nex:nodesize could be made about any resource whether it be subject or object.  For each resource, a "display node" triple is introduced and attached to the original rdf resource. RDF nex statements would then be made about that display node.  For example:

?s ?p ?o
?s nex:rnode ?displaynode
?displaynode nex:color "1,0,0" (red)
?displaynode nex:xyz "2.34,7.34,1.23"e
?displaynode rdf:type nex:sphere
?displaynode nex:radius "3.4"
    and so on.....

Adding this "display node" layer added a large degree of flexibility for RDF displays.  At one point, the display nodes were represented as blank nodes, but in the current version of Nexus, I converted these to resources.  It was just easier to work with in this way.

Fun with RDF Reification
Visualization nodes cannot be attached directly to predicates and literals because RDF statements cannot be made about predicates or literals.  You can only make RDF statements about resources.  However, you can make statements about statements through a process known as RDF reification.  The triples for a single reified statement for nexus would look as follows:

?s ?p ?literal  (statement to be visualized)

The following RDF statements attach a display node to the predicate (?p) and literal (?literal)

?viznode rdf:type rdf:Statement
?viznode rdf:subject ?s
?viznode rdf:predicate ?p
?viznode rdf:object ?literal
?viznode nex:pnode ?displaynode
?viznode nex:lnode ?displaynode

No, RDF reification is not pretty, I know, I'm not a fan of the syntax, but it does allow you to make statements about other statements, and in my case, be used to make indirect statements about specific predicates and literals without having to modify any of the ontologies or resorting to use named graphs (not that this method is bad, I just haven't thought about it yet much).  So, at this point, we have three kinds of display nodes - rnodes (for resources), pnodes (for predicates), and lnodes (for literals).  These three types are actually all the same, but by assigning them different names it is easier to distinguish them from each other when querying the RDF.  This could have been done with a rdf:type statment but this was a bit more compact.  I may or may not change it later.  The W3C RDF working group had a recent discussion of whether RDF reification should be deprecated (see here).  I think the functionality of reification is needed, I just think it's syntax and design need to be re-worked.  For now, it is enabling me to do my arbitrary 3D visualizations.

OpenSimulator Object References
Rather than relying upon OpenSimulator's inventory mechanism and object ID system, objects are stored as RDF and assigned dereferencable RDF URI's which allow the objects to be accessed from remote OpenSimulator regions via the Nexus server code/triple store.  This will allow multiple regions (even if on different grids) to access concurrently the same RDF visualization.  The same RDF URI method could be used as a universal reference to refer to OpenSimulator users and groups (as well as the objects), RDF data interchange between OpenSimulator regions could also be quite handy, but that's another project for another day... :-)  For now, we'll see how well it works within Nexus.

Laying out the RDF Graph
Nexus implements a basic force-directed layout algorithm where the force of the nodes are modeled with Coulomb's Law and the predicates are modeled as springs with Hooke's Law.  When applying the force-layout to the loaded RDF graph (and this can be any RDF graph), the Nexus triples are ignored.  Later down the road, I would like to experiment with various modifications of the force-directed method and/or different methods all together.  I still have a bit of work to do on Nexus force-directed layout engine so that the results are more usable.

Sending the back-end RDF model to the front-end for visualization
The purpose of the front-end is to render the visualization nodes.  The RDF is pulled from the back-end using http and is sent purely as RDF N-Triples.  In an earlier version of Nexus, this was mostly RDF, now it is purely RDF.  When commands are needed to instruct the front-end to do things, the commands are sent as RDF triples.  For example, if I want the front-end to redraw the model, the back-end sends a triple about the session to the front-end as follows:

<> <nex:redraw> "true"  (an example)

Turning RDF into DNANexus - Semantic DNA
Back when I attended CSHALS 2010, I had started to write a pdb ontology to express PDB as RDF but shelved it to work on the core of Nexus.  No one else had an RDF representation of PDB that I could find.  Periodically, I checked, and finally during the summer of 2010 I discovered that the Michel Dumontier Lab had written a conversion for pdb and made the conversion program available (pdb2rdf).  And there was rejoicing in the streets!  I had a program now that could do the pdb (protein databank format) to RDF conversion.  The pdb file converted resulted in 16,473 triples.  It doesn't look like pdb2rdf transfers the bonding/connect information in the pdb files yet, so I'm limited to space-filled at the moment.  When the bond information gets added to the RDF conversion, I will be able to do ball and stick views as well.

Now, in order to turn the force directed graph into a visualization of DNA as seen in the first figure to that of the second figure, we would issue the following SPARQL 1.1 commands:

Step #1 - Set all display nodes visible property to false.  The nex:visible predicate tells the server whether to include that visualization node in the final display or whether to even consider it in the layout routines.

modify delete {?s <nex:visible> ?o} insert {?s <nex:visible> "0"} where {?s <nex:visible> ?o}

Step #2 - Set all display nodes visible property attached to atom nodes to true.  We use the predicate "pdb:hasSpatialLocation" to select atoms nodes since the atom nodes are the only nodes that have a spatial location.

modify delete {?rnode <nex:visible> ?o} insert {?rnode <nex:visible> "1"} where {?atom <nex:rnode> ?rnode . ?atom pdb:hasSpatialLocation ?loc . ?rnode <nex:visible> ?o}

Step #3 - We now change the coordinates of the force-directed determined atoms (their visualization nodes) to the crystal-determined XYZ location by reconstructing a vector from the XYZ triples.

modify delete {?rnode <nex:xyz> ?o} insert {?rnode <nex:xyz> ?xyz} where {?atom <nex:rnode> ?rnode . ?rnode <nex:xyz> ?o . ?atom pdb:hasSpatialLocation ?loc . ?loc pdb:hasXCoordinate ?xc . ?loc pdb:hasYCoordinate ?yc . ?loc pdb:hasZCoordinate ?zc. ?xc pdb:hasValue ?x . ?yc pdb:hasValue ?y . ?zc pdb:hasValue ?z . let (?xyz := fn:concat(?x,",",?y,",",?z)) }

Step #4 - The following series of commands sets the nodesize (radius) of the atom visualization nodes to the values that represent the actual atomic radii of the various types of atom present in the structure.  If this data was entered into the system as RDF triples, these six commands could be reduced to one.

modify delete {?rnode <nex:nodesize> ?o} insert {?rnode <nex:nodesize> "1.0"} where {?atom <nex:rnode> ?rnode . ?atom rdf:type pdb:HydrogenAtom . ?rnode <nex:nodesize> ?o}

modify delete {?rnode <nex:nodesize> ?o} insert {?rnode <nex:nodesize> "2.8"} where {?atom <nex:rnode> ?rnode . ?atom rdf:type pdb:CarbonAtom . ?rnode <nex:nodesize> ?o}

modify delete {?rnode <nex:nodesize> ?o} insert {?rnode <nex:nodesize> "2.6"} where {?atom <nex:rnode> ?rnode . ?atom rdf:type pdb:NitrogenAtom . ?rnode <nex:nodesize> ?o}

modify delete {?rnode <nex:nodesize> ?o} insert {?rnode <nex:nodesize> "3.4"} where {?atom <nex:rnode> ?rnode . ?atom rdf:type pdb:PhosphorusAtom . ?rnode <nex:nodesize> ?o}

modify delete {?rnode <nex:nodesize> ?o} insert {?rnode <nex:nodesize> "2.4"} where {?atom <nex:rnode> ?rnode . ?atom rdf:type pdb:OxygenAtom . ?rnode <nex:nodesize> ?o}

modify delete {?rnode <nex:nodesize> ?o} insert {?rnode <nex:nodesize> "3.2"} where {?atom <nex:rnode> ?rnode . ?atom rdf:type pdb:SufurousAtom . ?rnode <nex:nodesize> ?o}

Step #5 - Now for a little flair, we set the shininess of the atom visualization nodes to a glossy metallic value, again, using the "hasSpatialLocation" predicate to pick out the atom nodes.

insert {?rnode <nex:shiny> "3"} where {?atom <nex:rnode> ?rnode . ?atom pdb:hasSpatialLocation ?loc}

Step #6 - We now color all atom visualization nodes to blue

insert {?rnode <nex:color> "0,0,1"} where {?atom <nex:rnode> ?rnode . ?atom pdb:hasSpatialLocation ?loc}

Step #7 - The next five commands color the backbone of the DNA green by selecting atom nodes with a name in the form *' and *'', the backbone atom labels are traditionally labeled with apostrophe and double apostrophe. The last three commands handle the phosphates.

modify delete {?rnode <nex:color> ?o} insert {?rnode <nex:color> "0,1,0"} where {?atom <nex:rnode> ?rnode . ?atom pdb:hasSpatialLocation ?loc . ?atom rdfs:label ?name . ?rnode <nex:color> ?o . filter regex (?name, "''")}

modify delete {?rnode <nex:color> ?o} insert {?rnode <nex:color> "0,1,0"} where {?atom <nex:rnode> ?rnode . ?atom pdb:hasSpatialLocation ?loc . ?atom rdfs:label ?name . ?rnode <nex:color> ?o . filter regex (?name, "'")}

modify delete {?rnode <nex:color> ?o} insert {?rnode <nex:color> "0,1,0"} where {?atom <nex:rnode> ?rnode . ?atom pdb:hasSpatialLocation ?loc . ?atom rdfs:label ?name . ?rnode <nex:color> ?o . filter regex (?name, "P")}

modify delete {?rnode <nex:color> ?o} insert {?rnode <nex:color> "0,1,0"} where {?atom <nex:rnode> ?rnode . ?atom pdb:hasSpatialLocation ?loc . ?atom rdfs:label ?name . ?rnode <nex:color> ?o . filter regex (?name, "OP1")}

modify delete {?rnode <nex:color> ?o} insert {?rnode <nex:color> "0,1,0"} where {?atom <nex:rnode> ?rnode . ?atom pdb:hasSpatialLocation ?loc . ?atom rdfs:label ?name . ?rnode <nex:color> ?o . filter regex (?name, "OP2")}

Step #8 - Lastly, we set the phosphor atom display nodes to glow by inserting nex:glow statements attached to the corresponding phosphor atom display nodes.

insert {?rnode <nex:glow> "0.2"} where {?atom <nex:rnode> ?rnode . ?atom pdb:hasSpatialLocation ?loc . ?atom rdfs:label ?name . filter (?name="P")}

The resulting 3D RDF graph now looks like a DNA model that I had done with my prior Monolith project which dealt exclusively with non-RDF PDB-formatted data.  The DNA structure can be colored and effects set in many different ways by using the powerful new SPARQL 1.1 query language using any of the data present in the loaded RDF graph, not just what is displayed.  We can even access remote SPARQL end points and include their data as well.  Since Nexus handles any RDF, we are not limited to just molecular visualization.  We can branch off into other linked data by using the PubMed ID triple present into the RDF-converted PDB file and link over to PubMed publications data or anywhere else in the LOD (Linked Open Data) cloud.  For those of you thinking "these commands are not easy nor obvious" (perhaps to the SemWeb junkies) you would be correct.  I'm exploring ways in which the commands can be executed visually via the 3D front-end interface, but, I needed a flexible foundation on which to build and the SPARQL-driven engine seemed the best way to achieve this.  As it is, several of the above commands could be re-written to be a bit more compact and fewer, but, I am learning about this stuff myself as I go along.  I'm getting better. ;-)

Next Steps
I've been focused on doing the semantic web/molecular visualization cross-over and now that I've hit that milestone, there is some front-end and back-end work that still needs to be done.  The data is there, but I am not currently displaying any of it in the actual visualization (RDF labels and such).  I would also like to enable  a user to interact with the model graphically.  Interaction now is limited to command-line SPARQL commands only.  I had tossed out the half-SPARQL, half my own concoction commands in favor of the SPARQL.

This year, I did a poster presentation of Nexus combined with work that I have done with my colleagues at Stony Brook University (Dr. Janos Hajagos and Tammy DiPrima) for CSHALS 2011 (see poster here).  In the poster, I mentioned a couple of other things I am working on.  One of them is another Nexus front-end client based on WebGL/HTML5.   I had started this last year, but shelved it while I redesigned the Nexus back-end (server) to be all-RDF that it is now.  Now that the server is working again I will get back to the WebGL/HTML5 client.  As part of that project, I wanted to experiment with using WebSockets rather that http calls between the WebGL client and Nexus.  I will also update the original Nexus client which I did in Second Life, but, it will not be able to render as large of displays as I can in OpenSimulator since Linden Labs limits region objects to 15,000 primitives.  The DNA force-directed model seen here is 26,713 primitives, nearly twice what the Second Life regions allow.  But, I have provisions to allow a limited client to see a smaller window of a larger model.  All three clients will use the same back-end server and will be able to view any of the server models at the same time.  For example, 30 avatars in an OpenSimulator region will be able to work with 30 avatars in a Second Life region along with 30 different WebGL/HTML5 clients at the same time and see changes done from any of the clients live.  RDF breaks down the walled gardens between worlds.