Nexus Project: WebGL Client/Server Communications Test using RDF over WebSockets

 A video from a July 25, 2011 showing the first successful test run of my Nexus Project's WebGL client/server communications using HTML5 WebSockets rather than http polling.  This visualization shows Friend of a Friend (FOAF) RDF graph data being displayed in 3D as it's layout is being determined by a 3D force-directed layout algorithm.  I got tired of digging up the video on my iPhone to show people so I decided to post it.  There have been many things done since this video (latest browser support, Jetty 8, GLGE 0.9, speed improvements, and better screen capture than my iPhone too ;-)  I have been considering a different RDF serialization rather than N-TRIPLES since N-TRIPLES since it is hopelessly uncompressed, but it made for the easiest implementation since N-TRIPLE parsers are easy to write in javascript.  Jena also supports N-TRIPLES serialization so nothing had to be done on the server end of things.  I was just at the ISWC 2012 in Boston and it was suggested to me to use Turtle RDF (I was also considering JSON-LD or even the binary RDF format), but honestly, the speed of N-TRIPLES is sufficient for now and I would rather work towards a first release of the software.  It's too aluring to endlessly tinker (and I love to tinker by the way).


RDF Triples over HTML5 WebSockets

From the beginning, I wanted Nexus to be a collaborative visualization system allowing multiple clients in multiple locations to see the same visualizations in real-time.  The issue that arises here is knowing "where" in the 3D semantic web visualization the other clients (people/avatars) are and what direction they are looking at.  In the 3D digital world, you have the concept of a "camera".  This is essentially your point-of-view in a particular 3D simulation.  As the camera moves, your view of the model changes as well.  In order to know where the other clients are in the simulation, the camera position and rotation data on all clients are converted to RDF triples and then sent to the Nexus server to be resent and synchronized to all other clients.  Nexus eats, breathes, and internalizes everything as RDF.  HTTP polling would not work well as a transport for these triples, especially with a dozen or more clients all trying to sychronize with each other.  The solution is sending the RDF N-Triples using the HTML5 WebSocket protocol. 

What are WebSockets?  The WebSocket protocol is a bi-directional, full-duplex communications protocol that is part of the HTML5 specification.  WebSockets allow my WebGL clients to talk back and forth with the Nexus server without resorting to http polling.  I will be adding WebSockets to my OpenSimulator client as well.

I've embedded Jetty in Nexus so Apache Tomcat is no longer necessary to run Nexus which simplifies the deployment of the Nexus server software.  Jetty also has a nice clean HTML5 WebSockets implementation and allows me to do both http and WebSockets on the same ip and port.  Nexus client/server communications are all just streams of RDF triples going in both directions using the HTML5 WebSockets protocol.

Here is my poster for my 2011 Gordon Conference on Visualization in Science and Education that I did a couple weeks ago where I presented the progress so far on Nexus.



Nexus WebGL 3D RDF client in Technicolor

It took less time than I thought it would, but here is an updated version of the 3D FOAF graph from my last posting with node sizes determined by the log base 10 of the number of links into a particular node.  The coulombs law for the larger nodes is adjusted so that larger nodes "push" out harder to accomodate the larger spheres preventing sphere clashes.  This images was taken with the WebGL running in Chrome.

Next on the agenda for additional functionality is the actual display of text labels over subjects, predicates, and objects.  Also to be added is WebGL camera and avatar positioning data.  What's this?  In the Opensimulator client, dozens of people can view and interact with the same RDF model/structure.  Where one of those people are looking or focusing their attention is indicated by their 3D cursor or avatar.  However, this leaves the WebGL client users in the dark as to what the OpenSimulator users and/or other WebGL clients are doing in the simulation.  I am planning to synchronize this information between all of the clients by streaming the avatar (or camera position data in the case of WebGL) back to the Nexus server where it will be pushed out to all clients in the form of more RDF triples.

The SPARQL commands for the colors and such for this image are as follows:

1) Make everything blue
insert {?rnode <nex:color> "0,0,1"} where {?node <nex:rnode> ?rnode}
insert {?pnode <nex:color> "0,0,1"} where {?node <nex:pnode> ?pnode}

2) Color white all literals
insert {?lnode <nex:color> "1,1,1"} where {?node <nex:lnode> ?lnode}

3) Color red all triples that are of foaf:knows
modify delete {?rnode <nex:color> "0,0,1"} insert {?rnode <nex:color> "1,0,0"}  where {?node <nex:rnode> ?rnode . ?node foaf:knows ?o }
modify delete {?pnode <nex:color> "0,0,1"} insert {?pnode <nex:color> "1,0,0"}  where {?node <nex:pnode> ?pnode . ?node rdf:predicate foaf:knows }

4) color green all triples of type rdf:type
modify delete {?rnode <nex:color> "0,0,1"} insert {?rnode <nex:color> "0,1,0"}  where {?node <nex:rnode> ?rnode . ?node rdf:type ?o }
modify delete {?pnode <nex:color> "0,0,1"} insert {?pnode <nex:color> "0,1,0"}  where {?node <nex:pnode> ?pnode . ?node rdf:predicate rdf:type }

5) Make everything shiny
insert {?rnode <nex:shiny> "3"} where {?node <nex:rnode> ?rnode}
insert {?pnode <nex:shiny> "3"} where {?node <nex:pnode> ?pnode}
insert {?lnode <nex:shiny> "3"} where {?node <nex:lnode> ?lnode}

Yes, I am planning on coming up with a far easier interface for the user other than SPARQL. :-)


3D RDF FOAF in WebGL-HTML5 linked to OpenSimulator

The adjacent image is of Tim Berners-Lee's FOAF file imaged with a new HTML5 / WebGL client I am developing for my Nexus RDF visualization server. WebGL allows for sophisticated 3D graphics within a web browser with no plug-in required.  The visualization is in 3D with a layout determined by a force-directed algorithm driven by the Nexus server.  The below color image is also Tim Berners-Lee's FOAF file, imaged in the same fashion, but from within an OpenSimulator region.  The twist is that both images are created off of the same server session.  In other words, the session is occuring concurrently in the HTML5/WebGL client and the OpenSimulator region allowing multiple users in the OpenSimulator region to collaborate in real-time with multiple HTML5 / WebGL clients.

In the intial testing/debugging of the HTML5 / WebGL client, I was able to get 14-16 frames per second using FireFox 5 (beta).  Greater frame rates were achievable in testing with Chrome.

To speed the development of the HTML5 /WebGL client, I made use of Paul Brunt's GLGE WebGL library which is an amazing piece of work in itself.  Currently, N-triples over HTTP is used to communicate between the clients and the server, but WebSockets is being explored.

The OpenSimulator client avoids the use of the standard OpenSim object inventory for object handling by using an RDF store with dereferenceable URIs.

Hopefully, in the next couple of weeks I will have color and variable nodesizes debugged.


SPARQL 1.1 Controlled 3D RDF Visualization - from a Force-Directed Layout to a Molecular Visualization of DNA using Nexus in OpenSimulator

Nexus is an experiment with Semantic Web RDF data visualized in three dimensiodians that can be done collaboratively amongst many people (and concurrently) at disparate locations.  Nexus also acts as a platform to try out various design ideas, technologies, and methodologies.  The original Nexus design read and displayed RDF data and could also export it.  I have reworked the back-end of Nexus to use RDF internally and to communicate with its front-end client(s) in pure N-Triples.  The internal RDF representation enables the use of SPARQL (the query language for RDF) via Jena ARQ to manipulate the RDF graph and thus the over-all visualization.  In this posting, I will show the SPARQL 1.1 commands used to manipulate the structural data of a strand of DNA that has been converted to RDF from the original PDB format.  The resulting display will be shown as a force-directed layout and then manipulated into a physical RDF layout determined by crystal structure coordinates contained within the RDF.  Essentially, this will allow for molecular visualization within Nexus allowing us to actually see the strand of DNA in a physical form.

Basic Visualization Design Concepts in Nexus
The basic unit of information we want to visualize is the RDF triple:

Subject - Predicate - Object

In keeping with the "pure RDF" concept, this triple would be annotated with a RDF triples using a display ontology designed for Nexus, it's prefix being "nex".  Statements like nex:color, nex:xyz, nex:glow, nex:nodesize could be made about any resource whether it be subject or object.  For each resource, a "display node" triple is introduced and attached to the original rdf resource. RDF nex statements would then be made about that display node.  For example:

?s ?p ?o
?s nex:rnode ?displaynode
?displaynode nex:color "1,0,0" (red)
?displaynode nex:xyz "2.34,7.34,1.23"e
?displaynode rdf:type nex:sphere
?displaynode nex:radius "3.4"
    and so on.....

Adding this "display node" layer added a large degree of flexibility for RDF displays.  At one point, the display nodes were represented as blank nodes, but in the current version of Nexus, I converted these to resources.  It was just easier to work with in this way.

Fun with RDF Reification
Visualization nodes cannot be attached directly to predicates and literals because RDF statements cannot be made about predicates or literals.  You can only make RDF statements about resources.  However, you can make statements about statements through a process known as RDF reification.  The triples for a single reified statement for nexus would look as follows:

?s ?p ?literal  (statement to be visualized)

The following RDF statements attach a display node to the predicate (?p) and literal (?literal)

?viznode rdf:type rdf:Statement
?viznode rdf:subject ?s
?viznode rdf:predicate ?p
?viznode rdf:object ?literal
?viznode nex:pnode ?displaynode
?viznode nex:lnode ?displaynode

No, RDF reification is not pretty, I know, I'm not a fan of the syntax, but it does allow you to make statements about other statements, and in my case, be used to make indirect statements about specific predicates and literals without having to modify any of the ontologies or resorting to use named graphs (not that this method is bad, I just haven't thought about it yet much).  So, at this point, we have three kinds of display nodes - rnodes (for resources), pnodes (for predicates), and lnodes (for literals).  These three types are actually all the same, but by assigning them different names it is easier to distinguish them from each other when querying the RDF.  This could have been done with a rdf:type statment but this was a bit more compact.  I may or may not change it later.  The W3C RDF working group had a recent discussion of whether RDF reification should be deprecated (see here).  I think the functionality of reification is needed, I just think it's syntax and design need to be re-worked.  For now, it is enabling me to do my arbitrary 3D visualizations.

OpenSimulator Object References
Rather than relying upon OpenSimulator's inventory mechanism and object ID system, objects are stored as RDF and assigned dereferencable RDF URI's which allow the objects to be accessed from remote OpenSimulator regions via the Nexus server code/triple store.  This will allow multiple regions (even if on different grids) to access concurrently the same RDF visualization.  The same RDF URI method could be used as a universal reference to refer to OpenSimulator users and groups (as well as the objects), RDF data interchange between OpenSimulator regions could also be quite handy, but that's another project for another day... :-)  For now, we'll see how well it works within Nexus.

Laying out the RDF Graph
Nexus implements a basic force-directed layout algorithm where the force of the nodes are modeled with Coulomb's Law and the predicates are modeled as springs with Hooke's Law.  When applying the force-layout to the loaded RDF graph (and this can be any RDF graph), the Nexus triples are ignored.  Later down the road, I would like to experiment with various modifications of the force-directed method and/or different methods all together.  I still have a bit of work to do on Nexus force-directed layout engine so that the results are more usable.

Sending the back-end RDF model to the front-end for visualization
The purpose of the front-end is to render the visualization nodes.  The RDF is pulled from the back-end using http and is sent purely as RDF N-Triples.  In an earlier version of Nexus, this was mostly RDF, now it is purely RDF.  When commands are needed to instruct the front-end to do things, the commands are sent as RDF triples.  For example, if I want the front-end to redraw the model, the back-end sends a triple about the session to the front-end as follows:

<> <nex:redraw> "true"  (an example)

Turning RDF into DNANexus - Semantic DNA
Back when I attended CSHALS 2010, I had started to write a pdb ontology to express PDB as RDF but shelved it to work on the core of Nexus.  No one else had an RDF representation of PDB that I could find.  Periodically, I checked, and finally during the summer of 2010 I discovered that the Michel Dumontier Lab had written a conversion for pdb and made the conversion program available (pdb2rdf).  And there was rejoicing in the streets!  I had a program now that could do the pdb (protein databank format) to RDF conversion.  The pdb file converted resulted in 16,473 triples.  It doesn't look like pdb2rdf transfers the bonding/connect information in the pdb files yet, so I'm limited to space-filled at the moment.  When the bond information gets added to the RDF conversion, I will be able to do ball and stick views as well.

Now, in order to turn the force directed graph into a visualization of DNA as seen in the first figure to that of the second figure, we would issue the following SPARQL 1.1 commands:

Step #1 - Set all display nodes visible property to false.  The nex:visible predicate tells the server whether to include that visualization node in the final display or whether to even consider it in the layout routines.

modify delete {?s <nex:visible> ?o} insert {?s <nex:visible> "0"} where {?s <nex:visible> ?o}

Step #2 - Set all display nodes visible property attached to atom nodes to true.  We use the predicate "pdb:hasSpatialLocation" to select atoms nodes since the atom nodes are the only nodes that have a spatial location.

modify delete {?rnode <nex:visible> ?o} insert {?rnode <nex:visible> "1"} where {?atom <nex:rnode> ?rnode . ?atom pdb:hasSpatialLocation ?loc . ?rnode <nex:visible> ?o}

Step #3 - We now change the coordinates of the force-directed determined atoms (their visualization nodes) to the crystal-determined XYZ location by reconstructing a vector from the XYZ triples.

modify delete {?rnode <nex:xyz> ?o} insert {?rnode <nex:xyz> ?xyz} where {?atom <nex:rnode> ?rnode . ?rnode <nex:xyz> ?o . ?atom pdb:hasSpatialLocation ?loc . ?loc pdb:hasXCoordinate ?xc . ?loc pdb:hasYCoordinate ?yc . ?loc pdb:hasZCoordinate ?zc. ?xc pdb:hasValue ?x . ?yc pdb:hasValue ?y . ?zc pdb:hasValue ?z . let (?xyz := fn:concat(?x,",",?y,",",?z)) }

Step #4 - The following series of commands sets the nodesize (radius) of the atom visualization nodes to the values that represent the actual atomic radii of the various types of atom present in the structure.  If this data was entered into the system as RDF triples, these six commands could be reduced to one.

modify delete {?rnode <nex:nodesize> ?o} insert {?rnode <nex:nodesize> "1.0"} where {?atom <nex:rnode> ?rnode . ?atom rdf:type pdb:HydrogenAtom . ?rnode <nex:nodesize> ?o}

modify delete {?rnode <nex:nodesize> ?o} insert {?rnode <nex:nodesize> "2.8"} where {?atom <nex:rnode> ?rnode . ?atom rdf:type pdb:CarbonAtom . ?rnode <nex:nodesize> ?o}

modify delete {?rnode <nex:nodesize> ?o} insert {?rnode <nex:nodesize> "2.6"} where {?atom <nex:rnode> ?rnode . ?atom rdf:type pdb:NitrogenAtom . ?rnode <nex:nodesize> ?o}

modify delete {?rnode <nex:nodesize> ?o} insert {?rnode <nex:nodesize> "3.4"} where {?atom <nex:rnode> ?rnode . ?atom rdf:type pdb:PhosphorusAtom . ?rnode <nex:nodesize> ?o}

modify delete {?rnode <nex:nodesize> ?o} insert {?rnode <nex:nodesize> "2.4"} where {?atom <nex:rnode> ?rnode . ?atom rdf:type pdb:OxygenAtom . ?rnode <nex:nodesize> ?o}

modify delete {?rnode <nex:nodesize> ?o} insert {?rnode <nex:nodesize> "3.2"} where {?atom <nex:rnode> ?rnode . ?atom rdf:type pdb:SufurousAtom . ?rnode <nex:nodesize> ?o}

Step #5 - Now for a little flair, we set the shininess of the atom visualization nodes to a glossy metallic value, again, using the "hasSpatialLocation" predicate to pick out the atom nodes.

insert {?rnode <nex:shiny> "3"} where {?atom <nex:rnode> ?rnode . ?atom pdb:hasSpatialLocation ?loc}

Step #6 - We now color all atom visualization nodes to blue

insert {?rnode <nex:color> "0,0,1"} where {?atom <nex:rnode> ?rnode . ?atom pdb:hasSpatialLocation ?loc}

Step #7 - The next five commands color the backbone of the DNA green by selecting atom nodes with a name in the form *' and *'', the backbone atom labels are traditionally labeled with apostrophe and double apostrophe. The last three commands handle the phosphates.

modify delete {?rnode <nex:color> ?o} insert {?rnode <nex:color> "0,1,0"} where {?atom <nex:rnode> ?rnode . ?atom pdb:hasSpatialLocation ?loc . ?atom rdfs:label ?name . ?rnode <nex:color> ?o . filter regex (?name, "''")}

modify delete {?rnode <nex:color> ?o} insert {?rnode <nex:color> "0,1,0"} where {?atom <nex:rnode> ?rnode . ?atom pdb:hasSpatialLocation ?loc . ?atom rdfs:label ?name . ?rnode <nex:color> ?o . filter regex (?name, "'")}

modify delete {?rnode <nex:color> ?o} insert {?rnode <nex:color> "0,1,0"} where {?atom <nex:rnode> ?rnode . ?atom pdb:hasSpatialLocation ?loc . ?atom rdfs:label ?name . ?rnode <nex:color> ?o . filter regex (?name, "P")}

modify delete {?rnode <nex:color> ?o} insert {?rnode <nex:color> "0,1,0"} where {?atom <nex:rnode> ?rnode . ?atom pdb:hasSpatialLocation ?loc . ?atom rdfs:label ?name . ?rnode <nex:color> ?o . filter regex (?name, "OP1")}

modify delete {?rnode <nex:color> ?o} insert {?rnode <nex:color> "0,1,0"} where {?atom <nex:rnode> ?rnode . ?atom pdb:hasSpatialLocation ?loc . ?atom rdfs:label ?name . ?rnode <nex:color> ?o . filter regex (?name, "OP2")}

Step #8 - Lastly, we set the phosphor atom display nodes to glow by inserting nex:glow statements attached to the corresponding phosphor atom display nodes.

insert {?rnode <nex:glow> "0.2"} where {?atom <nex:rnode> ?rnode . ?atom pdb:hasSpatialLocation ?loc . ?atom rdfs:label ?name . filter (?name="P")}

The resulting 3D RDF graph now looks like a DNA model that I had done with my prior Monolith project which dealt exclusively with non-RDF PDB-formatted data.  The DNA structure can be colored and effects set in many different ways by using the powerful new SPARQL 1.1 query language using any of the data present in the loaded RDF graph, not just what is displayed.  We can even access remote SPARQL end points and include their data as well.  Since Nexus handles any RDF, we are not limited to just molecular visualization.  We can branch off into other linked data by using the PubMed ID triple present into the RDF-converted PDB file and link over to PubMed publications data or anywhere else in the LOD (Linked Open Data) cloud.  For those of you thinking "these commands are not easy nor obvious" (perhaps to the SemWeb junkies) you would be correct.  I'm exploring ways in which the commands can be executed visually via the 3D front-end interface, but, I needed a flexible foundation on which to build and the SPARQL-driven engine seemed the best way to achieve this.  As it is, several of the above commands could be re-written to be a bit more compact and fewer, but, I am learning about this stuff myself as I go along.  I'm getting better. ;-)

Next Steps
I've been focused on doing the semantic web/molecular visualization cross-over and now that I've hit that milestone, there is some front-end and back-end work that still needs to be done.  The data is there, but I am not currently displaying any of it in the actual visualization (RDF labels and such).  I would also like to enable  a user to interact with the model graphically.  Interaction now is limited to command-line SPARQL commands only.  I had tossed out the half-SPARQL, half my own concoction commands in favor of the SPARQL.

This year, I did a poster presentation of Nexus combined with work that I have done with my colleagues at Stony Brook University (Dr. Janos Hajagos and Tammy DiPrima) for CSHALS 2011 (see poster here).  In the poster, I mentioned a couple of other things I am working on.  One of them is another Nexus front-end client based on WebGL/HTML5.   I had started this last year, but shelved it while I redesigned the Nexus back-end (server) to be all-RDF that it is now.  Now that the server is working again I will get back to the WebGL/HTML5 client.  As part of that project, I wanted to experiment with using WebSockets rather that http calls between the WebGL client and Nexus.  I will also update the original Nexus client which I did in Second Life, but, it will not be able to render as large of displays as I can in OpenSimulator since Linden Labs limits region objects to 15,000 primitives.  The DNA force-directed model seen here is 26,713 primitives, nearly twice what the Second Life regions allow.  But, I have provisions to allow a limited client to see a smaller window of a larger model.  All three clients will use the same back-end server and will be able to view any of the server models at the same time.  For example, 30 avatars in an OpenSimulator region will be able to work with 30 avatars in a Second Life region along with 30 different WebGL/HTML5 clients at the same time and see changes done from any of the clients live.  RDF breaks down the walled gardens between worlds.



The Magic behind Monolith - how it works

The other year, I developed a molecular visualization system inside of Second Life (see with a demonstration video of it in operation on YouTube for purposes of learning the scripting language LSL of Second Life and for the fact that I found the concept of a 3D collaborative visualization environment with IM and group voice incredibly intriguing.  Building Monolith seemed a good way to demonstrate the utility of the environment to myself and hopefully others.  There had been other projects for molecular visualization done in Second Life before Monolith (Hiro's molecule rezzer, ORAC, Peter Miller's Protein Rezzing Toolkit, and work by Troy McConaghy), so, what was I going to add to this arena?  In short, speed and flexibility.  Easier said then done, so here's how it works for those of you interested in Monolith and Second Life LSL scripting:

Bringing the data in-world
The source of Monolith's data is the Rutgers Protein Databank.  Rutgers provides a http interface for retrieving the various accession numbers which can be used to retrieve known structures of various proteins and DNA.  The problem here is that these files are larger than 2048 bytes.  Why is this a problem?  Because the command for accessing http data from within the Second Life environment is handled by the command llHTTPRequest.  Linden Lab (LL), limits singular http requests with this command to only the first 2048 bytes no matter how long the document is, so, how do you retrieve documents that are larger?  I got around this by developing a java servlet back-end which is what the in-world Monolith front-end talks to.  When the in-world user sends the command to Monolith for a particular accession number, the request is actually sent to the java servlet back-end which then turns around and downloads the entire file from Rutgers.  The java servlet then spoon-feeds the data to Monolith's front-end which resides in Second Life (aka in-world).  Ok, another problem here, LL also throttles llHTTPRequest to 1 call per second and no more than 25 calls in any 20 second period with a burst of up to 25 calls in a one second time-period per in-world object.  UGH!  That means a single object in-world can only bring in 50k/per 20 seconds or 2.5K/sec.  I worked around this problem by using multiple http objects which are rezzed by the primary object which can go as high as needed, so for 50 http objects gives me 125k/sec which is a whole lot more workable than 2.5K/sec.  Later, I added this "multiple http object streaming" method to my atom nodes themselves eliminating the need for separate http objects.  The process to this point now is that one command pulls all of the data to the java back-end fromn Rutgers where it is "chunked-up" a sent in-world to feed multiple requesting objects which then re-assemble it all so to speak.

On rezzing objects rapidly in-world aka "en-masse"
In order to understand how Monolith works, you need to know the basic architecture of it.  Atoms are represented with individual scripted spherical primitives (prims).  So for a decent sized protein, 3000 atom primitives are used and thus there are 3000 concurrently running scripts.  Monolith is a parallel processing machine.  The next problem is getting 3000 scripted prims into existence.  Enter the LSL function llRezObject.  This function rezzes a singular object, but, there is a 0.1 second sleep delay that is forced when it is called to prevent massive numbers of objects being rezzed which would cause griefing denial of service issues within the Second Life region simulator.  I understand why LL does it, but it does not help people who wish to rez larger number of objects for legitimate purposes.  My first choice for solving this was to use multiple rezzers.   Each rezzer creates objects one at a time and even with the 0.1 second sleep delay, the aggregate rezzing of multiple rezzers would solve the problem.  The first time I tested this algorithm it failed.  I was greeted with numerous "gray goo wall" errors.  WTH, another barrier!  I thought about rezzing complex linked objects since llRezObject rezzes objects not just singular prims.  The problem with this approach is that linking and delinking permissions would be requested by Monolith when it would want to do them.  I found this approach annoying and confusing for the user.  So what to do?  The solution I used was rezzing complex UNLINKED objects.  What's that?  You can select multiple prims that are not linked (or combination of linked and unlinked) and pull them into the inventory as a complex singular non-linked object.  One llRezObject call can rez as many unlinked scripted prims as needed thus avoiding the gray goo wall choke.

On multiple scripted object coordination
The next problem that needed to be solved was the one that using thousands of scripted objects created.  How to coordinate thousands of objects that all use the SAME script (actually thousands of copies of the same script)?  Data is brought in-world over as many as 50 different concurrent http calls into separate objects.  How do we send the data from these objects to the different "smart atoms" to let them know if they are an oxygen, a nitrogen, a carbon, or a hydrogen?  How do we tell the different atoms where they are supposed to be?  Should they be blue? Red? Although all of the atoms run the same script (well copy of the same script), they need their own identity to differentiate themseleves from other atoms.  Some way of being uniquely addressed.  Every object in Second Life gets it's own unique UUID number that could be used for this.  The problem with this is how does the back-end java server know what the the UUIDs are?  One method would be to use llHTTPRequest to send that data out and have each smart atom report it's name to the back-end engine. The problem here is that 3000 smart atoms would send 3000 http calls to the back-end.  I had concerns about scalability and causing issues with the region simulator with that many http calls.  Now it would be simpler if the 3000 atoms could just be named 1, 2, 3, 4, .... 3000.  Then, I would have a way to uniquely address them that would apriori be known to the back-end without having tp send that data.  Two problems now on this, how to get the 1->3000 naming scheme and how to get them to talk to each other (the smart atoms).  On the later, Monolith takes advantage of llListen.  llListen creates a listening function on a set communications channel.  There are about 4 billion potential channels to use.  More than enough.  Each "smart atom" in Monolith has it's own private communication channel, as well as, a global communications channel.  In this fashion, data can be sent to an individual atom and to all atoms at the same time from the primary Monolith object.  But, how do we get them named 1->3000?  One method would be to pre-generate 3000 atoms named 1-3000 (but the same script inside), that then those scripts could reference the name of the object to find out it's node id/name.  The 3000 atoms could be brought into Monolith's inventory as a singular composite object that could be rezzed with a single llRezObject call.  The problem with this is that not all molecules have 3000 atoms, some more, some less.  The maximum number of prims could be used (15,000) and then delete what is not needed.  Would work somewhat, but some regions have other things going on and 15,000 prims not always available, not to mention the lag in creating 15,000 scripted objects needlessly.  The comprimise used solution is this: create a block of 50 smart atoms, named atom1, atom2, atom3...atom50.  Bring the 50 atoms into Monolith's inventory as a singular scripted object and use multiple llRezObject calls to generate as many multiples as needed.  The maximum "waste" prims would be 49.  Acceptable.  But, this method would create multiple groups of 1-50.  So if 10 calls were used to create 500 prims, we would have 10 atom1's, 10, atom2's and so on.  How do we fix this?  llRezObject has a parameter on the end of the function llRezObject( string inventory, vector pos, vector vel, rotation rot, integer param ) called "param".   An integer placed here is passed to the rezz'd object or objects linked or not.  On the first llRezObject we pass a 0, then 1, then 2, and so on up to what is needed.  We then tell each atom that when it rezzes, to reference the part of it's name atom# (# being a # from 1-50) and then adding this number to the product of the the param value times the shard size, in this case 50 (param is the shard # value I pass to the llRezObject function) to determine it's name and identitiy.  We are then left with (1+0*50, 2+0*50, 3+0*50, 4+0*50, 5+0*50, 6+0*50....1+4*50, 2+4*50, 3+4*50) which then yields the desired 1-->3000 sequence but each atom runs the same script, but it's behavior will vary depending on the data sent to it by the http calling objects.  Each line of data brought into Monolith from Rutgers is just a compressed version of the pdb file format.  Each atom in the pdb file is numbered 1->n atoms.  This is used to steer the data when it gets in-world by sending, for example, atom 5's data to communication channel #5.  Atom 5 will get the data since it configures itself to listen on channel 5 because it's name is atom5.  Cute huh? :-)  So atom 5 can be told, you are a nitrogen, you are located at xyz independently.  When global commands like color red atom type nitrogen are sent out, they go over the global channel all atoms listen to.  Each atom, now knowing what it is, can say, "am I a nitrogen? yes?  I will color myself blue.  No? Ignore it".

On Atom Movement
llRezObject can only rez an object no more than 10m away from the calling script.  Since each atom has a script, it can move itself around.  My first reaction was to use llSetPos and move in 10m increments, but it was easier to use llWarpPos and move the atom in one motion.  In my Monolith demostration video, I enable "physics" on a strand of DNA thus collapsing it into a big pile of balls for effect.  Since each atom know it's original location, a single command can disable physics and reposition all atoms back into their original locations thus bringing the DNA back together again.  Useless for molecular visualization, but handy to show how things can be done and it makes my kids laugh.

On the Risks of these methods
LL has been talking about "script limits".  Whereas I do not know what it will ultimately mean, the danger could be if the number of concurrent scripts per region is limited per person.  This could toast any large-scale Monolith visualization or project using these methods.  Whereas I do understand the need in shared "public" regions, private region owners should be able to disable the chokes and caps up to the maximum a region simulator can use.  In other words, if I pay for the whole region, I should be able to use the resources the way I want.  Moore's law gives us more in terms of computation, networking, and such.  Things in this environment should not remain flat for a given flat $$$ amount.

Bringing Monolith to OpenSimulator
Bringing Monolith to OpenSimulator required yanking most of the above out.  The 0.1 second delay on llRezObject can be turned off.  There are no limits on llHTTPRequest.  The code just needed to be simplified and the shard value of 50 was just increased accordingly.  Otherwise, it functions the same.  The trick with non-linked composited objects does not work because it is not yet supported in OpenSimulator.  However, being able to disable sleep on llHTTPRequest in Opensimulator eliminates the need for it.

The Future of Monolith
I've halted development of Monolith in favor of Nexus some time ago.  Nexus swaps out pdb data for Semantic Web RDF data.  Instead of streaming pdb data in-world, rdf triples are streamed.  Nexus will have the ability to visualize far more than just molecules, but will be able to do what Monolith does in a near-future release of Nexus, but it will do it semantically.  It will also be able to access numerous RDF data sources and follow semantically linked data where it goes.  In this fashion, I can do two projects for the price of one and get more in the end.  The first public release of Nexus will be an OpenSimulator region module, followed by a concurrently developed WebGL front-end for WebGL capable browsers, followed by an LSL-version for when installing region modules is not an option.  - E


Nexus 3D RDF Visualization as an OpenSimulator Region Module Displaying Researcher Interest VIVO Data

The trouble with triples (not tribbles ;-), for me, is that there are alot of them. Last October, I ported my Nexus 3D RDF Visualizer into Opensimulator and was quite happy with the high availability of graphics primitives that I could use to display larger numbers of triples.  I started working with a set of 31,934 triples that represented the molecular structure of a strand of DNA but realized the programming model I was using wasn't going to scale as far as I wanted.

Nexus, to this point, existed as a massive number of coordinated scripted primitives that communicated to a back-end server that kept them all coordinated.  In my prior post, I was able to display 658 triples of FOAF data without any problems.  This 658 triples amounted to about 935 scripted graphics primitives (prims) and it worked and it was fast.  The front-end operated as a massive parallel processor but when the number of scripts (or in general terms, threads) increases without a corresponding increase in actual physical computation cores, the over-head of the separate threads becomes a liability.  In my case, I was looking at trying to run in excess of 40,000 scripted prims just in the case of the DNA RDF data set. *sighs heavily*  To solve this problem, I rewrote a new Nexus OpenSimulator front-end, not as a series of scripted prims, but as an actual Opensimulator region module.

Region modules are extensions of the core Opensimulator server software.  Opensimulator is written in C#, and being open-source, I was able to dig right in and excercise far greater control of Opensimulator operating at this level than when I was working with the easier, but more limited, scripted prims method.  The following images show a display of 20,002 triples (~25,000 prims - front&back with close ups) which represents about 534 individual SUNY Researchers and their research interests extracted from a VIVO installation that we are developing at Stony Brook (thank you Tammy DiPrima and Dr. Jizu Zhi who are part of that team).  The RDF data was then normalized by using the extracted MeSH terms from the researcher PubMed publications and then linked through a RDF representation of the UMLS (Unified Medical Language System) that was created by my collegue, Dr. Janos Hajagos. 

What is this normalization?  When I first tried visualizing our researcher interests from VIVO in Nexus, I found out that the research interests did not really link up and that I have 500+ little separate RDF graphs.  Why?  Because everyone had their own way of saying the same thing but slightly different.  We took the publication information we had for these researchers and linked it to a RDF version of PubMed that we developed at Stony Brook.  In this linkage, we extracted the MeSH terms (MeSH is part of the UMLS) and then linked and normalized them through the RDF UMLS.  Once this was done, things began to link up.  Multiple datasets linked are more interesting than a single data set. 

At the moment, the Nexus visualization of the data is more interesting to look at than useful.  Removal of the over-linked trivial data which obscures the more useful information needs to be done next.  Hover-text labeling of triples and data-sensitive clustering algorithms are also on the to-do (these features were present in the old front-end).  Fortunately, I did not have to change the back-end for the new front-end because it was my intentions from the beginning to have multiple front-ends connecting to the back-end(s).  Concurrently, I started last fall to develop an HTML5-based/WebGL front-end version of Nexus that would be able to see and share the same sessions as the OpenSimulator-based front-end with an extra twist of using WebSockets rather than http to pass RDF between the front-end and back-end.  Data persistence between sessions is handled thanks to OpenLink Virtuoso being tied to the back-end.  On a personal note, it's a lot of fun playing around with collaborative 3D Semantic Web visualization. :-)

Nexus Commands used:

color <0,1,0> p where {?s ?p ?o} color green all predicates (the sticks that represent the predicates)
color <1,0,0> s where {?s ?p ?o} color red all subjects
color <0,0,1> o where {?s ?p ?o} color blue all objects
color <1,1,1> o where {?s ?p ?o . filter(isliteral(?o))} color white all literals (this would over-write the blue of the last command on literals
shiny 3 spo where {?s ?p ?o} add metalic sheen for all triples


3D RDF FOAF Graphs in OpenSimulator

Here is an image (click on image for larger image) of a model I did earlier with Nexus in Second Life using Tim Berner-Lee and James Hendler's FOAF data linked and visualized in 3D within OpenSimulator with Nexus.  The only code change needed port it over to OpenSimulator from Second Life was the removal of the warppos function since it is no longer needed, however, I think I may have uncovered a small bug/limitation in OpenSimulator URL lengths.  I put a small work-around by shortening the URL's to the FOAF data to avoid the bug when loading from the remote http source, but I will have to go back and figure out what is actually going on and report it if need be to the OpenSimulator programmers.  The Nexus commands used were:

color <1,0,0> spo where { ?s ?p ?o . filter ( ?p=foaf:knows ) }
color <0,1,0> spo where { ?s ?p ?o . filter ( ?p=rdf:type ) }
color <1,1,1> o where { ?s ?p ?o . filter ( isLiteral(?o) ) }
glow 0.2 o where { ?s ?p ?o . filter ( isLiteral(?o) ) }

Since the last time I did this FOAF data, I changed the default shape for literals to cubes and made them smaller so as not to have the literals dominate the scene as much.  Also added were some glow effects to high-light elements of interest.


Monolith Molecular Visualization in OpenSimulator

OpenSimulator is a open-source version of Second Life's server software that emulates much of Second Life's functionality and in some ways, goes beyond it.  Earlier this year, I moth-balled my Monolith project in favor of my Nexus project - it's semantic replacement and then some.  But I decided to visit my old friend Monolith again and wanted to see how difficult it would be to port it to OpenSimulator.

It took a couple of hours to do, but as you can see, it works! :-)  Most of the time I spent on it was removing code that I wrote in Monolith to bypass many of Second Life's, Linden Lab-imposed restrictions, namely:

1) object creation limits 15,000 primitives per region
2) # of calls to llRezObject function (gray goo wall)
3) http call data cap (Linden Lab only allows 2048 bytes per http call)
4) http call rate (the # of http calls in any period per script is also capped)

The speed enhancements were no longer needed since none of the above is limited in OpenSimulator.  The limit on the number of primitives in a region appears to be only limited by the hardware on which the OpenSimulator is executed.  There were some limits in OpenSimulator that prevented a few other tricks I do from working, but I removed these by editing the OpenSimulator.ini file.  Namely, the object movement limit (10M max by default) and the # of llListeners were also limited in OpenSimulator, again, a simple edit to the INI file and they were gone.

My current installation of OpenSimulator is running on Windows XP within a virtual machine using VMware workstation 7.X.  The hardware specifications for the virtual machine are 4 cores and 4GB RAM with a 50GB disk.   The underlying machine is an 4Ghz over-clocked I7 with a RAID-10 disk system.

I will also be porting my Nexus project to OpenSimulator since I am interested in visualizing a huge number of RDF triples far more than the 15,000 primitive limit of Second Life will allow.  I'm looking forward to seeing how far I can push OpenSimulator.


Haylyn - Collaborative 3D Semantic Web Visualization and Analytics (Formerly Nexus)

Haylyn is an experimental collaborative 3D Semantic Web visualization tool being built with (WebGL/OpenSimulator) to test various ideas and design concepts in visualizations, software design, and algorithms.

Some key paradigms and principals are being following in Haylyn's design:
1) Must be collaborative - all visualizations must be sharable in real-time accross multiple clients regardless of location.
2) All-RDF - rather than use any custom formats or internal data representations, RDF is used throughout Haylyn's architecture.  Haylyn consumes, internalizes, and exports everythying as RDF - Client/server communications are in RDF, user and client sessions are in RDF, cursor postion and directional vectors are in RDF, even the visualizations themselves are in RDF which allows them to be tightly coupled with the original data itself.
3) Explore 3D - many graph layout programs use 2D layout but 3D is being explored in Haylyn including 4D (time-based)
4) If a best-practice dogma is encountered, I follow this quote:

“Do not go where the path may lead, go instead where there is no path and leave a trail.”
Ralph Waldo Emerson

Molecular visualization is achievable in Haylyn because of it's ontology-driven visualization model.  The added benefit in doing it this way is that other semantic data sources can be linked and referenced while searching for/or working with a particular structure(s).  In addition, since Haylyn is driven by a SPARQL query engine (Jena ARQ), molecular selection criteria become more flexible by allowing a SPARQL query to be used to pick which parts of a structure are acted upon for display or modification.  Haylyn is not limited to just molecular visualization but will be able to visualize various semantic datatypes (FOAF, DOAC, etc) from multiple data sources.


Subscribe to OpenSimulator