Monolith

Monolith used in photo of Proteopedia, winner of the 2010 TheScientist "Best Science Website" Labby Awards"

Proteopedia is the winner of the 2010 TheScientist "Best Science Website" Labby Awards.  The picture used is of the website Proteopedia (http://www.proteopedia.org) next to an 3D RNA model.  The picture was taken at the Stony Brook University's SOM Second Life region.  The structure was generated using Monolith http://www.ebremer.com/monolith

I hadn't known someone took a picture at the region and used it for the awards photo until Dr. Joel Sussman stopped by my office and told me during a visit.  It gave me a big smile. :-)  Congratulations to the whole Proteopedia community!  The structure is still up at the Stony Brook region.

Tags: 

The Magic behind Monolith - how it works

The other year, I developed a molecular visualization system inside of Second Life (see http://www.ebremer.com/monolith) with a demonstration video of it in operation on YouTube for purposes of learning the scripting language LSL of Second Life and for the fact that I found the concept of a 3D collaborative visualization environment with IM and group voice incredibly intriguing.  Building Monolith seemed a good way to demonstrate the utility of the environment to myself and hopefully others.  There had been other projects for molecular visualization done in Second Life before Monolith (Hiro's molecule rezzer, ORAC, Peter Miller's Protein Rezzing Toolkit, and work by Troy McConaghy), so, what was I going to add to this arena?  In short, speed and flexibility.  Easier said then done, so here's how it works for those of you interested in Monolith and Second Life LSL scripting:

Bringing the data in-world
The source of Monolith's data is the Rutgers Protein Databank.  Rutgers provides a http interface for retrieving the various accession numbers which can be used to retrieve known structures of various proteins and DNA.  The problem here is that these files are larger than 2048 bytes.  Why is this a problem?  Because the command for accessing http data from within the Second Life environment is handled by the command llHTTPRequest.  Linden Lab (LL), limits singular http requests with this command to only the first 2048 bytes no matter how long the document is, so, how do you retrieve documents that are larger?  I got around this by developing a java servlet back-end which is what the in-world Monolith front-end talks to.  When the in-world user sends the command to Monolith for a particular accession number, the request is actually sent to the java servlet back-end which then turns around and downloads the entire file from Rutgers.  The java servlet then spoon-feeds the data to Monolith's front-end which resides in Second Life (aka in-world).  Ok, another problem here, LL also throttles llHTTPRequest to 1 call per second and no more than 25 calls in any 20 second period with a burst of up to 25 calls in a one second time-period per in-world object.  UGH!  That means a single object in-world can only bring in 50k/per 20 seconds or 2.5K/sec.  I worked around this problem by using multiple http objects which are rezzed by the primary object which can go as high as needed, so for 50 http objects gives me 125k/sec which is a whole lot more workable than 2.5K/sec.  Later, I added this "multiple http object streaming" method to my atom nodes themselves eliminating the need for separate http objects.  The process to this point now is that one command pulls all of the data to the java back-end fromn Rutgers where it is "chunked-up" a sent in-world to feed multiple requesting objects which then re-assemble it all so to speak.

On rezzing objects rapidly in-world aka "en-masse"
In order to understand how Monolith works, you need to know the basic architecture of it.  Atoms are represented with individual scripted spherical primitives (prims).  So for a decent sized protein, 3000 atom primitives are used and thus there are 3000 concurrently running scripts.  Monolith is a parallel processing machine.  The next problem is getting 3000 scripted prims into existence.  Enter the LSL function llRezObject.  This function rezzes a singular object, but, there is a 0.1 second sleep delay that is forced when it is called to prevent massive numbers of objects being rezzed which would cause griefing denial of service issues within the Second Life region simulator.  I understand why LL does it, but it does not help people who wish to rez larger number of objects for legitimate purposes.  My first choice for solving this was to use multiple rezzers.   Each rezzer creates objects one at a time and even with the 0.1 second sleep delay, the aggregate rezzing of multiple rezzers would solve the problem.  The first time I tested this algorithm it failed.  I was greeted with numerous "gray goo wall" errors.  WTH, another barrier!  I thought about rezzing complex linked objects since llRezObject rezzes objects not just singular prims.  The problem with this approach is that linking and delinking permissions would be requested by Monolith when it would want to do them.  I found this approach annoying and confusing for the user.  So what to do?  The solution I used was rezzing complex UNLINKED objects.  What's that?  You can select multiple prims that are not linked (or combination of linked and unlinked) and pull them into the inventory as a complex singular non-linked object.  One llRezObject call can rez as many unlinked scripted prims as needed thus avoiding the gray goo wall choke.

On multiple scripted object coordination
The next problem that needed to be solved was the one that using thousands of scripted objects created.  How to coordinate thousands of objects that all use the SAME script (actually thousands of copies of the same script)?  Data is brought in-world over as many as 50 different concurrent http calls into separate objects.  How do we send the data from these objects to the different "smart atoms" to let them know if they are an oxygen, a nitrogen, a carbon, or a hydrogen?  How do we tell the different atoms where they are supposed to be?  Should they be blue? Red? Although all of the atoms run the same script (well copy of the same script), they need their own identity to differentiate themseleves from other atoms.  Some way of being uniquely addressed.  Every object in Second Life gets it's own unique UUID number that could be used for this.  The problem with this is how does the back-end java server know what the the UUIDs are?  One method would be to use llHTTPRequest to send that data out and have each smart atom report it's name to the back-end engine. The problem here is that 3000 smart atoms would send 3000 http calls to the back-end.  I had concerns about scalability and causing issues with the region simulator with that many http calls.  Now it would be simpler if the 3000 atoms could just be named 1, 2, 3, 4, .... 3000.  Then, I would have a way to uniquely address them that would apriori be known to the back-end without having tp send that data.  Two problems now on this, how to get the 1->3000 naming scheme and how to get them to talk to each other (the smart atoms).  On the later, Monolith takes advantage of llListen.  llListen creates a listening function on a set communications channel.  There are about 4 billion potential channels to use.  More than enough.  Each "smart atom" in Monolith has it's own private communication channel, as well as, a global communications channel.  In this fashion, data can be sent to an individual atom and to all atoms at the same time from the primary Monolith object.  But, how do we get them named 1->3000?  One method would be to pre-generate 3000 atoms named 1-3000 (but the same script inside), that then those scripts could reference the name of the object to find out it's node id/name.  The 3000 atoms could be brought into Monolith's inventory as a singular composite object that could be rezzed with a single llRezObject call.  The problem with this is that not all molecules have 3000 atoms, some more, some less.  The maximum number of prims could be used (15,000) and then delete what is not needed.  Would work somewhat, but some regions have other things going on and 15,000 prims not always available, not to mention the lag in creating 15,000 scripted objects needlessly.  The comprimise used solution is this: create a block of 50 smart atoms, named atom1, atom2, atom3...atom50.  Bring the 50 atoms into Monolith's inventory as a singular scripted object and use multiple llRezObject calls to generate as many multiples as needed.  The maximum "waste" prims would be 49.  Acceptable.  But, this method would create multiple groups of 1-50.  So if 10 calls were used to create 500 prims, we would have 10 atom1's, 10, atom2's and so on.  How do we fix this?  llRezObject has a parameter on the end of the function llRezObject( string inventory, vector pos, vector vel, rotation rot, integer param ) called "param".   An integer placed here is passed to the rezz'd object or objects linked or not.  On the first llRezObject we pass a 0, then 1, then 2, and so on up to what is needed.  We then tell each atom that when it rezzes, to reference the part of it's name atom# (# being a # from 1-50) and then adding this number to the product of the the param value times the shard size, in this case 50 (param is the shard # value I pass to the llRezObject function) to determine it's name and identitiy.  We are then left with (1+0*50, 2+0*50, 3+0*50, 4+0*50, 5+0*50, 6+0*50....1+4*50, 2+4*50, 3+4*50) which then yields the desired 1-->3000 sequence but each atom runs the same script, but it's behavior will vary depending on the data sent to it by the http calling objects.  Each line of data brought into Monolith from Rutgers is just a compressed version of the pdb file format.  Each atom in the pdb file is numbered 1->n atoms.  This is used to steer the data when it gets in-world by sending, for example, atom 5's data to communication channel #5.  Atom 5 will get the data since it configures itself to listen on channel 5 because it's name is atom5.  Cute huh? :-)  So atom 5 can be told, you are a nitrogen, you are located at xyz independently.  When global commands like color red atom type nitrogen are sent out, they go over the global channel all atoms listen to.  Each atom, now knowing what it is, can say, "am I a nitrogen? yes?  I will color myself blue.  No? Ignore it".

On Atom Movement
llRezObject can only rez an object no more than 10m away from the calling script.  Since each atom has a script, it can move itself around.  My first reaction was to use llSetPos and move in 10m increments, but it was easier to use llWarpPos and move the atom in one motion.  In my Monolith demostration video, I enable "physics" on a strand of DNA thus collapsing it into a big pile of balls for effect.  Since each atom know it's original location, a single command can disable physics and reposition all atoms back into their original locations thus bringing the DNA back together again.  Useless for molecular visualization, but handy to show how things can be done and it makes my kids laugh.

On the Risks of these methods
LL has been talking about "script limits".  Whereas I do not know what it will ultimately mean, the danger could be if the number of concurrent scripts per region is limited per person.  This could toast any large-scale Monolith visualization or project using these methods.  Whereas I do understand the need in shared "public" regions, private region owners should be able to disable the chokes and caps up to the maximum a region simulator can use.  In other words, if I pay for the whole region, I should be able to use the resources the way I want.  Moore's law gives us more in terms of computation, networking, and such.  Things in this environment should not remain flat for a given flat $$$ amount.

Bringing Monolith to OpenSimulator
Bringing Monolith to OpenSimulator required yanking most of the above out.  The 0.1 second delay on llRezObject can be turned off.  There are no limits on llHTTPRequest.  The code just needed to be simplified and the shard value of 50 was just increased accordingly.  Otherwise, it functions the same.  The trick with non-linked composited objects does not work because it is not yet supported in OpenSimulator.  However, being able to disable sleep on llHTTPRequest in Opensimulator eliminates the need for it.

The Future of Monolith
I've halted development of Monolith in favor of Nexus some time ago.  Nexus swaps out pdb data for Semantic Web RDF data.  Instead of streaming pdb data in-world, rdf triples are streamed.  Nexus will have the ability to visualize far more than just molecules, but will be able to do what Monolith does in a near-future release of Nexus, but it will do it semantically.  It will also be able to access numerous RDF data sources and follow semantically linked data where it goes.  In this fashion, I can do two projects for the price of one and get more in the end.  The first public release of Nexus will be an OpenSimulator region module, followed by a concurrently developed WebGL front-end for WebGL capable browsers, followed by an LSL-version for when installing region modules is not an option.  - E

Tags: 

Monolith Molecular Visualization in OpenSimulator 0.7.0.2

OpenSimulator is a open-source version of Second Life's server software that emulates much of Second Life's functionality and in some ways, goes beyond it.  Earlier this year, I moth-balled my Monolith project in favor of my Nexus project - it's semantic replacement and then some.  But I decided to visit my old friend Monolith again and wanted to see how difficult it would be to port it to OpenSimulator.

It took a couple of hours to do, but as you can see, it works! :-)  Most of the time I spent on it was removing code that I wrote in Monolith to bypass many of Second Life's, Linden Lab-imposed restrictions, namely:

1) object creation limits 15,000 primitives per region
2) # of calls to llRezObject function (gray goo wall)
3) http call data cap (Linden Lab only allows 2048 bytes per http call)
4) http call rate (the # of http calls in any period per script is also capped)

The speed enhancements were no longer needed since none of the above is limited in OpenSimulator.  The limit on the number of primitives in a region appears to be only limited by the hardware on which the OpenSimulator is executed.  There were some limits in OpenSimulator that prevented a few other tricks I do from working, but I removed these by editing the OpenSimulator.ini file.  Namely, the object movement limit (10M max by default) and the # of llListeners were also limited in OpenSimulator, again, a simple edit to the INI file and they were gone.

My current installation of OpenSimulator is running on Windows XP within a virtual machine using VMware workstation 7.X.  The hardware specifications for the virtual machine are 4 cores and 4GB RAM with a 50GB disk.   The underlying machine is an 4Ghz over-clocked I7 with a RAID-10 disk system.

I will also be porting my Nexus project to OpenSimulator since I am interested in visualizing a huge number of RDF triples far more than the 15,000 primitive limit of Second Life will allow.  I'm looking forward to seeing how far I can push OpenSimulator.

Tags: 

Subscribe to Monolith