Combination View Flat View Tree View
Threads [ Previous | Next ]
toggle
Ray Augé
NoSQL, MongoDB, Cassandra, etc. and Liferay
December 31, 2010 12:26 PM
Answer

Ray Augé

LIFERAY STAFF

Rank: Liferay Legend

Posts: 1171

Join Date: February 7, 2005

Recent Posts

Hey All,

Over the last several months the hype around NoSQL DB design has reached fever pitch.

At the same time, the hype around dynamic data modeling, web based form design, dynamic schema design, meta-data attachment and such has been increasing as well (if not under those names, creating stuff online dynamically with little coding).

Meanwhile, there have been concerns that certain aspects of Liferay's architecture may not be so well suited for scalability (we're of course talking lots and lots of data), namely around Custom Fields and Expandos in general. A Sharepoint demo that Jon Lee (Liferay) gave during the developer retreat and the ensuing discussion about its scalability seemed to confirm that the model they use which is much like ours for Expando doesn't seem to scale well when data sets become very large.

All this being said, Expando was designed not to be too tightly bound to a specific backend and it has no relations with any other portal domain entities.

As such, I've been thinking to prototype a NoSQL adapter (effectively ServiceWrapper hook) that we could plug in as the backend to make Expando scale to huge amounts of data (via some NoSQL DB Impl) and with the coming of the User Data Lists/Workflow Forms which should have adapters for storing in either WCM or Expando. It would allow our web based data modeling to support huge data sets dynamically and without any portal changes. It would mean that things like custom fields automatically get stored in this new backend.

OR

When developers create Expando tables programatically, they are in effect creating new document types, tables, field sets (or whatever NoSQL nomenclature is used for defining the data sets) and storing in highly scalable storage.

So if anyone asks about if we have plans to or ideas of how we can use the NoSQL model in Liferay, this is one way of how I see it being used.

Thoughts?
Szymon Gołębiewski
RE: NoSQL, MongoDB, Cassandra, etc. and Liferay
January 3, 2011 7:25 AM
Answer

Szymon Gołębiewski

Rank: Regular Member

Posts: 247

Join Date: June 8, 2009

Recent Posts

I can only write our experience with NoSQL database (it was RIAK). We made a system that stored ads that users entered on one of our sites. On one node RIAK was unable to do "map reduce" on set of 10 000 ads (users were gettin timeout error). So we made tests of different DB systems like MongoDB, CouchDB, PostgreSQL and MySQL. For one node and flat data structure MySQL was the fastests database. Problem was that if you want RIAK to be as fast as MySQL you have to prepare lots of nodes. Our farm consisted of 5 servers but 5 nodes was not enough.

So question is on what ammount of nosql nodes those DBs will be faster than MySQL (which btw have pretty nice replication options ootb)?
Ray Augé
RE: NoSQL, MongoDB, Cassandra, etc. and Liferay
January 3, 2011 8:33 AM
Answer

Ray Augé

LIFERAY STAFF

Rank: Liferay Legend

Posts: 1171

Join Date: February 7, 2005

Recent Posts

I would think that given a flat data structure and only 10000 items, adding the overhead of managing a separate DB (especially a NoSQL one) into an existing infrastructure would seem a little out of place to me. I would not think twice to add that to SQL and in the worst case add an indexer on top of that to speed searching. Heck, I wouldn't think too much if we were even talking of doing that using Expando's in Liferay (since those can also be indexed).

I think the problem comes up more when the order of magnitude begins to show signs of performance degredation. As in, the number of records starts to hit into the millions+, and perhaps you still need some amount of dynamic behavior such as ability to add columns on the fly, or add new tables (or documents in NoSQL speak).

Homomorphic data models (schema is defined as data), such as Expando (and Sharepoint Lists) start to degrade in performance because the number of tables is fixed. Whether you have 1 virtual table or 1000, all the data is still in the same real few tables. This means you start to form contention as more and more different apps try to use the read and write from those.

Now, Homomorphic data models suffer from additional limitations in that you can't perform traditional DB operations on the data because the columns are not stored in such as way as to allow aggregate operations on them (sorting and filtering for instance). As the number of SQL operations starts to go down it begins to look more like a document repository than a SQL one, minus the optimizations that a document repo has (such as inbuilt indexing). What we've been doing to solve that problem with Liferay is adding the ability to index Expando data (in our own embedded indexer, Lucene, Solr) along with the ORM entities. That solved Search (read of CRUD). The problem still lies in Create, Update, and Delete as scale increases again due to contention on those few tables.

This is where the idea of using NoSQL comes in:

1 - This data has to be reliable (clustering, replication, backup)
2 - It has to be dynamic (add custom fields to entities on the fly, create new Data Lists on the fly, etc.)
3 - It should offer aggregate operations for sorting and filtering (at least close to what you'd expect from a Document repo with indexing)
4 - It has to scale.
5 - It has to perform well (indexing, MapReduce, etc.)

What I'm looking to figure out is:

1 - Can NoSQL do what Expando does? (i.e. Can we map Expando onto NoSQL? I think so!)
2 - At what point does Expando backend need to be moved to a higher scale architecture like NoSQL?
3 - How hard is it to write an adapter Expando -> NoSQL?
4 - Is it really worth the effort?
5 - Does anyone see value in doing that (would it make anyone feel more comfortable, make their job easier, and make them look like Wizards to their bosses when they say they've implemented NoSQL seamlessly into their infrastructure and gained X amount of benefit)?
Ray Augé
RE: NoSQL, MongoDB, Cassandra, etc. and Liferay
January 8, 2011 8:22 PM
Answer

Ray Augé

LIFERAY STAFF

Rank: Liferay Legend

Posts: 1171

Join Date: February 7, 2005

Recent Posts

Jonas Yuan
RE: NoSQL, MongoDB, Cassandra, etc. and Liferay
January 9, 2011 9:34 AM
Answer

Jonas Yuan

Rank: Liferay Master

Posts: 993

Join Date: April 26, 2007

Recent Posts

Great! Thank you, Ray.
Marcelo Ruiz Camauër
RE: NoSQL, MongoDB, Cassandra, etc. and Liferay
January 9, 2011 4:38 PM
Answer

Marcelo Ruiz Camauër

Rank: Junior Member

Posts: 78

Join Date: May 8, 2006

Recent Posts

Congratulations, this is a very interesting experiment or rather, enhancement!

My question is would it be possible to replace the ENTIRE Liferay schema with a NoSQL db?

There's one in particular, VoltDB, which has a great degree of compatibility with SQL (a pretty complete subset of it). Given that LR runs on so many types of DB engines, it probably could get ported to it without too much trouble... VoltDB sounds pretty good, and may enable really large scale portals with large scale customizability (ie Expandos) of user data...
Ray Augé
RE: NoSQL, MongoDB, Cassandra, etc. and Liferay
January 9, 2011 7:31 PM
Answer

Ray Augé

LIFERAY STAFF

Rank: Liferay Legend

Posts: 1171

Join Date: February 7, 2005

Recent Posts

The short answers is either:

1) "It's very highly doubtful."
2) "The undertaking might not be worth the effort."

It might be easier to ask "Could Hibernate be made to work on a non-SQL persitence backend?" That would have to happen before Liferay could even consider it in any way.

On the other hand Liferay does have, as we've demonstrated, several places where we could leverage a non-SQL persistence. Another that I can think of (besides those we've already mentioned) is as perhaps an other implementation of the DocLib Repository backend (Liferay 6.1 will support multiple backend repositories all at the same time.) I've read that MongoDB (and I imagine there are others) are ideally suited for storing large binary objects (like video) efficiently and with highly concurrent, and extremely efficient streaming in either direction.
Ray Augé
RE: NoSQL, MongoDB, Cassandra, etc. and Liferay
January 9, 2011 7:32 PM
Answer

Ray Augé

LIFERAY STAFF

Rank: Liferay Legend

Posts: 1171

Join Date: February 7, 2005

Recent Posts

Committed to SVN (plugins/trunk).

See http://issues.liferay.com/browse/LPS-14646.
Marcelo Ruiz Camauër
RE: NoSQL, MongoDB, Cassandra, etc. and Liferay
January 10, 2011 8:37 AM
Answer

Marcelo Ruiz Camauër

Rank: Junior Member

Posts: 78

Join Date: May 8, 2006

Recent Posts

Relational DB's can and do handle very large workloads, and there are a variety of ways of extending their scalability, but in general they are costly to implement (multi-slave db's, etc., many servers, etc).

Large binary storage is one useful capability. Currently putting doc libraries in the DB is not a great strategy for really large-scale storage... it bogs down the whole DB and backup systems.

One feature about these nosql systems is their replication and resynchronization (and resiliency)... maybe they'd be useful for a LR system that can run locally and re-synch later when connected to the Web? You could do vertical apps with LR as the substrate then... and even in the First World you don't always have great connectivity outside the large cities... Maybe you could have a "P2P" portal?

The best strategy would be mapping Hibernate to NoSQL and leave Liferay untouched.