« Back

Hadoop Dynamic File System as Liferay Store

Company Blogs October 11, 2012 By Ray Augé Staff

At this year's Liferay North American Symposium I did a talk on Big Data.

The goal of the talk was to illustrate some cases where Liferay and the broad notion of "Big Data" intersect.

Most importantly I showed an example which stored Documents & Media portlet files in Hadoop's extremely scallable and fault tolerant Dynamic File System (a.k.a. HDFS).

The same example code also demonstrates how you might tee off the indexed result of these documents to store a copy in a separate location in HDFS so that it might be consumed as input for some MapReduce calculation in order to extract insight from it.

It demonstrates how to use the MapReduce API from a java client all the way from creating the job and sending it to submitting it for processing and then (basically) monitoring it's progress. The actual logic applied in the example is trivial, however the most important part is actually showing how you could use the APIs in making Liferay and Hadoop talk.

The code is on github as a standalone Liferay SDK (based on trunk but easily adaptable to earlier versions):

https://github.com/rotty3000/liferay-plugins-sdk-hadoop

Please feel free to fork it or use it as an example.

[update] I should also add a few links to resources I used in setting up hadoop in hybrid mode (single cluster node with access from a remote client, where my Liferay code is assumed to be a remote client):

Threaded Replies Author Date
That's neat! It would be a great feature to add... Hitoshi Ozawa October 11, 2012 3:29 PM
Sorry, I didn't do any work with Cassandra... Ray Augé October 11, 2012 3:33 PM
Have you looked at gridfs... Jelmer Kuperus October 11, 2012 11:45 PM
Yes we have. In fact Mike Han has already... Ray Augé October 12, 2012 4:32 AM
How close to production is the ability to store... Bill Dunn October 13, 2012 3:53 PM
Just this weekend, i started a Liferay Store... Carlos Vicente October 14, 2012 3:08 AM
@Bill, not close. However, it's stable, open... Ray Augé October 16, 2012 10:10 AM
Hey Ray, That is a great example on how... Miguel Ángel Pastor Olivar October 17, 2012 12:11 AM
I tried to build... Jason Andrew August 6, 2014 6:20 AM

That's neat! It would be a great feature to add with BI tools.
Is there anything about using Liferay with Cassandra?
Posted on 10/11/12 3:29 PM.
Sorry, I didn't do any work with Cassandra unfortunately. However, there is nothing preventing anyone from taking on the challenge. I doubt it's very difficult. emoticon
Posted on 10/11/12 3:33 PM in reply to Hitoshi Ozawa.
Have you looked at gridfs ?http://www.mongodb.org/display/DOCS/GridFS+Specification
Posted on 10/11/12 11:45 PM.
Yes we have. In fact Mike Han has already prototype of it.

https://github.com/mhan810/liferay-plugins/tree/LPS-28420
Posted on 10/12/12 4:32 AM in reply to jelmer kuperus.
How close to production is the ability to store docs and media in HDFS? We have a new project we are starting and would like to leverage all the of Liferay security and logic, but want to store the media and docs in a very scalable manner..
Posted on 10/13/12 3:53 PM.
Just this weekend, i started a Liferay Store implementation, that uses Cassandra.
It´s not finished , but is able to store and retrieve document library files.
It´s based in Kundera, so in theory must work with mongoDB and HBase.
Posted on 10/14/12 3:08 AM in reply to Bill Dunn.
@Bill, not close. However, it's stable, open source, and you are more than welcome to try it.

@Carlos, Awesome! Will you be open sourcing this project?
Posted on 10/16/12 10:10 AM in reply to Carlos Vicente.
Hey Ray,

That is a great example on how Liferay can interact with some of the new technologies out there. I would like to add a couple of things to your entry:

- The D in HDFS is for distributed not dynamic emoticon

- For those interested on using it at a real system; I would discourage that. Using "raw" HDFS for the purpose of serving images/docs/etc (usually small files) is not usually a good idea because it will add lot of pressure on the NameNode, slowing down your system. Some other implementations like HBASE (built on top of HDFS) can satisfy this kind of needs.

If some need doing some kind of analytics, map/reduce, etc with tons of data maybe a mixed NoSQL + HDFS approach could serve; but this is a different story.

Great post Ray!

Migue
Posted on 10/17/12 12:11 AM in reply to Ray Augé.
I tried to build https://github.com/rotty3000/liferay-plugins-sdk-hadoop this..
but failed..
According to documentation, it is compatible to liferay 6.1 but the compiled .war is not compatible.
How can i make it compatible for liferay6.1
Posted on 8/6/14 6:20 AM.