Once you have Liferay installed in more than one node on your application server, there are several optimizations that need to be made. At a minimum, Liferay should be configured in the following way for a clustered environment:
All nodes should be pointing to the same Liferay database
Jackrabbit, the JSR-170 content repository, should be:
On a shared file system available to all the nodes (not really recommended, though), or
In a database that is shared by all the nodes
Alternatively, the Document Library should be configured to use the File System Hook, and the files can be stored on a SAN for better performance.
Similarly, Lucene, the full text search indexer, should be:
On a shared file system available to all the nodes (not really recommended, though), or
In a database that is shared by all the nodes, or
On separate file systems for all of the nodes, or
Disabled, and a separate pluggable enterprise search server configured (recommended).
If you have not configured your application server to use farms for deployment, the hot deploy folder should be a separate folder for all the nodes, and plugins will have to be deployed to all of the nodes individually. This can be done via a script.
Many of these configuration changes can be made by adding or modifying properties in your portal-ext.properties file. Remember that this file overrides the defaults that are in the portal.properties file. The original version of this file can be found in the Liferay source code or can be extracted from the portal-impl.jar file in your Liferay installation. It is a best practice to copy the relevant section that you want to modify from portal.properties into your portal-ext.properties file, and then modify the values there.
All Nodes Should Be Pointing to the Same Liferay Database
This is pretty self-explanatory. Each node should be configured with a data source that points to one Liferay database (or a database cluster) that all of the nodes will share. This ensures that all of the nodes operate from the same basic data set. This means, of course, that Liferay cannot (and should not) use the embedded HSQL database that is shipped with the bundles. It is also best if the database server is a separate physical box from the server which is running Liferay.
Document Library Configuration
Liferay 5.2.x now defaults to using the file system for storing documents. This has proven to be the highest performing configuration for large document libraries, which is why this decision was made. You can use the file system for your clustered configuration, and Liferay's document library will prevent users from "colliding" with each other by versioning documents and locking the files before they are modified. If you have a Storage Area Network (SAN), you can configure Liferay to store documents there to take advantage of the extra redundancy. To configure the location where your documents are stored, use the following property:
If you wish to cluster your document library configuration in a database, you can still do so using the Jackrabbit JSR-170 repository. You would also use the Jackrabbit repository if you want to have a JSR-170 compliant repository for your documents.
Liferay uses Jackrabbit—which is a project from Apache—as its JSR-170 compliant document repository. By default, Jackrabbit is configured to store the documents on the local file system upon which Liferay is installed, in the $HOME/liferay/jackrabbit folder. Inside this folder is Jackrabbit's configuration file, called repository.xml.
To simply move the default repository location to a shared folder, you do not need to edit Jackrabbit's configuration file. Instead, find the section in portal.properties labeled JCR and copy/paste that section into your portal-ext.properties file. One of the properties, by default, is the following:
Change this property to point to a shared folder that all of the nodes can see. A new Jackrabbit configuration file will be generated in that location.
Note that because of file locking issues, this is not the best way to share Jackrabbit resources. If you have two people logged in at the same time uploading content, you could encounter data corruption using this method, and because of this, we do not recommend it for a production system. Instead, to enable better data protection, you should redirect Jackrabbit into your database of choice. You can use the Liferay database or another database for this purpose. This will require editing Jackrabbit's configuration file.
The default Jackrabbit configuration file has sections commented out for moving the Jackrabbit configuration into the database. This has been done to make it as easy as possible to enable this configuration. To move the Jackrabbit configuration into the database, simply comment out the sections relating to the file system and comment in the sections relating to the database. These by default are configured for a MySQL database. If you are using another database, you will likely need to modify the configuration, as there are changes to the configuration file that are necessary for specific databases. For example, the default configuration uses Jackrabbit's DbFileSystem class to mimic a file system in the database. While this works well in MySQL, it does not work for all databases. For example, if you are using an Oracle database, you will need to modify this to use OracleFileSystem. Please see the Jackrabbit documentation at http://jackrabbit.apache.org for further information.
You will also likely need to modify the JDBC database URLs so that they point your database. Don't forget to create the database first, and grant the user ID you are specifying in the configuration file access to create, modify, and drop tables.
Once you have configured Jackrabbit to store its repository in a database, the next time you bring up Liferay, the necessary database tables will be created automatically. Jackrabbit, however, does not create indexes on these tables, and so over time this can be a performance penalty. To fix this, you will need to manually go into your database and index the primary key columns for all of the Jackrabbit tables.
All of your Liferay nodes should be configured to use the same Jackrabbit repository in the database. Once that is working, you can create a Jackrabbit cluster (please see the section below).
You can configure search in one of two ways: use pluggable enterprise search (recommended for a cluster configuration) or configure Lucene in such a way that either the index is stored on each node's file system or is shared in a database.
Pluggable Enterprise Search
As an alternative to using Lucene, Liferay 5.1 and higher now supports pluggable search engines. The first implementation of this uses the open source search engine Solr, but in the future there will be many such plugins for your search engine of choice. This allows you to use a completely separate product for search, which can be installed on another application server in your environment. Your search engine then operates completely independently of your Liferay Portal nodes in a clustered environment, acting as a search service for all of the nodes simultaneously.
This solves the problem described below with sharing Lucene indexes. You can now have one search index for all of the nodes of your cluster without having to worry about putting it in a database (if you wish, you can still do this if you configure Solr or another search engine that way) or maintaining separate search indexes on all of your nodes. Each Liferay node will send requests to the search engine to update the search index when needed, and these updates are then queued and handled automatically by the search engine, independently.
Since at the time of this writing there is only one implementation of the pluggable enterprise search, we will cover how to implement this using Solr.
Configuring the Solr Search Server
Since Solr is a standalone search engine, you will need to download it and install it first according to the instructions on the Solr web site (http://lucene.apache.org/solr). Solr is distributed as a .war file with several .jar files which need to be available on your application server's class path. Once you have Solr up and running, integrating it with Liferay is easy, but it will require a restart of your application server.
The first thing you will need to define is the location of your search index. Assuming you are running a Linux server and you have mounted a file system for the index at /solr, create an environment variable that points to this folder. This environment variable needs to be called $SOLR_HOME. So for our example, we would define:
This environment variable can be defined anywhere you need: in your operating system's start up sequence, in the environment for the user who is logged in, or in the start up script for your application server. If you are going to use Tomcat to host Solr, you would modify catalina.sh or catalina.bat and add the environment variable there.
Once you have created the environment variable, you then can use it in your application server's start up configuration as a parameter to your JVM. This is configured differently per application server, but again, if you are using Tomcat, you would edit catalina.sh or catalina.bat and append the following to the $JAVA_OPTS variable:
This takes care of telling Solr where to store its search index. Go ahead and install Solr to this box according to the instructions on the Solr web site (http://lucene.apache.org/solr). Once it's installed, shut it down, as there is some more configuration to do.
Installing the Solr Liferay Plugin
Next, you have a choice. If you have installed Solr on the same system upon which Liferay is running, you can simply go to the Control Panel and install the solr-web plugin. This, however, defeats much of the purpose of using Solr, because the goal is to offload search indexing to another box in order to free up processing for your installation of Liferay. For this reason, you should not run Liferay and your search engine on the same box. Unfortunately, the configuration in the plugin defaults to having Solr and Liferay running on the same box, so to run them separately, you will have to make a change to a configuration file in the plugin before you install it so you can tell Liferay where to send indexing requests. In this case, go to the Liferay web site (http://www.liferay.com) and download the plugin manually.
Open or extract the plugin. Inside the plugin, you will find a file called solr-spring.xml in the WEB-INF/classes/META-INF folder. Open this file in a text editor and you will see that there are two entries which define where the Solr server can be found by Liferay:
<bean id="indexSearcher" class="com.liferay.portal.search.solr.SolrIndexSearcherImpl">
<property name="serverURL" value="http://localhost:8080/solr/select" />
<bean id="indexWriter" class="com.liferay.portal.search.solr.SolrIndexWriterImpl">
<property name="serverURL" value="http://localhost:8080/solr/update" />
Modify these values so that they point to the server upon which you are running Solr. Then save the file and put it back into the plugin archive in the same place it was before.
Next, extract the file schema.xml from the plugin. It should be in the docroot/WEB-INF/conf folder. This file tells Solr how to index the data coming from Liferay, and can be customized for your installation. Copy this file to $SOLR_HOME/conf (you may have to create the conf directory) on your Solr box. Now you can go ahead and start Solr.
You can now hot deploy the solr-web plugin to all of your nodes. See the next section for instructions on hot deploying to a cluster.
Once the plugin is hot deployed, your Liferay search is automatically upgraded to use Solr. It is likely, however, that initial searches will come up with nothing: this is because you will need to reindex everything using Solr.
Go to the Control Panel. In the Server section, click Server Administration. Click the Execute button next to Reindex all search indexes at the bottom of the page. It may take a while, but Liferay will begin sending indexing requests to Solr for execution. When the process is complete, Solr will have a complete search index of your site, and will be running independently of all of your Liferay nodes.
Installing the plugin to your nodes has the effect of overriding any calls to Lucene for searching. All of Liferay's search boxes will now use Solr as the search index. This is ideal for a clustered environment, as it allows all of your nodes to share one search server and one search index, and this search server operates independently of all of your nodes.
Lucene, the search indexer which Liferay uses, can be in a shared configuration for a clustered environment, or an index can be created on each node of the cluster. If you wish to have a shared index, you will need to either share the index on the file system or in the database.
The Lucene configuration can be changed by modifying values in your portal-ext.properties file. Open your portal.properties file and search for the text Lucene. Copy that section and then paste it into your portal-ext.properties file.
If you wish to store the Lucene search index on a file system that is shared by all of the Liferay nodes, you can modify the location of the search index by changing the lucene.dir property. By default, this property points to the lucene folder inside the home folder of the user that is running Liferay:
Change this to the folder of your choice. To make the change take effect, you will need to restart Liferay. You can point all of the nodes to this folder, and they will use the same index.
Like Jackrabbit, however, this is not the best way to share the search index, as it could result in file corruption if different nodes try reindexing at the same time. We do not recommend this for a production system. A better way is to share the index is via a database, where the database can enforce data integrity on the index. This is very easy to do; it is a simple change to your portal-ext.properties file.
There is a single property called lucene.store.type. By default this is set to go to the file system. You can change this so that the index is stored in the database by making it the following:
The next time Liferay is started, new tables will be created in the Liferay database, and the index will be stored there. If all the Liferay nodes point to the same database tables, they will be able to share the index. Performance on this is not always as good as it could be. Your DBAs may be able to tweak the database indexes a bit to improve performance. For better performance, you should consider using a separate search server (see the section on Solr above).
Note: MySQL users need to modify their JDBC connection string for this to work. Add the following parameter to your connection string:
Alternatively, you can leave the configuration alone, and each node will then have its own index. This ensures that there are no collisions when multiple nodes update the index, because they all will have separate indexes. This, however, creates duplicate indexes and may not be the best use of resources. Again, for a better configuration, you should consider using a separate search server (see the section on Solr above).
Plugins which are hot deployed will need to be deployed separately to all of the Liferay nodes. Each node should, therefore, have its own hot deploy folder. This folder needs to be writable by the user under which Liferay is running, because plugins are moved from this folder to a temporary folder when they are deployed. This is to prevent the system from entering an endless loop, because the presence of a plugin in the folder is what triggers the hot deploy process.
When you want to deploy a plugin, copy that plugin to the hot deploy folders of all of the Liferay nodes. Depending on the number of nodes, it may be best to create a script to do this. Once the plugin has been deployed to all of the nodes, you can then make use of it (by adding the portlet to a page or choosing the theme as the look and feel for a page or page hierarchy).
Some containers contain a facility which allows the end user to deploy an application to one node, after which it will get copied to all of the other nodes. If you have configured your application server to support this, you won't need to hot deploy a plugin to all of the nodes—your application server will handle it transparently. Make sure, however, that you use Liferay's hot deploy mechanism to deploy plugins, as in many cases Liferay slightly modifies plugin .war files when hot deploying them.
All of the above will get basic Liferay clustering working; however, the configuration can be further optimized. We will see how to do this next.