6011 EE clustering issue

掲示板

9年前に Eskendir Mulugeta によって更新されました。

6011 EE clustering issue

New Member 投稿: 5 参加年月日: 13/07/01 最新の投稿

Hello,

We have cluster with 2 tomcat and shared database. Apache is front end with load balancing.

Cluster is setup to use multicast with following in portal-ext

ehcache.multi.vm.config.location=/distEhcache/liferay-multi-vm-clustered.xml
ehcache.cache.manager.peer.provider.factory=net.sf.ehcache.distribution.RMICacheManagerPeerProviderFactory
ehcache.cluster.link.replication.enabled=true
cluster.link.enabled=true

Multicast is enabled and send/receive tests work as expected.

bind_addr property is set as production servers are multihome

The default JGroups jar that came with liferay is 2.8.1 GA

Clustering somewhat works -

if I add a portlet on one node, it shows up on other node, however, adding document or any user update/add works intermittently.

I have turned on jgroup and ehcache log level to debug

Also

DEBUG [MulticastKeepaliveHeartbeatReceiver:163] We are already processing these rmiUrls. Another heartbeat came before we finished:
with tons of rmi urls after that

is that normal?

I see bunch of rmi messages
DEBUG [RMICacheManagerPeerProvider:126] Lookup URL

I also see

[NAKACK:779] ugnise021-20342: dropped message from <host>-62953 (not in xmit_table), keys are [<host>-20342], view=[<host>-20342|0] [<host>-20342]

Do I need to upgrade JGroups to 2.9 (http://comments.gmane.org/gmane.comp.java.javagroups.general/6018) can I just upgrade the jgroups? not sure compatibility with liferay?

Any suggestions/ideas?

Thanks

EM

8年前に siddhant jain によって更新されました。

RE: 6011 EE clustering issue

Junior Member 投稿: 69 参加年月日: 13/03/19 最新の投稿

Hi Eskendir Mulugeta,

If you go through the documentation for liferay clustering here

It says about how you can configure the Liferay repository to perform well in a clustered configuration.
It also says that

To cluster your search indexes, you also need to set the following property:

lucene.replicate.write=true

thanks
Siddhant

8年前に Andew Jardine によって更新されました。

RE: 6011 EE clustering issue

Liferay Legend 投稿: 2416 参加年月日: 10/12/22 最新の投稿

Siddhant is correct. There are various aspects of the portal that use the SEARCH API over the SERVICE API in order to retreive items. If you are using an externalized search server in your cluster then you can omit the lucene replication. If you are not, meaning you are using the local storage (LIFERAY_HOME/data/lucene) on each of the nodes (this is the default behaviour) then the indexing actions need to be replicated as well.

If you don't want to have a search server you could alway configure a network share and reconfigure the default property to point to it. The property to reconfigure in this case is --


    #
    # Set the directory where Lucene indexes are stored. This is only referenced
    # if Lucene stores indexes in the file system.
    #
    lucene.dir=${liferay.home}/data/lucene/

If you reconfigure this to a shared location then the actions taken by one node in the cluster will be visible by the other without replication.

8年前に Olaf Kock によって更新されました。

RE: 6011 EE clustering issue

Liferay Legend 投稿: 6403 参加年月日: 08/09/23 最新の投稿

Two replies have already been given, here's some more input: Just clustering the caches is not sufficient for a proper Liferay cluster. You mention that document upload works intermittently: Note that the metadata ends up in the database but the default location for storing the actual data is a subdirectory of (each of) your ${liferay.home} directories - e.g. each cluster machine has its own document library storage. With 50:50 loadbalancing this statistically means that each of your 2 machines has 50% of the documents available to it, while both have 100% of the metadata.

The clustering documentation has been linked already - please check it and go through everything in there. Note that we also cover clustering in our System Administration Training and also that there are more recent versions of 6.0 EE that you might want to consider upgrading to.

8年前に Eskendir Mulugeta によって更新されました。

RE: 6011 EE clustering issue

New Member 投稿: 5 参加年月日: 13/07/01 最新の投稿

Thanks for replies,

Sorry for the confusion.

lucene.replicate.write=true is already there and luncene index are written in the proper directory.

We have many environments (test/dev etc) and it is working every where except production. I double/triple checked clustering guide on all environment and didn't find any missing step.

The real differences between PROD and TEST, which I noticed are

Production is multi home, DEV/TEST are not
Production has bind_addr set to proper NIC, and logs are shwoing proper IP, however, DEV/TEST are showing lo (127.0.0.1) address for RMI and everywhere.
Production has
[NAKACK:779] ugnise021-20342: dropped message from <host>-62953 (not in xmit_table), keys are [<host>-20342], view=[<host>-20342|0] [<host>-20342]
DEV/TEST don't have that

After googling NAKACK error, it appeared that jgroups.jar that comes with liferay (2.8.1 GA) has this issue. Not sure why it didn't show on DEV/TEST.

Can I just upgrade it o 2.9 or even 3.1 (which has NAKACK2)?

I also saw lot of RMI urls in th elogs of all environment, is that normal?

Is there any other way to debug this issue?

Thanks

EM

8年前に Olaf Kock によって更新されました。

RE: 6011 EE clustering issue

Liferay Legend 投稿: 6403 参加年月日: 08/09/23 最新の投稿

well, with that information my first recommendation is: You're on EE, make the best use of it and file an issue with support.

Upgrading jgroups might be possible, but it will also void the warranty: Unless support tells you that the upgrade is supported (and then they probably deliver a fixpack/hotfix or have long delivered a service pack containing the upgrade. I doubt that they'll bump the version number - they might just fix the issue and post a minor update to jgroups. But this is just my personal expectation.

Note that with multi-homed servers you must make sure that they're using the correct interface to communicate to each other: If the servers choose the interface where it can't reach the other, you're toast. Check for the "cluster.link.autodetect.address" setting in portal.properties and set it to an address that is accessed on an interface where both servers can talk to each other.