Foren

How can I index outside websites

Paige Lowe, geändert vor 9 Jahren.

How can I index outside websites

Junior Member Beiträge: 42 Beitrittsdatum: 11.11.14 Neueste Beiträge
Hi, I've tried to get help with this project before, but I haven't always been the best at explaining it, so I'll try again with as much detail as I can. I'm still pretty new at Liferay, and although I've tried to read as much of the documentation as I can, I'll admit I'm still confused about a lot of things.

What I'm trying to do: Index a non-liferay website without using the liferay sdk (I can't access it for long and complex reasons) for 6.2. Basically I have a nonliferay webpage that has a lot of articles on it, and I'd like those articles to show up on liferay. I've already got a crawler that can read the pages and get their content, but I'm not sure how to put that content into Liferay's search index. Most of the examples seem to be based on custom assets in liferay which, if I understand things correctly, means using the SDK (or maybe finding some workaround).

Questions
Can I create a custom asset type without the sdk? I can't use the wizard that usually links custom code (if I understand it correctly) to liferay, so is it still possible to get the entries into the lucene index using SearchEngineUtil? To do that, would I need to use soap to get things like user and company id?

Alternatively, if I can't use those methods, could I circumvent liferay entirely and add things directly to the lucene index? What fields would be absolutely necessary for it to appear in the search engine (I'm guessing it needs things like title or company id, but does it need a portlet id or create date? Does it need fields I wouldn't normally think of since they stay under the hood?)

Since this is an outside webpage and not a custom portlet, do I still need to create my own asset type? Could I piggyback on an already existing type like web asset? Can I even create an asset type without the sdk?

Thanks for the help. I've been looking at this problem way too long and without the SDK, I feel like I'm working with one hand tied behind my back. I realize I might be overcomplicating this. Obviously any help (or resteering if necessary) would be greatly appreciated.
thumbnail
Dave Weitzel, geändert vor 9 Jahren.

RE: How can I index outside websites

Regular Member Beiträge: 208 Beitrittsdatum: 18.11.09 Neueste Beiträge
I think you are confusing Liferay's Asset system which underpins the Asset Publisher portlet and Search indexing which underpins the search portlet and relies on an external search engine such as Lucene , SOLR or Google Search Appliance.

If all you want is to get your external web site pages returned in search results then you just need to add your pages to the search engine. I believe you need to use SOLR not Lucene as that is one of the main drivers for switching to SOLR. Your crawler will create the entries with their relevant urls. When a Liferay user searches with terms that create a hit on your web pages the search results should include those pages and be displayed.

Personally I have never done this so I am not sure how they are reflected in the search facets assetTypes.

Hope that helps, maybe someone else can give you the steps need to achieve this in more detail.
Paige Lowe, geändert vor 9 Jahren.

RE: How can I index outside websites

Junior Member Beiträge: 42 Beitrittsdatum: 11.11.14 Neueste Beiträge
Thanks Dave!
Paige Lowe, geändert vor 9 Jahren.

RE: How can I index outside websites

Junior Member Beiträge: 42 Beitrittsdatum: 11.11.14 Neueste Beiträge
It seems like (from what Google can tell me) that Solr itself doesn't do the webcrawling, Nutch does. It seems people using older versions of Liferay have had problems integrating the two (nutch search results apparently don't show up in seaches) and the original stack overflow page that seemed to explain it has been taken down, but most of their questions were from 2-8 years ago. Does anyone know where good instructions for integrating nutch with liferay would be/if they can work together at all in liferay 6.2?