Kombinierte Ansicht Flache Ansicht Baumansicht
Threads [ Zurück | Nächste ]
toggle
Is it possible to search content from the PDF Document? Amit Doshi 22. Mai 2012 00:55
RE: Is it possible to search content from the PDF Document? Hitoshi Ozawa 22. Mai 2012 04:14
RE: Is it possible to search content from the PDF Document? Amit Doshi 22. Mai 2012 05:20
RE: Is it possible to search content from the PDF Document? Hitoshi Ozawa 22. Mai 2012 05:53
RE: Is it possible to search content from the PDF Document? Amit Doshi 22. Mai 2012 06:40
RE: Is it possible to search content from the PDF Document? Hitoshi Ozawa 22. Mai 2012 14:55
RE: Is it possible to search content from the PDF Document? Alexander Chow 23. Mai 2012 01:10
RE: Is it possible to search content from the PDF Document? Amit Doshi 23. Mai 2012 03:31
RE: Is it possible to search content from the PDF Document? Alexander Chow 23. Mai 2012 03:54
RE: Is it possible to search content from the PDF Document? Hitoshi Ozawa 23. Mai 2012 04:15
RE: Is it possible to search content from the PDF Document? Amit Doshi 23. Mai 2012 04:28
RE: Is it possible to search content from the PDF Document? Hitoshi Ozawa 23. Mai 2012 06:11
RE: Is it possible to search content from the PDF Document? Amit Doshi 23. Mai 2012 06:29
RE: Is it possible to search content from the PDF Document? Hitoshi Ozawa 23. Mai 2012 14:16
RE: Is it possible to search content from the PDF Document? Amit Doshi 24. Mai 2012 01:12
RE: Is it possible to search content from the PDF Document? Alexander Chow 23. Mai 2012 06:08
RE: Is it possible to search content from the PDF Document? Amit Doshi 24. Mai 2012 01:20
RE: Is it possible to search content from the PDF Document? Alexander Chow 24. Mai 2012 01:58
RE: Is it possible to search content from the PDF Document? Amit Doshi 24. Mai 2012 03:43
RE: Is it possible to search content from the PDF Document? Alexander Chow 24. Mai 2012 05:12
RE: Is it possible to search content from the PDF Document? Hitoshi Ozawa 24. Mai 2012 06:00
RE: Is it possible to search content from the PDF Document? Amit Doshi 24. Mai 2012 06:54
RE: Is it possible to search content from the PDF Document? Hitoshi Ozawa 24. Mai 2012 14:37
RE: Is it possible to search content from the PDF Document? Alexander Chow 25. Mai 2012 04:37
RE: Is it possible to search content from the PDF Document? Amit Doshi 25. Mai 2012 06:50
RE: Is it possible to search content from the PDF Document? Alexander Chow 25. Mai 2012 08:19
RE: Is it possible to search content from the PDF Document? Subhash Pavuskar 23. Mai 2012 03:36
RE: Is it possible to search content from the PDF Document? Prabhakar Singh 18. Dezember 2013 22:57
RE: Is it possible to search content from the PDF Document? Rashmi S 4. November 2014 01:32
Amit Doshi
Is it possible to search content from the PDF Document?
22. Mai 2012 00:55
Antwort

Amit Doshi

Rang: Liferay Master

Nachrichten: 547

Eintrittsdatum: 29. Dezember 2010

Neue Beiträge

It is possible that Liferay can index content of PDF documents so it can show up in the search?

if yes then how?

Thanks in advance.
Hitoshi Ozawa
RE: Is it possible to search content from the PDF Document?
22. Mai 2012 04:14
Antwort

Hitoshi Ozawa

Rang: Liferay Legend

Nachrichten: 7952

Eintrittsdatum: 23. März 2010

Neue Beiträge

Yes, just upload pdf files to document and media library!
Amit Doshi
RE: Is it possible to search content from the PDF Document?
22. Mai 2012 05:20
Antwort

Amit Doshi

Rang: Liferay Master

Nachrichten: 547

Eintrittsdatum: 29. Dezember 2010

Neue Beiträge

Hi Hitoshi,

I done the same thing before posting ( uploaded the file and then try to search with the help of search Facet available in liferay 6.1), but it was not searching the content inside the pdf file. Is there any way to do it?

Thanks & Regards,
Amit Doshi
Hitoshi Ozawa
RE: Is it possible to search content from the PDF Document?
22. Mai 2012 05:53
Antwort

Hitoshi Ozawa

Rang: Liferay Legend

Nachrichten: 7952

Eintrittsdatum: 23. März 2010

Neue Beiträge

I'm using the Search portlet.
Amit Doshi
RE: Is it possible to search content from the PDF Document?
22. Mai 2012 06:40
Antwort

Amit Doshi

Rang: Liferay Master

Nachrichten: 547

Eintrittsdatum: 29. Dezember 2010

Neue Beiträge

In Liferay 6.1, Search portlet is replaced by Search Facet. So, indeed both are same. Can you please highlight some point what i am missing? how can check whether the indexing is done or not for the pdf?

Because what i see at present is it just index the Title and Metadata, not the content inside it.
Hitoshi Ozawa
RE: Is it possible to search content from the PDF Document?
22. Mai 2012 14:55
Antwort

Hitoshi Ozawa

Rang: Liferay Legend

Nachrichten: 7952

Eintrittsdatum: 23. März 2010

Neue Beiträge

I'm talking about "Search" portlet that can be put on a page from "Add" -> "More" -> "Tools" -> "Search".

I just did a search on a pdf file I've uploaded to Document and Media library and went fine.
Alexander Chow
RE: Is it possible to search content from the PDF Document?
23. Mai 2012 01:10
Antwort

Alexander Chow

LIFERAY STAFF

Rang: Liferay Master

Nachrichten: 519

Eintrittsdatum: 19. Juli 2005

Neue Beiträge

By default, the text layer of all PDF documents are extracted and indexed. This indexing would be used for any searching of the document in the (faceted) search portlet or in the documents and media library.

The first thing I would do is try to figure out if your document has a text layer. Are you able to open your PDF document in a PDF reader and copy and paste the text to another application (say MS Word)? If not, you don't have a text layer that Liferay can extract from.

If there is a text layer then, in theory, it should be indexed. You can try to look through what is indexed using Luke.
Amit Doshi
RE: Is it possible to search content from the PDF Document?
23. Mai 2012 03:31
Antwort

Amit Doshi

Rang: Liferay Master

Nachrichten: 547

Eintrittsdatum: 29. Dezember 2010

Neue Beiträge

Thanks Alexander for highlight these. It was useful for me. I found problem in the pdf.

But I found once issue while re-indexing the Document and Media portlet. It shows me the below exception with new pdf.

 1
 210:25:22,921 ERROR [FileImpl:304] org.apache.tika.exception.TikaException: Not a HPSF document
 3org.apache.tika.exception.TikaException: Not a HPSF document
 4        at org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaryEntryIfExists(SummaryExtractor.java:78)
 5        at org.apache.tika.parser.microsoft.SummaryExtractor.parseSummaries(SummaryExtractor.java:58)
 6        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:164)
 7        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
 8        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
 9        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
10        at org.apache.tika.Tika.parseToString(Tika.java:357)
11        at org.apache.tika.Tika.parseToString(Tika.java:386)
12        at com.liferay.portal.util.FileImpl.extractText(FileImpl.java:300)
13        at com.liferay.portal.kernel.util.FileUtil.extractText(FileUtil.java:135)
14        at com.liferay.portal.kernel.search.DocumentImpl.addFile(DocumentImpl.java:98)
15        at com.liferay.portlet.documentlibrary.util.DLIndexer.doGetDocument(DLIndexer.java:360)
16        at com.liferay.portal.kernel.search.BaseIndexer.getDocument(BaseIndexer.java:110)
17        at com.liferay.portlet.documentlibrary.util.DLIndexer.reindexFileEntries(DLIndexer.java:540)
18        at com.liferay.portlet.documentlibrary.util.DLIndexer.reindexFileEntries(DLIndexer.java:523)
19        at com.liferay.portlet.documentlibrary.util.DLIndexer.doReindex(DLIndexer.java:498)
20        at com.liferay.portal.kernel.search.BaseIndexer.reindex(BaseIndexer.java:329)
21        at com.liferay.portlet.documentlibrary.util.DLIndexer.reindexFolders(DLIndexer.java:581)
22        at com.liferay.portlet.documentlibrary.util.DLIndexer.reindexFolders(DLIndexer.java:560)
23        at com.liferay.portlet.documentlibrary.util.DLIndexer.doReindex(DLIndexer.java:490)
24        at com.liferay.portal.kernel.search.BaseIndexer.reindex(BaseIndexer.java:329)
25        at com.liferay.portlet.admin.action.EditServerAction.reindex(EditServerAction.java:325)
26        at com.liferay.portlet.admin.action.EditServerAction.processAction(EditServerAction.java:157)
27        at com.liferay.portal.struts.PortletRequestProcessor.process(PortletRequestProcessor.java:175)
28        at com.liferay.portlet.StrutsPortlet.processAction(StrutsPortlet.java:190)
29        at com.liferay.portlet.FilterChainImpl.doFilter(FilterChainImpl.java:70)
30        at com.liferay.portal.kernel.portlet.PortletFilterUtil.doFilter(PortletFilterUtil.java:48)
31        at com.liferay.portlet.InvokerPortletImpl.invoke(InvokerPortletImpl.java:651)
32        at com.liferay.portlet.InvokerPortletImpl.invokeAction(InvokerPortletImpl.java:686)
33        at com.liferay.portlet.InvokerPortletImpl.processAction(InvokerPortletImpl.java:361)
34        at com.liferay.portal.action.LayoutAction.processPortletRequest(LayoutAction.java:856)
35        at com.liferay.portal.action.LayoutAction.processLayout(LayoutAction.java:635)
36        at com.liferay.portal.action.LayoutAction.execute(LayoutAction.java:246)
37        at org.apache.struts.action.RequestProcessor.processActionPerform(RequestProcessor.java:431)
38        at org.apache.struts.action.RequestProcessor.process(RequestProcessor.java:236)
39        at com.liferay.portal.struts.PortalRequestProcessor.process(PortalRequestProcessor.java:174)
40        at org.apache.struts.action.ActionServlet.process(ActionServlet.java:1196)
41        at org.apache.struts.action.ActionServlet.doPost(ActionServlet.java:432)
42        at javax.servlet.http.HttpServlet.service(HttpServlet.java:641)
43        at com.liferay.portal.servlet.MainServlet.callParentService(MainServlet.java:538)
44        at com.liferay.portal.servlet.MainServlet.service(MainServlet.java:515)
45        at javax.servlet.http.HttpServlet.service(HttpServlet.java:722)


Can you please highlight how to overcome with the above exception?

Regards,
Amit Doshi
Subhash Pavuskar
RE: Is it possible to search content from the PDF Document?
23. Mai 2012 03:36
Antwort

Subhash Pavuskar

Rang: Regular Member

Nachrichten: 234

Eintrittsdatum: 12. März 2012

Neue Beiträge

Yes !! You can do this I hope this code may help you to read content form the PDF File ...
  1
  2import com.liferay.util.bridges.mvc.MVCPortlet;
  3import java.io.File;
  4import java.io.FileInputStream;
  5
  6import javax.portlet.ActionRequest;
  7import javax.portlet.ActionResponse;
  8
  9import org.apache.poi.hssf.usermodel.HSSFCell;
 10import org.apache.poi.hssf.usermodel.HSSFRow;
 11import org.apache.poi.hssf.usermodel.HSSFSheet;
 12import org.apache.poi.hssf.usermodel.HSSFWorkbook;
 13
 14import com.liferay.portal.kernel.upload.UploadPortletRequest;
 15import com.liferay.portal.util.PortalUtil;
 16/**
 17 * Portlet implementation class Isolate
 18 */
 19public class Isolate extends MVCPortlet {
 20
 21        String Name="";
 22        int companyId;
 23  String Division;
 24public void processAction(ActionRequest request,ActionResponse response)
 25 {
 26                  
 27                  UploadPortletRequest uploadRequest =
 28PortalUtil.getUploadPortletRequest(request);
 29                  File file = uploadRequest.getFile("ufile");
 30                  int i=0;
 31        try
 32        {
 33                                FileInputStream fs =new FileInputStream(file);
 34                            HSSFWorkbook wb = new HSSFWorkbook(fs);
 35     for (int k = 0; k<wb.getNumberOfSheets(); k++)         
 36     {
 37                              HSSFSheet sheet = wb.getSheetAt(k);
 38                              int rows  = sheet.getPhysicalNumberOfRows();
 39                         l1: for (int r=0; r<rows+100;r++)
 40                         {
 41                              try
 42                              {
 43                                  HSSFRow row = sheet.getRow(r);
 44                                      HSSFCell cell1  = row.getCell(0);
 45                                  if(cell1.getStringCellValue().indexOf('@')>0)
 46                                  {
 47                                          String[] temp;                                          
 48                                          String delimiter = " ";
 49                                          temp = cell1.getStringCellValue().split(delimiter);
 50                  
 51                                          for(int a =0; a < temp.length ; a++)
 52                                          {
 53                                                  if(temp[a].indexOf('@')>0)
 54                                                  {
 55                                                          request.setAttribute("email",temp[a]);
 56                                                  }
 57                                          }
 58                                  }
 59                                  if(cell1.getStringCellValue().indexOf("www")>=0)
 60                                  {
 61                                          String[] temp1;      
 62                                          String delimiter = " ";
 63                                          temp1 =cell1.getStringCellValue().split(delimiter);
 64                  
 65                                          for(int b =0; b < temp1.length ; b++)
 66                                          {
 67                                                  if(temp1[b].indexOf("www")>=0)
 68                                                  {
 69                                                          request.setAttribute("website",temp1[b]);
 70                                                  }
 71                                          }
 72                                  }
 73                                 
 74                                  if(cell1.getStringCellValue().indexOf('+')>=0||
 75cell1.getStringCellValue().indexOf("91")>=0||
 76cell1.getStringCellValue().indexOf("080")>=0)
 77                                  {
 78                                          String[] temp1;      
 79                                          String delimiter = " ";
 80                                          temp1 =cell1.getStringCellValue().split(delimiter);
 81                  
 82                                          for(int b =0; b < temp1.length ; b++)
 83                                          {
 84                                                  if(temp1[b].indexOf('+')>=0||temp1[b].indexOf("91")>=0||
 85temp1[b].indexOf("080")>=0)
 86                                                  {
 87                                                          if(temp1[b].indexOf('+')>=0)
 88                                                          {
 89                                                                  request.setAttribute("number",temp1[b]);
 90                                                          }
 91                                                          else
 92                                                          {
 93                                                                  request.setAttribute("number",temp1[b]+temp1[b+1]);   
 94                                                          }                                                      
 95                                                  }
 96                                          }
 97                                  }
 98                                  if(cell1.getStringCellValue().indexOf('#')>=0)
 99                                  {
100                                         request.setAttribute("address", cell1.getStringCellValue());
101                                         HSSFRow row1 = sheet.getRow(r-1);
102                                              HSSFCell cell2  = row1.getCell(0);
103                                              request.setAttribute("company", cell2.getStringCellValue());
104                                  }
105                                  if(cell1.getStringCellValue().indexOf('#')>=0)
106                                  {
107                                         request.setAttribute("address", cell1.getStringCellValue());
108                                         HSSFRow row1 = sheet.getRow(r-1);
109                                              HSSFCell cell2  = row1.getCell(0);
110                                              request.setAttribute("company", cell2.getStringCellValue());
111                                  }
112                              }
113                              catch (Exception e)
114                              {
115                                        continue l1;
116                              }
117                              
118                            }
119          }               
120 }               
121 catch (Exception e)
122 {
123 }   
124file.delete();   
125response.setRenderParameter("jspPage","/html/isolate/result.jsp");
126 }
127}
Alexander Chow
RE: Is it possible to search content from the PDF Document?
23. Mai 2012 03:54
Antwort

Alexander Chow

LIFERAY STAFF

Rang: Liferay Master

Nachrichten: 519

Eintrittsdatum: 19. Juli 2005

Neue Beiträge

Amit, I think your file is being interpreted as a MS Office file. Do you have a .pdf extension or something else?
Hitoshi Ozawa
RE: Is it possible to search content from the PDF Document?
23. Mai 2012 04:15
Antwort

Hitoshi Ozawa

Rang: Liferay Legend

Nachrichten: 7952

Eintrittsdatum: 23. März 2010

Neue Beiträge

How are you creating you pdf files? If you're using some java tools, try using Word2007/2010 or OpenOffice/LibreOffice.
If you are having problem still, please attach your pdf file here.

org.apache.tika.exception.TikaException: Not a HPSF document

BTW, this is also a well known problem. (This won't be in the JBoss forum :-) )
Amit Doshi
RE: Is it possible to search content from the PDF Document?
23. Mai 2012 04:28
Antwort

Amit Doshi

Rang: Liferay Master

Nachrichten: 547

Eintrittsdatum: 29. Dezember 2010

Neue Beiträge

It was working fine with liferay 6.0.10(not too acurate) but giving exception in liferay 6.1 EE.

In order to verify once again, I used Vanilla Version of Liferay. Uploaded the Single pdf file(which was attached here) in Document and Media Portlet

And then done re-indexing of Document and Media Portlet. It gives me no result and no error.

Please find the attach PDF for the same.
Anhänge: installing_configuring_openldap.pdf (263,9k)
Hitoshi Ozawa
RE: Is it possible to search content from the PDF Document?
23. Mai 2012 06:11
Antwort

Hitoshi Ozawa

Rang: Liferay Legend

Nachrichten: 7952

Eintrittsdatum: 23. März 2010

Neue Beiträge

I just downloaded the attached PDF and uploaded it on Liferay 6.1.0 GA1 CE as file name "OpenLDAP" and added Tools - Search portlet and did a search on "Installing". The pdf showed up in the list and I was able to open it up and view it without any problem. Please check if you're uploading the file to a folder which the user has permission to view.

The only other difference is that you're using liferay 6.1 EE while I'm using 6.1 CE. Maybe, you should write a ticket to this issue because you're using EE and you've paid for it.
Alexander Chow
RE: Is it possible to search content from the PDF Document?
23. Mai 2012 06:08
Antwort

Alexander Chow

LIFERAY STAFF

Rang: Liferay Master

Nachrichten: 519

Eintrittsdatum: 19. Juli 2005

Neue Beiträge

Amit,

Using that file I am able to search based on the contents of the file in 6.1 CE and EE. I did not get any of your HPSF errors.

Incidentally, the preview is a little weird, but that should not affect your search problem.
Anhang

Anhang

Anhänge: Documents and Media Portlet.jpg (89,6k), Faceted Search Portlet.jpg (59,3k)
Amit Doshi
RE: Is it possible to search content from the PDF Document?
23. Mai 2012 06:29
Antwort

Amit Doshi

Rang: Liferay Master

Nachrichten: 547

Eintrittsdatum: 29. Dezember 2010

Neue Beiträge

The PDF is showing in the list but not highlighting the words inside PDF as it is working in liferay.com website and also it is working with liferay 6.0.10 version. But not with Liferay 6.1 EE

Please find the screen shot for the same what I am looking for. The Screenshot is taken from liferay 6.0.10 version. I want the same with liferay 6.1 EE.

Hope I am clear.
Anhang

Anhänge: search.PNG (14,5k)
Hitoshi Ozawa
RE: Is it possible to search content from the PDF Document?
23. Mai 2012 14:16
Antwort

Hitoshi Ozawa

Rang: Liferay Legend

Nachrichten: 7952

Eintrittsdatum: 23. März 2010

Neue Beiträge

So, it seems you are able to search for it now. It's highlighting on my setup but I've changed the highlighting logic because it was buggy anyways. Are there any errors now?
Amit Doshi
RE: Is it possible to search content from the PDF Document?
24. Mai 2012 01:12
Antwort

Amit Doshi

Rang: Liferay Master

Nachrichten: 547

Eintrittsdatum: 29. Dezember 2010

Neue Beiträge

Hitoshi :-
The attached Screen shot is for liferay 6 EE by default it provide functionality for highlighting the content inside the pdf and I am aspecting the same working functionality with liferay 6.1 EE. It seems that there is bug with liferay 6.1 EE.

In Liferay 6.1 EE it works similar as attached screen shot by Alexander Chow, not highlighting any words inside PDF just displaying PDF name and title.

But still couple of questions ...

The error is due to open office DOC file in Document and Media, so question is why it gives error for doc files while re-indexing Document and Media?
Is it bug in liferay 6.1 EE ?

why the search functionality is not working as Liferay 6 EE ?
Is it another bug in liferay 6.1 EE ?

Shall I raise ticket for both of them in liferay?

Please suggest.
Amit Doshi
RE: Is it possible to search content from the PDF Document?
24. Mai 2012 01:20
Antwort

Amit Doshi

Rang: Liferay Master

Nachrichten: 547

Eintrittsdatum: 29. Dezember 2010

Neue Beiträge

Alexander Chow :-

While Re-indexing open office doc file in Document and Media , it gives error hspf error.

And also the search functionality is working similar for the pdf files as per your screenshot in liferay 6.1 EE.

But I am aspecting the words to be highlighted inside the pdf as per the screenshot that I have attached in my post before.

In liferay 6 EE by default providing the functionality then why not in liferay 6.1 EE ?

Can we say it is bug in liferay 6.1 EE?

Please suggest.
Alexander Chow
RE: Is it possible to search content from the PDF Document?
24. Mai 2012 01:58
Antwort

Alexander Chow

LIFERAY STAFF

Rang: Liferay Master

Nachrichten: 519

Eintrittsdatum: 19. Juli 2005

Neue Beiträge

Amit,

So, let me get this straight.

  1. For the PDF file, it works fine (as in it shows what I show).
  2. For an OpenOffice doc, it does not index (HSPF error). Can you upload a test file?
  3. For the highlighting… it seems to be more fundamental in which the summary itself is not displayed. I've emailed the developer who rewrote the search portlet to be a faceted search to ask him if it was intentional or not -- so you can hold off on a ticket for now.
Amit Doshi
RE: Is it possible to search content from the PDF Document?
24. Mai 2012 03:43
Antwort

Amit Doshi

Rang: Liferay Master

Nachrichten: 547

Eintrittsdatum: 29. Dezember 2010

Neue Beiträge

Thanks Alexander and Hitoshi for your co-operation.

Here I attach the doc file, It doesn't give any error but it was not indexing in vanilla version liferay 6.1 EE

And about HSPF errror, it comes on our Test Server and there were lots of document uploaded and difficult to find because of what it comes. There we have js files,images,css,txt files etc. more than 100 files. I am trying to figure out because of which particular type of files it gives me that error and then come back on that.

So at current stage i need to figure two points :-

1) Doc file are not indexing. Don't know why?
2) Highlight point for pdf as you have mentioned.

Regards,
Amit Doshi
Anhänge: Liferay Eclipse Setup.doc (348,0k)
Alexander Chow
RE: Is it possible to search content from the PDF Document?
24. Mai 2012 05:12
Antwort

Alexander Chow

LIFERAY STAFF

Rang: Liferay Master

Nachrichten: 519

Eintrittsdatum: 19. Juli 2005

Neue Beiträge

Amit Doshi:

And about HSPF errror, it comes on our Test Server and there were lots of document uploaded and difficult to find because of what it comes. There we have js files,images,css,txt files etc. more than 100 files. I am trying to figure out because of which particular type of files it gives me that error and then come back on that.


So, I just tested this against 6.1 CE and 6.1 EE and both seem to work fine. No console errors and, as you can see in the pictures, they search by content OK. Not sure if it makes any difference, but I'm testing with Tomcat.
Anhang

Anhang

Anhänge: Search 6.1.0 CE.jpg (47,5k), Search 6.1.0 EE.jpg (58,5k)
Hitoshi Ozawa
RE: Is it possible to search content from the PDF Document?
24. Mai 2012 06:00
Antwort

Hitoshi Ozawa

Rang: Liferay Legend

Nachrichten: 7952

Eintrittsdatum: 23. März 2010

Neue Beiträge

I just tested the attached file with Liferay 6.1.0 CE Tomcat bundle and was able to search by content of the doc file.
The searched keyword was also highlighted.

Amit, your question keep changing on each post. Please test it fully before submitting another question.
Amit Doshi
RE: Is it possible to search content from the PDF Document?
24. Mai 2012 06:54
Antwort

Amit Doshi

Rang: Liferay Master

Nachrichten: 547

Eintrittsdatum: 29. Dezember 2010

Neue Beiträge

Hitoshi :-

I just tested the attached file with Liferay 6.1.0 CE Tomcat bundle and was able to search by content of the doc file.
The searched keyword was also highlighted.


You can check the post of Alexander, the searched keywords are not getting highlighted inside the document or pdf. Alexander has attached the Screen shot for both the Edition(6.1 CE and 6.1 EE).


Amit, your question keep changing on each post. Please test it fully before submitting another question.


I had a problem with lucene. After deleting the folder of lucene from data, and then re-index it started to search the document , the same way as shown by Alexander.

My Question still remains the same It was not highlighting the word inside the pdf or doc and for that Alexander replied as below

# For the highlighting… it seems to be more fundamental in which the summary itself is not displayed. I've emailed the developer who rewrote the search portlet to be a faceted search to ask him if it was intentional or not -- so you can hold off on a ticket for now.


Now waiting for Alexander answers. It is a bug or the functionalities is developed like that.

Hope I am clear.

Thanks & Regards,
Amit Doshi
Hitoshi Ozawa
RE: Is it possible to search content from the PDF Document?
24. Mai 2012 14:37
Antwort

Hitoshi Ozawa

Rang: Liferay Legend

Nachrichten: 7952

Eintrittsdatum: 23. März 2010

Neue Beiträge

You can check the post of Alexander, the searched keywords are not getting highlighted inside the document or pdf. Alexander has attached the Screen shot for both the Edition(6.1 CE and 6.1 EE).


I probably fixed in my version then. It wasn't highlighting Japanese documents correctly anyways even in older versons. I don't think not highlightening is a new "improved" feature unless someone complained about it not working correctly, and one of the developer decided to delete the feature rather than to fix it (I've seen this at some sites).
Alexander Chow
RE: Is it possible to search content from the PDF Document?
25. Mai 2012 04:37
Antwort

Alexander Chow

LIFERAY STAFF

Rang: Liferay Master

Nachrichten: 519

Eintrittsdatum: 19. Juli 2005

Neue Beiträge

Hey Amit,

OK, so what I found is this. The search portlet was refactored significantly when they added in the faceted search. As part of the refactoring, the results were changed to give preference to AssetRenderer data. See main_search_result_form.jsp:

 1AssetRendererFactory assetRendererFactory = AssetRendererFactoryRegistryUtil.getAssetRendererFactoryByClassName(className);
 2
 3if (assetRendererFactory != null) {
 4...
 5    entryTitle = assetRenderer.getTitle(locale);
 6    entrySummary = assetRenderer.getSummary(locale);
 7}
 8else {
 9...
10    Summary summary = indexer.getSummary(document, locale, snippet, viewFullContentURL);
11
12    if (viewInContext) {
13        viewURL = viewFullContentURL.toString();
14    }
15
16    entryTitle = summary.getTitle();
17    entrySummary = summary.getContent();
18}


So, what that does is for any Asset, it will try to display the summary based on the AssetRenderer summary. In the case of a Document, the summary is the description. So when you upload a file and add a description, you will find that the summary results will be anything in the description. If it turns out that the search keywords are in the description, that will be highlighted. (So, for example, if you set the AssetRenderer in that file to null, you will get the same results as you did in 6.0.)

Why is the AssetRender the default choice? The basic theory is that the AssetRenderer is supposed to have a much richer API than the Indexer. The AssetRenderer could itself use the Indexer if it wants, but not the other way around. In the future, the AssetRenderer will also be the vehicle for execution of view templates which will provide admins a way to create new presentations for assets dynamically. The Indexer will never provide any sort of templating functionality, and so the Indexer should only be used as a fallback.

Hope that helps to clarify a few things.

Alex
Amit Doshi
RE: Is it possible to search content from the PDF Document?
25. Mai 2012 06:50
Antwort

Amit Doshi

Rang: Liferay Master

Nachrichten: 547

Eintrittsdatum: 29. Dezember 2010

Neue Beiträge

Thanks Alexander for the information that you shared with us.

So I changed the logic accordingly in main_search_result_form.jsp, moved the else content into if condition and it worked fine for me. Because I found that assetRendererFactory will never be null while moving through the flow.

So I created Hook for it. And It worked superb as per my aspectation.
Please check the below screenshot for the same.

Thanks & Regards,
Amit Doshi
Anhang

Anhänge: liferay_6.1_EE.PNG (29,1k)
Alexander Chow
RE: Is it possible to search content from the PDF Document?
25. Mai 2012 08:19
Antwort

Alexander Chow

LIFERAY STAFF

Rang: Liferay Master

Nachrichten: 519

Eintrittsdatum: 19. Juli 2005

Neue Beiträge

Brilliant! Great to here.
Prabhakar Singh
RE: Is it possible to search content from the PDF Document?
18. Dezember 2013 22:57
Antwort

Prabhakar Singh

Rang: New Member

Nachrichten: 8

Eintrittsdatum: 1. August 2012

Neue Beiträge

Hii Alexander,Hitoshi,Amit ,

This is just another awesome post in Liferay Forums...thanks a lot..!!!
Got a much clearer picture reg: whats & what-not's about the liferay-serach ...!!!

Thanks & Best Regards ,
Prabhakar
Rashmi S
RE: Is it possible to search content from the PDF Document?
4. November 2014 01:32
Antwort

Rashmi S

Rang: New Member

Nachrichten: 6

Eintrittsdatum: 2. Januar 2014

Neue Beiträge

Hii Alexander,Hitoshi,Amit ,

Thanks alot this post really helped me!!

Thanks,
Rashmi S