Viewing 5 replies - 1 through 5 (of 5 total)
  • Plugin Author Mikko Saari

    (@msaari)

    I’ve explored this, and it’s not that simple. PHP can read PDFs, but in my experience it’s very slow. Reading even a single PDF file can take longer than most servers allow, which would lead to problems when building the index.

    The SearchWP extension says “Warning: This Extension requires the use of exec() and also requires you to install Xpdf (upload a file to a non-public location) yourself.” – That’s a HUGE can of worms, and I have very little interest in providing support for something like that.

    So, at the moment it’s something I’d like to offer, but right now it’s not practical enough.

    Thread Starter maxystr

    (@maxystr)

    But maybe you can add it as a “hidden” feature (at least to the premium version)? I’m pretty sure many users have some control over their server configuration and can increase the execution time or install new php libraries. You could i.e. use a filter or action to activate the pdf index ing and then only experienced users/devs who know what they do can use/activate the pdf feature and the others won’t bother you.

    Btw SearchWP can also index pdf files without xpdf but the results may not be as accurate as with xpdf. I also found this project: http://www.pdfparser.org/ which may help you to index pdf files better/faster.

    Thread Starter maxystr

    (@maxystr)

    Ok, I found something which might work https://github.com/mplattu/masala_relevanssi

    Plugin Author Mikko Saari

    (@msaari)

    I’ve tried that code and found it unpractical. I’ve also tried PDF Parser. That didn’t work, either.

    Also, my development time is very limited, so I’m going to spend it on bug fixes and features that are useful to all, and not hidden features for the few chosen.

    It’s a cool feature, but like I said, it’s just not practical. When I get to developing the cloud-based version of Relevanssi, it’s pretty high on my list of things to do.

    why masala_relevanssi does not index all pdf it indexed just the first 3 line plz help me

Viewing 5 replies - 1 through 5 (of 5 total)

The topic ‘Feature Request: PDF Indexing’ is closed to new replies.