PDF to HTML plugin? (accessibility and responsive need)

mediaboxca
(@mediaboxca)

3 years, 7 months ago

Is there a plugin to convert PDF to HTML on the fly, to display the content in HTML inside a WordPress website page?

I work on a website that has several hundred PDF documents. I am trying to get these to be created accessible, so that exporting to HTML will be easier to handle. However, rendering the HTML one document at the time and uploading each as a page would be a big task, and could present problems when a PDF document is updated. It would be better to render on the fly (and cache, I suppose).

I would like to do this to both improve accessibility and also, so make sure the content is responsive (which PDFs are not, really).

I see many plugins that go from HTML to PDF but none that go the other way around. Seems like an opportunity for a developer 🙂

The page I need help with: [log in to see the link]

Viewing 8 replies - 1 through 8 (of 8 total)

Moderator bcworkz
(@bcworkz)

3 years, 7 months ago

IMHO, dynamically converting a PDF document on every request would be very inefficient. AFAIK the process is computationally very “expensive”. Of course caching would be a huge benefit. Even better would be to batch process the files one time and save the result in a non-volatile manner. There are a number of command line utilities that will do such a conversion. These typically work on one file at a time, but a batch script could be created to process all files in a given folder.

The resulting HTML document is likely saved as a static .html file. It’s possible to use such files in a WP site. When such a file is requested, WP is not even involved, behavior would be like any old skool web 1.0 site. This means any WP themeing, header, footer, etc. are not part of the content. You could embed a static .html file within a themed WP page by using <iframe>. It would be possible to create a custom template that dynamically embeds an .html file in a WP page.

There may even be a plugin that will import static .html files into individual WP pages. If you know PHP coding, it wouldn’t be too difficult to create such a tool. It’d basically read file content and save it with wp_insert_post().

Thread Starter mediaboxca
(@mediaboxca)

3 years, 7 months ago

Thank you bcworkz!

All good points. I’d rather keep the look and feel of the site on all pages so automating maybe not the best solution. If I can get the design team to create well tagged PDFs, then at least I will have won the first battle. I might be able to get away with doing a copy-and-paste from exported HTML to the “code” section in the body of the page and have the results I wanted in the first place. A little more tedious, but probably worth it.

bretlee61
(@bretlee61)

3 years, 5 months ago

Very good explanation, Here I got quality information from your post. Now I am able to do this HTML process for the bad bunny WordPress store. Thanks a lot, bcworkz.

memangino
(@memangino)

2 years, 11 months ago

@mediaboxca

I am trying to do the same thing. Were you able to make it work? If so, would you share how you did it? I am not a developer, so I was hoping that there would be a “cut and paste” solution. 🙂

Thanks!!

Thread Starter mediaboxca
(@mediaboxca)

2 years, 11 months ago

@memangino No, not yet. However, I did find out that unless the PDF files are incredibly simple (like, text only), or properly made as accessible PDF files (https://www.adobe.com/accessibility/pdf/pdf-accessibility-overview.html) which usually isn’t the case, then the HTML output will not be all that great. If there ate images and tables and other formatting, you can expect pretty bacd results.

Maybe in the near future there will be AI tools capable of “reading” PDF files and transcribing them properly to HTML, with all the correct attributes.

Please do share if you find a solution!

Marcus Quinn
(@surferking)

2 years, 8 months ago

https://mathpix.com seems to be doing something with AI OCR for this

Thread Starter mediaboxca
(@mediaboxca)

2 years, 8 months ago

Thank you! That looks very useful.

abolotni
(@abolotni)

2 years, 7 months ago

Look for plugins called: 3D FlipBook