IMHO, dynamically converting a PDF document on every request would be very inefficient. AFAIK the process is computationally very “expensive”. Of course caching would be a huge benefit. Even better would be to batch process the files one time and save the result in a non-volatile manner. There are a number of command line utilities that will do such a conversion. These typically work on one file at a time, but a batch script could be created to process all files in a given folder.
The resulting HTML document is likely saved as a static .html file. It’s possible to use such files in a WP site. When such a file is requested, WP is not even involved, behavior would be like any old skool web 1.0 site. This means any WP themeing, header, footer, etc. are not part of the content. You could embed a static .html file within a themed WP page by using <iframe>. It would be possible to create a custom template that dynamically embeds an .html file in a WP page.
There may even be a plugin that will import static .html files into individual WP pages. If you know PHP coding, it wouldn’t be too difficult to create such a tool. It’d basically read file content and save it with wp_insert_post().
Thank you bcworkz!
All good points. I’d rather keep the look and feel of the site on all pages so automating maybe not the best solution. If I can get the design team to create well tagged PDFs, then at least I will have won the first battle. I might be able to get away with doing a copy-and-paste from exported HTML to the “code” section in the body of the page and have the results I wanted in the first place. A little more tedious, but probably worth it.
Very good explanation, Here I got quality information from your post. Now I am able to do this HTML process for the bad bunny WordPress store. Thanks a lot, bcworkz.
@mediaboxca
I am trying to do the same thing. Were you able to make it work? If so, would you share how you did it? I am not a developer, so I was hoping that there would be a “cut and paste” solution. π
Thanks!!
@memangino No, not yet. However, I did find out that unless the PDF files are incredibly simple (like, text only), or properly made as accessible PDF files (https://www.adobe.com/accessibility/pdf/pdf-accessibility-overview.html) which usually isn’t the case, then the HTML output will not be all that great. If there ate images and tables and other formatting, you can expect pretty bacd results.
Maybe in the near future there will be AI tools capable of “reading” PDF files and transcribing them properly to HTML, with all the correct attributes.
Please do share if you find a solution!
https://mathpix.com seems to be doing something with AI OCR for this
Thank you! That looks very useful.
Look for plugins called: 3D FlipBook