PDF Metadata Mapping

Resolved kevinfraser
(@kevinfraser)

13 years ago

David I love the plugin and it looks very promising, but I seem to have a problem.

I either am deficient in understanding how to use the feature, or there is some other reason it doesn’t work as the documentation describes. I have carefully read and re-read all the documentation I could find that seems relevant. I have tried it with a new wordpress installation with only the MLA plugin enabled to eliminate any kind of plugin interaction possibility.

I have a collection of only PDF files, all “purely” authored in latest version of Adobe Acrobat Pro CS6, so the PDF files themselves are as canonical I think as possible. Each has had metadata input using Acrobat. I have confirmed that Acrobat shows this metadata both in the first Acrobat ‘Document Info’ screen, and echoed to its ‘Additional Metadata’ button/tabs that are supposed to somehow use XML to handle both EXIF and IPTC fields. I have also verified the presence of the metadata using the independent PDFtk command line tool (q.v.:)
( http://www.pdflabs.com/docs/pdftk-man-page )

My point is the metadata is filled in everywhere I can find that it should be and I can demonstrate it’s in these PDF files somewhere!

I cannot FOR THE LIFE of me figure out any mode or method to get the MLA to see any of that metadata in any of those PDF attachments, either during upload or using the MLA IPTC/EXIF tab to any standard WordPress field or Category or Tag fields.

I am intrigued by the ALL_EXIF and ALL_ITPC fields, but I can’t figure out from the documentation how I would use them to see this metadata in the PDF files: an example expressed as an [MLA-GALLERY] shortcode would be most appreciated.

SUMMARY: Demonstrably populated EXIF (and/or) ITPC metadata fields in a latest version Adobe-created PDF file, but seemingly unable to see that metadata using MLA plugin with any method I’ve discovered (so far).

Any ideas? I’m looking forward to slapping myself in the forehead over something cravenly simple.

http://ww.wp.xz.cn/extend/plugins/media-library-assistant/

Viewing 8 replies - 1 through 8 (of 8 total)

Thread Starter kevinfraser
(@kevinfraser)

13 years ago

An idea occurred to me that WordPress may not be at the bleeding edge of PDF authoring, so I tried converting one of the files to the Archivable (“/A”) Acrobat 5 PDF 1.4 format. I verified the presence of the metadata, via Acrobat and also the PDFtk command line tool. Same result: no apparent visibility of any metadata in the PDF files via the MLA plugin. Is there a way you know of to test the WordPress image meta function? That is probably the first place to look. Please let me know if you would like a test PDF file from me.

Plugin Author David Lingren
(@dglingren)

13 years ago

Thank you for your interest in the plugin and for posting this question. As you suspected, WordPress isn’t up to speed on PDF authoring, and MLA hasn’t addressed this issue either.

Neither WordPress nor MLA looks for metadata in anything other than image files. WordPress 3.6 will extend this to audio and video files, but I haven’t seen anything about applying this functionality to PDF files.

This is a great idea for an MLA enhancement, and I am finishing up work on a new version. I will have a serious look at how to extract metadata from PDF files and make it available in MLA.

I will keep you posted on my progress, adding comments to this topic. If you’d like to test a pre-release version, send me your e-mail address, using the “Contact Us” page at our web site:

Fair Trade Judaica/Contact Us

Thanks for your interest, a great suggestion and your patience.

Thread Starter kevinfraser
(@kevinfraser)

13 years ago

David thank you for your quick reply!

I have also been trying another plugin called MMWW

(http://ww.wp.xz.cn/plugins/mmww)

Which is apparently being hampered by WordPress’s (in)ability to find all relevant XMP metadata in all PDF files as well. I got a near-instant reply from MMWW’s developer this morning, sent him some test files, and he’s having a look at it. He has licensed the Zend_PDF module to redistribute with his plugin. I don’t know what your roadmap is, but if you haven’t seen it, maybe this will Offer some clues?
( http://framework.zend.com/manual/1.12/en/zend.pdf.info.html )
Perhaps brings a solution closer and sooner?

Ollie is running into exactly the same problem – incomplete metadata retrieval from PDFs. Perhaps the two of you can share a solution to this problem?

Thanks for your pre-release offer — I’ll drop you my email address via your contact link and I’ll be happy to send you real world test files too if you need them.

OllieJones
(@olliejones)

13 years ago

Kevin contacted me with a similar problem retrieving XMP metadata from his pdf files.

It turns out that some PDF files contain multiple XMP metadata stanzas. (Each metadata stanza is a well-formed chunk of XML.)

The first one is pretty much a stub; but later ones contain the useful metadata. That accounts for MMWW (my plugin)’s failure to read his metadata correctly. My version 1.0.2 corrects it.

Don’t hesitate to steal my opensource code if you want it. It doesn’t use the Zend framework; but it does use PHP’s xml handling.

Thread Starter kevinfraser
(@kevinfraser)

13 years ago

This is just cool. Thanks, Ollie! I can’t wait to see what David does with your solution, too!

Plugin Author David Lingren
(@dglingren)

12 years, 12 months ago

Ollie,

I’ve begun looking into adding PDF metadata support, and this will be a great help. Thank you for your offer and your generosity!

I will post my progress to this topic.

Thread Starter kevinfraser
(@kevinfraser)

12 years, 12 months ago

David, Ollie’s code to pull metadata strings from XMP in a PDF definitely works! Haven’t tested it on anything except PDFs yet, but from looking at it I suspect it will also pull metadata from anything containing XMP, including humungous media files — without causing server memory issues. Good approach to that, Ollie! Theoretically that will cover every supported media type that Adobe CS6 applications can save.

One little hack I already stuck in was to locate where Ollie formed the list of returned XMP keywords/tags and change the delimiter from a semicolon to a comma, which at least makes drag and drop input of those tags as a WP Media Library item easier.

It would really help me if I could just have those tags shoot right into the WP db automagically on import, but this at least gets me closer!

Thank you both!

Plugin Author David Lingren
(@dglingren)

12 years, 8 months ago

I have released version 1.50, which adds the ability to extract metadata from PDF documents.

Please let me know if you have any problems with or further questions about this new feature. Thanks again for your interest and for your suggestion.

Viewing 8 replies - 1 through 8 (of 8 total)

The topic ‘PDF Metadata Mapping’ is closed to new replies.

Tags