Viewing 8 replies - 1 through 8 (of 8 total)
  • Thread Starter kevinfraser

    (@kevinfraser)

    An idea occurred to me that WordPress may not be at the bleeding edge of PDF authoring, so I tried converting one of the files to the Archivable (“/A”) Acrobat 5 PDF 1.4 format. I verified the presence of the metadata, via Acrobat and also the PDFtk command line tool. Same result: no apparent visibility of any metadata in the PDF files via the MLA plugin. Is there a way you know of to test the WordPress image meta function? That is probably the first place to look. Please let me know if you would like a test PDF file from me.

    Plugin Author David Lingren

    (@dglingren)

    Thank you for your interest in the plugin and for posting this question. As you suspected, WordPress isn’t up to speed on PDF authoring, and MLA hasn’t addressed this issue either.

    Neither WordPress nor MLA looks for metadata in anything other than image files. WordPress 3.6 will extend this to audio and video files, but I haven’t seen anything about applying this functionality to PDF files.

    This is a great idea for an MLA enhancement, and I am finishing up work on a new version. I will have a serious look at how to extract metadata from PDF files and make it available in MLA.

    I will keep you posted on my progress, adding comments to this topic. If you’d like to test a pre-release version, send me your e-mail address, using the “Contact Us” page at our web site:

    Fair Trade Judaica/Contact Us

    Thanks for your interest, a great suggestion and your patience.

    Thread Starter kevinfraser

    (@kevinfraser)

    David thank you for your quick reply!

    I have also been trying another plugin called MMWW

    (http://ww.wp.xz.cn/plugins/mmww)

    Which is apparently being hampered by WordPress’s (in)ability to find all relevant XMP metadata in all PDF files as well. I got a near-instant reply from MMWW’s developer this morning, sent him some test files, and he’s having a look at it. He has licensed the Zend_PDF module to redistribute with his plugin. I don’t know what your roadmap is, but if you haven’t seen it, maybe this will Offer some clues?
    ( http://framework.zend.com/manual/1.12/en/zend.pdf.info.html )
    Perhaps brings a solution closer and sooner?

    Ollie is running into exactly the same problem – incomplete metadata retrieval from PDFs. Perhaps the two of you can share a solution to this problem?

    Thanks for your pre-release offer — I’ll drop you my email address via your contact link and I’ll be happy to send you real world test files too if you need them.

    OllieJones

    (@olliejones)

    Kevin contacted me with a similar problem retrieving XMP metadata from his pdf files.

    It turns out that some PDF files contain multiple XMP metadata stanzas. (Each metadata stanza is a well-formed chunk of XML.)

    The first one is pretty much a stub; but later ones contain the useful metadata. That accounts for MMWW (my plugin)’s failure to read his metadata correctly. My version 1.0.2 corrects it.

    Don’t hesitate to steal my opensource code if you want it. It doesn’t use the Zend framework; but it does use PHP’s xml handling.

    Thread Starter kevinfraser

    (@kevinfraser)

    This is just cool. Thanks, Ollie! I can’t wait to see what David does with your solution, too!

    Plugin Author David Lingren

    (@dglingren)

    Ollie,

    I’ve begun looking into adding PDF metadata support, and this will be a great help. Thank you for your offer and your generosity!

    I will post my progress to this topic.

    Thread Starter kevinfraser

    (@kevinfraser)

    David, Ollie’s code to pull metadata strings from XMP in a PDF definitely works! Haven’t tested it on anything except PDFs yet, but from looking at it I suspect it will also pull metadata from anything containing XMP, including humungous media files — without causing server memory issues. Good approach to that, Ollie! Theoretically that will cover every supported media type that Adobe CS6 applications can save.

    One little hack I already stuck in was to locate where Ollie formed the list of returned XMP keywords/tags and change the delimiter from a semicolon to a comma, which at least makes drag and drop input of those tags as a WP Media Library item easier.

    It would really help me if I could just have those tags shoot right into the WP db automagically on import, but this at least gets me closer!

    Thank you both!

    Plugin Author David Lingren

    (@dglingren)

    I have released version 1.50, which adds the ability to extract metadata from PDF documents.

    Please let me know if you have any problems with or further questions about this new feature. Thanks again for your interest and for your suggestion.

Viewing 8 replies - 1 through 8 (of 8 total)

The topic ‘PDF Metadata Mapping’ is closed to new replies.