Good to hear from you again, and thanks for your report. Thanks as well for posting the complete text of the PHP Warning message; very helpful.
I tried the link you posted but got a 404 Page Not Found result.
Line 4411 in the MLA code is part of the logic for deciding the metadata in the PDF document. It looks like one or more of your eight documents has metadata that is damaged or simply a format the MLA doesn’t handle correctly. If you can fix your link to give me access to the documents I can try them out on my system and investigate the problem.
Thanks for any additional information you can provide.
Sorry as it’s a debug utility I had set it to Private. It is now published http://godshillnewforest.net/mla-pdf-debug.
Looking at the error items there are are 5. Looking at the PDFs there are 5 that were all produced by the Epson scan. So based on they both have 5 files maybe it is the Epson Scan that produces odd PDFs? I have left all the pdfs on the web site.
Must admit I do need a good PDF meta data editing application so suggestions welcomed.
Thanks for making the debug page available and for your excellent detective work. I downloaded and tested the five Epson scan documents and they were the cause of the problem.
All of those documents have binary zeroes in the CreateDate and ModifyDate fields. That caused the XMP parser to fail, leaving bad data in the array that MLA uses to access the data. The bad data caused the PHP Warning message.
I have updated the code to discard damaged data and avoid the message. I also added a fix to remove the binary zeroes so the rest of the data can be accessed without errors.
I also tried opening the documents in Adobe Acrobat and Adobe Reader DC and then saving them with a different file name. The saved files do not have the binary zeroes that caused the problem. If you use either program and open the File menu “Properties…” window you can edit the metadata easily.
I have uploaded a new MLA Development Version dates 20151102 that contains the fixes. You can find step-by-step instructions for using the Development Version in this earlier topic:
MLA errors when using plugin
If you get a chance to try the Development Version please let me know how it works for you. I will leave this topic unresolved until the fixes go out in the next MLA version. Thanks for helping me find and fix this issue.
David,
Hi just uploaded the Development version and the debug-pdf page http://godshillnewforest.net/mla-pdf-debug works just fine.
Many thanks for a speedy fix.
Paul
PS have you ever considered doing for Posts what MLA does for media??
PPS on a wider topic it is a shame that the ww.wp.xz.cn does not readily allow us users to easily provide links to show others what MLA can do.
David, There’s another possible rogue PDF – it’s this one 2015-04-00-Annual-Parish-Meeting-Agendav2.pdf
which on the debug page displays this info
array ( ‘PDF_Version’ => ‘PDF-1.6’, ‘PDF_VersionNumber’ => ‘1.6’, ‘xmptk’ => ‘Adobe XMP Core 5.2-c001 63.139439, 2010/09/27-13:37:26 ‘, ‘DocumentID’ => ‘uuid:5bebe852-b1e9-468a-913b-3593adcb788c’, ‘InstanceID’ => ‘uuid:edb49045-190f-4243-962c-b2379d7cec45’, ‘xmlns’ => ‘(ARRAY)’, ‘xmpMM’ => ‘(ARRAY)’, )
However, whenever we look at it on the desktop using Adobe reader, Cute PDF, FoxIt Reader, PDF-Info it shows a valid Title and author details.
Are the details there or are the just not being read by mla?
After a lot to-ing and fro-ing using almost all the online and desktop pdf editing tools available it seems that only the online https://www.cutepdf-editor.com/edit.asp is able to consistently produce readable files with updated meta data (see v5-2 and v6) as opposed to the others.
As a slightly more controlled test I downloaded 2015-04-00-Annual-Parish-Meeting-Agendav4.pdf loaded it into CutePDF-editor and ONLY changed the layout from default to single page ie I did not change the meta-data at all, saved the file and uploaded it to become 2015-04-00-Annual-Parish-Meeting-Agendav4-1.pdf which now displays the meta data including the correct producer that it contained all along…
I leave the pdfs on the web site for you!!
Thanks for your updates and for the new and interesting document samples. Thanks as well for all the testing you’ve done on different editing tools.
I downloaded and tested all eight 2015-04-00-Annual-Parish-Meeting-Agenda ... documents. They were very useful and allowed me to find and fix a couple of additional issues with the PDF and XMP metadata processing. Thank you!
I have uploaded a new Development Version dated 20151103 that will give you much better results with all of the example documents and probably others as well. Please let me know if you find any other documents that give you bad or incomplete results. The metadata processing is very data-dependent and I am always interested in new examples to work with.
David,
Hi many thanks the latest dev version displays loads of details from the PDFs. A quick question which title is used is it ‘Title’ or ‘title’?
Paul
Thanks for trying the latest Development Version; I am happy to hear it’s working for you.
All of the PDF and XMP metadata field names are case-sensitive, so “Title” and “title” are different fields. “Title” is the more reliable choice. Older PDF documents populate it directly and MLA will populate it from other XMP fields as needed. For example, from the “dc.title” field.
MLA also copies values from namespaces like “dc” up to the root level for easier access. That’s where “title” comes from.
All of these rules and enhancements are further explained in the “Field-level metadata in PDF documents” section of the Settings/Media Library Assistant Documentation tab.
David,
Hi in checking the PDFs there’s another one that does not display the details – 2014-02-11-Council-Agenda.pdf.
A quick update to subject in the https://www.cutepdf-editor.com/edit.asp and it now displays all the details 2014-02-11-Council-Agenda-1.pdf.
I leave both of them on the site for your perusal.
Paul
Thanks once again for your tireless efforts to find PDF documents with unusual metadata format/content. To be fair, the problem I found in your latest example also occurs in a few of my older test documents; I just never noticed it.
I have uploaded a new Development Version dated 20151104 that will give you better results with the 2014-02-11-Council-Agenda.pdf document. As always, let me know if you find other examples that MLA doesn’t handle; I really appreciate your efforts!
I have released MLA version 2.20, which contains all the fixes we worked on for PDF parsing.
I am marking this topic resolved, but please update it if you find any other PDF metadata issues. Thanks for working with me on these improvements.