img are processed multiple times

Resolved Nickness
(@nickness)

11 years, 1 month ago

Hi,

In some case Lazy Load XT process img tag multiple times. I noticed this issue with a post grid component that I use (Beaver Builder).
This occurs because the post grid use the_post_thumbnail() which fires post_thumbnail_html filter so img are processed by Lazy Load XT. And then when the_content filter runs Lazy Load XT process the img of the post grid a second time.

To avoid this I slightly modified the regex you use to grab img.

So I turn this

preg_match_all('/<'.$tag.'[\s\r\n]+.*?(\/|\/'.$tag.')>/is',$content,$matches);

to this

preg_match_all('/<'.$tag.'[\s\r\n]([^<]+)(\/|\/'.$tag.')>(?!<noscript>|<\/noscript>)/is',$content,$matches);

Hope this help and thanks for this great lazy loading implementation.

Best regards

https://ww.wp.xz.cn/plugins/lazy-load-xt/

Viewing 10 replies - 1 through 10 (of 10 total)

Plugin Author dbhynds
(@dbhynds)

11 years, 1 month ago

Interesting. So does your theme create some HTML that includes get_the_post_thumbnail(), then run apply_filter(‘the_content’, $that_html) ?

Either way, I’m making some revisions to the regex for next version, so I’ll include your (?!<noscript>|<\/noscript>) bit.

I’ll be releasing the next version shortly, so feel free to update to it when you see it come through.

Thread Starter Nickness
(@nickness)

11 years, 1 month ago

Interesting. So does your theme create some HTML that includes get_the_post_thumbnail(), then run apply_filter(‘the_content’, $that_html) ?

Yes exactly but it’s not the theme that include get_the_post_thumbnail(). It’s the Beaver Builder plugin which is a drag&drop frontend editor that allows to drop a post grid module anywhere in a page. I don’t know how other page builder plugins behave but I suspect it should be the same.

I’ll be checking your next version to see how it plays with Beaver Builder.

Regards

Plugin Author dbhynds
(@dbhynds)

11 years, 1 month ago

I have a development version of the plugin here that incorporates the regex change you suggested. I’ll be releasing the next version soon.
Thread Starter Nickness
(@nickness)

11 years, 1 month ago
I checkout the revision 1135603 this is the right one?

There is a problem with the regex and I thing you should turn this

preg_match_all('/<'.$tag.'[\s\r\n]+.*?'.$tag_end.'>(?!<noscript>|<\/noscript>)/is',$content,$matches);

to this

preg_match_all('/<'.$tag.'[\s\r\n]([^<]+)'.$tag_end.'>(?!<noscript>|<\/noscript>)/is',$content,$matches);

By replacing +.*? by ([^<]+) we just match everything except opening tag which is necessary with the (?!<noscript>|<\/noscript>) addition.

To see what happens, you can try both regex against following html :
```
<img width="155" height="300" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" data-src="http://local.wordpress.dev/wp-content/uploads/2013/03/featured-image-vertical-155x300.jpg" class="attachment-medium wp-post-image" alt="Horizontal Featured Image" /><noscript><img width="155" height="300" src="http://local.wordpress.dev/wp-content/uploads/2013/03/featured-image-vertical-155x300.jpg" class="attachment-medium wp-post-image" alt="Horizontal Featured Image" /></noscript>
<img class="fl-photo-img" src="http://local.wordpress.dev/wp-content/uploads/2015/04/5527cc7c08d02_input_1.jpg" alt="5527cc7c08d02_input_1" itemprop="image" />
```
Plugin Author dbhynds
(@dbhynds)

11 years, 1 month ago

1135603 is correct.

I understand the basics of regex, but I’m no pro. I tested both expressions agains the HTML you provided, and they both accurately matched the second img but not the first.

I recognize that they both work, so I don’t understand the purpose of changing +.*? to ([^<]+). Can you explain it?

Does ([^<]+) begin here? …
<img class=”fl-photo-img” src=”http://local.wordpress.dev/wp-content/uploads/2015/04/5527cc7c08d02_input_1.jpg” alt=”5527cc7c08d02_input_1″ itemprop=”image” />

Where as +.*? begins here? …
<img class=”fl-photo-img” src=”http://local.wordpress.dev/wp-content/uploads/2015/04/5527cc7c08d02_input_1.jpg” alt=”5527cc7c08d02_input_1″ itemprop=”image” />

Plugin Author dbhynds
(@dbhynds)

11 years, 1 month ago

(Look for the bold characters in those img tags. They’re kinda hard to see.)
Thread Starter Nickness
(@nickness)

11 years, 1 month ago
I’m not a regex pro neither so we speak the same language.
Did you use the s modifier when you tested both regex?

My understanding is that with the s modifier turned on the first regex will match the whole string :
```
<img width="155" height="300" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" data-src="http://local.wordpress.dev/wp-content/uploads/2013/03/featured-image-vertical-155x300.jpg" class="attachment-medium wp-post-image" alt="Horizontal Featured Image" /><noscript><img width="155" height="300" src="http://local.wordpress.dev/wp-content/uploads/2013/03/featured-image-vertical-155x300.jpg" class="attachment-medium wp-post-image" alt="Horizontal Featured Image" /></noscript>
<img class="fl-photo-img" src="http://local.wordpress.dev/wp-content/uploads/2015/04/5527cc7c08d02_input_1.jpg" alt="5527cc7c08d02_input_1" itemprop="image" />
```
Indeed if we split the regex: <img[\s\r\n]+.*?\/>(?!<noscript>|<\/noscript>)

<img[\s\r\n]+

will match opening img tag followed by one or more white space, carriage return or new line.

.*?

match any character (new line included with the s modifier turned on), zero or more times.

\/>(?!<noscript>|<\/noscript>)

match the ending tag if not followed by <noscript> or </noscript>.

So this regex will not match :

<img width="155" height="300" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" data-src="http://local.wordpress.dev/wp-content/uploads/2013/03/featured-image-vertical-155x300.jpg" class="attachment-medium wp-post-image" alt="Horizontal Featured Image" /><noscript>

as the ending img tag is followed by <noscript>.

However it will match the whole string matching the first opening img tag and everything that stands between it (without any restriction with .*?) and the first closing tag that is not followed by <noscript> or </noscript> so the closing tag of the second img in our example :
```
<img width="155" height="300" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" data-src="http://local.wordpress.dev/wp-content/uploads/2013/03/featured-image-vertical-155x300.jpg" class="attachment-medium wp-post-image" alt="Horizontal Featured Image" /><noscript><img width="155" height="300" src="http://local.wordpress.dev/wp-content/uploads/2013/03/featured-image-vertical-155x300.jpg" class="attachment-medium wp-post-image" alt="Horizontal Featured Image" /></noscript>
<img class="fl-photo-img" src="http://local.wordpress.dev/wp-content/uploads/2015/04/5527cc7c08d02_input_1.jpg" alt="5527cc7c08d02_input_1" itemprop="image" />
```
Now if we replace .*? by ([^<]+) we still match everything except a new opening tag and so it prevent the regex to spread accross multiple tags.

So the resulting regex should be:

<img[\s\r\n]+([^<]+)\/>(?!<noscript>|<\/noscript>)

With this you can even remove the s modifier as negative class always match new line character (see http://php.net/manual/en/reference.pcre.pattern.modifiers.php).

You’ll also noticed that I added back the + after [\s\r\n] that was missing in my previous version.

What do you think?
Plugin Author dbhynds
(@dbhynds)

11 years, 1 month ago

Ahah. So my original regex would match this:

<img src="" /><noscript></noscript>Blah blah<br />

because it looks for a “/>” ?

Where as yours looks until a “>” and then checks for the <noscript> ?

I think that makes sense. And yeah, that would be a good amendment to the code. I’ll incorporate it and release a new version this weekend.

Thanks for your help!

Thread Starter Nickness
(@nickness)

11 years, 1 month ago

Yes exactly the original regex version would match all of this.

The version I proposed just doesn’t match this:

<img src="" /><noscript></noscript>

because “<img…” is followed by “<noscript…”.
And doesn’t match this neither:

<img src="" /><noscript></noscript>Blah blah<br />

because they are an "<" between "<img..." and "<br />". So it breaks at the "<" of "<noscript>"in fact.

Glad I could help 🙂

Plugin Author dbhynds
(@dbhynds)

11 years, 1 month ago

Just released 0.4 with this implemented in it. Thanks for your help! I sincerely appreciate it.

Viewing 10 replies - 1 through 10 (of 10 total)

The topic ‘img are processed multiple times’ is closed to new replies.