strlen bug with RTL languages?
-
I was playing around with a plugin for an RTL language website. If I use RTL text in the glossary term, it messes up the page quite badly. I figured it relates to the length of the glossary term, and from there to the
gt_get_lenmethod. I saw that you’re using a check if the text is RTL and then applymb_strlenwithout encoding. I’m not a PHP or unicode expert, but I thinkstrlenshould do the job without having to check for RTL? e.g. see https://stackoverflow.com/a/12046233
-
Hi,
strlendoesn’t do the job.
mb_strlenwithout any parameter set the encoding based on the server one enabled https://www.php.net/manual/en/function.mb-strlen.phpThe difference between the two is:
*strlen works perfectly with ascii or latin encoding
*mb_strlen works with emoji, hebrew, kanji, arab and the other languages (also mixed) where the characters in reality are built of various “hidden” characters to print a single symbolAs the RTL languages requires as mandatory a check for multibyte symbols, we enforce in the code to avoid issues.
To help you debug your issue I need a page where I can investigate what is happening as we have automated tests for the languages mentioned above for example.I tested with Hebrew and it’s broken. Changing
gt_get_lento simplyreturn strlen($stringtomatch)seems to fix it and works with English glossary terms or Hebrew.The site I’m working on isn’t live, but I tested with a glossary term with Hebrew, e.g. אבגדהוזחטיכלמנסעפצקרשת in the term, and then having two terms on a page, the first one pointing to the Hebrew one, and then second to another term (Hebrew/English), and the HTML gets broken as a result.
We have other customers using the plugin with the code as it is with Hebrew and we don’t have the issue you are reporting.
I need a page with the issue to investigate, because the reasons are why aren’t working can be various, from plugin conflicts to a server configured differently.I can share the HTML output for now? but once you look at the HTML it’s quite clear that the bug is related to the length calculation. The HTML gets garbled based on the length of the term excerpt. Maybe it’s something on my system, but I’m not sure what it is. Changing to use
strlenseems to work on my system with different languages. I can also check with other unicode characters.In general, I’m curious why the
replace_with_utf_8function needs length at all. Can this be simplified by using a template instead?-
This reply was modified 2 years, 2 months ago by
yoav.aner.
There are various reasons why needs the length, as example with mixed encoding the calculation is different or with broken characters, with complex HTML and many others.
I can try with the content (I need also a list of all the words found) but isn’t enough to check the issue.
As I said, I’m not that familiar with PHP, but I imagine text substitution is more-or-less a solved problem. I imagine using some kind of templating?
If I provide access to a page that is broken, what can you test more than looking at the HTML? or would you need admin/backend access?
I just tested with Kanji and it seems to work ok, but not with Hebrew. The website is set to Hebrew/RTL.
I need probably an access but this kind of support it is offered only for premium customers, so I need that you share a link here.
Not a Premium customer, but it looks like a bug to me, and I have a feeling the implementation can be simplified. Thanks so much for responding quickly, Daniele. Happy to share some HTML snippets if it helps.
I hope that can be simplified, but after years this was the only solution working for all the languages we met.
you can share the HTML snippet anyway and I can do some tests.<div class="entry-content wp-block-post-content has-global-padding is-layout-constrained wp-block-post-content-is-layout-constrained"><p>Check if you can do <span class="glossary-tooltip glossary-term-299" tabindex="0"><span class="glossary-link"><a href="https://haideberlin.com/glossary/aterm/" target="_blank" class="glossary-only-link">ATerm</a></span><span class="hidden glossary-tooltip-content clearfix"><span class="glossary-tooltip-text">אבגדהוזחטיכלמנסעפצקרשת <a href="https://haideberlin.com/glossary/aterm/">More</a></span></span></span> with the apartment</p> <<span class="glossary-tooltip glossary-term-301" tabindex="0"><span class="glossary-link"><a href="https://haideberlin.com/glossary/bterm/" target="_blank" class="glossary-only-link">BTerm</a></span><span class="hidden glossary-tooltip-content clearfix"><span class="glossary-tooltip-text">This is Term B <a href="https://haideberlin.com/glossary/bterm/">More</a></span></span></span>o check with the BTerm about facilities</p></div><div class="entry-content wp-block-post-content has-global-padding is-layout-constrained wp-block-post-content-is-layout-constrained"><p>Check if you can do <span class="glossary-tooltip glossary-term-299" tabindex="0"><span class="glossary-link"><a href="https://haideberlin.com/glossary/aterm/" target="_blank" class="glossary-only-link">ATerm</a></span><span class="hidden glossary-tooltip-content clearfix"><span class="glossary-tooltip-text">This is A term <a href="https://haideberlin.com/glossary/aterm/">More</a></span></span></span> with the apartment</p> <p>Also check with the <span class="glossary-tooltip glossary-term-301" tabindex="0"><span class="glossary-link"><a href="https://haideberlin.com/glossary/bterm/" target="_blank" class="glossary-only-link">BTerm</a></span><span class="hidden glossary-tooltip-content clearfix"><span class="glossary-tooltip-text">This is Term B <a href="https://haideberlin.com/glossary/bterm/">More</a></span></span></span> about facilities</p></div>So in this case we have the gutenberg editor with blocks with mixed content but the HTML like this is not helpful for me.
I need the clean text and the words that mismatch to tests in our test suite.Can you point me to the test suite? If it’s easy enough to set up a test environment and run it, I can try to see if I can reproduce it on there.
We don’t share the tests to simplify the plugin deploy and people asking for support with that stuff (also because has stuff for the pro version).
Closing after 2 weeks.
-
This reply was modified 2 years, 2 months ago by
The topic ‘strlen bug with RTL languages?’ is closed to new replies.