• Resolved mlanners

    (@mlanners)


    Hello,

    On a site that we converted to static with SimplyStatic successfully in the past, we have recently been hit by a problem related to Umlauts originally encoded as UTF-8 (two octets) in the dynamic page. In the generated static page, the Umlaut becomes a succession of two HTML entities.

    I cannot readily give access to the dynamic page or the static page with the error, but I can show some code. Dynamic page:

    <head> <meta charset="UTF-8">
    ...
    <a href="#" class="elementor-item elementor-item-anchor">Kommunizéieren mat Luxchat</a>
    ...

    This “é” is in fact encoded as UTF-8 in two bytes "C3A9". Firefox page source displays it correctly. Previous versions of SimplyStatic left this untouched as-is, but I can’t name the version we used in the past.

    The current version transforms the above to this:

    <head> <meta charset="UTF-8">
    ...
    <a href="#" class="elementor-item elementor-item-anchor">Kommuniz&Atilde;&copy;ieren mat Luxchat</a>

    The problem sounds similar to this topic:
    https://ww.wp.xz.cn/support/topic/text-being-converted-to-html-entities/
    However we are in the more recent version 3.4.6.1.

    Any idea what’s up?

    Thanks and cheers

    Michel

Viewing 4 replies - 1 through 4 (of 4 total)
  • Plugin Author patrickposner

    (@patrickposner)

    Hey @mlanners,

    I’ll give this a test and follow up!

    Update:

    I just gave this a test and it exported just fine (without any UTF-8 conversion):

    <div class="entry-content alignwide wp-block-post-content has-global-padding is-layout-constrained wp-block-post-content-is-layout-constrained">
    <p>Kommunizéieren mat Luxchat</p>
    </div>

    Are you running on the latest version of Simply Static?

    • This reply was modified 7 months, 4 weeks ago by patrickposner.
    Thread Starter mlanners

    (@mlanners)

    Hello Patrick,

    We were running the then-current version of SimplyStatic. I have since upgradsed to 3.5.0 and performed a new export. Some of the accented characters are still mangled, for example:

    • “É”, encoded UTF-8 0xc389, becomes “É”, encoded as “Ã&#137;” in the static HTML source code, corresponding to an individual interpretation of the two bytes 0xc3 and 0x89 instead of a UTF-8 two-byte character
    • “Õ”, encoded UTF-8 0xc396, becomes “Ö“, encoded as “Ã&#150;” in the static HTML source code, corresponding to an individual interpretation of the two bytes 0xc3 and 0x96 instead of a UTF-8 two-byte character
    • “ ‘ “, encoded UTF-8 0xe28099, becomes “’”, encoded as “â&#128;&#153;” in the static HTML source code, corresponding to an individual interpretation of the three bytes 0xe2, 0x80 and 0x99 instead of a UTF-8 three-byte character

    Here is an example of a correctly exported accented character:

    • “ë”, encoded UTF-8 0xc3ab, remains “ë” and is still encoded UTF-8 0xc3ab after the static export.

    I fail to see why _some_ characters are wrongly “translated” when passing through SimplyStatic while others remain intact. Unless there is a way to redefine the charset for certain HTML elements? Or maybe SimplyStatic is somehow lost in the tag space and doesn’t recognize some characters as printing characters but as non-printing HTML code? The fact that the same text block can contain both good and bad conversions speaks against this theory.

    Any idea to help pinpoint the error?

    Thanks and cheers

    Michel

    Thread Starter mlanners

    (@mlanners)

    In other news, version 3.5.1 seems to keep accented characters intact, but corrupts the HTML code…. investigating….

    Thread Starter mlanners

    (@mlanners)

    Concerning corrupted code, more info:

    Example HTML code extract after export with 3.5.0 (one <div > opening tag):

    <div class="elementor-element elementor-element-d42fdbd elementor-nav-menu__align-end elementor-nav-menu__text-align-center elementor-nav-menu--dropdown-tablet elementor-nav-menu--toggle elementor-nav-menu--burger elementor-widget elementor-widget-nav-menu" data-id="d42fdbd" data-element_type="widget" data-settings="{&quot;submenu_icon&quot;:{&quot;value&quot;:&quot;&lt;svg aria-hidden=\&quot;true\&quot; class=\&quot;e-font-icon-svg e-fas-angle-down\&quot; viewBox=\&quot;0 0 320 512\&quot; xmlns=\&quot;http:\/\/www.w3.org\/2000\/svg\&quot;&gt;&lt;path d=\&quot;M143 352.3L7 216.3c-9.4-9.4-9.4-24.6 0-33.9l22.6-22.6c9.4-9.4 24.6-9.4 33.9 0l96.4 96.4 96.4-96.4c9.4-9.4 24.6-9.4 33.9 0l22.6 22.6c9.4 9.4 9.4 24.6 0 33.9l-136 136c-9.2 9.4-24.4 9.4-33.8 0z\&quot;&gt;&lt;\/path&gt;&lt;\/svg&gt;&quot;,&quot;library&quot;:&quot;fa-solid&quot;},&quot;layout&quot;:&quot;horizontal&quot;,&quot;toggle&quot;:&quot;burger&quot;}" data-widget_type="nav-menu.default">

    Same tag after export with 3.5.1:

    <div class="elementor-element elementor-element-d42fdbd elementor-nav-menu__align-end elementor-nav-menu__text-align-center elementor-nav-menu--dropdown-tablet elementor-nav-menu--toggle elementor-nav-menu--burger elementor-widget elementor-widget-nav-menu" data-id="d42fdbd" data-element_type="widget" data-settings="{" submenu_icon":{"value":"<svg aria-hidden=\"true\" class=\"e-font-icon-svg e-fas-angle-down\" viewBox=\"0 0 320 512\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\">
    <path d=\"M143 352.3L7 216.3c-9.4-9.4-9.4-24.6 0-33.9l22.6-22.6c9.4-9.4 24.6-9.4 33.9 0l96.4 96.4 96.4-96.4c9.4-9.4 24.6-9.4 33.9 0l22.6 22.6c9.4 9.4 9.4 24.6 0 33.9l-136 136c-9.2 9.4-24.4 9.4-33.8 0z\">
    <\/path><\/svg>","library":"fa-solid"},"layout":"horizontal","toggle":"burger"}" data-widget_type="nav-menu.default">

    I have my doubt that HTML entities should be found inside tags, but definitely the 3.5.1 version doesn’t work and renders part of the tag as text.

Viewing 4 replies - 1 through 4 (of 4 total)

The topic ‘UTF-8 converted to HTML entities’ is closed to new replies.