Title: [Plugin: HTML Import 2] Importing preformatted  text (pre tag)
Last modified: August 20, 2016

---

# [Plugin: HTML Import 2] Importing preformatted text (pre tag)

 *  [Mark Tuttle](https://wordpress.org/support/users/markrtuttle/)
 * (@markrtuttle)
 * [14 years, 3 months ago](https://wordpress.org/support/topic/plugin-html-import-2-importing-text/)
 * Are there known issues with importing preformatted text?
 * Importing the test file
 *     ```
       <html>
       <head><title>Test</title></head>
       <body>
       <pre>
       This is the first line
       This is the second line
       </pre>
       </html>
       ```
   
 * with `<pre>` listed under import settings -> content -> allowed html results 
   in the single unbroken line
 * `<pre> This is the first line This is the second line </pre>`
 * Is there a configuration of this plug in that will respect the line breaks within
   the `<pre></pre>` tags?
 * [http://wordpress.org/extend/plugins/import-html-pages/](http://wordpress.org/extend/plugins/import-html-pages/)

Viewing 3 replies - 1 through 3 (of 3 total)

 *  Thread Starter [Mark Tuttle](https://wordpress.org/support/users/markrtuttle/)
 * (@markrtuttle)
 * [14 years, 3 months ago](https://wordpress.org/support/topic/plugin-html-import-2-importing-text/#post-2550138)
 * I propose adding to the HTML_Import class defined in html-importer.php the function
 *     ```
       function strip_insignificant_html_whitespace($string) {
         $pre_start = "<pre(?:>|\\s[^>]*>)";
         $pre_end   = "</pre(?:>|\\s[^>]*>)";
   
         $old_parts = preg_split(";($pre_start|$pre_end);i",$string,0,PREG_SPLIT_DELIM_CAPTURE);
         $new_parts = array();
   
         $strip = true;
         foreach ($old_parts as $part) {
           if (preg_match(";$pre_start;i",$part)) {
             $tmp = preg_replace(";\s+;"," ",$part);
             $new_parts[] = preg_replace("; +>;",">",$tmp);
             $strip = false;
             continue;
           }
           if (preg_match(";$pre_end;i",$part)) {
             $tmp = preg_replace(";\s+;"," ",$part);
             $new_parts[] = preg_replace("; +>;",">",$tmp);
             $strip = true;
             continue;
           }
           if ($strip)
             $new_parts[] = preg_replace(";\s+;"," ",$part);
           else
             $new_parts[] = $part;
         }
         return implode("",$new_parts);
       }
       ```
   
 * In clean_html
 *     ```
       replace
         $string = str_replace( '\n', ' ', $string );
       with
         $string = $this->strip_insignificant_html_whitespace($string);
       ```
   
 * In get_post in the `!empty($my_post['post_content']))`
 *     ```
       replace
         $my_post['post_content'] = ereg_replace("[\n\r]", " ", $my_post['post_content']);
       with
         $my_post['post_content'] = $this->strip_insignificant_html_whitespace($my_post['post_content']);
       ```
   
 * It would be nice also to strip the contents of cdata blocks and <script>..</script
   > blocks cleanly. I find examples like
 *     ```
       <div id="googleAds">
         <!-- b e g i n   g o o g l e  a d s  -->
         <script type="text/javascript">
           //<![CDATA[
           <!--
           google_ad_client = "...";
           google_ad_slot = "...";
           google_ad_width = ...;
           google_ad_height = ...;
           //-->
           //]]>
         </script>
         <script type="text/javascript" src="/data/../pagead2.googlesyndication.com/pagead/show_ads.js">
         </script> <!-- e n d   g o o g l e  a d s  -->
       </div>
       ```
   
 * that are not stripped cleanly by the application of the php strip_tags function
   in the plugin.
 *  Thread Starter [Mark Tuttle](https://wordpress.org/support/users/markrtuttle/)
 * (@markrtuttle)
 * [14 years, 3 months ago](https://wordpress.org/support/topic/plugin-html-import-2-importing-text/#post-2550173)
 * To strip the cdata, script, and style blocks, I think it is sufficient to add
   the functions
 *     ```
       function allowed_tag($tag,$allowedtags=NULL) {
         return
           !is_null($allowedtags) &&
           stripos($allowedtags,$tag) !== false;
       }
   
       function strip_cdata_block($string,$allowedtags=NULL) {
         if ($this->allowed_tag('<cdata>',$allowedtags)) return $string;
   
         $delim = "@";
         $cdata_start = preg_quote('<![CDATA[',$delim);
         $cdata_end = preg_quote(']]>',$delim);
         $block = "$cdata_start.*?$cdata_end";
   
         return preg_replace("${delim}$block${delim}s","",$string);
       }
   
       function strip_tag_block($tag,$string,$allowedtags=NULL) {
         if ($this->allowed_tag($tag,$allowedtags)) return $string;
         if (!preg_match(":<(.*?)>:",$tag,$match)) return $string;
   
         $delim = "@";
         $tag_str = $match[1];
         $tag_start = "<$tag_str(?:>|\\s[^>]*>)";
         $tag_end   = "</$tag_str(?:>|\\s[^>]*>)";
         $block = "$tag_start.*?$tag_end";
   
         return preg_replace("${delim}$block${delim}is","",$string);
       }
   
       function strip_comment_block($string) {
         $delim = "@";
         $comment_start = preg_quote('<!--',$delim);
         $comment_end = preg_quote('-->',$delim);
         $block = "$comment_start.*?$comment_end";
   
         return preg_replace("${delim}$block${delim}s","",$string);
       }
       ```
   
 * and add the following calls before strip_tags at the head of clean_html:
 *     ```
       $string = $this->strip_cdata_block($string,$allowtags);
       $string = $this->strip_tag_block('<script>',$string,$allowtags);
       $string = $this->strip_tag_block('<style>',$string,$allowtags);
       $string = $this->strip_comment_block($string);
       ```
   
 *  Plugin Author [Stephanie Leary](https://wordpress.org/support/users/sillybean/)
 * (@sillybean)
 * [14 years, 2 months ago](https://wordpress.org/support/topic/plugin-html-import-2-importing-text/#post-2550277)
 * Thanks, Mark! I’ll try to incorporate this into the next version.

Viewing 3 replies - 1 through 3 (of 3 total)

The topic ‘[Plugin: HTML Import 2] Importing preformatted text (pre tag)’ is closed
to new replies.

 * ![](https://s.w.org/plugins/geopattern-icon/import-html-pages.svg)
 * [HTML Import 2](https://wordpress.org/plugins/import-html-pages/)
 * [Frequently Asked Questions](https://wordpress.org/plugins/import-html-pages/#faq)
 * [Support Threads](https://wordpress.org/support/plugin/import-html-pages/)
 * [Active Topics](https://wordpress.org/support/plugin/import-html-pages/active/)
 * [Unresolved Topics](https://wordpress.org/support/plugin/import-html-pages/unresolved/)
 * [Reviews](https://wordpress.org/support/plugin/import-html-pages/reviews/)

 * 3 replies
 * 2 participants
 * Last reply from: [Stephanie Leary](https://wordpress.org/support/users/sillybean/)
 * Last activity: [14 years, 2 months ago](https://wordpress.org/support/topic/plugin-html-import-2-importing-text/#post-2550277)
 * Status: not resolved