• samymsa

    (@samymsa)


    Hi Maxwell,

    I’m using mxchat with complex Elementor-built pages that include heavily nested structures (e.g. vc-rows, shortcodes, nested containers). While the current tag-stripping approach helps, it doesn’t produce reliable or structured enough content for RAG use.

    I also tried the URL and sitemap import options, but in our setup (preloader + JS-rendered content) they only capture placeholder HTML rather than the final rendered page. Not a blocker for me right now, just noting it in case it matters.

    What actually worked well was manually having ChatGPT transform page content into clean, structured knowledge-base text for RAG.

    My question: is there a way to override or hook into the WordPress content import step in mxchat so I can inject a custom preprocessing function (e.g. HTML → structured Markdown/clean text) before indexing? I think hookking on mxchat_before_process_post could do the trick.

    Any pointers to relevant hooks or extension points would be appreciated.

    Thanks,
    Samy

Viewing 1 replies (of 1 total)
  • Plugin Support m4xw3ll

    (@m4xw3ll)

    Hey @samymsa,

    Good instinct – both options are there in 3.2.5.

    Option 1 – filter (easiest for in-WP transforms):

    Use mxchat_before_process_post. It runs right before MxChat builds the KB text from a post, and it hands you the full WP_Post object plus the bot_id. You can render the content through the_content so Elementor and shortcodes resolve, then overwrite post_content with the cleaned version. Something like:

    add_filter('mxchat_before_process_post', function ($post, $bot_id) { $rendered = apply_filters('the_content', $post->post_content); // strip Elementor wrappers, vc-rows, etc. however you like $post->post_content = your_clean_for_rag($rendered); return $post; }, 10, 2);

    That way the existing “Add to Knowledge” flow keeps working, you just feed it cleaner input.

    Option 2 – the new REST API:

    If you’d rather build the clean text outside WordPress (your own pipeline, n8n, a script, whatever) and push it in, hit POST /wp-json/mxchat/v1/knowledge with a bearer token from MxChat → API Access. Body takes content, source_url (dedupe key, so re-submitting the same URL replaces the entry), and optional bot_id and content_type. That bypasses the post crawler entirely and gives you full control over what gets embedded.

    For your Elementor case I’d lean toward the filter since it stays inside the normal admin flow, but the API is there if you want to do the cleaning in a separate environment.

    Maxwell

Viewing 1 replies (of 1 total)

You must be logged in to reply to this topic.