Forum Replies Created

Viewing 15 replies - 1 through 15 (of 15 total)
  • Thread Starter samymsa

    (@samymsa)

    Thanks for your reply.

    Here is another related issue:

    Issue 4 — Slug rename leaves orphaned vectors

    File: class-knowledge-manager.php:5149

    When a published post’s slug is renamed, mxchat_store_pre_update_status() correctly captures the old URL in a transient before the update. But mxchat_handle_post_update() never compares the old URL to the new one — it only checks for status transitions (publish → unpublish). Since the post stays published, the unpublish block is skipped, the old URL transient is silently overwritten with the new URL (line 5207), and new vectors are upserted under md5(new_url).

    The old vectors under md5(old_url) (base + all chunks) are never deleted.

    What goes stale: Every slug rename creates a full set of orphaned vectors in Pinecone (or orphaned rows in the WP DB) containing outdated content and a dead source URL. These continue to be returned in chatbot search results, potentially surfacing broken links and old content to users.

    Affected paths:

    • Manual slug edit in the post editor
    • Permalink structure changes that alter the URL
    • WooCommerce product slug renames (same flow via save_post_product → mxchat_handle_post_update)

    Fix: After the unpublish check and before upserting, compare $previous_url to the current get_permalink(). If they differ and $previous_url is set, delete the old vectors via MxChat_Utils::delete_chunks_for_url($previous_url) before storing the new ones.

    Thread Starter samymsa

    (@samymsa)

    Hi Maxwell,

    Here are some issues I flagged if it can help.

    Issue 1 — Orphaned chunks on unpublish

    When a published post is changed to draft/pending/private, the unpublish path at line 5152 calls mxchat_delete_from_pinecone_by_url() which only deletes the base vector (md5(url)). If the content was stored chunked, all {md5(url)}_chunk_0, _chunk_1, etc. vectors remain in Pinecone and will continue polluting search results.

    Compare to the delete/trash path at line 5415 which correctly calls the chunk-aware MxChat_Utils::delete_chunks_for_url().

    Issue 2 — Same problem for WooCommerce products

    mxchat_handle_product_delete at line 5629 also uses the non-chunk-aware mxchat_delete_from_pinecone_by_url(). Trashing or deleting a product with chunked content leaves orphaned chunk vectors.

    Issue 3 — Trashed permalink mismatch

    When wp_trash_post fires, WordPress may have already appended __trashed to the post slug. So get_permalink($post_id) at line 5408 returns something like:

    https://example.com/my-post__trashed/

    Since the vector ID is md5($source_url), this won’t match the original vector ID stored in Pinecone. The delete request targets a non-existent ID and the real entries silently survive.

    This affects both mxchat_handle_post_delete and mxchat_handle_product_delete.

    Thread Starter samymsa

    (@samymsa)

    Additional information on my previous post :

    • Using Pinecone
    • Auto Sync on
    • I deleted a page and it is still in the knowledge base
    Thread Starter samymsa

    (@samymsa)

    Hi,

    Thanks for your answer.

    To clarify, I think that if a content is in the knowledge base, the knowledge information associated to it should be updated automatically. This is because if I add some content in the knowledge base, then update the content with auto sync turned off, the knowledge information will be outdated.

    I understand my ideal workflow is not the same as for other users of MxChat, but a simple hook might do the trick. I’ll send you a code snippet when I have some time.

    However, I had auto sync turned on for “Pages”, and the pages knowledge information did not get updated. I am using Pinecone. Something might be wrong in the current implementation.

    Thanks for all the work,

    Samy

    Thread Starter samymsa

    (@samymsa)

    Here is some context to help you understand what I am aiming to do :

    Our website pages contain a lot of shortcodes and sometimes, stripping them isn’t enough. Shortcodes sections are mixed with actual useful information. My goal is to add AI preprocessing in order to extract useful information, restructure it and optimize it for content retrieval using an LLM before they are embedded into the knowledge base.

    Since you’re currently retrieving the post content with get_post() then $post->post_content, I have no way to intercept it using, for example, the_content filter

    Thread Starter samymsa

    (@samymsa)

    You’re welcome, happy to help MxChat become better!

    Thank your for the fixes!

    Thread Starter samymsa

    (@samymsa)

    Hi @m4xw3ll,

    The issue is indeed related to the WordPress import option, sorry if my explanation wasn’t clear enough.

    Anyway, thanks for the update!

    Thread Starter samymsa

    (@samymsa)

    Closing this topic and opening a dedicated one about dynamic data / shortcodes inside system prompt.

    Thanks again for you support @m4xw3ll !

    Thread Starter samymsa

    (@samymsa)

    Thanks!

    Happy to help enhancing the product 🚀

    Thread Starter samymsa

    (@samymsa)

    Hi Maxwell,

    Following up our conversation, would it be possible for you to add a hook that allows to alter the system_prompt_instructions, or maybe simpler to allow shortcodes inside the system prompt (not into the current message context)?

    This way we would be able to add dynamic data into the system prompt. I’m currently forced to use a mu-plugin and hook into the get_options, which is not ideal.

    Best,
    Samy

    Thread Starter samymsa

    (@samymsa)

    I understand your point of view, and it does make sense 👍
    Keeping the full 24h context can definitely help the AI maintain continuity as users navigate and ask related questions.

    That said, in my opinion, whatever strategy we choose (current conversation only vs full 24h window), the conversation history shown to the user should always match the one sent to the AI. Otherwise, it can lead to confusing or “unexplained” AI behavior from the user’s perspective, since the model may be reacting to messages the user cannot see.

    I also strongly think that we should definitely keep some conversation history, because users very often ask follow-up questions and expect the assistant to remember what was just discussed.

    My personal suggestion:

    • When chat persistence is off: only show current conversation messages, and only send those to the AI.
    • When chat persistence is on: show the full 24h message history to the user and send this history to the AI.

    Even more ideal would be to base this on some kind of browser session instead of an arbitrary 24h window, though I understand this may not be an easy task.

    • This reply was modified 3 months, 4 weeks ago by samymsa.
    Thread Starter samymsa

    (@samymsa)

    I’ll try this, thanks for your quick answer!

    Thread Starter samymsa

    (@samymsa)

    I see there is mxchat_prepare_context filter I could maybe use

    Thread Starter samymsa

    (@samymsa)

    Thanks so much!

    For your information, here is the code I added:

    In your plugin, inside mxchat-basic/admin/class-knowledge-manager.php line 2792:

    // Apply filter to allow modification of post data before processing
    // This allows me and other developpers to alter the behaviour without modifying mxchat code
    $post = apply_filters('mxchat_process_post_data', $post);

    In my custom theme:

    /**
     * Enrich post_content with whitelisted post meta for MXChat embeddings
     */

    function mxchat_enrich_post_content_with_meta($post)

    {

      $meta_map = [
        // EVENT META KEYS
        'event_start_date' => 'Start Date',
        'event_start_time' => 'Start Time',
        'event_end_date' => 'End Date',
        'event_end_time' => 'End Time',

      ];

      // Return early if not a WP_Post object
      if (!($post instanceof WP_Post)) {
        return $post;
      }

      $lines = [];

      foreach ($meta_map as $meta_key => $label) {

        $value = get_post_meta($post->ID, $meta_key, true);

        if (empty($value)) {
          continue;
        }

        // Normalize value

        if (is_array($value)) {
          $value = implode(', ', array_map('wp_strip_all_tags', $value));
        } else {
          $value = wp_strip_all_tags((string) $value);
        }

        if ($value !== '') {
          $lines[] = $label . ': ' . $value;
        }
      }

      if (!empty($lines)) {
        $post->post_content .= "\n\n" . implode("\n", $lines);
      }

      return $post;

    }

    add_filter('mxchat_process_post_data', 'mxchat_enrich_post_content_with_meta');

    Feel free to take the best approach 😉

    • This reply was modified 4 months, 1 week ago by samymsa.
    Thread Starter samymsa

    (@samymsa)

    Actually, the embeddings should probably never be cached, because if the prompt is not the same, the relevant content will not be the same (and so for current_valid_urls)

    For now I clear the cache on every request. I suggest either not caching the embeddings at all or rework the caching logic.

Viewing 15 replies - 1 through 15 (of 15 total)