samymsa
Forum Replies Created
-
Thanks for your reply.
Here is another related issue:Issue 4 — Slug rename leaves orphaned vectors
File: class-knowledge-manager.php:5149
When a published post’s slug is renamed,
mxchat_store_pre_update_status()correctly captures the old URL in a transient before the update. Butmxchat_handle_post_update()never compares the old URL to the new one — it only checks for status transitions (publish → unpublish). Since the post stays published, the unpublish block is skipped, the old URL transient is silently overwritten with the new URL (line 5207), and new vectors are upserted undermd5(new_url).The old vectors under
md5(old_url)(base + all chunks) are never deleted.What goes stale: Every slug rename creates a full set of orphaned vectors in Pinecone (or orphaned rows in the WP DB) containing outdated content and a dead source URL. These continue to be returned in chatbot search results, potentially surfacing broken links and old content to users.
Affected paths:
- Manual slug edit in the post editor
- Permalink structure changes that alter the URL
- WooCommerce product slug renames (same flow via
save_post_product→mxchat_handle_post_update)
Fix: After the unpublish check and before upserting, compare
$previous_urlto the currentget_permalink(). If they differ and$previous_urlis set, delete the old vectors viaMxChat_Utils::delete_chunks_for_url($previous_url)before storing the new ones.Hi Maxwell,
Here are some issues I flagged if it can help.Issue 1 — Orphaned chunks on unpublish
When a published post is changed to draft/pending/private, the unpublish path at line 5152 calls
mxchat_delete_from_pinecone_by_url()which only deletes the base vector (md5(url)). If the content was stored chunked, all{md5(url)}_chunk_0,_chunk_1, etc. vectors remain in Pinecone and will continue polluting search results.Compare to the delete/trash path at line 5415 which correctly calls the chunk-aware
MxChat_Utils::delete_chunks_for_url().Issue 2 — Same problem for WooCommerce products
mxchat_handle_product_deleteat line 5629 also uses the non-chunk-awaremxchat_delete_from_pinecone_by_url(). Trashing or deleting a product with chunked content leaves orphaned chunk vectors.Issue 3 — Trashed permalink mismatch
When
wp_trash_postfires, WordPress may have already appended__trashedto the post slug. Soget_permalink($post_id)at line 5408 returns something like:https://example.com/my-post__trashed/Since the vector ID is
md5($source_url), this won’t match the original vector ID stored in Pinecone. The delete request targets a non-existent ID and the real entries silently survive.This affects both
mxchat_handle_post_deleteandmxchat_handle_product_delete.Additional information on my previous post :
- Using Pinecone
- Auto Sync on
- I deleted a page and it is still in the knowledge base
Hi,
Thanks for your answer.
To clarify, I think that if a content is in the knowledge base, the knowledge information associated to it should be updated automatically. This is because if I add some content in the knowledge base, then update the content with auto sync turned off, the knowledge information will be outdated.
I understand my ideal workflow is not the same as for other users of MxChat, but a simple hook might do the trick. I’ll send you a code snippet when I have some time.
However, I had auto sync turned on for “Pages”, and the pages knowledge information did not get updated. I am using Pinecone. Something might be wrong in the current implementation.
Thanks for all the work,
Samy
Here is some context to help you understand what I am aiming to do :
Our website pages contain a lot of shortcodes and sometimes, stripping them isn’t enough. Shortcodes sections are mixed with actual useful information. My goal is to add AI preprocessing in order to extract useful information, restructure it and optimize it for content retrieval using an LLM before they are embedded into the knowledge base.
Since you’re currently retrieving the post content with
get_post()then$post->post_content, I have no way to intercept it using, for example,the_contentfilterYou’re welcome, happy to help MxChat become better!
Thank your for the fixes!
Hi @m4xw3ll,
The issue is indeed related to the WordPress import option, sorry if my explanation wasn’t clear enough.
Anyway, thanks for the update!
Forum: Plugins
In reply to: [MxChat - AI Chatbot & Content Generation for WordPress] Dynamic Prompt DataClosing this topic and opening a dedicated one about dynamic data / shortcodes inside system prompt.
Thanks again for you support @m4xw3ll !
Thanks!
Happy to help enhancing the product 🚀
Forum: Plugins
In reply to: [MxChat - AI Chatbot & Content Generation for WordPress] Dynamic Prompt DataHi Maxwell,
Following up our conversation, would it be possible for you to add a hook that allows to alter the system_prompt_instructions, or maybe simpler to allow shortcodes inside the system prompt (not into the current message context)?
This way we would be able to add dynamic data into the system prompt. I’m currently forced to use a mu-plugin and hook into the get_options, which is not ideal.
Best,
SamyI understand your point of view, and it does make sense 👍
Keeping the full 24h context can definitely help the AI maintain continuity as users navigate and ask related questions.That said, in my opinion, whatever strategy we choose (current conversation only vs full 24h window), the conversation history shown to the user should always match the one sent to the AI. Otherwise, it can lead to confusing or “unexplained” AI behavior from the user’s perspective, since the model may be reacting to messages the user cannot see.
I also strongly think that we should definitely keep some conversation history, because users very often ask follow-up questions and expect the assistant to remember what was just discussed.
My personal suggestion:
- When chat persistence is off: only show current conversation messages, and only send those to the AI.
- When chat persistence is on: show the full 24h message history to the user and send this history to the AI.
Even more ideal would be to base this on some kind of browser session instead of an arbitrary 24h window, though I understand this may not be an easy task.
- This reply was modified 3 months, 4 weeks ago by samymsa.
Forum: Plugins
In reply to: [MxChat - AI Chatbot & Content Generation for WordPress] Dynamic Prompt DataI’ll try this, thanks for your quick answer!
Forum: Plugins
In reply to: [MxChat - AI Chatbot & Content Generation for WordPress] Dynamic Prompt DataI see there is mxchat_prepare_context filter I could maybe use
Thanks so much!
For your information, here is the code I added:
In your plugin, inside mxchat-basic/admin/class-knowledge-manager.php line 2792:// Apply filter to allow modification of post data before processing
// This allows me and other developpers to alter the behaviour without modifying mxchat code
$post = apply_filters('mxchat_process_post_data', $post);In my custom theme:
/**
* Enrich post_content with whitelisted post meta for MXChat embeddings
*/
function mxchat_enrich_post_content_with_meta($post)
{
$meta_map = [
// EVENT META KEYS
'event_start_date' => 'Start Date',
'event_start_time' => 'Start Time',
'event_end_date' => 'End Date',
'event_end_time' => 'End Time',
];
// Return early if not a WP_Post object
if (!($post instanceof WP_Post)) {
return $post;
}
$lines = [];
foreach ($meta_map as $meta_key => $label) {
$value = get_post_meta($post->ID, $meta_key, true);
if (empty($value)) {
continue;
}
// Normalize value
if (is_array($value)) {
$value = implode(', ', array_map('wp_strip_all_tags', $value));
} else {
$value = wp_strip_all_tags((string) $value);
}
if ($value !== '') {
$lines[] = $label . ': ' . $value;
}
}
if (!empty($lines)) {
$post->post_content .= "\n\n" . implode("\n", $lines);
}
return $post;
}
add_filter('mxchat_process_post_data', 'mxchat_enrich_post_content_with_meta');Feel free to take the best approach 😉
- This reply was modified 4 months, 1 week ago by samymsa.
Actually, the embeddings should probably never be cached, because if the prompt is not the same, the relevant content will not be the same (and so for current_valid_urls)
For now I clear the cache on every request. I suggest either not caching the embeddings at all or rework the caching logic.