Knowledge and Chat issues

bsmolyanov
(@bsmolyanov)

8 months, 3 weeks ago
Hi!

I bought and installed the plugin. Looks quite nice initially. Created a Pinecone database index and uploaded some 400 posts, no errors or notices, so I assume setup is done. Then, I noticed the following issues:

1) Non of my questions got answered by the chat. I always get the generic reply that there is not such information in the database (the default answer set in the system prompt). By the way, all my content is in Cyrillic, but this should not be an issue I believe.

2) Then I tried to use the debugging panel and in there I noticed that the similarity threshold is 75%. On the other hand in the chatbot settings it was set to 35. Initially, I thought these are two separate values for different purposes, but nevertheless decided to drop the value in the chatbot settings to 20. Then I continued testing with the debugging panel and noticed that the similarity threshold there is now also 20%. At this point I was able to get a couple of answers, but very far from the expectations. Anyway, I do not see an option to go lower than 20% and I perhaps it is not a good idea.

3) The debugging panel itself does not seem to refresh with the new questions being typed in the chat window, despite being open text to it the whole time. It either does not react at all and stays empty as if I am not trying to have a conversation in the chat, or picks some of the questions “by chance” and then does not update any more on the next ones. Tried to clear session and clear the log, but no change.

4) In the Knowledge section of the plugin’s settings I use WP Content (Recommended). Here I have a couple of observations as well.
- there is no option to set it to run in the background so you have to do it manually by 50 posts at a time. I realize that perhaps this should be done via the Sitemap option, but in my case the content is partially gated (not visible on the front end for logged out visitor) and if the Sitemap option is viewing it as a visitor, than it will not work for me. Please kindly confirm.
- the content of the posts in my case contains several short codes which are not stripped and they end up in the vector database.
- the posts in the vector database seem to be cut off in terms of content at some point. I would expect that the rest of a post’s content is stored as another entry in the database but I am not able to confirm this.
- the posts that are already uploaded in the vector data base continue to be marked as “Not In Knowledge Base” so it seems that none of the posts is uploaded which is not true. Also the filter on top right (All Content-In Knowledge Base – Not in Knowledge Base) does not work as well.
  
  5) You mentioned in the dev updates that the plugin works with ACF now – but how and where does this happen? I see no settings in this respect.
OK, I think that is enough for a starter – please let me know of your thoughts and comments.

Best regards,
Boris

Viewing 2 replies - 1 through 2 (of 2 total)

Plugin Support m4xw3ll
(@m4xw3ll)

8 months, 3 weeks ago
Hi @bsmolyanov,

Thanks for reaching out. Considering these questions are all directly related to only the free version, I will answer them here. There is a lot going on here in various different directions, so I’ll try to be concise.
1. It seems you answered this in your next question, but if you uploaded content into the knowledge database, and confirmed that content is showing, it means your similarity threshold is set too high. Lower it (as you did) and it will likely match. Matching scores vary depending on embedding model, language, and length of content.
2. This is actually a bug. The threshold is supposed to be set at 35% by default (previously 75) but I just checked on a fresh install and it does show 35 in the backend and 75 in the frontend until you adjust it then they align. Lowering it below 20% would mean the content you’re matching is not very semantically similar.
3. You likely have some “actions” enabled. When some actions are triggered the testing panel will not show the query. This is because depending on the action the query can be routed away from the testing panel and documents are not used on most actions, which means nothing would update in the top part of testing panel. You should be able to see which actions are being triggered when this occurs in the testing panel if you scroll down some. In all the testing I did, the testing panel updated on every query when no actions were triggered.
4. A. This is correct, there isn’t an option to process content in the background when using the WordPress import option, 50 is the limit currently. CRON processing is often unreliable and doesn’t work, and doing more than 50 can often lead to crashing the website. The best method for large data sets would be the sitemap option, but as you mentioned, if the information is gated it wouldn’t have access. You can use both, so best suggestion is to submit sitemaps for content that is accessible and the import option for content that isn’t accessible. The data isn’t chunked if its too long (which would be rare). I’ve not figured out a way to chunk data when its to long or even just chunk it for better responses and maintain things like sitemap submission and URL tracking. You can’t have two entries with the same URL. This is prevent duplicates if you were to submit sitemaps over and over as most users do. It will only update the content and not duplicate it. Also, with autosync options, it tracks down the URL in the database to update the content. If we were chunk data from 1 URL into multiple entries, it would become very messy for things like sitemap and autosync. Considering those options are used very frequently and data being too large to submit is very infrequent, it’s remained as is for now. Only data imported via the WordPress import option will show “in the knowledge database” this is stated on the wordpress importer option at the top left “(Content imported here will be tagged “In Knowledge Base”)“. I will review that and the dropdown and update you after fixed if bugged.
5. MxChat auto-detects all ACF fields on posts/pages when processed or auto-synced, supports text/select/post objects/repeaters/flexible content, safely converts relationships to readable text, and adds the relevant ACF data to the knowledge base alongside other metadata.
Thread Starter bsmolyanov
(@bsmolyanov)

8 months, 3 weeks ago
Hi @m4xw3ll ,

Thank you very much for the extensive reply!

1) Is there a way to set the similarity threshold below 20?
2) Regarding the MxChat Debug Panel – I have no “actions” added/enabled at all, but it not working at all in my installation. Yesterday I was able to see it work a couple of times, but since then no success.

3) About the ACF – I now spotted how it works – they are included after the post’s content, but only if there is enough “space” for them. It will be great to be able to choose which “parts” of the post go to Pinecone – the_content and all ACF; only the_content; only the ACF; some of the ACF. Perhaps the ACF can go in additional metadata fields, like the source_url does. That would allow to manage the volume/nature of data being sent to the vector database greatly!

4) Most of my posts are quite long and I see that when uploaded in Pinecone they are cut off at around 5,700 characters. It there a way to change the size of the text metadata that goes in each record or it depends on the embedding model (“TE3 Large” in my case)?

Thank you!
- This reply was modified 8 months, 3 weeks ago by bsmolyanov.

Viewing 2 replies - 1 through 2 (of 2 total)

The topic ‘Knowledge and Chat issues’ is closed to new replies.