Thread Starter
Sim2K
(@sim2k)
Does anyone know the answer to any of these? I might just try them all out but I don’t want to break anything.
Plugin Author
senols
(@senols)
Hello @sim2k,
Thank you for your feedback on the plugin, I’m glad to hear that you like it 🙂
Regarding your questions,
Our plugin uses the text-embedding-ada-002 model for embeddings, which has a maximum token limit of 8191 according to OpenAI documentation. Assuming 750 words are around 1k tokens, the maximum number of words should be around 5-6k. However, I haven’t tested this, so it may vary. I suggest embedding data in splits instead of as huge chunks.
If you are using embeddings, then all data goes to Pinecone because embeddings only work with a vector database, and we use Pinecone as an external long-term memory for the bot. When you use free text, it simply adds whatever you put. When you use FAQs, our plugin appends “Question” to the beginning of the question and “Answer” to the beginning of the answer. When you use the knowledge base builder, it appends the topic and description. This is just for the purpose of making it easy for users not to type those things.
Currently, separate bot creation for each page is not supported, but I’m working on it. However, if you are going to use embeddings or an excerpt for the context, then I don’t see the need to create separate bots for each page.
You can create a single index, and the bot will answer relevant to the question. There is no need for creating different collections or indexes.
Thread Starter
Sim2K
(@sim2k)
So I got some data into Pinecone, and one embedding was 6500 words (38,709 char). That went in OK but when ever I asked a question based on an answer in that embedding, I got this response…
“This model's maximum context length is 4096 tokens. However, your messages resulted in 5941 tokens. Please reduce the length of the messages.”
Thread Starter
Sim2K
(@sim2k)
Other responses against smaller embeddings brought back data ok.
Maybe I need to really break it down into chunks as I’m sure a response of 4096 tokens is going to be expensive even if I do get a proper response.
I wonder if I can just get a response by itself, no embedding+completion, just completion, would that help?
yeh – the problem is after you get the embedding search result from pine cone, you add the orginal text into your current prompt, that has a lesser token limit (4096 tokens or 16000 characters ) .
So Senol’s answer is wrong. You really need to stick to around 12000 character limit. And if it chooses to bring back more than 1 big embedding result – same thing will occur.
Senol probably needs to limit the size of what is added to the prompt from the embedding search, plus a hard limit on the embeddings of around that 12000 characters.
Plugin Author
senols
(@senols)
If you are receiving “This model’s maximum context length is 4096 tokens”… pls try to reduce your max token.
Also I am using Davinci instead of turbo for chat which I found Davinci is much better, strangely.
There is a good discussion here:
https://community.openai.com/t/gpt-3-5-turbo-vs-text-davinci-003-chatbot/82806