Hi @zwalton,
There is a character limit for embedding models, but its rather large. When selecting an embedding model you can see the token limits next to them. Also, here is a list:
- TE3 Small: 8K context (8,192 tokens)
- Ada 2: 8K context (8,192 tokens)
- TE3 Large: 8K context (8,192 tokens)
- Voyage-3 Large: 32K context (32,768 tokens)
- Gemini Embedding: 8K context (8,192 tokens)
Voyage has the largest token window, so if your context is getting cut off due to this you could try that model. If you already have embeddings though and switch to a new model you will have to delete your entire database and start over with only the new embedding model. Can you also maybe tell me your currently selected model and are you using default WP or Pinecone?
Once again, though, 8k tokens is quite a bit, so it could be something else unless your website pages are VERY long.
For example:
8K tokens (most models):
- ~6,000-8,000 words of typical English text
- ~30,000-40,000 characters including spaces
- Web content equivalent: About 15-20 typical blog posts, or 3-4 long-form articles, or 1-2 very detailed product pages
32K tokens (Voyage-3 Large):
- ~24,000-32,000 words of typical English text
- ~120,000-160,000 characters including spaces
- Web content equivalent: About 60-80 blog posts, or 10-15 long-form articles, or an entire small website’s worth of content
Apologies, i have been out of town. I had to break up my large page into 2 different imports, not a big deal.
I’m just using the default settings.
Thanks.
-
This reply was modified 10 months, 1 week ago by
zwalton.