Split options and facet count
-
A WordPress post seems to get split by this plugin and those multiple records are then stored in Algolia. Our avg. record size is 2.65KB. I think 10 KB is the limit?
Can this split size be reduced so that we store less records for each post in Algolia?
Also, when we use the instantsearch display page using the suggested code and instantiate the widgets, the count there shown is for the number of records (including the split ones) and not the number of search results. Any suggestions on how to fix this?
For example, the facet for post types shows Posts 9,505 (we have around 1600 posts). Clicking on it returns 1677 results found, which is accurate.. Those 1677 posts have 9505 records because they’re being split.
-
Regarding the content splitting part, it looks like we handle that at https://github.com/WebDevStudios/wp-search-with-algolia/blob/2.8.1/includes/class-algolia-utils.php#L232-L261
Most specifically with the
ALGOLIA_CONTENT_MAX_SIZEconstant. We default the record splitting max size to 2000 byte characters, and we implore to only reduce that value with the constant, instead of increasing otherwise your queue will break.That said, leaving at that amount, based on what I’m seeing, would help with keeping the record totals as minimal as possible. If you’re averaging 2.65KB then it makes sense that most of those would need 2 records instead of one, for the given posts. However, it’d be very easy to start getting longer post content in the future and forget that changes to the
ALGOLIA_CONTENT_MAX_SIZEgot changed and result in going over that 10kbRegarding the facet, hard to say offhand, as we’re using pretty basic settings for that
menuinstantsearch.js widget https://www.algolia.com/doc/api-reference/widgets/menu/js/That’s feeling more like behavior from Algolia themselves. I am seeing the same behavior on my local install as well, so you’re definitely not alone on that part.
Based on this specific spot from the documentation, https://www.algolia.com/doc/api-reference/widgets/menu/js/#widget-param-attribute, it is saying that it’s meant to show records.
That said, I’m checking on something with that and will try to circle back soon.
Thanks for the quick response and sorry about my late reply.
With the average of 2.65 KB average record size we have, ~1600 posts requiring ~9000 records, making the average around 5.6 records per post. If the recommended size per record is within 10 KB, can’t IALGOLIA_CONTENT_MAX_SIZEto 6000 which would reduce the number of records by a lot? The other things apart from the post content shouldn’t be enough to make it above 10 KB.We default the record splitting max size to 2000 byte characters, and we implore to only reduce that value with the constant, instead of increasing otherwise your queue will break.
The queue would break if I set it to 6000 byte characters even if the total size is below 10 KB?
Thanks for confirming the facet behaviour. Will hide it until I find a way to show the number of posts/search results instead of the number of record numbers.The queue would break if I set it to 6000 byte characters even if the total size is below 10 KB?
I have to assume so.
For some context, our plugin is a fork of a plugin that Algolia themselves originally created, and there are a good number of parts that we haven’t changed from their original work. This includes this
explode_content()function. Original from Algolia’s code base: https://github.com/algolia/algoliasearch-wordpress/blob/master/includes/class-algolia-utils.php#L162-L191So I have to believe if they coded it this way, it was intentional. That said, it could be worth trying increasing to see what happens. Perhaps, though, try it on a test/dev install and indexes so that you don’t “wreck” production usage.
I tried it on a dev install and it didn’t wreck, so I just pushed it live and so far it is OK. Will be monitoring it though.
I changed$max_sizefrom 2000 to 6000. I’m down to ~4k records from ~10k records with avg. record size of 5.3KB now compared to 2.6KB earlier.Rooting for you. Probably not something we’re going to change defaults for, but definitely good to know that this may be workable for certain user(s) situations and needs.
I’m indexing PDF content and stumbled upon this thread (I’ve got really long PDFs and they’re being split into a really large number of parts).
@sgrx how is the 6000 setting working out for you? do you have many other attributes being indexed contributing to your record size?—
@tw2113 Do you have any idea what the behaviour is if the content is chunked as such:
Chunk 1:Algolia is a French proprietary search-as-a-service platform, with its headquarters in SanChunk 2:
Francisco and offices in Paris and London. Its main product is a web search platform for individual websites.If a user comes along and searches for a string that spans chunks, e.g. “with its headquarters in San Francisco and offices in Paris” how does it behave with split chunks? Would it find two items as fairly low partial matches and then de-duplicate them to return one item?
Based on my experiences, I believe it’d still return 1 result.
Had a bit of a budget shock going from Build (1m records included) to Grow (100k records included).
Across all environments we had about 950k records (including replicas for custom sorting). Which seems to be about $40 per 100k per month, so $340+ a month!
My average record size was about 4.5kb so I’m currently trialing increasing the ALGOLIA_CONTENT_MAX_SIZE to 4,000.A couple of the records are bigger than 10kb, but the max record size on Grow is 100kb* (*with an average record size max of 10kb).
Seems to be working ok thus far!-
This reply was modified 1 year, 3 months ago by
golden_g73.
@may73alliance thought I replied to this.
Algolia doesn’t have any published best practices, but they agreed with my ideas. To summarize:
- clear out any test/old indices that may be in the application.
- Remove local development indices/records, especially if not actively working on things.
- Limit how many environments are fully indexed. For example your staging copy may not need much at all for some testing, but the production site would want to have as much as able and intended.
- They also linked me to https://www.algolia.com/doc/guides/getting-started/what-is-algolia/#cost-management and the cost management section.
@graham73may The 6000 setting on
$max_sizeis working well so far.
The site had less than 2k posts and increasing this from 2000 helped reduce the number of records by 60%. Not sure about other attributes, I think it was just the post content. The avg. record size is around 5 KB now.Just happy things are working here š
@sgrx š
We’ve settled on 6,000 too, this has given us 7.97KB average record size and 43k records.
Build plan 10KB max / 10KB average meant we needed 3,500 during development.
Grow plan 100KB max / 10KB average meant it coped better with the handful of larger outliers so we were able to bump it up to 6,000.
Awesome plugin!
-
This reply was modified 1 year, 3 months ago by
The topic ‘Split options and facet count’ is closed to new replies.