Forum Replies Created

Viewing 4 replies - 1 through 4 (of 4 total)
  • I have a slightly odd take on this… I was implementing Algolia into a site that previously had Relevanssi premium, which was already set up to index PDF.

    Long story short, I could use the usual Algolia indexing hooks and just piggyback off of Relevanssi’s PDF indexing server. (Means I don’t need to pay/maintain something myself to do this!). It’s working great.

    relevanssi_index_pdf( $attachment_id, false, false );

    $content = get_post_meta( $attachment_id, '_relevanssi_pdf_content', true );

    @sgrx 👍

    We’ve settled on 6,000 too, this has given us 7.97KB average record size and 43k records.

    Build plan 10KB max / 10KB average meant we needed 3,500 during development.

    Grow plan 100KB max / 10KB average meant it coped better with the handful of larger outliers so we were able to bump it up to 6,000.

    Awesome plugin!

    I had an interesting day solving this on one of our websites, the crux of the problem was we are using the WPML setting of “show translation or show the default language as a fallback” option.

    If you have everything set to “translate only”, things are pretty simple, you can index the locale of each post and then set your filter to (locale:en-GB).

    With the fallback mode things were a bit tricky as the conditions were too complex for Algolia’s filters. Which would require something like locale:en-gb OR (available_languages !== en-gb AND is_original_translation) (psuedo code).

    To get around this and move all of the processing to the PHP side pre-indexing I’ve essentially got an attribute called “languages_to_show_on” being added to my index, and my filter is now simply:

    	$current_language = wpml_get_current_language();
    $filters[] = "languages_to_show_on:$current_language";

    This allows for, if you’re viewing the French site.

    • Show French item (no other languages)
    • Show French item (also available in English / Spanish but these are not shown)
    • Show English item (also available in Spanish but not French, so the original item in English is shown).

    This is the whole function I have, thats also indexing some other language meta that I’ve found useful:

    /**
    * Prepare WPML data for indexing.
    *
    * @param array $attributes Algolia attributes array.
    * @param WP_Post $post The post being indexed.
    *
    * @return void
    */
    function algolia_index_wpml_meta( &$attributes, $post ) {
    $language_info = wpml_get_language_information( '', $post->ID );

    if ( ! empty( $language_info ) && ! $language_info instanceof WP_Error ) {
    $attributes['language_code'] = $language_info['language_code'];
    $attributes['language'] = $language_info['display_name'];
    $attributes['language_native'] = $language_info['native_name'];

    // Reading direction.
    $dir = 'ltr';

    if ( isset( $language_info['text_direction'] ) && ! empty( $language_info['text_direction'] ) ) {
    $dir = 'rtl';
    }

    $attributes['dir'] = $dir;

    // Process an array of language codes the content is NOT available in.
    $translations = get_content_translations( $post->ID );
    $languages_with_translation = array();

    if ( ! empty( $translations ) ) {
    $languages_with_translation = array_keys( $translations );
    }

    // Get WPML languages.
    $all_languages = apply_filters( 'wpml_active_languages', null, 'orderby=id&order=desc' );
    $all_language_codes = array_keys( $all_languages );
    $attributes['unavailable_languages'] = array();

    foreach ( $all_language_codes as $language_code ) {
    if ( ! in_array( $language_code, $languages_with_translation, true ) ) {
    $attributes['unavailable_languages'] [] = $language_code;
    }
    }

    // Add the original language.
    if ( ! empty( $translations[ $language_info['language_code'] ]->source_language_code ) ) {
    $original_language = $translations[ $language_info['language_code'] ]->source_language_code;
    } else {
    $original_language = $language_info['language_code'];
    }

    $attributes['source_language_code'] = $original_language;

    // Add an array of which languages this can be shown on.
    // - Show in the current language
    // - Show on other languages if not available in the current language and is the original.
    $attributes['languages_to_show_on'] = array();

    foreach ( $all_language_codes as $language_to_show_on ) {
    // This item is in this language, show it for this language.
    if ( $language_info['language_code'] === $language_to_show_on ) {
    $attributes['languages_to_show_on'][] = $language_to_show_on;

    continue;
    }

    // 1. If this language doesn't have a translation available.
    // 2. This is the original language item.
    if (
    ! in_array( $language_to_show_on, $languages_with_translation, true )
    && $language_info['language_code'] === $original_language
    ) {
    $attributes['languages_to_show_on'][] = $language_to_show_on;
    }
    }
    }
    }

    function get_content_translations( $post_id = '' ) {
    global $sitepress;

    if ( empty( $post_id ) ) {
    $post_id = get_the_ID();
    }

    $wpml_post_type = 'post_' . get_post_type( $post_id );

    $t_post_id = $sitepress->get_element_trid( $post_id, $wpml_post_type );
    $translations = $sitepress->get_element_translations( $t_post_id, $wpml_post_type, false, true );
    $translations_that_exist = array();

    foreach ( $translations as $key => $translation ) {
    if ( ! empty( $translation->post_status ) && $translation->post_status === 'publish' ) {
    $translations_that_exist[ $key ] = $translation;
    }
    }

    if ( ! empty( $translations_that_exist ) ) {
    return $translations_that_exist;
    }

    return array();
    }

    I’m indexing PDF content and stumbled upon this thread (I’ve got really long PDFs and they’re being split into a really large number of parts).

    @sgrx how is the 6000 setting working out for you? do you have many other attributes being indexed contributing to your record size?


    @tw2113 Do you have any idea what the behaviour is if the content is chunked as such:

    Chunk 1:

    Algolia is a French proprietary search-as-a-service platform, with its headquarters in San

    Chunk 2:

    Francisco and offices in Paris and London. Its main product is a web search platform for individual websites.

    If a user comes along and searches for a string that spans chunks, e.g. “with its headquarters in San Francisco and offices in Paris” how does it behave with split chunks? Would it find two items as fairly low partial matches and then de-duplicate them to return one item?

Viewing 4 replies - 1 through 4 (of 4 total)