Hi @slamsal, YARPP kinda already has you covered here. When searching for related posts from a “reference post”, it looks for the 20 most common words (that aren’t stop words like “the”, “from”, “by”, etc) and only uses those for comparison. You can see the code in classes/YARPP_Cache.php, in the function extract_keywords.
So I don’t think the optimization you’re thinking of is necessary, because YARPP only compares using 20 words by default.
Does that make sense?
Because I’m aggregating lots of article from around the web in one post but the title and generally the first two paragraphs are always related to the title of post and rest of content is usually unrelated.
I’m currently doing something like this:
return apply_filters( 'yarpp_body_keywords', $this->extract_keywords( apply_filters( 'the_content', wp_trim_words( $post->post_content, 120 ) ), $max, $ID ), $max, $ID );
Please let me know if you think I did it right.
Thank you again!
Hi @slamsal ahh I see, so it’s not an efficiency issue, gotcha.
What you’re doing there will work… until the next time YARPP is updated, and your changes get overwritten by the update.
I looked into it and it’s quite hard to do that with the current codebase. We’re discussing what changes could be made to make it easier to customize what words YARPP uses when evaluating relatedness.
`
Awesome! Looking forward to the updates.