• Resolved evita086

    (@evita086)


    Hello,

    I’m having the same issue as this:
    https://ww.wp.xz.cn/support/topic/search-spam-6/

    There are up to 8,000 to 10,000 spam searches showing up in Algolia.

    They show up in the logs like this:

    "GET /de/%21/6/?s=+Wo+kaufen+viagra+in+wien%F0%9F%96%95%F0%9F%A4%AA%F0%9F%8E%96+www.ZavaMed.store+%F0%9F%8E%96%F0%9F%A4%AA%F0%9F%96%95+Kamagra+aus+indien+bestellen+Cialis+5mg+f%C3%BCr+die+frau+preis+im+ausland+bestellen+Viagra+150+mg+kaufen+ohne+rezept+billige HTTP/1.1" 200 33087 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.83 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "MISS:(lb2-c28-2)" "not-in-cache" "-" "66.249.66.81" "0" "0" "1" 0.693 0.690 D=684513

    Not sure how they are pulling this off. We are blocking access to /?s= and /search with robots.txt

    We turned off autocomplete but it’s still going on. Is there a way to limit the query length as all these are typically extremely long?

    Or can you recommend any other solution?

Viewing 11 replies - 1 through 11 (of 11 total)
  • Thread Starter evita086

    (@evita086)

    I temporarily added this to .htaccess:

    <If "%{QUERY_STRING} =~ /^(s|search)=/">
      AuthName "WordPress Admin"
      AuthType Basic
      AuthUserFile /home/admin/web/.htpasswd
      require valid-user
    </If>

    This made it so that nobody can search (bot or human) without signing in.

    And yet, those spam requests were STILL showing up in Algolia (about 1 search per second).

    Any ideas would be appreciated.

    Plugin Contributor Michael Beckwith

    (@tw2113)

    The BenchPresser

    Admittedly I am about in the same place as that other thread you pointed out, where I don’t have any solid leads on this.

    Did you check out https://www.algolia.com/doc/faq/basics/too-many-false-unexplained-search-operations/ at all and try some of the suggestions?

    Thread Starter evita086

    (@evita086)

    Hey Michael,

    Thank you for your reply!

    Yes, I did try everything that I could. We can’t use the IP from “Search API Logs” in Algolia as it is always the IP of our hosting provider that’s making the request in Algolia. So we used our logs to track down the IP and it’s… Googlebot! Worth noting we had the same issue with Majestic’s bot but we could block that one.

    This is strange because I am blocking all bots from accessing search pages with robots.txt and verified the block is in place through Google’s Search Console (inspected a search URL and confirmed it can’t be crawled and fetched).

    Tried blocking with .htaccess but no luck. This is our latest attempt (in case it works for someone elese):

    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot) [NC]
    RewriteCond %{QUERY_STRING} ^s= [OR]
    RewriteCond %{REQUEST_URI} ^/search/
    RewriteRule ^ - [F]

    This should be blocking Googlebot and Bingbot from accessing search pages but it’s not. They are still going through ignoring both that and robots.txt

    Tried using an API key with rate limiting but that didn’t seem to do anything at all. We copied that key in the plugin and it still works (so it is valid) but rate limiting doesn’t seem to apply. Perhaps that’s because Googlebot is using far too many IPs.

    You don’t have a way of excluding specific queries (contain keyword) with a custom function, do you? Or limiting the query characters (they are always long)?

    Plugin Contributor Michael Beckwith

    (@tw2113)

    The BenchPresser

    You don’t have a way of excluding specific queries (contain keyword) with a custom function, do you? Or limiting the query characters (they are always long)?

    None that I know of offhand, but I do know we kept and provide a lot of WordPress filters in the codebase to help with customization, and some in there may be good candidates to help out with this. I just don’t know them off the top of my head.

    Thread Starter evita086

    (@evita086)

    Michael,

    Thank you, I’ll take a look!

    Do you have a link to said codebase?
    In the meantime, I’ll update this post if I figure this out.

    Plugin Contributor Michael Beckwith

    (@tw2113)

    The BenchPresser

    @evita086 it’s just the plugin code itself. If you want the GitHub link, it can be found at https://github.com/WebDevStudios/wp-search-with-algolia/ otherwise just crack open the plugin in your IDE or text editor.

    Thread Starter evita086

    (@evita086)

    For anyone possibly facing the same issue, here’s what the spammers are doing:

    1. They are not actually conducting searches. They are linking to the search pages from an indexed site they control. This, however, does trigger a query in Algolia.
    2. This causes Google to index the pages even if you are blocking access to /?s or /search/ via robots.txt

    Now, this won’t stop them from spamming your search bar but here’s what you can do to at least not let them achieve their goal.

    1. Remove your Disallow rules for search pages from robots.txt
    2. Make sure you are using noindex for search pages.

    We were doing BOTH and that was the problem. Because of the fact our robots.txt was blocking search engines from accessing search pages, Google (for example) couldn’t see the “noindex” tag. And they indexed the pages even though they were being blocked by robots.txt

    Hope this helps someone.

    Worth noting spam searches are almost non-existant now but not due to an action we took. They just stopped. My guess is that’s because they were able to index the pages they wanted. They will get de-indexed with the new setup so they may come back; we’ll see.

    • This reply was modified 4 years, 8 months ago by evita086.
    • This reply was modified 4 years, 8 months ago by evita086.
    Thread Starter evita086

    (@evita086)

    Update: spammers are back, unfortunately. Getting thousands of queries again.

    Thread Starter evita086

    (@evita086)

    .

    • This reply was modified 4 years, 8 months ago by evita086.
    Plugin Contributor Michael Beckwith

    (@tw2113)

    The BenchPresser

    Sadly I haven’t determined anything new since last week when I last responded.

    Thread Starter evita086

    (@evita086)

    That is fine, thank you for trying. Just to clarify this code works, if you place it in your .htaccess:

    
    RewriteEngine On
    RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot) [NC]
    RewriteCond %{QUERY_STRING} ^s= [OR]
    RewriteCond %{REQUEST_URI} ^/search/
    RewriteRule ^ - [F]
    

    To anyone interested in that solution:

    1. You may have to add more bots in the second line.

    2. Keep in mind this will completely block the bots you specify from accessing your search pages which means they won’t get indexed even if you want them to. Make sure spam search queries haven’t been indexed by Google/Bing before you add these directives to your htaccess file. If you see spam queries indexed, it’d be best if you remove them first. In our case, search pages with spam queries got indexed because we were blocking bots with robots.txt and, therefore, they couldn’t see the “noindex” tag we were applying to search pages. What we did is: remove the robots.txt block and then allow Google time to crawl pages to see the noindex tag and deindex them and THEN add the above code to htaccess

    Good luck!

Viewing 11 replies - 1 through 11 (of 11 total)

The topic ‘Search Spam!’ is closed to new replies.