tdockweiler
Forum Replies Created
-
Forum: Plugins
In reply to: [Blackhole for Bad Bots] Need help removing thousands of /?blackhole linksOK, I finally solved this mystery!
I’m no programmer and have only been using Woocommerce for 2 years, but maybe the issue with your plugin is not related to caching, but how Woocommerce handles query strings?
So your plugin is actually what originally caused Googlebot to even find those /?blackhole links. Over 172k of them.
Due to the way Woocommerce handles query strings, Googlebot thinks the links still exist! That’s unless you redirect them ALL to clean urls permanently. This is why they’re still around after removing your plugin.
When I opened the link to one and did “View Source” those “/?blackhole” links were injected into the dynamically generated page content.
Funnily enough if you change the url links to a new parameter like “/?Whatever”, guess what happens? You do “View Source” and all the add to cart links and page selection links have that query injected into them.
This is somehow a “feature” of Woocommerce. I’ve confirmed it does this on 3 clean installs of Woocommerce with no data. I can only get it to happen on the shop page with 2 or 3 test products. Nowhere else.
Now I feel kind of stupid for wasting my time on this when it’s Woocommerce making things even worse.
But the blackhole urls definitely came from your plugin, but I don’t think caching is what caused it. Maybe Woocommerce doing something weird with query strings. Plus Googlebot may have been ignoring disallow directives in robots.txt
Now I was able to have ChatGPT make a pretty cool query string remover code snippet that allows you to whitelist essential query strings. The whitelist is a MUST if you use a plugin with custom query strings. This code snippet will strip all other query strings in the url to the clean urls. I also have a second one that would log new garbage urls with unknown query strings to debug.log. Pretty cool.
Here is the first one:
WARNING: nobody should use this without a site/database backup and knowing which query strings need to be whitelisted. This works perfectly for me with the Astra theme. This will ALSO of course clean those /?blackhole urls! You just enter this code into the code snippets plugin and activate it. This immediately fixes most of my headaches.
/**
* WooCommerce URLs, now whitelisting Advanced Woo Search
*/
add_action( 'template_redirect', function() {
if ( is_admin() || ! isset( $_SERVER['REQUEST_URI'] ) ) {
return;
}
$current_url = home_url( $_SERVER['REQUEST_URI'] );
$parsed = wp_parse_url( $current_url );
// Parse query vars if any
$query_vars = array();
if ( ! empty( $parsed['query'] ) ) {
parse_str( $parsed['query'], $query_vars );
}
// Define essential WooCommerce / WordPress query parameters
// Added Advanced Woo Search params: post_type and type_aws
$essential = array(
's', // search
'orderby', // product sorting
'paged', // pagination
'add-to-cart', // add product to cart
'remove_item', // cart item removal
'undo_item', // undo removal
'restore_item', // restore cart item
'quantity', // add-to-cart quantity
'update_cart', // cart update
'apply_coupon', // coupon application
'coupon', // coupon code
'post_type', // AWS search post type
'type_aws' // AWS search flag
);
// Keep only essential query vars
$clean_vars = array_intersect_key( $query_vars, array_flip( $essential ) );
// Rebuild clean URL
$base_url = strtok( $current_url, '?' );
$clean_url = $clean_vars ? add_query_arg( $clean_vars, $base_url ) : $base_url;
// Redirect if current URL has junk query strings
if ( $clean_url !== $current_url ) {
wp_redirect( $clean_url, 301 );
exit;
}
}, 1 );- This reply was modified 9 months, 1 week ago by tdockweiler.
Forum: Plugins
In reply to: [Blackhole for Bad Bots] Need help removing thousands of /?blackhole linksWant to hear something absolutely crazy? If it’s really true I’m going to feel like a complete dummy. I spent 8 hours today trying to figure this out. Even doing a complete reinstall of woocommerce on a staging site. ChatGPT was able to write me some code snippets to monitor the injections but it went around in circles. Eventually it thought it was coming from the server level and not Woocommerce or my database.
I thought maybe the OpenLiteSpeed WordPress one click install might have been bad and causing this.
Then I realized I had a blog with that I installed a month ago with nothing on it. I added Woocommerce to it and a few products. I added /?blackhole=1e2f8c274e&add-to-cart=12 after my blog domain and did “View Source”. All those /?blackhole links were on my add to cart and page navigation links.
On my main site with 30K products I get 55 links per page and Google seems to be following them.
With a clean install of Woocommerce with nginx/openlitespeed I get those links too. Again, that’s with no database import. No other plugins! No restored backup.
Maybe Woocommerce is using that code for some purpose that’s similar to what your plugin is doing.
Funnily enough, your Blackhole fix causes those sale links to redirect to a clean url.
I asked in the reddit Woocommerce forum what these are for and hopefully it’s normal.
This is the code I found being injected from Woocommerce (or WordPress? but unlikely).
<a href="/shop/?blackhole=1e2f8c274e&add-to-cart=12"
<input type="hidden" name="blackhole" value="1e2f8c274e" /></form>Hopefully I’ll get a reply soon and can stop all this non-sense.
But if this is all true then I owe you an apology. I still don’t know if all those 127K google links are from Woocommerce or what. I think your plugin might only install 1 link at the footer? I mean it makes no sense why it would add 50 right?
If I hear anything back i’ll post it here.
Forum: Plugins
In reply to: [Blackhole for Bad Bots] Need help removing thousands of /?blackhole linksThanks for the reply. You said it’s impossible for a plugin to have any effect on anything once removed (as far as you know). In the case of your plugin, this is definitely not true. This is what’s happening now. Maybe it’s a bug or some rare random occurrence. Even with page caching enabled, no plugin should literally nearly destroy a site like this. I am pretty sure I never even had any page caching enabled. It’s literally injecting code somehow into my entire site. Sometimes 50 urls on one page. They are in all the Add to Cart urls like as follows:
<a href="/?blackhole=1e2f8c274e&ppp=20&add-to-cart=115454"There is also this code entered into every /?blackhole page:
<input type="hidden" name="blackhole" value="1e2f8c274e" /><input type="hidden" name="ppp" value="20" /><input type="hidden" name="add-to-cart" value="115451" /></form>I also noticed absolutely bizarre line in my source code. Almost as if something was trying to run (it’s not from a plugin, i disabled them):
<span claadd-to-cart=115451: command not found
ss="xoo-wsc-sc-bki xoo-wsc-icon-cart2"></span>This is with ZERO cache enabled. No page cache, object cache etc. Plugin totally disabled.
It’s not a browser cache issue because it’s also showing these lines when using curl.
At this point I don’t know what to do. I guess i’ll have to use 301 redirects while the code injections still happen.
There is no malicious scripts or plugins on my site. I’m not hacked and I don’t know what’s going on. I just want to figure out how to stop these injections without relying on redirects.
My only idea is to use a staging site and import the database with only the essential content and see if it still happens.
I’ve spent a few days of non-stop work on this with no ideas on how to fix this. My site is basically dead at this point because of this and google crawling 127K /?blackhole urls. Each page has a canonical, but google is still indexing many of them.
I also should point out, there is zero references to “blackhole” or any of that above code in my database except for the product IDs.
The biggest mystery for me is that it’s treating the /?blackhole urls as different pages. Only those pages have injected code. The clean urls do not (with no queries at the end).
- This reply was modified 9 months, 1 week ago by tdockweiler.
- This reply was modified 9 months, 1 week ago by tdockweiler.
Forum: Plugins
In reply to: [Blackhole for Bad Bots] Need help removing thousands of /?blackhole linksHere’s the thing though. I don’t think even you the author understands the destructive power of your own plugin. Even with cache enabled a well written plugin should not go absolutely berserk on one’s site and cause 50+ urls to appear in every page. I’ve got them even injected into all my add-to-cart links. At the very least it should be able to totally reset and remove all those links when the plugin is deleted. It does not. Even if someone accidentally uses it with caching, it should clean up after itself when deleted. It 1000% does not. I can prove this too.
The plugin is 100% removed. No sign of it anywhere and still something left behind is injecting unwanted code into my site.
Currently my homepage is clear of any blackhole links, but when I visit the url with “/?blackhole=1e2f8c274e&ppp=20&add-to-cart=115451” added to the end, it’s being treated as a totally different page with all those injected urls in it’s page under “View Source”. So it’s being inject dynamically somehow.
Again, the plugin is 100% gone, but something your plugin did is still causing this injection to happen. There is no references to “blackhole”, “bbb_options” or “bbb_badbots” in my database or in /wp-content.
I used curl on the console to make sure browser caching wasn’t the issue and even that showed the /?blackhole links.
I was hoping the Blackhole cleaner was able to actually remove this links. It appears to just redirect them while the injecting of unwanted code is somehow still happening.
I did have an .htacccess redirect in place to remove the query strings, but it’s frustrating knowing the links are still being injected dynamically.
I do use caching (plugin deleted) and clearing all caches and even disabling caching does nothing.
Do you have any other ideas on what could be injecting these /?blackhole links? It’s like the plugin is still installed, but there is no sign of it anywhere. I did dozens of searches with grep, but nothing found anything.
Last night I tried to reinstall a clean version of the Astra theme, but no fix. Something is buried deep into my code or database and I can’t find out what is injecting these links.
This is just such a huge puzzle for me. Again, the feeling is as if there is a virus on my site and injecting all this unwanted code. I don’t understand this. It’s being done dynamically and it’s not in any HTML files.
If I can’t figure this out I will have to spend days recreating my 700MB database from scratch and hopefully be able to import all my orders, products and customer names. I don’t even know if this is possible.
Hopefully you have some more ideas.
PS I also tried a “Search and replace” plugin, but since the /?blackhole links are being injected dynamically, there is nothing to be found.
If you were to guess, what might be doing this? There is noting leftover in /wp-content/ or the the /plugins/ folder at all.
Basically, do you have any ideas how on earth I can stop these injections from happening? Where is it coming from and how is this even possible? The plugin is gone. It was suggested it’s in functions.php but I see nothing in there.
Hopefully I can figure this out.
- This reply was modified 9 months, 1 week ago by tdockweiler.
- This reply was modified 9 months, 1 week ago by tdockweiler.
Thanks for the reply.
I have good news, but am not sure if this is the cause.
It’s very possible this plugin just doesn’t play well with the Cloudpanel 1-click install from Vultr. My idea is perhaps the server settings are not stock and are causing issues. I did run WP-Optimize for 1 month, but maybe was not aware of it overloading anything.
So after I posted that initial message I did a migration to a totally new server. Same specs, but with this time using Plesk (1 click install) and Nginx only (not as reverse proxy).
I’m currently doing a huge preload and my CPU never hits more than 25-30%. With similar page caching plugins it’s about the same.
I also was able to save pages without the loading icon spinning non-stop.
The only major changes to my server is that I’m no longer using PHP 8.4, but a slightly lower version. MariaDB instead of MYSQL. I think the Cloudpanel 1 click install server settings must be wonky or something.
Plesk has been so reliable for me. I don’t know if the issues were with Cloudpanel, but maybe their settings on the OS didn’t work well with WP-Optimize. I didn’t change any myself and used stock ones.