• Resolved flyfisher842

    (@flyfisher842)


    In relation to the Google mobile friendly push
    If I was going to allow the robots access to the wp-content/themes, the wp-content/plugins and the wp/includes folders in robots.txt but add some x header meta no-index to the htaccess code through BPS, how would I syntax the code.

    Ex
    robots.txt
    Allow: /wp-content/uploads/
    Allow: /wp-content/themes/*js?
    Allow: /wp-content/themes/*css?
    Allow: /wp-content/plugins/*js?
    Allow: /wp-content/plugins/*css?

    ex htaccess code
    <Files ~\/wp-content\/;>
    Header set X-Robots-Tag “no-index”
    </Files>

    Can all this be done through htaccess and stop messing with the robots.txt?

    https://ww.wp.xz.cn/plugins/bulletproof-security/

Viewing 15 replies - 1 through 15 (of 17 total)
  • Plugin Author AITpro

    (@aitpro)

    WordPress recommends that you use the WordPress virtual robots.txt function and so do I. http://forum.ait-pro.com/forums/topic/wordpress-robots-txt-wordpress-virtual-robots-txt/ So I do not recommend handling anything related to robots via htaccess code.

    The Files and FilesMatch htaccess directives only allow using filenames or file extensions. You cannot use a URI/folder path.

    http://httpd.apache.org/docs/current/mod/core.html#files

    The <FilesMatch> directive limits the scope of the enclosed directives by filename, just as the <Files> directive does. However, it accepts a regular expression. For example:

    Plugin Author AITpro

    (@aitpro)

    Did this answer all of your questions? If so, please resolve this thread. If not, please post any additional questions you may have. Thanks.

    Thread Start Date: 4-17-2015 to 4-18-2015
    Current Date: 4-25-2015

    Thread Starter flyfisher842

    (@flyfisher842)

    Was there an example for the Regex in your reply of a week ago?

    Plugin Author AITpro

    (@aitpro)

    I can provide a general example and also strongly recommend that you do not use it. So here you go.

    Reference: https://developers.google.com/webmasters/control-crawl-index/docs/robots_meta_tag

    I strongly recommend that this code below is not used and a WordPress Virtual Robots function is used instead.

    <Files ~ "\.(png|jpe?g|gif)$">
      Header set X-Robots-Tag "noindex"
    </Files>
    Plugin Author AITpro

    (@aitpro)

    Or of course you can use a robots.txt file to do whatever you want to do since that is what a robots.txt file is for. It is kind of silly to do this with .htaccess code when there is already a Internet standard for this: a robots.txt file.

    Thread Starter flyfisher842

    (@flyfisher842)

    Yes the htaccess code has caused me some problems by locking my browser onto a sight picture that is not current.

    So how do you keep Google from indexing stuff without the no index meta tag. Or has Google finally learned not to index the items in the wp-content and wp-includes folder of a wordpress site.

    Most of the themes I have tested for mobile friendly Google needs access to the css and js files to render the page on mobile. And usually some js in the wp-includes folder too.

    A Disallow whether on a robots.txt or virtual would do the same thing I would think. Unless the flag “\n” at the end is a no index flag. If that is the case, then yes the virtual robots file is better.

    Plugin Author AITpro

    (@aitpro)

    So how do you keep Google from indexing stuff without the no index meta tag.

    If you do not want the googlebot to crawl and index something you would add that in your robots.txt file or in a WordPress Virtual Robots function.

    Example: I do not want the googlebot to crawl or index these folders and pages: /members/, /groups/, /activity/p/ and the login page /wp-login.php

    // WordPress Virtual robots.txt additions
    add_filter( 'robots_txt', 'v_robots', 10, 2 );
    
    function v_robots( $output, $public ) {
    $output .= "Disallow: /members/" . "\n";
    $output .= "Disallow: /groups/" . "\n";
    $output .= "Disallow: /wp-login.php" . "\n";
    $output .= "Disallow: /activity/p/" . "\n";
    return $output;
    }

    has Google finally learned not to index the items in the wp-content and wp-includes folder of a wordpress site.

    The googlebot has never crawled or indexed any files in these folders on any of my websites so if that is happening on your site then something is fubar on your site.

    \n means newline and not no index. Disallow means do not crawl or index.

    Thread Starter flyfisher842

    (@flyfisher842)

    I know what the disallow does whether virtual or real robots.txt. My problem is Gbot needs access to the theme css and js to render the page for the mobile friendly test. I don’t want to have to edit a robots.txt file every time I change themes. So I have the entire wp-content/themes folder set to Allow. I also don’t want Googlebot or their mobile bot or any other respectable bot indexing the files in wp-content themes or plugins. robots.txt is either allow or Disallow.

    So How do I do this.

    Plugin Author AITpro

    (@aitpro)

    I have never done anything with my /wp-content/ folder or /plugins/ folder that has anything to do with Robots ever – meaning the last 10 years. I have never had a single problem with any of the files in the wp-content folder or plugins folder being indexed on around 10 different websites and in the last 10 years. So the problem you are describing has never happened to me before and I have never heard anyone else ever mention this except for you. So logically the only conclusion I can come to if you are experiencing a problem with this is – it is an isolated unusual issue that is only occurring on your sites because of some problem on the sites that is causing that issue.

    If you do not use Disallow then the default is Allow. In other words, you do not need to add/use Allow.

    The answer that I use on all my sites is that I do not do anything for/with or to either the wp-content or plugins folder that has anything to do with Robots. In other words, I do not have any other suggestions because you should not have to do anything at all with either of these folders that has anything to do with Robots.

    Thread Starter flyfisher842

    (@flyfisher842)

    You say you do not do anything with them. Do you just not put them in any robots.txt which would by default mean an allow search. Do you leave the wp-admin and wp-includes open too. Or does BPS take care of them in the htacess.

    Years ago I left the wp-content out of my robots file and sure enough Google indexed part of the plugins folder.

    Plugin Author AITpro

    (@aitpro)

    I literally do not do anything with any of the WordPress Core folders that is related to robots.txt in any way. BPS adds security protection for all WordPress Core folders and all your website files. BPS does security protection and does not do SEO stuff with a robots.txt file.

    Thread Starter flyfisher842

    (@flyfisher842)

    I understand about BPS not doing SEO stuff too. (What I don’t understand is why Google has not indexed your Core files. Google indexes just about everything they spider.)

    This indexing happened about 2009 or 2010 when I was using an editor plugin with a vulnerability that I and the developer did not know about. Whether a hacker found it by trolling or by using site command in Google, I will never know but I got hacked. This was when I found out some hard lessons and started looking for better security.

    So I get a bit paranoid about Google indexing stuff they have no need to index. That is why I have been using the Disallow on the Core Folders. Has nothing to do with SEO. This was fine until all this mobile stuff.

    I have tried two different “mobile ready” themes and Google needed to have access to the css/js on both for them to be Google approved “mobile ready”. They looked fine on my tablet and on my phone without doing anything different than I had been doing. But they were not fine for Google.

    I also tried a mobile plugin which was a total disaster. And yes I remember what you said about those too.

    Thread Starter flyfisher842

    (@flyfisher842)

    Here is what I found for your site using the site command
    http://www.ait-pro.com/wp-content/themes/AITpro/smoothmenu.htm

    This is why I don’t like Google or any other search engine bot wondering around my Core Files

    Plugin Author AITpro

    (@aitpro)

    I need for that file to be accessible, crawled and indexed by the googlebot. It is an alternate loading menu for folks that disable javascript in their Browsers.

    Plugin Author AITpro

    (@aitpro)

    Plus it boosts my SEO up quite nicely since it acts like a sitemap and without showing all that crap in my actual site menus. 😉

Viewing 15 replies - 1 through 15 (of 17 total)

The topic ‘css,js access control’ is closed to new replies.