While I’m currently developing a client’s website, the root of their website has a simple coming soon page. I decided to also setup a subdomain so that I can use it as a development environment and also send it as a link to the client so that they can see a close representation of the progress and actually interact with the website.
One challenge with this is, while I want the root domain with the coming soon page to be index by Google, I didn’t want the subdomain to be indexed because at some point when the site is done, I’d probably delete the subdomain.
noindex
method
According to Google, including a meta tag with the content value of noindex
and name value of robots
will cause the Googlebot to completely drop the page from the Google Search results when it next crawls.
This is what the noindex
meta tag looks like in the head of your web page.
<head>
<meta name="robots" content="noindex">
<title>Your cool website</title>
</head>
The meta tag will need to be included in every single page you want the Googlebot not to index. If you want to block the bot completely instead of telling which individual pages not to index, you’ll want to use the robots.txt
method.
robots.txt method
The other method is to block all search engine crawler bots from indexing your site. To do this, you’ll create a robots.txt
file and place it at the root of the domain. This method also assumes you have file upload access to your server.
The contents of robots.txt will be:
User-agent: *
Disallow: /
Which tells all crawlers to not crawl the entire domain. So for example if I’ve got a subdomain of dev.example-url.com
and I want just the subdomain of dev
to be blocked, I’ll want to place the robots.txt
file at the root for the subdomain.
http://dev.example-url.com/robots.txt
Do I Need Both?
Nope, you only need one method, but remember with the noindex
tag, you’ll need to add it to every page you desire not to be indexed, while the robots.txt
will instruct the crawler to not index the entire subdomain.