While I’m currently developing a client’s website, the root of their website has a simple coming soon page. I decided to also setup a subdomain so that I can use it as a development environment and also send it as a link to the client so that they can see a close representation of the progress and actually interact with the website.
One challenge with this is, while I want the root domain with the coming soon page to be index by Google, I didn’t want the subdomain to be indexed because at some point when the site is done, I’d probably delete the subdomain.
According to Google, including a meta tag with the content value of
noindex and name value of
robots will cause the Googlebot to completely drop the page from the Google Search results when it next crawls.
This is what the
noindex meta tag looks like in the head of your web page.
<head> <meta name="robots" content="noindex"> <title>Your cool website</title> </head>
The meta tag will need to be included in every single page you want the Googlebot not to index. If you want to block the bot completely instead of telling which individual pages not to index, you’ll want to use the
The other method is to block all search engine crawler bots from indexing your site. To do this, you’ll create a
robots.txt file and place it at the root of the domain. This method also assumes you have file upload access to your server.
The contents of robots.txt will be:
User-agent: * Disallow: /
Which tells all crawlers to not crawl the entire domain. So for example if I’ve got a subdomain of
dev.example-url.com and I want just the subdomain of
dev to be blocked, I’ll want to place the
robots.txt file at the root for the subdomain.
Do I Need Both?
Nope, you only need one method, but remember with the
noindex tag, you’ll need to add it to every page you desire not to be indexed, while the
robots.txt will instruct the crawler to not index the entire subdomain.