Sitecore Sitemap XML module is one of the brilliant Sitecore Marketplace module. We have been using it frequently in many of our Sitecore implementations and it works seamlessly. Here I would like to share a side effect of this module which we encountered today. We all have lower environments other than PRODUCTION i.e DEV, QA, STAGING and would wish that the content from the lower environments should not be indexed by any bots. Restricting bots to crawl a website content is all together a different topic and that is not what I wanted to write in this post.
The word of caution is if you are using this module and have not created a robot.txt on your website root directory. Sitecore Sitemap XML will autogenerate robot.txt and place a reference of sitemap.xml into it. It updates robot.txt for every publishing you do as the code is hooked into publish:end event of Sitecore.
There might be a thought in your mind as to why lower environment is accessible over the internet or publicly? Well the environments might be hosted on cloud and in this global village where people work from different part of the worlds the lower environments are required to be accessible from anywhere.
The code / package on Sitecore Marketplace is not updated so would recommend to visit GitHub repository at SitecoreSitemapXML. It also contains a fix made by hloken in his commit, where he introduces a setting in the config file specifying whether to generate / update the robot.txt.
So proud to be part of Sitecore Community where you always find solution to your problems.