<\/span><\/h2>\nRobots.txt is a General Text file<\/strong>. So, if you don\u2019t have this file in root of your directory, then<\/p>\n\nOpen any Text Editor<\/strong> as you like ( Notepad, text) and<\/li>\nMake Robots.txt file<\/li>\n<\/ul>\nRobots.txt file<\/figcaption><\/figure>\n\nAnd upload it to your site. It’s done.<\/li>\n<\/ul>\nBy default, WordPress has the following robots.txt file in the root of the domain.<\/p>\n
User-agent: * <\/span><\/em> \n Allow: \/wp-admin\/admin-ajax.php<\/span><\/em> \n Disallow: \/wp-adin\/ <\/span><\/em><\/p>\nYou can check your wordpress Robots.txt file by simply typing yourwebsitename.com\/robots.txt<\/em>\u00a0<\/span>in the new Tab of your browser.<\/p>\nSearch In Browser<\/figcaption><\/figure>\nSo this is, who robots.txt file look like.<\/p>\n
<\/span>Basic Robots txt file Syntax<\/span><\/h3>\nRobots txt syntax<\/strong> is very simple, you don’t need to learn a new Programming language to make a robots.txt file<\/p>\nAvailable commands for directives are few. In fact, knowing just two of them is enough for most purposes.<\/p>\n
Here is a command;<\/strong><\/p>\n\nUser-Agent<\/strong>\u00a0\u2013 Defines the search engine crawler like Google, Yandex, Bing etc<\/li>\nDisallow <\/strong>\u2013 Tells the crawler to stay away from defined directories\/page\/file\/image.<\/li>\n<\/ul>\nAn asterisk (*) can be used to define universal directives for all the search engine.<\/p>\n
For example,<\/strong><\/p>\nTo block everyone from your entire website, you would Configure robots txt file<\/strong> in the following way:<\/p>\nUser-agent: * <\/em><\/span> \nDisallow: \/<\/em><\/span><\/p>\nHere, the Slash (\/) tell don’t crawl this.<\/p>\n
Let, we first clear this Example of the\u00a0robots txt File<\/strong> and then we move next.<\/p>\nI want to tell search engine no need to index my website. then I simply write a command in a .txt file and upload it to the root of my directory.<\/p>\n
Disallow: \/<\/span><\/em><\/p>\nBut this command is incomplete. I have to also mansion the search engine that is User-agent<\/p>\n
User-agent: * <\/span><\/em> \nDisallow: \/<\/span><\/em><\/p>\nHere Asterisk (*) defines, all the search engine. So according to this command, all search engine will\u00a0not index my website.<\/p>\n
But, if you only want to block google, then you have to\u00a0Configure robots txt file in the following way,<\/p>\n
User-agent: Googlebot<\/span><\/em> \nDisallow: \/<\/span><\/em><\/p>\nNote: –<\/strong> This command only blocks\u00a0the google bots to crawl your website.<\/p>\nBut, If you want to allow Only Googlebot and block all the other search engine then\u00a0you have to write the following command in your WordPress robots txt file<\/p>\n
User-agent: Googlebot<\/span><\/em> \nDisallow:\u00a0<\/span><\/em> \nUser-agent: *<\/span><\/em> \nDisallow: \/<\/span><\/em><\/p>\nThis code inside your robots.txt would give only Google full access to your website while keeping everyone else out.<\/p>\n
Note: –<\/strong> Command always runs in sequence. So it is important to first allow the search engine and then disallow it.<\/p>\n<\/span>Some Other Syntex of Robots file<\/span><\/h3>\n\nAllow<\/strong> \u2013allows crawling your site<\/li>\nSitemap<\/strong> \u2013 Tell where your sitemap file<\/li>\nHost –<\/strong> Tell the Primary domain<\/li>\n<\/ul>\nAllow Directory: <\/strong><\/span><\/p>\nA common misconception about Allow robots txt file directory<\/strong>\u00a0<\/em><\/span>is that this rule is used to tell search engines to check out your site<\/p>\nBasically, Allow is used to give permission to a\u00a0subfolder.<\/p>\n
For example;<\/strong><\/p>\nUser<\/span>–<\/span>agent<\/span>:<\/span> *<\/span><\/em><\/span><\/div>\nAllow<\/span>:<\/span>\u00a0\/content<\/span>\/<\/span>my<\/span>–<\/span>file<\/span>.<\/span>php<\/span><\/em><\/span><\/div>\nDisallow<\/span>:<\/span> \/<\/span>content<\/span>\/<\/span><\/em><\/span><\/div>\nThe search engines would stay away from a Content<\/em>\u00a0folder in general, but still access my-file.php.<\/p>\nNote: –<\/strong> it\u2019s important to note that you need to place the directive allow<\/em><\/span> first in order for this to work.<\/p>\nSitemap Directory:<\/strong><\/span><\/p>\nThis can be used to tell search engines or other robots where your sitemap is located. For example, the complete robots.txt could look like this,<\/p>\n
For example<\/strong><\/p>\nSitemap: https:\/\/www.hitechwork.com\/post-sitemap.xml<\/em><\/span> \nSitemap: https:\/\/www.hitechwork.com\/page-sitemap.xml<\/em><\/span> \nSitemap: https:\/\/www.hitechwork.com\/category-sitemap.xml<\/em><\/span><\/p>\nWordPress robots txt file are used to block particular directory where the\u00a0sitemap may is used to give the robot a list of pages that is available for indexing.<\/p>\n
As I already told you, By giving the search engine a sitemap you can increase the number of pages that it indexes. The sitemap can also tell the robots when the page was last modified, the priority of the page, and how often the page is likely to be updated.<\/p>\n
Host Directory: \n<\/strong><\/span><\/p>\nA Host is only supported by Yandex. This Command let you to decided whether you want to show www.example.com<\/span> or example.com in the\u00a0search result.<\/span><\/span><\/p>\nHost Syntex<\/figcaption><\/figure>\nFor example;<\/strong><\/p>\nHost: www.hitechwork.com<\/span><\/p>\nI don’t recommend to do this because\u00a0only Yandex support this.\u00a0But if you want to do, You can learn more about Host directive here<\/p>\n
Do only that setting that follow by all the search engine. Like, google Use 301 redirects to handle this situation.<\/p>\n
For example;<\/b><\/p>\n
If your domain starts with www. Peoples who are searching your website without www (hitechwork.com) will automatically redirect to www.hitechwork.com<\/p>\n
<\/span>Advanced Robots txt Syntex<\/span><\/h3>\nRobots.txt file not only uses\u00a0to prevent the search engine from crawling your site.<\/p>\n
Sometimes it is used to provide useful information to search engine and block unnecessary file to clear your website.<\/p>\n
For example:<\/strong><\/p>\nOn your website, you have a folder for testing content, affiliate links, unnecessary image and many other things.<\/p>\n
You want to keep this folder out from the search engine index. then you have to write the following command in robots file.<\/p>\n
Disallow: \/testfolder\/<\/em><\/span><\/p>\nAll the content in the testfolder is now blocked.<\/p>\n
But, if you wanted to block all folders from access that begin with wp<\/em> or something else. You could so like that:<\/p>\nUser-agent: *<\/span><\/em> \nDisallow: \/wp-*\/<\/span><\/em><\/p>\nIf you want to exclude all PDF files in my media folder from showing up in search results, again you have to write the following command in the robots.txt file.<\/p>\n
User-agent: *<\/span> \nDisallow: \/wp-content\/uploads\/*\/*\/*.pdf<\/span><\/p>\nNote: –<\/strong>\u00a0when you upload any file it will go\u00a0to the Uploads folder<\/strong><\/p>\nSee the below screenshot<\/p>\nURL of image<\/figcaption><\/figure>\nI replaced the month and day directories that WordPress automatically sets up with wildcards Asterisk (*)<\/p>\n
According to this command no matter when they upload. All the file in the uploads folder that ending with .pdf<\/strong> are blocked.<\/p>\nwww.hitechwork.com\/wp-content\/uploads\/2017\/04\/SEO.pdf<\/span><\/p>\n