본문 바로가기

Marketing and SEO

Robots.txt

반응형

Most web browsers use crawling to index web pages to easily find the pages and show them in the search result. robots.txt is a text file that defines the list of sites that can access or cannot access the page to prevent possible overload or malicious attacks (However, there are other ways to access the page even if the page is listed in tobots.txt such as when the page is linked to another page that is indexed. So do not use robots.txt to block access

 

robots.txt is not mandatory so whether to follow the rule is based on the agent's discretion. However, most of the renowned agencies such as Google and Bing honor the rule

Implementation

Source: https://designpowers.com/blog/url-best-practices
Source: https://developer.mozilla.org/en-US/docs/Learn/Common_questions/Web_mechanics/What_is_a_URL

robots.txt must be added for each protocol or port and it should be located in the root folder. For example, https://www.eg.com/robots.txt is different from https://m.eg.com/robots.txt or http://eg.com/robots.txt

Creating the File

Create a text file named 'robots' (this name only). The format should be UTF-8. You can only have one robot.txt for a website

robots.txt

Syntax

robots file is comprised of agents and access permission. Commands are as follows:

 

▶ Agent (target), marks the beginning of a group (Required / You can have one or more in a group)

User-agent:

 Access Denial (Either access grant or denial is required / You can have one or more in a group)

Disallow:

Access Grant (Either access grant or denial is required / You can have one or more in a group)

By default, access from all crawlers is allowed. So as long as a crawler is not listed in 'Disallow', this won't be necessary to allow access

Allow:

 Site Map (Optional / You can have one or more in a group)

Sitemap:

※ A wildcard (*) can be used to allow all crawlers (However, For the 'AdsBot', you cannot use a wildcard and you must specify the agent).

If the path is a directory, it should start with '/' and end with '/'


Example,

# Example 1: Block only Googlebot
User-agent: Googlebot
Disallow: /

# Example 2: Block Googlebot and Adsbot
User-agent: Googlebot
User-agent: AdsBot-Google
Disallow: /

# Example 3: Block all crawlers except AdsBot (AdsBot crawlers must be named explicitly)
User-agent: *
Disallow: /

More Examples

 

robots.txt 파일 만들기 및 제출 | Google 검색 센터  |  문서  |  Google Developers

robots.txt 파일은 사이트의 루트에 위치합니다. robots.txt 파일을 만들고 예를 확인하며 robots.txt 규칙을 확인하는 방법을 알아보세요.

developers.google.com

Testing

If the robot.txt file is accessible, you can use Google Search Console Tester to test

Google Search Console Tester

 

Google Search Console

로그인 Google 검색 콘솔로 이동

accounts.google.com


References

https://developers.google.com/search/docs/crawling-indexing/robots/intro

 

robots.txt 소개 및 가이드 | Google 검색 센터  |  문서  |  Google Developers

robots.txt는 크롤러 트래픽을 관리하는 데 사용됩니다. robots.txt 소개 가이드에서 robots.txt 파일의 정의와 사용 방법을 알아보세요.

developers.google.com

https://developers.google.com/search/docs/crawling-indexing/robots/create-robots-txt

 

robots.txt 파일 만들기 및 제출 | Google 검색 센터  |  문서  |  Google Developers

robots.txt 파일은 사이트의 루트에 위치합니다. robots.txt 파일을 만들고 예를 확인하며 robots.txt 규칙을 확인하는 방법을 알아보세요.

developers.google.com

https://developer.mozilla.org/en-US/docs/Learn/Common_questions/Web_mechanics/What_is_a_URL

 

What is a URL? - Learn web development | MDN

With Hypertext and HTTP, URL is one of the key concepts of the Web. It is the mechanism used by browsers to retrieve any published resource on the web.

developer.mozilla.org

 

728x90
반응형

'Marketing and SEO' 카테고리의 다른 글

Google Search  (0) 2023.04.21
SEO (Multi-Lingual Blog Indexing)  (9) 2023.04.21
Creating a Unified Error Response Format  (0) 2023.04.20