Having a website today is an essential part of an online presence, and it’s important that your target customers can find your website by your brand name in google. And, of course, one of the ways you hope they will find it is through search. But typically, you have to wait around for the Googlebot to crawl your website and add it (or your newest content) to the Google index.
So the question is: how do you ensure this happens as quickly as possible? Here are the basics of how website content is crawled and indexed, plus some great ways to get the Googlebot to your website or blog to index your content sooner rather than later.
What Is Google Indexation?
Since Google is the world’s largest search engine, we’re going to be focusing on their index.
What Is A Web Index?
You know what an index is, right? It’s basically a list of information, with instructions on how to find that information. Think of an alphabetical index at the end of a book.
A web index isn’t very different. It’s a database — a list of all the stuff on the Internet. It keeps track of where this information is, and helps you find it.
“All the stuff on the Internet,” by the way, is a mind-blowing amount.
How does Google keep track of all this stuff? They store it in a really big database.
Think of the Internet as a library. There are billions upon trillions of books (websites) with individual chapters (pages of a website).
Google’s search engine is the index to this library. You don’t have to know the Dewey Decimal System, thankfully. All you need to know is how to type.
When you type stuff into Google, you are searching Google’s index. The search results page is the index page.
But How Does Google Index The Internet?
In order to index a library as big as the Internet, you have to have some powerful tools.
They’re called spiders.
Why spiders? Because they crawl from site to site, essentially creating a web of information.
The process is called fetching. The spiders fetch information that is stored in the Google index.
The web spider crawls to a website, indexes its information, crawls on to the next website, indexes it, and keeps crawling wherever the Internet’s chain of links leads it.
Thus, the mighty index is formed.
But What Kind Of Information Are The Spiders Storing?
Spiders try to look at most of the information on a website, but they can’t look at everything. They index the most important information first.
What kind of information?
- URLs — The address. That’s pretty important. Otherwise, you’ll never be able to find the website in the first place.
- Title tags — Title tags are the name of the website.
- Metadata — The description of the website along with any relevant keywords.
This is the main information that the spiders retrieve for Google’s index. And this is what you see in an index.
That’s the basic idea. Obviously, there’s a lot more complexity to the way that those search results are returned and organized.
Let’s sum all this up in a brief video from Matt Cutts himself. The content is a bit dated, but the information is still valid.
Why? Because your success at indexation depends on all that you just learned.
Now, let’s figure out how to get Google to index your website.
How To Get Google To Index Your Website
The fact is, Google is probably going to index your website, regardless, unless you’ve specifically taken action to refuse indexation (in the robots.txt.)
But you want your website indexed quickly and successfully. Here’s your step-by-step guide to getting your website indexed in the best way possible.
First, Create A Sitemap
A sitemap is a document in XML format on your website. The crawler looks at this page in order to learn all about your website — how big it is, what the most important pages are, and where new content is located. An XML sitemap is the critical first ingredient to successful indexation.
The spiders are smart, but they really do need a map.
Without a sitemap, crawling can take a long time — as long as 24 hours to index a new blog post or website.
That’s too long.
With a sitemap, you can shave that time down to just a few minutes.
That’s right. Your website, blog, or new page can be indexed by Google in less than an hour.
One experiment compared no sitemap indexation results to a site mapped indexation result, and the time difference was incredible.
Once you’ve created your sitemap, you can upload it in the Google Search Console. Here’s how:
- On your Search Console home page, select your site.
- In the left sidebar, click Site Configuration and then Sitemaps.
- Click the Add/Test Sitemap button in the top right.
- Enter /system/feeds/sitemap into the text box that appears.
- Click Submit Sitemap.
Create A Robots.txt
The robots.txt is a simple file on your website that instructs search engines what to index and what not to index.
This is the very first stop that a spider makes on its journey to index your website. If your robots.txt says, “don’t index me,” then the spider will move along.
Thus, it’s very important that your robots.txt gives Google unrestricted permission to crawl the site.
Of course, if there are sections of your website that you don’t want to appear in the search results, you can set this up in your robots.txt.
A robots.txt is essentially a list of commands to the search engine that says, “Index this,” or, “Don’t index this.”
Most websites don’t need to set up restrictions for crawling, indexing or serving, so their pages are eligible to appear in search results without having to do any extra work. That said, site owners have many choices about how Google crawls and indexes their sites through Webmaster Tools and a file called “robots.txt”. With the robots.txt file, site owners can choose not to be crawled by Googlebot, or they can provide more specific instructions about how to process pages on their sites.
The more pages you have on your website, and the more you open these up to the index, the better your indexation will be.
Submit Your Site To Search Engines
It used to be that webmasters would submit their site to search engines in order to get it indexed.
In fact, some SEOs actually offered to do this for website owners, promising faster and superior indexing.
Create Internal Links
The most effective way to boost your website’s indexation is through linking.
The paths that the spiders take through the Internet are formed by links. When one page links to another page, the spider follows that path.
Within your own website, make sure that you’ve created links to and from all your most important pages.
Usually, this happens naturally as long as you have a well-organized website. For example, this standard architecture of a restaurant website links to internal pages with main navigation.
It’s helpful to link between as many pages as possible. For example, you should link from one blog article to another one, or from a blog article to an evergreen content page on your website.
Earn Inbound Links
The most powerful form of linking, however, comes from outside websites. When other websites link to yours, it gives you a ton of indexation power.
Not only does your website get indexed faster, but it also earns more SEO power.
Encourage Social Sharing
Social sharing is a big part of indexation. Since Google and Twitter have partnered to share data, Google can access this form of data quickly and accurately, improving indexation.
Create A Blog
A blog creates tons of juicy content for spiders to crawl and Google to index. The more high-quality content you put on the web, the more indexation you get, and the more SEO power you earn.
Create An RSS Feed
An RSS feed isn’t absolutely necessary, but it doesn’t hurt.
RSS stands for “really simple syndication.” It is a way to announce the publication of new content. It used to be a popular way to subscribe to blogs, but that has kind of become outdated.
Even so, an RSS feed can be an effective way of telling Google about your new content as soon as it’s published.
Check For Crawl Errors
Sometimes, a website has issues with crawling. This usually happens when you make significant changes to your website such as adding, removing, or moving pages.
- To access your crawl report, open up Google Search Console and select your website.
- Click on Crawl.
- Click Crawl Errors.