This post is part of the weekly Q&A section. Just use the contact form if you want to submit a question.
Rick Regan asks:
I have a two-part question about how Google indexes sites:
1) Is Google supposed to index an entire site at once or does it do so incrementally? My blog (on my own domain) appears to be getting indexed incrementally, to the point where it’s taken a month to index all 30+ of my pages. Most of those pages were present before I manually submitted my URL to Google. I’m wondering if I’m doing something wrong or not doing something I should be doing. Does this have anything to do with my blog being new, or having no external links pointing to it?
2) Does Google eventually drop noindex and 404 pages? I have archive pages that got indexed but I have since added to them the “noindex” tag. I also deleted an empty category that Google now gets a 404 on. Will Google eventually remove those pages as it re-crawls my site?
I will answer to each question individually.
1) Most of the times Google will index a new website gradually, yes. That is at least what I have observed with most of my websites. The speed at which Google will index all your internal pages will depend on different factors though.
If you get some very trusted and relevant backlinks, and on top of that you also have a very efficient internal link structure, all your internal pages will get indexed fast. If, on the other hand, you have very few backlinks and a poor link structure, it might take a while before you get to see all your pages indexed.
Keep in mind that using the manual URL submission to Google will have a small impact upon the speed and breadth of your indexation. In fact many people recommend that if you want to get a site indexed fast you should NOT use that feature, and rather focus on getting some trusted backlinks to your site.
2) Yes, eventually Google will fix those issues. New sites don’t get crawled very often, so that is certainly the reason. As soon as Google finds out about the “noindex” tag, for example, it will remove those pages from their index.