Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
How rel=nofollow Works (luigimontanez.com)
209 points by luigi on Jan 12, 2012 | hide | past | favorite | 50 comments


The article is a bit wrong. nofollow does not mean that the crawlers must not follow the link, but means that the ranking algorithms must not consider the link for the ranking as the link could be dubious or spam.

As such, the relationship between the page linked and the page linking is not to be affected by the link. And this is what makes the links not surfacing.

Basically, the current Twitter HTML says: We have no outbound links.


It's true that there's a difference between crawling, indexing, and ranking, but Google is pretty clear on the ultimate effect.

http://support.google.com/webmasters/bin/answer.py?hl=en&...

How does Google handle nofollowed links?

In general, we don't follow them. This means that Google does not transfer PageRank or anchor text across these links. Essentially, using nofollow causes us to drop the target links from our overall graph of the web. However, the target pages may still appear in our index if other sites link to them without using nofollow, or if the URLs are submitted to Google in a Sitemap.


That's the robots nofollow, we're talking about a/@rel=nofollow which google advertises quite differently: http://googleblog.blogspot.com/2005/01/preventing-comment-sp... and the microformats standardization[0] stuck very carefully to "provide no additional weight or ranking to target" period. Not that crawlers can't follow that link for new content, and not that crawlers can't reverse that link for "shared on" features, just that this precise link won't add to the pagerank of the target.

[0] http://microformats.org/wiki/rel-nofollow


Dear god, your basically repeating one link from 7 years ago.

Provided here as information for people who might think you are offering a valid resource:

http://support.google.com/webmasters/bin/answer.py?hl=en&...


The thing is, if Google were to say “x was shared by y on twitter”, that is precisely providing some additional weight to the result. Human weight, admittedly, though I believe this also factors into the weight on the page. So it makes sense that twitter links would also not be included in this sharing stuff either.


Google has said that it won't add a URL to its index of URLs to crawl as a result of seeing the URL in a rel=nofollow href [1]. This is over and above not weighting the link if the URL does otherwise make it onto its to crawl list.

[1] http://googleblog.blogspot.com/2007/02/robots-exclusion-prot...


Note when Google first unveiled their 'rel=nofollow' invention in 2005, they were very careful to say only that it meant 'no weighting', but not necessarily 'no discovery via this reference'. The attempt to formalize the attribute maintained that same distinction. [1]

The Google manager blogging about it two years later (five years ago) was probably correct about then-current practice, but I wouldn't count on that as an enduring commitment that Google had never and would never follow such links.

[1] http://microformats.org/wiki/rel-nofollow


The post you're linking is discussing the META tag version of nofollow, not the hyperlink rel attribute which is detailed here: http://googleblog.blogspot.com/2005/01/preventing-comment-sp...

This article doesn't say explicitly that it won't follow those links and I would suggest it often will as comments are a rich source of pages to index. The nofollow on a link like this just indicates that the page owner can't vouch for the quality of the linked page (e.g. user submitted link) and therefore doesn't want to pass pagerank to it.

Edit: it's also implied in the parent's link that the META tag only prevents the following of links to pages within your site. This sounds reasonable as who are you to tell Google they can't index a third party domain?


"who are you to tell Google they can't index a third party domain?"

No tweets link to third-party domains, since Twitter wraps all links in their own URL shortener (t.co). Even if you use an independent URL shortener, it will be wrapped by t.co.

(I suppose you could argue that there is no way for Google to determine this algorithmically, since "twitter.com" != "t.co", so it should go ahead with the crawl, but there's the question of how Twitter would respond to that.)


They could create a robots.txt for t.co and set it to noindex.


Sure, but the probability of having an URL shared only through a tweet is zero. Google is insanely efficient at finding new URLs. So, it will find it anyway, but it will not be shown to you as "shared by your friend John" because of the nofollow.


> the probability of having an URL shared only through a tweet is zero

Countless URLs are shared only through tweets, for example URLs that are private URLs to content on services such as TwitPic[1].

1. http://www.dailymail.co.uk/tvshowbiz/article-2084361/AnnaLyn... (NSFW)


The cited Danny Sullivan article has a little of his interview with Eric Schmidt:

sullivan> I countered that Google seemed to have all the permission it needed, in that they’re not blocked from crawling pages.

schmidt>“That’s your opinion,” Schmidt said, then joked: “If you could arrange a letter from Facebook and Twitter to us, that would be helpful.”

sullivan> I pushed back that both have effectively given those letters since their robots.txt files — a method of blocking search engines — weren’t telling Google to go away.

Well, why stop there? Why shouldn't Google ignore the robots.txt file in its search for shareable nuggets for its search results, which they are equally not "blocked" from using? The answer for both rel=nofollow and robots.txt is that Google has explicitly promised webmasters that it will not do this. Sullivan knows this: this is bad journalism. I'd be curious to see more of the transcript of the interview.


That doesn't make sense. Twitter's robots.txt allows profile pages and tweets to be indexed, and Google does index them. Google is simply choosing to exclude them from their people & places feature so that they can favor their own Google+ profiles.

It's only the outbound links within tweets that Google is forbidden from indexing. That has nothing to do with why Google excludes content on twitter.com itself.


> That has nothing to do with why Google excludes content on twitter.com itself.

Um, they don't. That's not what this is about. You need to research the actual feature that's causing controversy, because it's not "Twitter doesn't show up in search", not even close.


The People & Pages feature. On Google search. Is excluding twitter.com content.

Example: http://parislemon.com/post/15682237911/twitter-keeps-right-o...


So what? That's not part of the standard search. Since when is Google not allowed to advertise its own services alongside standard search results? What's next, do you whine that Bing has links to MSN and Hotmail?

Twitter actually contaminates ordinary search results on its site with paid tweets, and you're mad Google puts some Google+ content off to one side? WTF?


Oh, it's definitely part of standard search now. Schmidt said Twitter content wasn't included because they didn't have "permission" to index it. That's false. They are purposefully excluding it -- a point of fact which I'm glad to see you acknowledge now.


No, it's not part of standard search. And since you seem to know everything that goes on between Twitter and Google, perhaps you could share with us the documents showing Google has the permission it needs to include Twitter results in this very not-standard-search feature?

Edit: By the way, are you a lawyer? If not, even given your apparent insider knowledge of Google and Twitter goings-on, how do you know your legal understanding is correct?


Sure, I'd be happy to. Here it is: http://twitter.com/robots.txt


> The answer for both rel=nofollow and robots.txt is that Google has explicitly promised webmasters that it will not do this.

No, not for a/@rel="nofollow", the only promise Google does is that it will not transfer pagerank credit to the link's destination[0].

This does not prevent them from using that link as an outbound in order to discover new content, nor does it prevent Google from reversing the link for the "Shared On" feature. Google specifically advocates[0] using a/@rel=nofollow for user-generated links in order to prevent rewarding spammers, Twitter's use of the attribute is not just sensible, it's necessary.

niyazpk's comment[1] looks far more sensible: "Shared On" is a specific feature for content producers and part of a special agreement and API access.

[0] http://googleblog.blogspot.com/2005/01/preventing-comment-sp...

[1] http://news.ycombinator.com/item?id=3456233


> the only promise Google does is that it will not transfer pagerank credit to the link's destination

Google is fairly clear about this:

http://support.google.com/webmasters/bin/answer.py?hl=en&...

"How does Google handle nofollowed links?

In general, we don't follow them. This means that Google does not transfer PageRank or anchor text across these links. Essentially, using nofollow causes us to drop the target links from our overall graph of the web. However, the target pages may still appear in our index if other sites link to them without using nofollow, or if the URLs are submitted to Google in a Sitemap. Also, it's important to note that other search engines may handle nofollow in slightly different ways."

So, if your tweet includes a link, they drop that link from the post as far as their graph is concerned. And that graph is what makes up their search.

It's not just page rank.


Interesting, so their implementation has broadly expanded upon the original meaning of the attribute (and the one standardized as a microformat), and in recommending this be broadly used to fight against spam they've paved the way for an interesting time for everybody else?


Broadly expanded? Standardized? Please.

I think the nofollow attributes intent was fairly straight forward from the beginning: to designate links that should not be accorded attention in search results. In 2005, this was merely via PageRank. Today, it's recommendations in search results.

Why would I want to see search results that shouldn't be given weight in a search result in a search result? Explain that to me.


Where in Google's blog post do they claim to still follow the link? They provide no technical details for their process, so if we're speculating, it would make far more sense that rel=nofollow adheres to the same guidelines as a meta/robots.txt nofollow.

And this speculation for the "shared on" functionality doesn't exactly add up, either. If they're not following links with rel=nofollow, why should Google be legally or even morally responsible for storing a massive and entirely extraneous relational database, and then checking against it every time they follow a short URL from another site, just to see if it was shared on Twitter at some point?


But, MG Siegler is correct, is he not, that google have chosen not to link to artists twitter pages alongside their G+ pages, and that this could easily been seen as anti-competitive.

e.g. https://www.google.com/search?&q=music


That's a new sidebar feature that goes where ads have traditionally gone. Google has decided to literally promote G+ over ads, which is a story in itself, but a totally different one.

It doesn't change the search results in the main column.


How is it a "totally different" story when the two changes are launched at the same time under the same name ("Search Plus Your World") and both underplay the dominant social networks?

They seem rather related parts of the same story.

You make a big deal about not "ascribing evil motives to Google" but I think it's pretty clear that Google is going all in on G+ and both aspects of Search+ that Danny Sullivan has taken issue with are directly related to this.

I don't care for the word "evil" because I think Google may be perfectly within their rights to do this but it's also very unlike how Google has historically operated. Inasmuch as that historical behavior personified "don't be evil", well it's hard to be shocked when people call this new, tough negotiating Google "evil".


Yep, you're right.

There are two aspects to the new G+ search features: indicating that a link was shared by one of your friends, and the sidebar that links relevant G+ profiles. The former is presumably limited by rel=nofollow, but the latter has no relation to rel=nofollow.

If you search for "WWE", Google doesn't show you the @wwe Twitter account in the sidebar. The fact that Google is told not to follow any outgoing links posted by the @wwe account isn't a valid explanation for that, especially when you consider that (a) Google definitely has twitter.com/wwe indexed, and (b) twitter.com/wwe almost certainly has a high PageRank due to incoming links.


Thank you. A lot of people are confusing these two aspects.

I'll add one more thing though: Schmidt actually claimed they didn't have "permission" to include Twitter and Facebook pages in the suggested profiles sidebar. That's just plain false.

Watch at about 1:10 in the video. http://marketingland.com/schmidt-google-not-favored-happy-to...


This guy completely misread MG Siegler. MG's point is rel=nofollow has nothing to do with the twitter profile pages that google is refusing to include in their people & places feature. Nothing to do with outbound links.

This enitre debate over rel=nofollow is a red herring.


I'm amazed really by how much FUD has be4en spread around by Twitter, and most reports of the story. This article clears it up perfectly.


It does not, a/@rel=nofollow has nothing to do with it, the only thing Google says it will do with these is break searchrank credit transference, and Google themselves recommends to set this attribute:

> anywhere that users can add links by themselves, including within comments, trackbacks, and referrer lists

http://googleblog.blogspot.com/2005/01/preventing-comment-sp...

In fact, Google created this attribute specifically and solely to combat comment/user spam.

If Google decided to expand the role of @rel="nofollow" as a gigantic "fuck you" to everybody that's a different issue, but the article still is not right.


You are repeating this misinformation. As I explained here: http://news.ycombinator.com/item?id=3456404 Google makes this clear in this answer: http://support.google.com/webmasters/bin/answer.py?hl=en&...


FWIW, bingbot basically DDOS'ed our site a few months ago by crawling links that were labeled nofollow/noindex in the hrefs. Adding a rule to robots.txt fixed the problem.


"The hubbub is centered around a complaint by Twitter that links shared on Twitter are not surfacing in the search results."

I think this mischaracterizes this complaint. I don't think anyone is complaining that the LINKS themselves aren't surfacing. That complaint would imply that their ought to be some influence between links on Twitter and the search rank of those linked pages. And I don't think anyone wants to see spammy tweets on Twitter altering skewing search ranking.

Rather the complaint is that recommendations from Twitter are not APPEARING in the search results. And to that point: Why is it difficult for Google to index the fact that someone linked to a site (whether it be rel=nowfollow or not)? Why does this information need to be coupled with the passing of page rank?


I don't think this is what they're complaining about. Twitter is saying why are they not linking to the @wwe Twitter account in the new social recommendations like the Google+ links to @wwe. They're not talking about links outbound of Twitter, rather links inbound to twitter, like the @wwe twitter or Facebook account pages.


What would happen if twitter removed rel="nofollow"? Couldn't that open the possibility for more spam on twitter? I tend to agree with Google here, but maybe when it comes to social recommendation, Twitter could be recognized as special and specific algorithms could be written to figure out the worth of a recommendation.


I can't imagine more spam on Twitter.

In all seriousness htough, you can probably analyze the Twitter "social graph" to weed out a lot of spam.


I don't think anyone wants to remove the rel=nofollow attribute. I think people are confused why you can't have a social recommendation appear on a serp even if it is a rel=nofollow.


Brilliant article. I was starting to doubt Google for a minute, but this clears it up concisely.


There seems to be some confusion about the different kinds of nofollow.

<a rel="nofollow"...> Means not to follow the link to which the nofollow belongs. It may still be indexed, however, if the page is found by a either an internal or external followed link.

The <meta name="robots" content="nofollow"> does not follow/crawl to any of the links on the page. These pages may be indexed, however if they are reached by either an internal or external followed link

To keep the page out of the index, "noindex" should be applied to the meta tag of the page that is to be removed from the index.

Further, a Robots.txt /disallow does not remove pages from the index. A noindex must be used on the page to remove it from the index. Or a request via Google's Webmaster Tools.

That said, it seems feasible that if Google wanted to, they could parse the external links from a Twitter stream without actually following them. This wouldn't necessitate Twitter removing nofollow from their links.


"if Google wanted to, they could parse the external links from a Twitter stream without actually following them"

The problem is that there are no external links in the tweets. All tweets point to t.co, which is run by Twitter. You can't find out the ultimate destination of the link without following it through Twitter's redirector.


Twitter does in fact implement rel=nofollow to external links on their site but this does not mean content (especially profiles) can not be indexed. Mentioning Twitter’s use of rel=nofollow is a definite red herring.


Individual tweets will be indexed, but google has no idea who's sharing what. That's the issue at hand. Google's changes to Search will promote links that your social connections have shared. If googlebot is ignoring links in tweets, at Twitter's request, then Twitter has taken themselves out of the Search plus Your World game.

Again, the issue is not that googlebot doesn't index tweets. The issue is that it doesn't index any of the links in those tweets. Thus Google has no way of displaying "@Someone tweeted this" in their Search plus Your World results.


Personally I think it's dumb for Twitter to nofollow it's outbound links. If I were them I would let the links be followed and then let Google come up with an algorithm on how to rank the links from Twitter.


Can someone please explain to me what nofollow has to do with Google indexing or displaying tweets in their serps. It's like saying Google discourages pages which may contain relevant/important information on the topic a user may be looking for because it contains nofollow links. What am i missing here?


What have outbound links to do with this at all? Many tweets can and do have zero links. I don't see how that's relevant to this conversation where the topic is about searching the tweets themselves, which has nothing whatsoever to do with outbound links from those tweets.


(I am new to this debate and so may be way off, but)

As far I understand, even this article is not very clear. It says:

Google is simply complying with Twitter.com’s directive to not follow outbound links in tweets it crawls, and the consequence is that there will never be ”… shared this on Twitter” in the search results.

Wrong.

You see, when I share some random link in my blog and then you search for that topic, Google will not say "niyazpk shared this in ...." in the search result. Why? Because Google probably considers shares from a few trusted sites/partners only.

Let us read Google's explanation again:

We are a bit surprised by Twitter’s comments about Search plus Your World, because they chose not to renew their agreement with us last summer (http://goo.gl/chKwi), and since then we have observed their rel=nofollow instructions.

And this quote from Google[1]:

Since October of 2009, we have had an agreement with Twitter to include their updates in our search results through a special feed, and that agreement expired on July 2. While we will not have access to this special feed from Twitter, information on Twitter that’s publicly available to our crawlers will still be searchable and discoverable on Google.

It is pretty clear what happened. Twitter did not renew the agreement with Google and Google stopped considering Twitter as a source for the "shared on" snippet. The "no-follow" attribute has nothing to do with except that it work exactly like it works for any other site.

[1] http://searchengineland.com/as-deal-with-twitter-expires-goo...


> new to this ... may be way off

You are indeed "way off".

The article is factually and technically correct, while your "Because Google probably considers" is speculation.

The "special" feed supported crawling efficiency, giving Google a real time firehouse of new tweets, instead of having to crawl Twitter as any visitor or spider would.

The "no follow" breaks the association between a tweeter and the shared content, exactly as the linked article states.

In Google's own words, "Essentially, using nofollow causes us to drop the target links from our overall graph of the web."




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: