Rupert Murdoch’s interest in erecting paywalls around NewsCorp content and removing it from Google’s index is in the news seemingly daily.  And today, it looks like Google made a play to keep the Wall Street Journal and other NewsCorp properties in the index.  Mashable reports on the two changes, and the second one has interesting fair use implications:

Publishers now have the option to tell Google’s spiders to only crawl and index the “preview pages.” This refers to pages that display the first few paragraphs of an article on subscription sites like in order to entice them to pay for a subscription. If a publisher chooses to have spiders crawl their articles in this manner, they will be labeled with “subscription” within Google News.

If you don’t know what fair use is, there are four factors that courts look at to see if an otherwise infringing use of one’s copyright is “fair” and therefore non-infringing:

  1. the purpose and character of your use
  2. the nature of the copyrighted work
  3. the amount and substantiality of the portion taken, and
  4. the effect of the use upon the potential market.

For more, this Stanford guide is a good start — indeed, the bullets above are a direct copy from there — but if you can handle some legalese, you are best off reading a case.  I recommend Warner Bros. Entertainment Inc. v. RDR Books (“RDR“, for short), for two reasons: (1) it’s about Harry Potter and (2) it shows that fair use is not at all intuitive.  The case doesn’t apply here but it’s easier to slough through than most cases.

Anyway: The fair use analysis.  I am going to skip the second factor because I think it’s basically irrelevant in this case.

Factor 1:  Is Google’s use transformative?

Yes, generally.  NewsCorp provides the content to report the news.  Google uses the content to provide a research tool.

While Google would almost certainly agree to not index NewsCorp content irrespective of the legal question, I wonder if NewsCorp could force the issue legally.  The answer: Probably not.  Crawling and indexing is transfomative: the search service (Google, Google News, etc.) is a research tool.  This article about litigation over Google’s Image Search explains more, and the analogy should hold.  In that case, Google argued that it provides the image thumbnails in order to allow it to act as an effective research tool.  The court noted that Google isn’t providing the full-size image, but rather, just enough for the searcher to see if they’re on the right track and click through.

If Google provided access to the full text instead of the synopsis, it seems that they’d be hard-pressed to claim they are providing a research tool  This factor would be in play and likely tilt toward NewsCorp.

Factor 3: Is the amount and substantiality of the content “too much”?

Let’s look at academic papers and their research tools; specifically, PubMed.  Check out this search result for “food protein induced enterocolitis syndrome” and you’ll see it’s a mere syllabus.  It’s substantial enough to meet the needs of the researcher, and the amount — well, the copyright holder wrote the passage so that researches can find the paper.   It’d be hard for the author to argue that using this is “too much,” but it’d also be hard for the research tool provider to argue that this is not enough.

The parallel to preview pages, written by the publisher (NewsCorp, for example) for the benefit of the researcher, is stark.  If NewsCorp provides these pages to Google (and the pages are indeed adequate to meet the researchers’ needs), Google would be very hard pressed to make a fair use argument for indexing the content behind the paywall.

Factor 4: What is the effect of Google’s use on the potential market for NewsCorp’s content?

Seems like an easy one. If Google indexes the preview page, the effect is minimal at worst. If Google indexes the article itself (and provides it for free), different story.  NewsCorp is drawing a line, albeit maybe one in the sand, saying that there’s a paid-for market for their complete content.  If Google crosses that line, the factor tilts into NewsCorp’s factor, quickly.

So, I think that at the end of the day, Google is doing what it has to do, not what it necessarily wants to.