Archive for December, 2009
Introducing BitMe.Me
Posted by Dan Lewis in Uncategorized on December 28th, 2009
On December 17th — eleven days ago — I emailed my friend Ashish over at Setfive Consulting with a pretty simple idea: a topical meme tracker, using pre-selected RSS feeds to limit the scope of acceptable articles and bit.ly to weight and further filter the wheat from the chaff. Specifically, as a die-hard New York Mets fan who has a second identity on Twitter, @metstweets, I wanted to be able to tweet out “hot” Mets stories. I considered using TweetMeme, but I really didn’t want some generic article from an out-of-market newspaper, something non-English, spam, or even worse, something entirely unrelated to the “Mets” I’m talking about.
(Also, I wanted to test a model for a startup: a reverse-incubator. But that’s a story for another day — probably later this week — so if you’re interested, subscribe to my RSS feed.)
Limiting the universe of potential sources seemed like the right starting point. So I quickly wrote it up and sent it to Ashish to see if it was a reasonable build. The end result is BitMe.Me, and you’ll note that the front page has nothing to do with the Mets. The tool itself applies to any vertical, which is why I liked the idea so much, and can be adapted incredibly quickly. When you see how it works, you’ll see why.
We started with my target vertical — the Mets. I read enough Mets content to know that there are fewer than 100 key sources, and while I’m sure there are diamonds out there in other sites, I am willing to forgo them in my meme tracker. Why? Because as the volume of content expands out from beyond the core group, the signal to noise ratio decreases. Also, because the number of sources is relatively small and, in any event, finite, this allowed us to build an index without having to worry about spidering, discovery, etc.
Ashish instinctively expanded it to bigger verticals — technology, sports, gossip, politics, and news — and that’s what you see on the site right now. Yes, there’s a Mets section as well, but that’s not the focus of the site. (And we’re working on a problem which causes MetsBlog content to not register correctly.)
Pretty straight forward. Check it out and let me know what you think in the comments, or by shooting me an email.
Google and NewsCorp Do the Fair Use Hokey Pokey
Posted by Dan Lewis in Uncategorized on December 2nd, 2009
Rupert Murdoch’s interest in erecting paywalls around NewsCorp content and removing it from Google’s index is in the news seemingly daily. And today, it looks like Google made a play to keep the Wall Street Journal and other NewsCorp properties in the index. Mashable reports on the two changes, and the second one has interesting fair use implications:
Publishers now have the option to tell Google’s spiders to only crawl and index the “preview pages.” This refers to pages that display the first few paragraphs of an article on subscription sites like WSJ.com in order to entice them to pay for a subscription. If a publisher chooses to have spiders crawl their articles in this manner, they will be labeled with “subscription” within Google News.
If you don’t know what fair use is, there are four factors that courts look at to see if an otherwise infringing use of one’s copyright is “fair” and therefore non-infringing:
- the purpose and character of your use
- the nature of the copyrighted work
- the amount and substantiality of the portion taken, and
- the effect of the use upon the potential market.
For more, this Stanford guide is a good start — indeed, the bullets above are a direct copy from there — but if you can handle some legalese, you are best off reading a case. I recommend Warner Bros. Entertainment Inc. v. RDR Books (“RDR“, for short), for two reasons: (1) it’s about Harry Potter and (2) it shows that fair use is not at all intuitive. The case doesn’t apply here but it’s easier to slough through than most cases.
Anyway: The fair use analysis. I am going to skip the second factor because I think it’s basically irrelevant in this case.
Factor 1: Is Google’s use transformative?
Yes, generally. NewsCorp provides the content to report the news. Google uses the content to provide a research tool.
While Google would almost certainly agree to not index NewsCorp content irrespective of the legal question, I wonder if NewsCorp could force the issue legally. The answer: Probably not. Crawling and indexing is transfomative: the search service (Google, Google News, etc.) is a research tool. This article about litigation over Google’s Image Search explains more, and the analogy should hold. In that case, Google argued that it provides the image thumbnails in order to allow it to act as an effective research tool. The court noted that Google isn’t providing the full-size image, but rather, just enough for the searcher to see if they’re on the right track and click through.
If Google provided access to the full text instead of the synopsis, it seems that they’d be hard-pressed to claim they are providing a research tool This factor would be in play and likely tilt toward NewsCorp.
Factor 3: Is the amount and substantiality of the content “too much”?
Let’s look at academic papers and their research tools; specifically, PubMed. Check out this search result for “food protein induced enterocolitis syndrome” and you’ll see it’s a mere syllabus. It’s substantial enough to meet the needs of the researcher, and the amount — well, the copyright holder wrote the passage so that researches can find the paper. It’d be hard for the author to argue that using this is “too much,” but it’d also be hard for the research tool provider to argue that this is not enough.
The parallel to preview pages, written by the publisher (NewsCorp, for example) for the benefit of the researcher, is stark. If NewsCorp provides these pages to Google (and the pages are indeed adequate to meet the researchers’ needs), Google would be very hard pressed to make a fair use argument for indexing the content behind the paywall.
Factor 4: What is the effect of Google’s use on the potential market for NewsCorp’s content?
Seems like an easy one. If Google indexes the preview page, the effect is minimal at worst. If Google indexes the article itself (and provides it for free), different story. NewsCorp is drawing a line, albeit maybe one in the sand, saying that there’s a paid-for market for their complete content. If Google crosses that line, the factor tilts into NewsCorp’s factor, quickly.
So, I think that at the end of the day, Google is doing what it has to do, not what it necessarily wants to.
What Does it Mean to “Buy” an E-book?
Posted by Dan Lewis in Uncategorized on December 1st, 2009
The discussion around my post yesterday also entailed this comment over on another blog, and got me thinking further about what one actually purchases when one buys and e-book. The same, of course, applies to mp3s and any other property which can be reproduced by a third party at very low (often no) marginal costs.
There is a disconnect of language here, probably a side effect of legacy businesses working with their legal teams to try and grab control while the consumer base, disorganized as it naturally is, is expected if not forced to make its arguments with the rubric set by the producers. In other words, “buying” an e-book is different than “buying” a book, even though from the consumer’s standpoint, it shouldn’t be.
Let’s start with tangible books — you know, the dead-tree versions that sit on shelves. For those, there’s a pretty clear bundle of rights and lexicon, which I articulated in the comment linked-to above:
If I buy (“own”) a book, I expect to be able to do things such as re-sell, loan, rent, gift it. If I rent or borrow (“posses”) a book, I don’t, but expect to be able to do things like take it with me on a trip. If I am in your house and flip through (“access”) a book, you being a mensch aside, I probably can’t just walk out the door with it.
That’s not complete, of course, but it’s intuitive. For centuries, we have culturally understood ownership to mean something absolute, constituting exclusivity and control. If I own it, you don’t. You can’t tell me what to do with it, what not to do with it, etc. Obviously, there are going to be some limitations on my ownership when my ownership rights conflict with something in your bundle of rights in something. But the meaning of the term is pretty clear.
What is also typically clear is how one gets to own something. In most cases, you either buy it or receive it as a gift/inheritance. Sure, there are other situations, but even then the transfer of ownership is most often clear, clean, simple. Just over five years ago, two friends were over my apartment. As one left, he noticed a DVD on my bookcase, and asked if he could borrow it. I of course said yes, and he left. My other friend commented, immediately after, that I would never be getting the DVD back. We both knew he was right (and he still is). What lawyers may deem theft by conversion I instead saw as acceptable if not annoying. But in any event, it was clear — by any definition other than a hard legal one, my friend now owned that DVD.
Indeed, the biggest virtue of “ownership” is that it’s simple. Everyone — even a toddler (“Mine!”) knows what it means.
E-books, specifically, and digital media, generally, muddle that up. Right now, when you buy an e-book on your Kindle, you most definitely do not own that “book” in the typical meaning of the word “own”. You cannot, lawfully and/or technologically, use it in the way(s) which you would have been able to use the paperback version of the same content: you can’t lend it to a friend, donate it to a library, re-sell it, etc. Your rights are clearly delineated, I’ll bet, in the licensing agreement you entered into, but as the consumer, that’s not the bargain you expected. What is expected is the simple, common language: I bought it, therefore I own it.
The troubling part about all this is that the time-tested concept of ownership is clear, yet we have some odd expectation that it will yield to granularity in licensing. You can buy DRM music in the iTunes store or non-DRM. Amazon uses DRM-free content as a sales point. On the other side of the aisle, Creative Commons has six different licenses, some of which use ther term “non-commercial.” What does non-commercial mean? Answering that required a year long study (in which I participated) to yield a 255 page (!) .pdf to “define non-commercial”.
It’s all way, way too complicated. We need to find a way to keep it simple.
