MP3, Copyright, Piracy, Intellectual Property Issues

Copyright Issues, Copying and MP3 File-Sharing

This page discusses various copyright issues that I have examined over the years. I have not updated this page in several years but I now include some new material to cover that span. [Last updated March 2011]

Recent Events

My Background

Research on File-Sharing

Canadian Government Study of File-Sharing

What about the Oberholzer-Gee Strumpf “Harvard” (JPE) Paper?

The Future of Copyright?

Calculate the optimal copyright term

Other Sources of Information

Recent information about File-sharing [March 2011]

I have been negligent in keeping this material up to date.

There are several results from the last two years I want to cover. First, there is relatively new information about the Oberholzer-Gee/Strumpf paper and the Canadian study mentioned earlier (found below since this material begins with the most recent and goes backward in time). I have found a flaw in the main regression results of Oberholzer-Gee/Strumpf. My previous criticisms of their paper concerned my attempted replication of portions of their statistical analyses where I found results quite different from the ones they reported as well as an enumeration of what appeared to be a disquieting misrepresentation of factual matters by Oberholzer-Gee and Strumpf. The replications related to secondary ‘tests’ they conducting using publicly available data. A possible defense to my replication criticism, and one made by the Journal of Political Economy, was that even if their additional tests were wrong and their stated facts were wrong, these problems were not directly related to the main regression results. In other words, I did not have evidence indicating that their main regression analysis had errors. This appeared to be something of a Catch-22 since Oberholzer-Gee and Strumpf have not made their data available,seemingly making it impossible to show a flaw in those results.

Nevertheless, I have since found in their main regression results an internal inconsistency with any plausible view of the world. Additionally, the file-sharing data upon which they rely has wildly different patterns than the overall file-sharing market as measured by Big Champagne, referred to by Oberholzer-Gee and Strumpf as the “gold standard” of file-sharing measurements. This implies that their particular results, if there were no other problems, would not be representative of the entire market. Finally, I demonstrate that their key instrumental variable (the number of German kids on vacation) cannot have a measurable impact on American purchases of albums, invalidating their particular methodology. The paper making these points, which has been available for about a year, can be found here.

The 2009 literature overview by Oberholzer-Gee and Strumpf

Oberholzer-Gee/Strumpf have a newer paper (I refer to the May 15, 2009 Working Paper titled “File-sharing and Copyright”) that is basically an overview of the literature on file-sharing. This paper has one of the same attributes of their earlier paper: the use of misleading factual allegations unsupported by any references. Additionally, this paper mischaracterizations of the work of other researchers with the mischaracterizations being used to support the conclusions that Oberholzer-Gee and Strumpf would like to be true.

The most important error by Oberholzer-Gee and Strumpf, however, is in their survey of the literature. Oberholzer-Gee and Strumpf state:

The majority of studies finds that file sharing reduces sales, with estimated displacement rates ranging 3.5% for movies (Rob and Waldfogel, 2007) to rates as high as 30% for music (Zentner, 2006).¹⁸ A typical estimate is a displacement rate of about 20%. One implication of these results is that developments other than file sharing must have had a profound impact on sales. [¹⁸ An outlier is Liebowitz (2008) who reports a displacement rate of more than 100% for a selection of U.S. music markets.]

Oberholzer-Gee and Strumpf use the term ‘displacement’ in multiple inconsistent ways, in this single paragraph. The Rob and Waldfogel measure to which they refer is the actual percentage change on sales due to file-sharing, the Zentner measure to which they refer is the change in the probability of purchase due to file-sharing and the Liebowitz measure to which they refer is the share of the decline due to file-sharing.

The meaning of “Displacement” and the displacement values found in the literature.

Although Oberholzer-Gee and Strumpf never actually define what they mean by the term “displacement” their third sentence above, stating that file-sharing is a minor cause of the decline in sales since other factors have a “profound impact” on sales, provides guidance. A minor impact would be consistent with a displacement of 20% if “displacement” is the share of the decline due to file-sharing. Confirming this definition is their claim that my 2008 paper finds a displacement of over 100% since my paper does conclude that file-sharing has led to a decline that is more than 100% of the historical decline (meaning that sales would have increased if file-sharing had not intervened. Thus, this indicates that the 20% “displacement” that Oberholzer-Gee and Strumpf claim is the average of most studies means that only 20% of the decline that has occurred in record sales during the last decade is due to file-sharing. The problem with such a claim is that it is false, and by a very huge margin.

Take a simple example. Peitz and Waelbroeck (2004) found that file-sharing reduces record sales by 20% as of 2002, when their data ended. Oberholzer-Gee and Strumpf correctly report this result in their Table 5. But record sales (in the U.S.) had generally fallen by about 20% as of 2002. The Peitz and Waelbroeck result, therefore, assuming that Europe had a similar decline to the U.S., is consistent with the entire decline being due to file-sharing, or a “displacement” of 100%. Similarly, Zentner (2006) concludes that file-sharing had reduced sales by about 8% as of 2001 [Oberholzer-Gee and Strumpf incorrectly claim, in the above quote, that he found a 30% decline when he actually found a 30% decline in the probability of purchase by file-sharers, not a 30% decline in overall sales]. In the U.S. record sales had dropped by only 11% in 2001 from their 2009 peak, thus Zentner’s result is consistent with a finding that 70% of the overall decline was due to file-sharing (8/11).

Even Rob and Waldfogel’s (2006) result is consistent with 100% or more of the decline being due to file-sharing. Rob and Waldfogel specifically mention finding a “displacement” effect of 20% (although another result, based on instrumental variables, in which they have less statistical confidence, is four times as large). But they are very clear what they mean by displacement: they define it as the portion of each unlawful downloaded album that replaces the sale of a legitimate album. But this definition of displacement does not mean that 20% of the decline is explained by file-sharing. In 2003, when the Rob and Waldfogel data were developed, total sales in the U.S. had dropped by 26%. If there were an equivalent number of unlawfully downloaded files as there were lawfully sold albums, a 20% displacement would lead to a 20% decline in sales, which would be 77% of the total sales decline, or a displacement of 77% as defined by Oberholzer-Gee and Strumpf. But if there were twice as many unlawful files as legitimate music songs sold (and most estimates indicate that there were considerably more unlawful than lawful files, see Liebowitz 2006), then the Robb and Waldfogel result would imply a decline in legitimate sales of 40%, which is larger than the decline that had actually occurred, implying a displacement, used in the sense of Oberholzer-Gee and Strumpf, of well over 100%. Even the Leung paper cited by Oberholzer-Gee and Strumpf, which measures elasticities thus making it hard to map into any form of ‘displacement,’ finds elasticities that are about one third the size of those found by Rob and Waldfogel. Using a simple linear translation, this result would be compatible (using the Oberholzer-Gee and Strumpf definition of displacement) with displacement rates ranging from 33% to over 100% depending on how much larger the illicit market was relative to the legitimate market. In fact, the only study finding harm where displacement (defined as per Oberholzer-Gee and Strumpf) is as low as 20% is Hong (2007) although Hong’s other result is 100% as reported by Oberholzer-Gee and Strumpf in their Table 5. The other studies of which I am aware also find large displacements: Michel (~50%), Blackburn (~100%), Zentner-2009 (~100%).

The 3.5% “displacement rate” for video attributed to Rob and Waldfogel by Oberholzer-Gee and Strumpf actually measures the size of the decline in sales due to file-sharing. Rob and Waldfogel find that “[O]ur analysis indicates that unpaid consumption, which makes up 5.2 per cent of movie viewing in our sample, reduced paid consumption in our sample by 3.5 per cent.” If these results are extrapolated to a world where illicit movie viewing becomes as large or larger than legal movie viewing, as is the case for music, then a tremendous decline in legal movie viewing would be expected.

This means that Oberholzer-Gee and Strumpf are wrong to claim that the average ‘displacement’ is 20% for studies finding harm. The average value is much higher and is closer to 100% than to 20%. Oberholzer-Gee and Strumpf’s claim in their footnote 18 found above, to the effect that my 2008 result was an outlier, is false [note that their discussion of my paper clearly uses the definition of displacement that I attribute to them here]. The true outlier is the Oberholzer-Gee and Strumpf result of zero impact (along with the Andersen Frenz result which is discussed below).

Other incorrect claims about the literature.

Although Oberholzer-Gee and Strumpf admit that most studies find harm, they still wish to suggest that a finding of no harm is more appropriate:

While many studies find some displacement, an important group of papers reports that file-sharing does not hurt sales at all (Tanaka, 2004; Bhattacharjee et al., 2007; Oberholzer-Gee and Strumpf, 2007; Smith and Telang, 2008). [my emphasis]

It is useful to examine these “important” papers. Start with the second paper, Bhattacharjee et al. (2007). I wish to start with this paper because Oberholzer-Gee and Strumpf mischaracterize its conclusions. This paper does not claim that file-sharing fails to cause harm. This paper concludes that file-sharing harms chart survival for a majority of the albums on the charts. When I pointed this out at a conference, a reporter for the Chronicle of Higher Education, writing an article about the conference decided to check on my claim and spoke to Dr. Bhattacharjee. Here is the reporter’s description of what Dr. Bhattacharjee said:

“It is not correct to say that our work shows file sharing is unrelated to changes in sales,” said the Management Science paper’s lead author, Sudip Bhattacharjee, in an e-mail message to The Chronicle. “The paper did not look directly at sales, only at chart longevity, also known as chart survival.” And “we did report a decrease in survival over all,” said Mr. Bhattacharjee. Glenn (2010).

The author himself agrees that Oberholzer-Gee and Strumpf have mischaracterized his paper.

Another of the “important” papers that Oberholzer-Gee and Strumpf cite is Tanaka (2004). This paper was presented at a conference but is very clearly not a complete piece of research. The paper consists of 8 pages of double spaced text and some tables. Tanaka himself lists the paper as version “0.1”. I don’t believe any professional economist reading his paper would say that his paper is even close to being a finished paper. His conclusion begins “This research is very preliminary because we have not yet tried sufficient instrumental variables.” This is not just professional modesty. Since his main econometric technique is supposed to be instrumental variables, not having sufficient instrumental variables is a major problem. I have corresponded with Dr. Tanaka and he says he has no intention of finishing his paper. Nevertheless, Oberholzer-Gee and Strumpf claim this incomplete piece of research is an important paper.

The third “important” paper is the Oberholzer-Gee and Strumpf 2007 paper. Given the numerous problems in their papers that I have catalogued I doubt that an impartial researcher would wish to place very much confidence in its results.

Finally, we have the Smith and Telang paper. This is a very fine paper about the impact of television broadcasts of older movies on the sales of the DVDs of those movies. They find that television broadcasts increase the very small sales of these DVDs, presumably by reminding viewers of the existence of these movies. The paper does not attempt to address the overall impact of file-sharing on movie sales or theatrical releases. Instead, they attempted to discover whether the availability of pirated versions of those old movies had any impact on the increase in DVD sales after broadcast. They found that the pirated versions did not influence the sales of DVDs after broadcast on television. This is interesting but hardly a test on the overall impact of file-sharing on movies.

Therefore, the “important” group of papers that Oberholzer-Gee and Strumpf want readers to rely on does not support the claims made by Oberholzer-Gee and Strumpf.

Oberholzer-Gee and Strumpf also repeat a slightly weakened version of a false claim made in their 2007 paper that file-sharing falls in the summer, supposedly because students leave their high speed college Internet connections.

Figure 3 shows Big Champagne’s count of the monthly number of U.S. file-sharing users from mid-2002 through mid-2006...As with the earlier data on file sharing traffic, there is evidence of secular growth as well as reductions, or least a lack of growth, during summer months.

Oberholzer-Gee and Strumpf originally claimed a reduction in usage in the summer; now it is just a discontinuation of growth. As I pointed out in my original 2007 comment, the Big Champagne data show no pattern of any kind in the summer. Here is the chart:

Finally, Oberholzer-Gee and Strumpf present a rather strange diagram purporting to show how complementary markets allow creators of music to generate increasing revenues even in the face of piracy. They state:

Figure 7 shows spending on CDs, concerts and iPods. The decline in music sales – they fell by 15% from 1997 to 2007 – is the focus of much discussion. However, adding in concerts alone shows the industry has grown by 5% over this period. If we also consider the sale of iPods as a revenue stream, the industry is now 66% larger than in 1997.

Here is their Figure 7:

Here is their discussion:

The question to ask is thus whether the new technology has undermined the incentives to create, market, and distribute entertainment. Sales displacement is a necessary but not a sufficient condition for harm to occur. We also need to know whether income from complementary products offset the decline in income from copyrighted works…there is clear evidence that income from complements has risen in recent years. For example, concert sales have increased more than music sales have fallen. Similarly, a fraction of consumer electronics purchases and internet-related expenditures are due to file sharing.

The first misconception promulgated by the chart is the implication that iPods have a positive impact on creators of sound recordings. The income from iPods goes completely to Apple, not to creators of sound recordings. Including iPod sales in a discussion of revenues going to creators of music is simply wrong and very misleading.

The second error in Oberholzer-Gee and Strumpf’s analysis is their claim that concert sales have increased more than music sales has fallen. File-sharing did not begin until the advent of Napster, which was in 1999, which also happened to be the year of peak sales. Even on their chart, combined concert revenues and CD sales is higher in 1999 than in 2007. But there are two elementary errors in their analysis that cause the above chart to underestimated the decline in combined CD and concert revenues.

The main problem with their chart is the simple point that inflation has distorted the relative value of dollars in 1999 and 2007, which is why economist usually compare ‘real’ revenues and not nominal revenues. The amount of inflation between these years was approximately 25%, so this would imply a 25% decline in real revenues between these years even if the nominal amounts were the same in both years and even the Oberholzer-Gee and Strumpf chart shows that combined concert and CD revenues are higher in 1999 than in 2007. A second, somewhat less important error is their claim that their charts includes only CD revenues in the ‘recording revenue’. In fact, the revenue statistics used by Oberholzer-Gee and Strumpf include ringtone revenues and performance rights fees from Internet radio broadcasts, both of which existed in 2007 but not in 1999. Since these latter revenues are not due to the sales of CDs, nor are they enhanced by file-sharing, they should not be included in the analysis.

Here is a table which fixes these mistakes and also includes data from 2009. This table tells quite a different story than the one put forward by Oberholzer-Gee and Strumpf.

The Canadian Study

This study has been published in the Journal of Evolutionary Economics. The conclusions have been changed significantly. Whereas the previous version concluded that file-sharing had a strong positive impact on record sales in Canada (with each download leading to additional sales of .44 units) the new published version claims that file-sharing has essentially no impact on sales.

File-Sharing [discussed as of April 2008]

The weight of current evidence strongly supports a view that file-sharing diminishes the revenues of the recording industry. There are two forms of evidence.

The first has to do with general factors: the timing between record sales declines and file-sharing is very close; the current decline is very large compared to previous declines; there are no other explanations for the decline in record sales that hold up upon analysis; economic theory implies that record sales will fall. I have the lead article published in the April 2006 issue of the Journal of Law and Economics which provides a thorough discussion of the history and data used in analyzing file-sharing. Among other things it demonstrates the close linkage between file-sharing growth and record sales declines using half-year data. It also points out some heretofore unnoticed inconsistencies in these data. For example, what appears to be the best estimate of the number of audio files downloaded reports that files downloaded are generally less than one tenth the amounts of previous estimates. File-sharing appears to have hit record retailers less severely than it has hit record clubs, causing possible underestimates of harm by those looking at statistics from retailers. Finally, the claims that DVD sales have been responsible for the decline in CD sales (for those articles that provide any evidence at all) have been based on a statistic that provides a misleading picture of the DVD market. That issue of the Journal of Law and Economics has a symposium on file-sharing that anyone interested in the subject must read, including articles (linked below) by Rob and Waldfogel (Penn), Zentner (UTD) and a quartet of economists from the University of Connecticut.

The second form of evidence can be found in econometric studies of the industry. All the econometric studies, except one, find some degree of harm. I have written a paper (A version of which appears in Management Science) that examines record sales and Internet uses in 99 US cities to measure the impact of file-sharing. While I am partial to my own work, I believe this paper provides the strongest analysis to date of these issues. The methodology avoids many empirical difficulties found in other papers. It concludes that file-sharing is responsible for the entire decline in record sales that has occurred, and that except for file-sharing there would have been an increase in sales since 1999 instead of the strong decline. It also examines which genres had the greatest impact from file-sharing, and they are consistent with intuition (genres appealing to older individuals have the smallest sales decline, and vice-versa). As a by-product of the analysis I examine the impact of general Internet use on time spent with television and radio in order to prevent this Internet effect from contaminating my results. It turns out that Internet use decreases television viewing and radio listening, but the size of the effect in this period is not large (5-10%) and the impact on television is greater than the impact on radio.

All the papers that I have seen by other economists, except for one notable exception, find some degree of harm (to record producers) caused by file-sharing. These include papers by Blackburn, Hong, Michel, Peitz and Waelbroeck, Rob and Waldfogel and Zentner. The lone exception, but the most heavily publicized, is a paper by Oberholzer-Gee and Strumpf, which I believe is littered with errors and disingenuousness as discussed in greater detail below.

I have several articles that look at other papers in the economics literature on file-sharing. My first paper to examine the other articles written by economists about file sharing was published in CESifo Economic Studies in the Summer/Fall of 2005. It examines the (generally unpublished) state of the literature (theory and empirics) on file-sharing. This paper makes the point that ‘sampling’ (the exposure effect) is likely to decrease the sales of records and not increase them, as is normally assumed. It also examined the then current literature examining the empirical question of file-sharing’s impact. An expanded and updated version of this paper was published by MIT press as part of a book titled “Industrial Organization and the Digital Economy” which is edited by G. Illing and M. Peitz. This article, besides updating the literature survey, demonstrated that the concept of network effects had not been properly applied to the impacts of file-sharing since the alternative to file-sharing is likely to be radio listening by those unwilling to pay the purchase price of CDs, whereas the theory usually assumes that downloaders are new listeners who provide new network effects.

My initial study of file-sharing was published in Advances in the Study of Entrepreneurship, Innovation and Economic Growth. That paper looks at a 30 year history of record sales as well as changes in income, prices, sale of blank tapes, videogames, prerecorded cassettes DVDs, possible changes in the interest in music, and changes in the population. None of these other factors appeared capable of explaining any but a very small percentage of the recent decline in CD sales.

Latest Evidence on the Oberholzer-Gee/Strumpf File-sharing Paper

Elsewhere I describe in detail the problems that I have with this paper. There are several recent developments that are newsworthy, however.

My comment, written for the JPE, was submitted in September of 2007. While it was under submission, Dr. Norbert Haring, a reporter for, the German business magazine Handelsblatt, contacted me about the comment which he had read on the SSRN. Dr. Haring had previously written an article discussing the O/S paper. It was just the typical laudatory story about the two economists who had discovered the unusual result that file-sharing did not harm record sales. After reading my comment on the O/S paper he decided to reexamine the issue. He concluded, after speaking both to them and to me, that my critique was warranted, and that the O/S results should be considered suspect. An English translation of this article is here.

The comment was rejected by the JPE in June of 2008. The rejection included two referee reports, one very positive and one very negative. I suspected that the negative referee report, although written as if by a third party, was actually written by O/S. I corresponded with Dr. Haring, sending him the two referee reports and the letter from Steve Levitt, the editor responsible for the decision. I explained my suspicion to him and he was able to verify that the referee report that I received was essentially identical to a response to my comment written by O/S and sent to him. He also wrote, in English, a column very critical of the JPE editorial process, arguing, in essence, that the correct decision had not been made.

I then examined in detail the O/S response. O/S had only responded to 4 of the 9 points that I made in my comment. And their defense to these 4 points was extremely weak, full of the same types of errors, mushy thinking, and misdirection found in their original paper. I then wrote up a critique, which I labeled a “sequel” to my original comment. I included the referee reports and letter from Steve Levitt. This course of action was highly unusual and risked alienating the economists associated with the JPE.

I posted this sequel to the SSRN in July of 2008. Shortly after, a reported for the Chronicle of Higher Education wrote a story about the entire set of events.

A discussion of the Canadian Study released by Industry Canada authored by Birgitte Andersen and Marion Frenz (A/F).

I have had a chance to look more carefully at this study. My initial criticism was too harsh regarding the simultaneity bias because the authors (A/F) attempt to control it but since they do so only imperfectly some bias is likely to remain. However, the result, now that I have had a chance to inspect the data, is even more implausible than I originally thought. The authors find a result that is not only implausible but is actually is impossible to be true. There are other important problems, as well, such as the usage of only part of their data and some problems with the construction of the data and the regression specification.

The result that has attracted the most attention comes from this quote from the A/F report: “Among Canadians who engage in P2P file-sharing, our results suggest that for every 12 P2P downloaded songs, music purchases increase by 0.44 CDs.” Since there are 14 songs on a typical CD, this means that for each CD equivalent of song downloaded, sales of CDs would increase by one half of a CD.

The average number of files downloaded from peer-to-peer networks in their sample of downloaders is 30 (24 -- weighted values are in parentheses) files per month (the data are publicly available here). This is the equivalent (in terms of the number of tracks) of 26 (20) CDs per year. According to the A/F quote reported above, this would mean that the average downloader increases their purchase of CDs by 14 (10) units per year. Yet the same data indicate that downloaders only purchase an average of 9 (6) CDs per year. Thus, A/F’s reported result is actually impossible since downloaders cannot have increased their consumption by 14 (10) units and yet, after this increase, only consume 9 (6) CDs per year. Note that if true, this result would mean that if it were not for the positive impact of downloading, record sales (to the group who download music) would be zero. Who could possibly believe this? I have to wonder if the people at Industry Canada even understand this.

But it gets even worse. An increase of 14 (10) CDs by the 29% of the population that are downloaders (according to the survey) should have increased sales by about 4 (3) CDs per person for the entire country, yet current sales are only about 2 CDs per person. This implies that no one would buy CDs if it weren’t for the positive impact of file-sharing on record sales. [A/F could claim that actual CD sales are considerably lower than that reported in their survey. Although some might think that this indicates a problem with their data, it would allow A/F to claim that only half of their measured CD sales are due to file-sharing. In this best-case scenario (from the point of view of the implausibility of their result), CD sales, without the existence of file-sharing would fall to only half of their current level and not zero.]

The reason it is so hard to square the reported results with actual numbers is that the reported result is simply incorrect and the numbers in the survey are also incorrect.

A/F try to explain this glaring inconsistency by suggesting that their result holds only at a single point, since they assume that the relationship is only non-linear, meaning that the relationship will get stronger going in one direction and weaker going in the other. This claim is true but irrelevant. If A/F believe their stated conclusion, then they believe that this point estimate is representative of the overall impact of file-sharing. If they think this point estimate is not representative of their overall analysis then they need to put forward a result that is representative. You cannot deny the relevance of the point estimate when it leads to an embarrassing result but continue to use it to draw empirical conclusions about the impact of file-sharing on the entire market. You cannot have it both ways, as should be obvious.

As Kennedy’s econometrics text (referenced by A/F) points out on page 397, such anomalous results provide useful information to the analyst—such results imply that there is something amiss with the analysis.

To arrive at this bizarre conclusion the authors limit their sample to individuals who download music from peer-to-peer sites but exclude individuals who do not download. Limiting the sample in this way seems nonsensical. When we test the efficacy of a drug we compare those who take the drug with those who do not. If we limited our observations to only those users who take the drug we would be giving up our most useful and important information. It is possible that dosage differences across users might still provide some information about the overall impact of the drug, but the most important information is whether the drug, at any reasonable dosage, causes a change compared to no drug at all. It is also possible that dosage has very little change (e.g., aspirins beyond two have little impact) so that we would get no information at all by looking only at users receiving the drug.

Our interest is in the CD purchase behavior of consumers and the ‘treatment’ is peer-to-peer downloading. The best test for that is to compare the group that downloads with peer-to-peer against the group that does not download. In a numerical example I present below I make the simple assumption that all downloaders decrease their CD purchases by the same percentage, which is equivalent to saying the downloading ‘dosage’ is the same for all downloaders. Even though some consumers will download more than others, it need not reflect differential ‘dosages’ of downloading (if you have a higher fever 2 aspirins might lower your temperature by a larger amount than someone with a lower fever, but that doesn’t mean the dosage of aspirins is different). The approach used by A/F assumes that the number of files downloaded provides dosage information about downloading, but this need not be true at all. And even if it were true, there is less information than when the entire sample is used.

A/F have since responded that they do not have a controlled experiment, such as that above, and imply that somehow that changes the logic of the above example. It does not. If we were to examine the impact of tobacco smoking we would compare the smokers to the non-smokers even though it is not a controlled experiment. In fact, this is how the studies were done. It would be illogical to examine only smokers. So A/F still need to provide a cogent explanation for their decision.

If you are not convinced by the above paragraph, it is relatively easy to show why that the results that A/F present for the sample of downloaders is incorrect.

It is also the case that the OLS regression in Table 4.3, which does not assume a non-linear structure, reports a coefficient of 1.24, meaning that an additional download (per month) increases CD sales by 1.24 units. With 30 files downloaded per month this would imply that downloading increased CD sales by the more absurd value of 37 units per year. [On the other hand, it is not exactly clear how A/F are measuring downloads. They state on the second complete paragraph of page 19 that it is the number of peer-to-peer downloads in an average month, yet on page 54, appendix 3 they report on line 10 that the average value of peer-to-peer downloads is 3 with a maximum value of 6.22. The raw data show an average value of 30, with a maximum of 500 (yes, 500 files per month). The natural log of 500 is 6.21 and the natural log of 30 is 3.4, so it actually seems as if the variable they are using for number of peer-to-peer is in natural logs, but only they know for sure.]

I will posit some reasons why I think A/F are getting unreliable results. It comes down to several points: (a) Focusing on the entire sample is the appropriate way to test the hypothesis about file-sharing. Focusing only on the sample of file-sharers is a mistake; (b) The authors acknowledge that there is a likely simultaneity in their data and attempt to fix it, but the fix is likely to be incomplete; (c) the survey results are likely to be biased toward a positive finding and appear to be suspect in other ways.

Example demonstrating simultaneity and the problem with focusing on just filesharers

A simple example is probably the best way to show the problems that occur from analyzing only the filesharers. Assume there are three types of individuals: those who love music, those who like music, and those who like it a little and are barely above indifference. Assume the following table gives the number of albums that members of each group purchase before file-sharing becomes available.

groups	albums purchased
music lovers	6
music likers	4
music indifferents	2

We next assume that file-sharing is invented and half the individuals in each group engage in file-sharing. We also assume that file-sharing causes individuals who participate to reduce their purchase of albums by 50%, which would provide downloads of 3,2, and 1 for the three groups in order of how much they like music. We need to throw a little noise into the result (to keep our computers from breaking when running regression and controlling for simultaneity) which we shall do by adding, subtracting and adding .1 to the numbers in the file-sharing download column :

groups	non-sharers purchase of albums	filesharers purchase of albums	filesharers downloads of albums
music lovers	6	3	3.1
music likers	4	2	1.9
music indifferents	2	1	1.1

Note to start that the true impact of file-sharing is -1, meaning that each downloaded file reduces sales by one unit.

If you compare the last two columns to one another you find that each additional download appears to increase sales of albums by one unit. If you run a regression of file-sharers albums purchases on file-sharers albums downloaded you get a coefficient of .989, implying that each downloaded album is related to an increase in the purchase of an album. Yet, as we know, the correct result is -1 since each download actually reduces album sales by one unit. The unreasonably high coefficient is due to the fact that the results are entirely determined by the simultaneity of degree of music interest and music consumption, whether purchased or downloaded. Thus it can be very misleading to use only the sample of filesharers to draw conclusions about the impact of file-sharing.

When both file-sharers and non file-sharers are included we get the following data:

sales	downloads
6	0
4	0
2	0
3	3.1
2	1.9
1	1.1

If you run a regression using downloads to explain sales you get a coefficient of -.498, which is closer to the correct value of -1, but still to positive. That is because it is still infected by the simultaneity problem.

A fix for simultaneity?

The example above provides a quick understanding of the simultaneity problem. A/F are aware of this problem in general (see their first and second complete paragraph on page 25). They suggest that it is often difficult to fix such a problem, which is true. They then attempt to fix the problem by including a variable that is supposed to measure interest in music which jointly impacts both purchases and downloads (I erroneously suggested they did nothing to fixed this issue in my earlier post). The variable on music interest asks users to state which of five categories best conveys their music interest (very strong, somewhat strong, moderate, somewhat limited, very limited) best describes their interest. Different people will ascribe different meanings to these terms and we cannot expect this measure to be any but a very imperfect indicator of music interest. Variables based on categories are less informative than variables that actually measure the strength of the factor of interest. For example, income is more accurately measured by the number of dollars, or even categories based on dollars, than it would be by asking individuals to put themselves into ill defined categories such as “very rich”, “rich” “poor” and so forth. Nevertheless, there is no metric on ‘music interest’ that would allow for constructing a better variable measuring interest. The limitations of the variable use by A/F—the fact that it is ordinal and not cardinal (meaning that the size of the difference between categories such as very strong and somewhat strong is unclear)—needs to be acknowledged. And this limitation means that the simultaneity will only be, at best, partially fixed. The less precise the measure of music interest the more likely the simultaneity issue will still impact the results. When included in their regressions this variable is significant.

Now what happens when a correction for simultaneity is applied? If the correction were perfect and applied to the entire sample of both file-sharers and non-file-sharers the raw numbers would be as found in the following table:

sales	downloads	music liking
6	0	3
4	0	2
2	0	1
3	3.1	3
2	1.9	2
1	1.1	1

The coefficient from this regression is -.98 which is just about perfect as an estimate of the true impact of file-sharing.

What happens with a less than perfect measure of simultaneity? We can keep the order correct but assume that the number for music liking is 7 for the highest group, 2 for the middle group and 1.6 for the lowest group, instead of the actual 3,2,1. The imperfect measure of music liking has a positive correlation of .9 with the true values, so it would appear to be a very good proxy, almost certainly better than the one created by A/F. We run a regression on the following data:

sales	downloads	music liking
6	0	7
4	0	2
2	0	1.6
3	3.1	7
2	1.9	2
1	1.1	1.6

The coefficient result from the above data is -.91 which is not as good as the -.98 above but is not too far off.

Now run the same regressions for only the file-sharers (downloaders). To save space I am not going to show the data since it just comes from the bottom 3 rows of the above tables.

First, we use the perfect proxy for music liking. The regression coefficient for downloads is 0, when the true value is -1. This shows that using the sample of downloaders provides a very the wrong result even with a perfect proxy for music liking. The coefficient is much too close to positive.

Using the imperfect proxy for music liking gives us a coefficient of 1.3, which is the wrong sign and much too large.

These are just simple numerical examples. They illustrate problems with using only downloaders instead of the entire sample, and they illustrate the problem of simultaneity.

The bias and accuracy of the survey

All the data in the study are based on a survey of Canadians. This survey took place after the debate about the impact of file-sharing had attracted considerable attention, and after the lawsuits against file-sharers in the US had begun. One needs merely sample the bile against the RIAA lawsuits found in numerous blogs, magazines and even newspapers to gauge the anger over attempts to prevent file-sharing. These are the same sites heaping praise on any study that appears to support a finding that file-sharing is not harmful. Those supporting file-sharing clearly prefer claims that file-sharing does not have a negative impact on sales. A large portion of A/F’s entire sample engages in file-sharing (29%) and everyone in their main sample engages in downloading.

This could be a serious problem.

People taking surveys will have their own opinions. Downloaders know what they should say if they want to make it seem that downloading does not harm sales—they would say that they purchase many more albums than they do while also reporting on their downloading activities. To the extent that survey takers lie in this manner, the results will be biased in favor of a positive impact of file-sharing. This bias, again, is stronger for the sample of based entirely on downloaders. As evidence that survey takers do lie about downloading we know that after the lawsuits started American survey takers reported a much larger decrease in file-sharing activity than actually occurred (according to other measurement techniques).

There are other problems as well. A/F report that, according to the survey, the average Canadian purchased 8.3 albums in 2005. How consistent are these self-reported statistics compared to actual industry sales? The Canadian Recording Industry Association reports that in total 54 million albums were sold in 2005. If we divide this value by the number of Canadians represented by the survey (children were excluded since they are not normally purchasers of albums) we arrive at an average number of albums of 2.2 albums per person, which is similar to the average value in the US. This means that the claimed sales of CDs are off by a factor of 270%, which is a very large deviation. Clearly, respondents were not giving accurate answers to this question.

Other issues

A/F seem to fall into a common error when discussing their results. When a result is not statistically significant that does mean that the result is the same as zero. The result is whatever the coefficient says it is, and some interpretation about the size and economic importance of the coefficient is required. If it is not statistically significant it merely means that we do not have enough confidence to reject the possibility that the true relationship might be zero at some given, and arbitrary, level, such as 95%. The best estimate of the impact is still the coefficient. We just have less confidence in the estimate than we might like. How much less depends on the p-value, not on a binary decision based on whether the t-statistic is above a given level.

A/F could easily construct a single income variable based on the midpoints of the categories that they used and taking the value for the highest group to be some average value for the population above that income level. This single variable would be more useful to interpret, and is more likely to show up as being significant. The current income variables merely show whether the included income groups are different from the left out income group (the lowest income level) which is a different thing than being different from zero. This is of some importance since income should be positively related to CDs sales and the fact that it is not should be troubling (A/F are incorrect when they claim that I did not find a positive significant relationship between income and CD sales).

Illicitly acquired music is measured both by the peer-to-peer variable as well as the “Copy MP3s from friends” variable. The only difference is that peer-to-peer is acquired from strangers. A/F should conduct a joint significance test on both of these coefficients. The ripping question doesn’t imply that the CDs being ripped were not purchased so it cannot be taken to mean the same thing (I believe it was a mistake not to have asked the ripping question only for borrowed CDs).

A/F should focus more on the OLS estimates since the justification for the other specifications based on count data are fairly weak and the non-linear specifications are difficult to interpret and have their own weaknesses. The number of observations for which CD purchases is zero is not that large (~20%) and the rest of the observations are reasonably spread out. A/F might want to check the percentage of observations with predicted values that are negative before deciding to use count data techniques.

It is important to make sure that the ‘downloaders’ who downloaded no tracks in an average month are not impacting the results. Just because Decima decides that these respondents appear to be more like downloaders does not seem compelling. It would be better to run the regressions both including these individuals and then excluding them to make sure they don’t have an important impact on the results. If they do then the issue needs to be addressed more extensively.

It seems fairly pointless to examine the impact of file-sharing on downloaded tracks from sites like iTunes since such downloads made up only about 5% of the market in 2005.

Everyone faces just about the same retail prices. There is no point in including a price variable in these regressions.

The authors should conduct some test for influential observations. If they are using Stata they could try the RREG routine.

What about the ‘Harvard’ study?

The Oberholzer-Gee/Strumpf paper claims to demonstrate that downloading is having virtually no negative impacts on record sales. This paper has been published in the Journal of Political Economy, one of the top journals in the profession.

I have written a lengthy critique that is available on the SSRN http://ssrn.com/abstract=1014399. Unfortunately, the paper by Oberholzer-Gee and Strumpf contains numerous factual errors, poorly performed empirical tests, and errors in logic. For example, the authors claim that record sales have been rising or flat in several important markets, that the sales decline in the US might be due to a single company having problems with a merger, that the decline is due in large part to inventory issues and to increased expenditures on other activities such as telephones. All these factual claims are, according to the data I have examined, false. The authors also run four tests to check whether their main results are consistent with other empirical relationships. I have tried to replicate each of these tests. In each case I come to a conclusion very different from that of the authors. In some cases I appear to be running the same regressions on the same data but I find very different regression coefficients; in other instances I find that the authors have failed to examine their data carefully enough and that when properly examined the results are the opposite of those reported in the JPE. My critique, linked above, provides data and or links to data so that the reader can check for themselves the claims I make. I find the very large number of inaccuracies in the Oberholzer-Gee/Strumpf paper to be particularly troubling.

I have had problems with their overall methodology from the very beginning. One fundamental problem with their methodology was their use of sound recordings as the unit of analysis. This opens up a potential danger since what might be true for individual records might not hold for the industry as a whole, a problem more generally known as the “fallacy of composition”. The sampling impact that they rely on for possibly providing a positive impact of file-sharing (to offset the clear negative impact from substituting downloaded files for the purchase of files) is closely related to advertising. Advertising tends to have strong fallacy of composition issues because much of advertising’s impact is product stealing from other brands, not market enhancement. For this reason using the brand as a unit of analysis will give different results than for the market as a whole since the market stealing impacts do not exist for the industry as a whole.

I also had concerns with their use of instruments (for an early discussion click here). In particular, I thought their use of the number of German schoolkids on vacation made little sense. In my current critique I point out that they make contradictory assumptions about the impact of vacations on the amount of file-sharing in different parts of their paper. They ignore college students in Germany when explaining their logic for using this variable (they assume that vacations increase file-sharing) whereas in a test they perform on the US they ignore high school kids and make the assumption that vacations decrease the amount of file-sharing.

I believe that one symptom of the problems with their analysis was the initial result they found. The results of their March 2004 paper actually indicated that file-sharing has a substantial positive impact on record sales. This last result came from the fact that when they divided their sample into quartiles, the best selling albums had fairly large positive coefficients. Since the best selling albums are responsible for most of the industry sales, the result for the top quartile would be effectively the results for the industry. Their newer versions of this paper remove, without comment or explanation, all evidence based on their quartile analysis. They replace their quartile analysis (showing a positive impact of file-sharing on record sales) with what in my opinion is a clearly inferior methodology that just happens to provide a result that was more consistent with their previous description (a zero impact on sales). Instead of using actual record sales to segregate album performance, they use a predictor of success based on prior success of the artist. Since they do not even mention that they have made a change, they obviously do not provide any explanation for why they might prefer what appears to be an inferior technique. I think this initial result was a sign that there were problems with the analysis and it is too bad that Oberholzer-Gee/Strumpf did not treat it as such.

My Background on Copyright and Copying Issues

I am an economist who has worked on copyright topics since 1979. I conducted an examination for the Canadian government (I was teaching in Canada at the time) on the impact of photocopying and impact on television broadcasters of the retransmission of television signals by cable operators. I concluded that photocopying, the copying activity of that moment did not harm publishers and that cable retransmission of television signals did not harm television broadcasters. My papers based on those reports were published in the two leading economics journals, the Journal of Political Economy and the American Economic Review. I also wrote a paper discussing the likely impact of videocassette recorders (Betamax) and concluded that no harm to television broadcasters was likely and that the Betamax decision was the right decision, even if the reasoning was incorrect.

I appear to be the first economist to suggest that illicit copying might actually benefit copyright owners. In each of these cases I concluded that copyright owners were not being harmed by the new technologies. I suggested that technologies could have some positive impacts, such as the exposure effect, where consumers would become familiar with a product that they would eventually purchase (e.g., converting a pirated piece of software into a purchased version to keep up to date and for the product support) or indirect appropriability, where producers can raise the price of originals because consumers value the ability to make copies. Indirect appropriability only would work under certain conditions and photocopying was one of the cases where it seemed to provide publishers with an increase in revenues. Here is an offspring of the photocopying report that appeared in the 1985 issue of the Journal of Political Economy. The theory behind indirect appropriability was published as an article in the American Economic Review but in the context of durability. This theory was then picked up and extended by Stan Besen and various coauthors.

Needless to say, I was not very popular with organizations representing copyright holders. But in those days there was not an army of copyright critics to embrace my work and make me a hero, as there is now.

By the late 1980s interest in copying had waned and I worked on other topics, particularly the economics of standards. But then file-sharing came along early in the 21^st century. In the fall of 2001 I wrote an analysis for the Cato Institute in which I predicted, based on my prior theories about copying, that the impact of file-sharing would be strongly negative, unlike prior copying technologies, such as photocopying. One major reason for this was the anonymity of file-sharing, which removed the value received by the copier from the value placed on the original by the person making the original available. The other reason was the high variability in the number of copies made from each original. I also reviewed the evidence in the Napster case and concluded that the quality of the evidence used to shut Napster down was very poor and that although the Court had reached a correct decision it had used faulty analysis, as was the case in Betamax.

Changes in my views on file-sharing?

I then had the first of two interviews with Salon magazine. The first interview that Salon had with me about the impact of MP3 downloads was in June 2002. The reporter expected me to state that file-sharing was harmful to the record companies, based on my Cato analysis. But earlier in the Spring of 2002, after reading that CD writers had reached a higher market penetration than I had thought, I gave the keynote speech/paper at the first meetings of the Society for Economic Research on Copyright Issues (www.serci.org) in which I suggested that perhaps the negative impacts that my theoretical analysis (Cato) had predicted might not be showing up in the data and if a decline in record sales didn’t start happening very soon that I might need to rethink the theory. The news media was reporting a decline in record sales, but nothing very large, and I had not yet studied record sales in any detail. I told the interviewer, who expected me to be critical of file-sharing, that perhaps file-sharing might not be as harmful to record sales as I had originally suggested since we should have been seeing the beginning of a large decline in record sales. The interview was published and I became something of a hero to the numerous and vocal anti RIAA forces (e.g., see the Financial Times article by Larry Lessig and my reply).

I enjoyed the acclaim while it lasted. Over that summer, however, I gathered data on record sales to see what was actually happening and by the late Summer of 2002 I had determined that record sales had indeed begun to suffer a decline in sales consistent with the claim that file-sharing was causing harm. That is when I had a second interview with Salon in which I reported that file-sharing appeared to be harming record sales. Since then the evidence has gotten stronger and stronger.

Other Copyright Topics

I have a paper that takes a more general view of how economists think about copying in general, which is part of a symposium examining the twentieth anniversary of my suggestion that copying might benefit copyright owners. I suggest that economists have become infatuated with the possibility that copying might benefit copyright owners, and that they do not understand how rare this is likely to be.

I have also been interested in the optimal length and breadth of copyright, and the nature of economic forces behind concepts such as fair use and droit de suite (resale rights). A paper about the economic tradeoffs involved in copyright appeared in the 1986 issue of Research in Law and Economics. I published an article in Contemporary Policy Issues about the difficult for economists to explain behavior of copyright owners with respect to droit de suite, performing rights societies, and the old television financial and syndications rules.

You might be interested to see the Amicus brief filed by famous economists in the Eldred case which discusses, quite imperfectly, the economic tradeoffs involved in copyright law. Steve Margolis and I have written a critique of this brief that appears in the Spring 2005 issue of the Harvard Journal of Law and Technology.

I have also written a number of papers examining the impact of radio on the sales of records. It has usually been assumed that radio has a positive impact on record sales but I do not find that to be the case. Apparently radio is more of a substitute than it is a complement.

You can find my vita here.

I currently am involved with several academic organizations interested in these topics. I am currently Director of the Center for the Economic Analysis of Property Rights and Innovation which supports research on intellectual property issues. I am on the editorial board of the Review of Economic Research on Copyright Issues and was on the Board for the short-lived journal set up by Larry Lessig named Copyright. I also am the president of the Society for Research on Copyright Issues, and on the advisory board of the Intellectual Property Institute, the Media Institute, and the Center for the Study of Digital Property.

The Future of Copyright: Changes in Copyright Regimes?

There have been recent proposals that the MP3 downloading problems can best be solved by replacing the current copyright system with a Compulsory License system where MP3 files can be downloaded with impunity and revenues are raised by taxes on ancillary products. I wrote a paper that critically examines these proposals for a conference hosted by the Progress and Freedom foundation in the spring of 2003.

These proposals have been most thoroughly explicated in a book by William Fisher titled Promises to Keep”. I have recently finished writing a review of that book. I highly recommend the book to anyone interested in the subject, although I disagree with his conclusions.

Play God with Copyright Length

I have constructed a spreadsheet that lets the user determine the optimal copyright length by changing any of several factors. Although it is not completely general, it is fun to play around with.

Here are some links of interest:

Academic Groups

Society for Economic Research on Copyright Issues

Review of Economic Research on Copyright Issues

Intellectual Property Institute

Pho: Interesting Discussion Group

Advocacy and Copyright Groups

Copyleft and free software

Copy Protection Technology Working Group (CPTWG)

Electronic Frontier Foundation

Motion Picture Association of America

Recording Industry Association of America