Twenty years ago today, this email made the rounds among the dozens of engineers and researchers at the Fraunhofer Institute for Integrated Circuits IIS in Germany:
Date: Fri, 14 Jul 1995 12:29:49 +0200 Subject: Layer3 file extension: .mp3 Hi all, this is the overwhelming result of our poll:
everyone voted for .mp3 as extension for ISO MPEG Audio Layer 3!
As a consequence, everyone please mind that for WWW pages, shareware, demos, and so on, the .bit extension is not to be used anymore. There is a reason for that, believe me
🙂 Jürgen Zeller
It’s probably a bit silly to assign a single “birthday” to a technology that was developed over the better part of a decade, and built upon many decades more of research and development. But the Fraunhofer institute likes to cite July 14th as the day when the ISO standard IS 11172-3 “MPEG Audio Layer 3” which formerly had the file extension “.bit,” gained the extension that has become world famous: “.mp3”.
This is the story of MP3, the technology that (revolutionized? upended? destroyed? transformed?) changed music forever.
In The Beginning…
The story of the MP3 begins not in Silicon Valley, or Hollywood, or Japan or any of the other likely birthplaces of audio and digital technology, but in the Middle Franconian Bavarian town of Erlangen, Germany.
In the early 1980s, Karlheinz Brandenburg was a doctoral student at the University of Erlangen-Nuremberg, studying electrical engineering and mathematics and the areas where those disciplines intersect. This area of Germany is known as a hotbed for scientific, industrial and academic innovation. The Fraunhofer Society for the advancement of applied research is based nearby, as is the Max Planck Institute for the Science of Light and numerous branches of Siemens AG research and development facilities. For a student with polymath interests like Brandenburg, there was no better place to be.
“Really, doing both these degrees had to do with not being able to decide [what my focus would be],” says Brandenburg. “So, I ended up starting electrical engineering, and at the same time, I was always interested in mathematics (…) And we got a lot of topics in computer science as well. So it was really the idea to have a broad range of possibilities in the future.”
Brandenburg’s thesis advisor was a man named Dieter Seitzer, who had done pioneering work in the obscure discipline of “psychoacoustics,” which is the study of the way humans perceive sound. It turns out that the human auditory system is not an instrument that scoops up all the frequencies in a given environment, like a microphone does. What we “hear” is not an accurate representation of reality, but only those sounds that the brain, over the course of years of evolution, has determined to be the “most important” sounds. Once you understand this, you can understand the ways that human hearing can be manipulated. For example, most people can distinguish between two simultaneous tones that are a half note apart on the diatonic scale. But if the tones are brought closer together in pitch, humans will hear only one tone. Another example would be the impossibility of having a conversation on a sidewalk when a loud truck goes by. The sounds of the conversation are present, but the brain does not process them, in favor of the louder noise of the truck. This is known as auditory masking, which would be a key technique used in audio compression technologies such as the MP3.
When music and audio recordings were first digitized, an emphasis was placed on capturing the full and complete frequency range with total fidelity. But someone familiar with psychoacoustics, like Dieter Seitzer, knew that this was overkill. The human ear and brain do not process or “hear” every tone, note or sound on a CD.
In the early 1980s, Seitzer had a pet project that he called a “digital jukebox.” Sort of like a Spotify or Pandora, way before their time, he envisioned a system where people could connect to a central server and order music to be delivered on demand, over the new ISDN digital telephone lines that were beginning to be installed across Germany. The problem was, digital music files were simply too large to be transmitted, even over this broader band. A typical compact disc uses linear pulse code modulation (PCM) with a with 16 bit/sample. In other words, it takes about 1.4 million bits to store a single second of stereo audio. But to send music files over wire, Seitzer would need to compress the file down by a factor of almost 12-to-1, or about 128,000 bits per second.
Seitzer applied for a patent on his pet project, only to be rejected by patent office on the grounds that what he was trying to do was “impossible.”
“You shouldn’t tell that to some German professor who has an idea what he’s doing,” Brandenburg says. “He [Seitzer] was looking for a PhD student who would take on the subject. And I have to admit, I knew enough about the state of the art that I thought, ‘Okay, the patent examiner is right; I will do some analysis to show why this is not possible. This will get me a PhD and then I’m off to something real.'”
But as Brandenburg delved into the research, he began to realize that compression on the level Seitzer was asking for might not be impossible after all. Brandenburg was able to combine previous work done on speech coding with some of the insights Seitzer and others discovered in terms of psychoacoustics to begin to make real headway. The main trick was compressing the audio files in a way that a human ear wouldn’t notice the difference. As mentioned, the audio masking technique was a big key. Brandenburg developed a coding process that could filter a signal into layers of sound which could be saved or discarded depending on his needs. In other words, he could systematically eliminate the parts of a sound recording that the average human ear wouldn’t be able to hear anyway, thereby gaining that much space in the data file. For example, since lower tones cancel out higher ones, if there was a recording with overlapping instruments, Brandenburg could assign fewer bits to the tones that wouldn’t register to a listener. Another example is related to the funny fact that the brain cancels out noise before and after a loud click. So, Brandenburg would be able to eliminate a few precious bits around, say, a loud cymbal crash.
The algorithm Brandenburg developed could be run iteratively. Each time he ran a piece of music through his program, he could turn around and run the output through the algorithm again, reducing the audio file’s size each time. Then Brandenburg combined his algorithm with others. First, he paired it with the basic data compression technique known as “Huffman coding,” which achieves lossless data savings by scanning for patterns and calling back for them them in the code when necessary. Then he applied the famous Fast Fourier Tramsforms, to break down the components of different frequencies.
Keep in mind, all of this was being done on an undergraduate research budget and in a comparatively primitive computing era. Seitzer had to compete for time on the university’s mainframe. He could only test twenty seconds of music at a time due to the limited capacity of hard discs. Heck, this was still the age of 3.5 inch floppy discs!
Brandenburg achieved his biggest breakthroughs by 1986, when he was only 31 years old. He received his first patent for his algorithms before he even defended his thesis.
Brandenburg’s technique (and thus, Seitzer’s original pet project) was theoretically proven. But in practice, there would be a long process of refinement ahead. The algorithms worked on paper, but they had to be tweaked for every possible type of sound, every type music, instrument and recording. And the tweaks had to be made precisely, meticulously. In other words, Brandenburg would spend several years making hands- and ears-on adjustments.
By this point, Brandenburg had followed his mentor Seitzer to the Fraunhofer Institute for Integrated Circuits. The Fraunhofer Society is a government-funded research organization, sort of like a German Bell Labs, but funded by the German government. So Brandenburg’s work could graduate from graduate thesis project to something more comprehensive. There would be a team to help him with the research. There would be better computing available. And, there would be a bigger budget.
“It was clear that it needed to be tested with completely different types of music,” Brandenburg recalls. “So we had some funding for the project and from this, I went to a local CD store, and I remember, I told them, ‘Okay, I need around 30 or 40 different CDs, all types of music, what you would recommend?’ Of course he looked at me and said, ‘What?’ So I came home with all these CDs just so that we had enough data to try our algorithms on.”
I am sitting in the morning at the diner on the corner…
And indeed, it was in testing all these various acoustic possibilities that Brandenburg encountered a major roadblock, in the person of Suzanne Vega.
While Brandenburg’s algorithm had been massaged to work well for most instruments, there was one instrument, the human voice, that it still had trouble with. Vega’s acapella version of her most famous song, Tom’s Diner, put these difficulties in stark relief.
“I read in some high fidelity, high-end journal that people used Suzanne Vega’s song Tom’s Diner, the a capella version, to test loudspeakers,” Brandenburg says. “And then I got to the Fraunhofer lab and I saw, ‘Oh, they have the CD; let me listen to this.’ I listened to it, and then I said, ‘Okay, please transfer that to the computer.’ (…) And that was, at first, a catastrophe, because this song turned out to be much more difficult than anything else we had used before.”
After running the song through the algorithms, Vega’s emotional vocals came back hoarse, transformed in a way that sounded unnatural.
“At lower bit rates, it was no longer her voice, but like… yeah… very hoarse… distorted in some way. You can explain it in one way… that human ears are specially trained to get a clean understanding of the human voice. We want to understand speech, so we seem to be more susceptible to voice changes than to other [sounds]. And on the other hand, our signal processing… this signal had some properties which just made it more difficult than others. In the end, we found out with these methods that our models of human hearing where in fact too simple, not good enough. So we had two big steps to get rid of that problem. The first one was to get better understanding of human hearing and patch our so-called psychoacoustic models, and then at some point in time, we started to do stereo encoding using similarities between left and right channel and because Suzanne Vega really stands in the middle, this piece, of course, when using this, things became easy.”
After thousands of tiny tweaks (and God knows how many times listening to Tom’s Diner, a song that tends to stick inordinately in a person’s head anyway, as we all know) it turned out that Vega led the way to perfecting the algorithm.
“We were really, for a number of years, hunting for that difficult item, to find out where the problems were. It’s—even today, to do something with 70% of music is easy. That’s an undergraduate project. But to get it to the level that it’s really perfect, or near-perfect, for everything… that’s work.”
Brandenburg met Vega years later, and even heard her perform Tom’s Diner in person. He claims he still enjoys the song, despite having heard it more times than perhaps anyone else on earth.
While we’re still on Vega, an example might be useful. Below is an example of the same Tom’s Diner track from above, but this time, the only things you will hear are the audio elements that the MP3 compression process eliminates. Compare the two. Quite haunting. (h/t Chris Higgins).
The Standard that Almost Wasn’t
Brandenburg finally published his thesis paper in 1989. His algorithm became known as optimum coding in the frequency domain (OCF).
The next step was to get the technology out to the world. Around this time, the early 1990s, there were several new technologies coming to market that were looking for an audio encoding standard to utilize, among them CD-ROMs and DVDs. Then, as now, the committee that decides which standards are included in consumer technologies is the Moving Picture Experts Group (MPEG). Brandenburg and his team from Fraunhofer were among fourteen different groups that submitted entries to MPEG, confident that their audio encoding technique was far and away superior. But they were up against competing technologies, backed by major multinational corporations. Their biggest competition came from a group called MUSICAM, which had strong ties to the Dutch corporation Philips. Philips, famously, held the patents on the Compact Disc.
Brandenburg’s technique produced better audio quality while using less data— seemingly, the holy grail. But MUSICAM’s technique used less processing power, a not insignificant consideration when you remember the speed of most processors at the time. There was plenty at stake. A fortune in licensing fees was waiting to be collected on behalf of whatever technology was chosen. In the end, after a byzantine round of politicking (and, perhaps, back-room dealing) a compromise was announced. There would be three standards adopted:
- Moving Picture Experts Group, Audio Layer 1 would be a compression method optimized for digital cassette tape.
- MUSICAM’s method would be named Moving Picture Experts Group, Audio Layer 2, and it would be chosen as the standard for digital FM radio, CD-ROMs, Digital Audio Tape, over-the-air HDTV, and, most crucially, for the audio tracks on home DVD players.
- Brandenburg and Fraunhofer’s technology would be called Moving Picture Experts Group, Audio Layer 3, but it was not chosen for any technology.
In a way, it seemed that MPEG, Layer 3 (what would officially be named MP3 in 1995) had lost. It was an orphaned standard. In fact, Brandenburg himself had already lead the development of a successor technology to MP3: Advanced Audio Coding (AAC). It seemed that the MP3 was destined for dustbin of history.
But a funny thing happened on the way to obscurity.
Two new technologies had burst onto the scene in the mid 1990s that would revive the fortunes of the MP3: the world wide web and Windows 95. Partially on a lark, but possibly in an effort to revive MP3’s flagging prospects, Brandenburg (who was, by this point, a director at the Fraunhofer Institute) directed his team to develop a software player for MP3s that could be released to work on Windows computers. It was Windows, with it’s three-character filenames that lead, in July of 1995 to the official naming of the MP3.
In a way, MP3, the unloved stepchild of the audio standards was suddenly, the right technology, in the right place, at the right time. Computer hard drives of about 1 GB in size only became common in 1995, so storage space was paramount. And of course, the vast majority of people getting online in 1995 did so at 28k baud modem speeds… 56k at best. This made bandwidth a major issue. Finally, even though MUSICAM was trying to get its .MP2 files out in the world at exactly the same time, to the uninitiated hundreds of millions of users adopting Windows 95, .MP3 looked like a next generation technology. If MP2 was good, then didn’t that mean that MP3 was better?
“I still remember one meeting, I think it must have been in ’94, when we discussed different options in our department,” Brandenburg recalls. “And we said, ‘Okay, we have a window of opportunity to make MPEG audio Layer 3 into the internet audio standard.’ That was the time when there were the first applications like Progressive Networks were out there with RealAudio. Now the company’s called Real Networks. And so on, so we said, ‘Okay, we have a chance. What do we need to do?’ In fact around the same time, PCs, for the first time, got fast enough to do on the flight code of Layer 3 signals. We first had to do some bad fix to make that work, but it did. So you could have, if your computer was equipped with a sound card, which at that time was not yet standard, if it was equipped with a sound card, you could have an MPEG audio Layer 3 file on your hard disk and listen to some music.”
As the web took off and gained mainstream acceptance, MP3, by accident or design, took off as the standard for audio files. As, mentioned, there were competitors. Real Audio’s .ra files were understood to be for audio streaming. Microsoft’s own .wma files never quite took off. Brandenburg and Fraunhofer helped MP3 gain traction online by pursuing a “let a thousand flowers bloom” approach. They released MP3 players and encoding software packages as free-to-use shareware on the Internet. The first MP3 website was launched as early as 1995. And the Fraunhofer Society took a hands-off approach when developers began to adopt the MP3 on their own. Popular music software packages like WinAmp were allowed to find an audience before Fraunhofer approached them, gently and politely, about licensing agreements.
This strategy paid off handsomely, and in just a few short years. By the late 1990s, “MP3” replaced “sex” as the most queried term on search engines. Entire websites and online communities sprang up solely to trade and disseminate MP3 files. MP3 became the de-facto standard for digital audio worldwide and the Fraunhofer society was suddenly in line to start raking in the licensing fees that it had missed out on all those years previous. Very early on, Microsoft licensed MP3 for inclusion in its Windows Media Player. When the first MP3 hardware devices, such as Saehan Information Systems’ MPMan F10 in 1998 (32MB of on-device storage, enough for only half an album of songs encoded at 128Kb/s) Fraunhofer benefited as well. Obviously, in the years to come, it would continue to do so with every copy of iTunes that was shipped, not to mention every iPod, Zune, iPhone and any cell phone that was capable of playing MP3 files.
Brandenburg says, “I think we took the right compromise in terms of business models, to, on one hand still keep royalty income flowing, and on the other hand making it easy to be used for PCs and for internet applications. (…) Even before the iPod, I remember I was on a business trip to Hong Kong, and I saw an electronics shop window with 30 different brands and versions of mp3 players, and I said, ‘Okay, that’s it. We have won, finally.'”
“That’s something I never endorsed and we never endorsed…”
But obviously, this is only part of the story that is the MP3. By the end of the 1990s came piracy. Came Napster. Came the entire upending of the music industry, the fallout from which we are still working through today.
Karlheinz Brandenburg says that from the beginning his team was only focused on perfecting MP3 technology and helping the standard gain broad acceptance. But as early as 1994, an industry representative is reported to have told Brandeburg, “Do you know that you will destroy the music industry?”
When the standard was first released to the public, Brandenburg showed it to the recording industry, only to receive a reaction that was something along the lines of, “‘This is very interesting. You have done good research, but what has that to do with us?'”
By the time that widespread piracy of MP3s began in the late 90s, Brandenburg and Fraunhofer found themselves in an ironic situation. They were thrilled that MP3 had finally gained widespread acceptance—their strategy of shareware distribution and grassroots seeding of the technology had paid off exactly as planned. And because MP3 had become the defacto standard, they were profiting handsomely from hardware and software licenses. And yet, no one had ever had the intention of, even tacitly, promoting the piracy of intellectual property.
“We found later on that a lot of people did not care about who owns the rights,” Brandenburg says. “That [piracy] is something I never endorsed and we never endorsed. I still have a strong opinion that musicians… all the artists, composers, and everybody helping to distribute music should be paid for their work.”
But of course, it is a well-known cautionary tale how the music industry, fractured as it was, could never quite get its act together in any meaningful way to would address the rise of digital media. This is in spite of the fact that Brandenburg himself tried to provide them with a solution.
“I think it was early 1998, I was [in town] for a visit in Washington, DC, and I had some contact to see the RIAA, the Recording Industry Association of America. So I wrote an email, ‘I could come to visit you to discuss these issues.’ And they wrote back, ‘We heard about you. We would love to talk to you.’ So I went to RIAA headquarters, discussed different strategies, what to do, and made it very clear we were not interested in the death of the music industry. In the end, there was no direct outcome from this, but later on people started the Secure Digital Music Initiative, SDMI. And it was done in ’99 or 2000, and I went to a lot of these meetings. We were into all these discussions about Digital Rights Management and how to do that. The problem there was that somehow, there were some companies [that were] not interested in any technical standard, in any interoperability. And I thought, ‘Look, you’ve got to do a technical standard which makes sure that every service selling music and every device works together, and has a universal system.’ If you don’t do that, I see a clear winner and that is MP3 without any protection, without any Digital Rights Management. It’s not our problem if that happens and that’s what happened.”
Brandenburg says that he had the same insight Steve Jobs would later have: the only real way to combat “free” music was to make paying for music as easy and painless as possible.
“I said, ‘Look, whatever you do, the most important [thing] is ease of use; second thing is ease of use; third is ease of use.'”
From a music industry perspective, the last decade has been a story of Apple and iTunes, followed by streaming companies like Spotify and Pandora, completely taking over music retail. But behind all of this is really the underlying story of MP3 becoming the triumphant medium for music. In fact, it was just this year that digital music revenue overtook physical music sales for the first time. Through it all, Brandenburg says he is satisfied with the fruits of his research.
“I have to admit I still sometimes feel like, ‘Am I dreaming or is this real?'” Brandenburg says. “Of course, as a young student, you are dreaming about success and things [you create] being used and so on. But [MP3] became so much more successful that… really, for me… it’s the story of my life.”
Brandenburg is asked all the time if he is bitter that others have made a fortune off of his invention, but in fact, he is happy to point out that he has been amply compensated.
“The institute has made a lot of money. Fortunately, compared to my colleagues in the U.S., in Germany, there’s a law that if your company makes a lot of money from patents, then the inventors have to get some share of that.”
20 years on, Brandenburg does not much like being called “the inventor” of the MP3, pointing out how many different people around the world contributed to the development of the technology.
“Everybody looks at me when talking about the birth of mp3, but [they] don’t know who else was involved, and there were many more good ideas. (…) [I am] somebody who has done very important contributions, certainly, yes. So [my] being in the first line is correct; but being the only one is completely incorrect.”
Subscribe To This Podcast!
Listen Right Here On Web
Listen on iTunes
- How Music Got Free: The End of an Industry, the Turn of the Century, and the Patient Zero of Piracy
- Appetite for Self-Destruction: The Spectacular Crash of the Record Industry in the Digital Age
- Ripped: How the Wired Generation Revolutionized Music
- All the Rave: The Rise and Fall of Shawn Fanning’s Napster