How Good Is That Playlist?
Mon, January 11, 2010 at 12:01AM One question that arises whenever someone wants to listens to music is: Is it worth creating a playlist -- or do I just press shuffle?" The former is getting more and more difficult, due to the sheer amount of music available on today's computers while the latter is rendered useless by the huge variety of music on any player.
-- Jakob Frank, Analysing and Evaluating Playlists on Music Maps, WDA'2009
In my previous post, I discussed the different different approaches used to generate music playlists automatically, and compared the features of four playlist recommenders. In this post I'll describe an experiment I conducted to discover how the playlists differ when using each of those products.
Automatic Playlisting Products
The products I'm comparing are:
- Pandora Radio from Pandora Media
- iTunes Genius from Apple
- MusicIP Mixer from Amplified Music Services
- Moodagent from Syntonetic Media
These products are not really direct competitors; you could conceivably find a use for each. Using different technologies and approaches to produce their results, each offers a different mix of features and services. In my previous post I summarized those differences in a table.
In one important respect, however, all four share a common goal: they aim to pick music tracks that will sound good together in a playlist. But to evaluate products against that simple goal statement involves some complications that we must discuss first.
Playlist Quality: A Subjective Concept
A playlist is a collection of songs grouped together under a particular principle. The principle could be general, such as “rock songs from the 70’s” or personal like “songs that remind me of Melanie.”
--Barrington, Oda, Lanckriet [pdf]
That quote highlights one of the challenges of evaluating the results of any playlist generation process or tool. Musical taste and appreciation is a highly subjective area to begin with. Furthermore, no single rule could ever define “a good playlist” because people have many different reasons for compiling a music playlist or mix. See, for example, Taxonomy of the Mix.
In their 2006 study, More of an art than a science: Supporting the creation of playlists and mixes, Cunningham, Bainbridge, and Falconer [Ref 2] report on their analysis of forum postings at the Art of the Mix website [Ref 3]. Their Table 1 lists relative frequencies of some common organizing principles for mixes:
- 25.2% -- Artist/Genre/Style
- 25.2% -- Event or Activity (e.g. party, travel, holiday)
- 19.1% -- Romance
- 16.5% -- Message or Story
- 16.5% -- Mood
- 10.4% -- Challenge or Puzzle
- 7.0% -- Orchestration
- 6.1% -- Characteristics of Mix Recipient
- 6.1% -- Cultural References
- 2.8% -- Other
Many of these organizing principles involve personal evaluation criteria that inevitably fall outside the scope of any automatic playlist generation system. Because of this, most studies of playlist recommendations focus on music characteristics that can be defined with enough objectivity to permit discussion and common understanding. For example, citing the study quoted above, Barrington et al. [Ref 4] argue as follows:
Cunningham et al.’s user study reports that 50 percent of requests for help in creating a playlist included a song as an example. Our work focuses on this “query by example” paradigm where the user provides a song as a query or “seed” and the recommender system’s task is to generate a playlist of more music that somehow “fits well” with the seed song. The meaning of “fits well” may depend on a variety of the factors below.
Factors that impact playlist generation
Playlists may be generated (either automatically or by hand) to reflect a mood, accompany an activity or explore novel songs for music discovery. Recommendations can be based on similarity to one or more seed examples or songs may be grouped based on semantic descriptions. The top organization schemes for playlists in [Cunningham et al.] were similar artists, genres and styles so we focus on the impact of these factors for automating playlist generation.
--Barrington, Oda, Lanckriet [pdf]
I am going to adopt a similar starting point for my own analysis. All four playlist generators I want to compare can produce a playlist based on a musical seed — one or more songs, artists, or albums — so that will be my focus.
I know that people also want to create playlists based on other criteria or song characteristics which I am not considering in my comparison. In the paper cited above, Cunningham et al. discuss various ways that products could provide music information retrieval services to better support the creation of playlists and mixes, but these four products make no claim to assist in those ways.
Playlist Similarity and Bias
Similarity and Bias are closely related concepts; both refer to ways in which we perceive certain characteristics of the tracks in a playlist. When we select music as a seed for a playlist, we expect the resulting recommendations to have characteristics that are similar to that seed in some recognizable way.
But just what will be similar? Even a single track has many characteristics, and music recommenders differ in the ones they use to describe, compare, and select tracks. For example, the Wikipedia entry for Pandora Radio states that "over 400 different musical attributes are considered when selecting the next song. These 400 attributes are combined into larger groups called focus traits. There are 2,000 focus traits".
To account for so many different characteristics, recommender systems use a variety of mathematical methods to determine similarity, and those methods continue to evolve. For example, the opening quote in this post comes from a 2009 paper by Jakob Frank, whose work [Ref 5,6] focuses on the use of self-organizing maps, a neural-network technique, to visualize similarity within a large audio collection. Toby Segaran's 2007 book, Programming Collective Intelligence [Ref 1], is an excellent introduction to the field.
To see how similarity and bias are related, consider a simple conceptual algorithm for comparing pairs of tracks and producing a playlist from a seed track:
- For all characteristics, compute numerical distance factors corresponding to the differences between the tracks
- Combine those distance factors using some kind of weighting to produce a Similarity measure for track pairs
- When given a seed, select the tracks with the smallest (i.e. closest) similarity measures
That proposal ignores many real complications. But with that general framework in mind, the concepts of similarity and bias can be viewed simply as different weightings of the distance factors. A perception of bias would be created if, when computing its similarity measures, a recommender assigned undue weight to one characteristic of the seed, or perhaps to a few characteristics.
Emphasis is not always seen as bias. If a recommender lets you request a certain characteristic, then the playlist should reflect your preference. Using Pandora, for example, you can home in on a particular emphasis by approving or disapproving its selections. But if I don't explicitly specify a preference for hit singles, 1970's tracks, tracks by a single artist, or tracks tagged as a particular genre, I'm going to view such an emphasis as an unwarranted bias.
Test Framework
To test the quality of the playlists produced by different recommenders, I decided to take a popular song that every music recommender should “recognize”, use each product to generate playlists with several recordings of that song as seeds, and compare the results.
My expectations are that:
- Playlist generators will “know” how to use a popular song as a seed. At minimum, this means accepting each recording as a seed and generating a playlist from it.
- Playlists will have “similar” characteristics to the seed. I realize that my evaluation of similarity will involve subjective judgments. Even so, this criterion is more than reasonable, it is an essential aspect of what any user of a playlist generator expects when selecting a seed song. If the resulting playlist is not similar to the seed in some recognizable way, the playlist is no better than a random selection of tracks from the user's library. Simpler tools can do that.
- Identical (or very similar) seeds will produce very similar playlists, and (conversely) distinctly different seeds will produce distinctly different playlists. Again, these expectations are a natural implication of the "seed song" concept being applied in a consistent and predictable way.
- Playlists will be free of obvious biases that derive from a single characteristic of the seed song.
Seed Song: Layla
For my test, I chose the song Layla by Derek and The Dominos [Ref 8]:
Layla is a song by rock band Derek and the Dominos from their album Layla and Other Assorted Love Songs, released in December 1970. It is considered one of rock music's definitive love songs, featuring an unmistakable guitar figure, played by Eric Clapton and Duane Allman, and a piano coda that comprises the second half of the song.
The song has experienced great critical and popular acclaim. It is often hailed as being among the greatest rock songs of all-time. Two versions have achieved chart success, first in 1972 and again twenty years later as an acoustic Unplugged performance. In 2004, it was ranked #27 on Rolling Stone's list of The 500 Greatest Songs of All Time, and the acoustic version won the 1993 Grammy Award for Best Rock Song.
-- Wikipedia article [edited]
Five Seed Tracks
Among the tracks in my music library, I have five recordings of Layla; a subset of their iTunes metadata is reproduced in the table below. I added the first column for identification purposes here.
| No. | Time | Album | Date | Album Artist | Artist | Genre | Bit Rate | Kind |
|---|---|---|---|---|---|---|---|---|
| 1 | 7:06 | The Layla Sessions [Disk 1] |
1970 | Derek & The Dominos | Derek & The Dominos | Blues | 154 kbps (VBR) | MPEG |
| 2 | 7:07 | Anthology [Disc 1] |
1970 | Duane Allman | Derek & The Dominos | Rock | 128 kbps | AAC |
| 3 | 7:06 | Time Pieces - The Best of Eric Clapton | 1970 | (none) | Eric Clapton | Blues | 256 kbps (VBR) | AAC |
| 4 | 6:26 | Eric Clapton's Rainbow Concert | 1973 | (none) | Eric Clapton | Blues | 256 kbps (VBR) | AAC |
| 5 | 4:46 | Unplugged | 1992 | (none) | Eric Clapton | Blues | 256 kbps (VBR) | AAC |
The first three seed tracks are different digital versions of the original recording. Using sophisticated equipment, a sound engineer can detect digital mastering differences among them, introduced when the original audio tapes were digitized for distribution in CD format. But to the untrained ear, all three versions sound identical. The only obvious differences are in their metadata.
The last two are recordings of live performances of the song, which sound distinctly different from each other, and from the original studio recording.
During my test, I created playlists using each of the five recordings with each of the four playlist generators, and evaluated the results. Each playlist contained 25 tracks: the seed itself, and 24 recommended tracks.
Metadata: To Clean or Not to Clean?
Table 1 lists my own iTunes metadata without modification. It contains some obvious inconsistencies and omissions. Therefore, it may not exactly match what could be found in an Internet-accessible database such as that maintained by Gracenote. But it is what I have accumulated in the normal course of compiling a digital music library, so I made no attempt to clean it up to make it more complete or internally consistent, nor did I deliberately introduce any additional inconsistencies.
I believe that this approach provides a fair and reasonable test of the four playlist generators, because such minor inconsistencies and omissions will be typical of the content of most personal digital music libraries. Only the most obsessively meticulous music owners will have invested in programs like TidySongs or TuneUp to clean up the metadata for all their songs.
Lastly, all four playlist generators in this study interact with their users' data in an online environment. They could perform their own metadata checks to clarify inconsistencies or rectify omissions in a user's metadata, if doing so would improve their recommendations. If they don't perform such checks, it is not my responsibility as a tester to help them out by scrubbing my metadata before the test.
Expectations
Because of its widespread familiarity and status as a rock music "standard," it is reasonable to expect that all playlist generators should be able to produce appropriate playlists when a popular recording of Layla is supplied as the seed song. In particular, for the five recordings I have selected for this test, I would expect that:
- The playlists generated for all five recordings will have a guitar-oriented rock/blues emphasis
- The playlists generated for all three versions of the original recording will be similar or identical
- The playlist generated for the live Rainbow Concert recording will be different from those for the original studio recording
- The playlist generated for the Unplugged recording will contain a greater emphasis on acoustic rock
- The playlists generated for the Rainbow Concert and Unplugged versions will include some emphasis on live music recordings
- The playlists generated for all five recordings will not contain excessive hit-single bias
I will score each playlist generator product based on my opinion of how well they fare against these expectations. I accept that these are just one person's subjective evaluations, but I will explain my conclusions.
Test Results: Details
The four sections below describe the results with each playlist generator.
Pandora Radio
Pandora [Ref 9,10] selects music to play using its "Music Genome" database, which contains detailed song profiles created by Pandora's expert musicologists. It does not play your own music, or play specific tracks on demand. Perhaps this is why it can afford to be less specific when you are selecting a seed. In my experiment, entering "Layla" as the seed track produced a menu of Layla performances, most of which were cover versions. Only these three menu options were related to my five chosen seeds:
- Layla by Derek & The Dominos
- Layla (Unplugged Version) by Eric Clapton
- Layla (Live) by Eric Clapton
It seemed clear that the first must represent seeds #1-3 in my list, and the second must be seed #5. Giving Pandora the benefit of the doubt, I decided that the third could represent seed #4 — a live performance of Layla, maybe at the Rainbow Concert. I would have felt more certain if Pandora had understood "Rainbow Concert" as a search term. Because it didn't, I'm only willing to give Pandora partial credit for recognizing that Eric Clapton has recorded live versions of Layla that are not the familiar Unplugged recording. The resulting playlist seemed to confirm this interpretation.
Pandora's music choices can be progressively refined by indicating whether you approve or disapprove of specific selections; I did not use that feature during my experiment. Here are my playlist observations:
| Five Recordings | Pandora Radio Playlists |
|---|---|
| 1. Original recording of
Layla by Derek & The Dominoes |
Offers original recording only as a seed
Plays 100% classic rock hit singles 71% from Greatest Hits CDs |
| 2. Original recording
-- on Duane Allman Anthology |
|
| 3. Original recording
-- on Time Pieces by Eric Clapton |
|
| 4. Eric Clapton’s Rainbow Concert (live) | Not clearly identified as a seed song |
| 5. Eric Clapton Unplugged (live) | Acoustic rock, 1969-2007, no hit song bias |
Pandora's playlist selections for the original recording of Layla sounded just like a classic rock radio station. All 24 tracks were hit singles, 17 of 24 (71%) were selected from greatest hits compilations, and the others were from instantly recognizable albums like Eat A Peach or Sticky Fingers.
If you think songs like Brown Sugar and It's Only Rock And Roll by the Stones, Ticket To Ride and Come Together by the Beatles, and Proud Mary and Down On The Corner by CCR are similar to Layla, you might enjoy that playlist. Those are all great songs, and (like Layla) they are all classic rock hits. But I know Pandora has enough data about them to know that they are not acoustically similar to Layla. I can only assume that some Pandora customers expect this type of bias, and like it. In my view, it is a perfect example of hit single bias, and I will score it accordingly.
In contrast, selecting the other two options as seeds produced playlists that seemed much more appropriate. Both playlists emphasized live recordings (9/24 for "Unplugged", 12/24 for "Live"), with the Unplugged seed producing a playlist composed largely of acoustic rock tracks. Fewer than 20% of the selections were hit songs. These similarities and biases were consistent with my expectations for the two seeds.
iTunes Genius (Apple)
Apple's Genius feature [Ref 11] uses a collaborative filtering approach to determine which songs belong together in a playlist; it relies entirely on the inter-song relationships it finds in track metadata and in the contents of users' iTunes libraries. In my test, this method produced some inconsistent recommendations:
| Five Recordings | Apple Genius Playlists |
|---|---|
| 1. Original recording of
Layla by Derek & The Dominoes |
Only 7 tracks in common with the playlist for the version on Eric Clapton's Time Pieces |
| 2. Original recording
-- on Duane Allman Anthology |
Zero tracks in common with the playlists for the other two original versions |
| 3. Original recording
-- on Time Pieces by Eric Clapton |
Playlists for all three recordings by Eric Clapton have 50% of their tracks in common
Almost all playlist selections are classic rock hits Playlists having live performances as seeds contain the fewest live performances (<20%) |
| 4. Eric Clapton’s Rainbow Concert (live) | |
| 5. Eric Clapton Unplugged (live) |
First, because it has no understanding of the musical content of tracks, Genius had no idea that the first three tracks in my test were actually the same original recording of Layla. Because they have quite different metadata, Genius created quite different playlists for each. Of the 72 tracks it selected in three playlists, 58 were unique.
- The playlists for seeds #1 and #3 had just 7 tracks, all classic rock hits, in common: Brown Sugar, Crossroads, I Shot The Sheriff, Jumpin' Jack Flash, Money For Nothing, Pinball Wizard, and Rock and Roll.
- The playlist for seed #2, the Duane Allman version of the original, had no tracks in common with either of the other two playlists. On the positive side, however, it was the only playlist not dominated by classic rock hits.
- To see if my seeds' Genre labels had an effect, I counted the tracks labeled Blues and Rock in each playlist. The playlists for seeds #1 and #3 contained just 4 tracks labeled Blues and 20 labeled Rock. For seed #2, the playlist contained only 10 tracks labeled Rock. These results are the opposite of what Table 1 might suggest, indicating that Genius was not much influenced by my iTunes Genre metadata.
The Genius playlists for seeds #4-5 were no better. Despite their quite different musical styles and different track lengths, Genius treated seeds #3-5 in my test as if they were quite similar.
- Apparently, the type of performance (studio, live rock concert, and live acoustic) was largely irrelevant, and the fact that the artist was Eric Clapton trumped all the other information it had access to.
- The three playlists were composed almost exclusively of classic rock hits, 12 of which reappeared in all three playlists. Together with 5 of the songs I listed above, Genius also repeatedly recommended Bad Moon Rising, Desperado, Kashmir, Money, Riders On The Storm, Tequila Sunrise, and Time; I don't think I need to list the artists.
- Finally, the playlists for the two live seeds (#4 and #5) contained only 5 live recordings each, compared with a total of 22 live recordings in the other three playlists.
These results did not come close to fulfilling my expectation that playlists would reflect similarities and differences in the seeds. While the Genius playlists did meet my minimum criterion — to have a guitar-oriented rock/blues emphasis — they disappointed in every other respect. In fact, Genius almost always picked the opposite of what I expected.
MusicIP Mixer (Amplified Music Services)
MusicIP Mixer [Ref 13] can create playlists based on a track, an artist, or an album. Initial mix preferences can be set using sliders for style and variety. For this experiment, I set them to minimize playlist variety and make them as comparable as possible to those produced by the other products. Generated playlists can be further refined by indicating whether you approve or disapprove of specific tracks; as with Pandora, I did not use that feature. Here are my observations:
| Five Recordings | MusicIP Mixer Playlists |
|---|---|
| 1. Original recording of
Layla by Derek & The Dominoes |
Three distinct playlists, with rock/blues emphasis
No obvious biases |
| 2. Original recording
-- on Duane Allman Anthology |
|
| 3. Original recording
-- on Time Pieces by Eric Clapton |
|
| 4. Eric Clapton’s Rainbow Concert (live) | Distinct playlist, 40% live performances |
| 5. Eric Clapton Unplugged (live) | Distinct playlist with slower, acoustic emphasis |
MusicIp Mixer's recommendations are based on a hybrid approach that combines information from three sources: metadata from ID3 tags, acoustic analysis of sample segments, and user annotation, if present. I rarely add user ratings to tracks, and for this experiment, I removed the few that had been present in my iTunes database. Even so, it turns out that using both musical content and metadata causes MusicIP Mixer to generate a distinct playlist for each seed, even for seeds that sound identical.
In fairness to MusicIP Mixer, software performing acoustic analysis can detect differences between two music tracks that may not be obvious to the ear, except perhaps to a sound engineer using expensive equipment. By sampling two tracks at slightly different points, software can also report differences where none exist in reality. Digital encoding and compression schemes (which differ for seeds #1-3) also introduce differences.
These are some technical complications of using software to analyze music content. But (other than a few geeks) customers don't care about these complications; they expect a product's developers to master their own technology, and rate the results accordingly. I discuss this further below when reviewing Moodagent, which faces similar challenges.
Other than its refusal to produce similar playlists for seeds #1-3, MusicIP Mixer performed well against my evaluation criteria, with all its selections sounding reasonable for the various seeds. The playlists for the two live seeds contained more live tracks, and none of the playlists contained obvious biases other than an overal preference for guitar-oriented rock/blues tracks, which was what I expected.
Moodagent (Syntonetic)
Like MusicIP Mixer, Moodagent [Ref 14] uses digital signal processing software to read portions of each music track. The difference is in what Moodagent does after it has sampled the music file. Moodagent's analysis goes beyond purely musical properties, using artificial intelligence and music science to focus on the way music is perceived emotionally by the listener.
The result is a profile for a track composed of scores for each of five mood-related properties: sensual, tender, joy, aggressive and tempo. These scores are reflected visually in the positions of five slider controls, each of which has 7 possible settings. Moodagent users can define a "mood profile" for a playlist by moving these sliders directly. For this experiment, I set the slider positions indirectly by picking a seed track. Here are my observations, including the mood profiles for each seed:
| Five Recordings | Mood Profiles | Moodagent Playlists |
|---|---|---|
| 1. Original recording of
Layla by Derek & The Dominoes |
4.3.5.4.6 | Identical playlists, with rock/blues emphasis
No obvious biases |
| 2. Original recording
-- on Duane Allman Anthology |
4.3.5.4.6 | |
| 3. Original recording
-- on Time Pieces by Eric Clapton |
4.3.4.4.6 | Distinct playlist (digital mastering has created a distinct mood profile) |
| 4. Eric Clapton’s Rainbow Concert (live) | 2.2.4.5.6 | Distinct playlist, 56% live performances |
| 5. Eric Clapton Unplugged (live) | 3.4.5.2.4 | Distinct playlist with slower, acoustic emphasis |
Because its recommendations are entirely derived from its programmatic profiling of each track, I was not at all surprised when Moodagent generated identical playlists for seeds #1 and #2. So I was puzzled when it produced a distinctly different playlist for seed #3.
Closer inspection of the three tracks' profiles revealed the reason: the Joy value in the profile for seed #3 is one notch lower. I don't know why. This track actually has the highest bit rate of seeds #1-3 (see Table 1), so simplistic explanations like "file compression must have reduced the contrast" don't seem to work. But I do know that seeds #1-3 are all derived from the same 1970 audio tapes, so any differences must be a consequence of digital mastering. In a 2007 Rolling Stone article [Ref 16], Robert Levine describes a common criticism of modern digital mastering techniques. He quotes David Bendeth, a rock music producer:
Over the past decade and a half, a revolution in recording technology has changed the way albums are produced, mixed and mastered — almost always for the worse. "They make it loud to get [listeners'] attention," Bendeth says. Engineers do that by applying dynamic range compression, which reduces the difference between the loudest and softest sounds in a song. Like many of his peers, Bendeth believes that relying too much on this effect can obscure sonic detail, rob music of its emotional power and leave listeners with what engineers call ear fatigue.
-- The Death of High Fidelity [emphasis added]
[For more about this subject, see References 15-17 below].
I can only conclude that Moodagent's profiling software must be sensitive to this aspect of the seed music in some way. If so, this raises an interesting question of which is more "correct": my expectations for seeds #1-3, or Moodagent's profiling and recommendation behavior for those seeds? For the moment, I'm going to score Moodagent's behavior as "wrong" in this case. I still think my expectation (that seeds #1-3 would produce a similar or identical playlists) was reasonable, and is what a typical Moodagent customer would expect.
The preceding discussion highlights one of the technical challenges that the developers of any music recommendation software must address, namely how to capture the "true nature" of a piece of music in a profile. According to Wikipedia, a robust acoustic fingerprint will allow a recording to be identified after it has gone through ... compression, even if the audio quality has been reduced significantly. I'm not sure if any existing software actually achieves this goal, so I may do some research and revisit this topic in a future post.
Moodagent performed well against my other evaluation criteria. All its selections sounded reasonable for the various seeds, and the playlists for the two live seeds contained more live tracks. None of the playlists contained obvious biases other than the expected preference for guitar-oriented rock/blues tracks.
Test Results: Summary
Table 6 below shows my summary of all four products' playlists side-by-side, and Table 7 shows how they scored against my expectations:
| Five Recordings | Pandora Radio | Apple Genius | MusicIP Mixer | Moodagent |
|---|---|---|---|---|
| 1. Original recording of
Layla by Derek & The Dominoes |
Offers original recording only as a seed.
Plays 100% classic rock hit singles. 70% from Greatest Hits CDs. |
Only 5 tracks in common with Clapton/Time Pieces version | Three distinct playlists, with rock/blues emphasis
No obvious biases |
Identical playlists, with rock/blues emphasis
No obvious biases |
| 2. Original recording
-- on Duane Allman Anthology |
Zero tracks in common with other two playlists for original versions | |||
| 3. Original recording
-- on Time Pieces by Eric Clapton |
Playlists for all three recordings by Eric Clapton have 50% of tracks in common
Almost all selections are classic rock hits Live seeds produce fewest (<20%) live performances in playlists |
Distinct playlist (digital mastering has created a distinct mood profile) | ||
| 4. Eric Clapton’s Rainbow Concert (live) | Not clearly identified as a seed song | Distinct playlist, 40% live performances | Distinct playlist, 56% live performances | |
| 5. Eric Clapton Unplugged (live) | Acoustic rock, 1969-2007, no hit song bias | Distinct playlist with slower, acoustic emphasis | Distinct playlist with slower, acoustic emphasis |
| Expectation for Playlist Characteristics | Pandora Radio | Apple Genius | MusicIP Mixer | Moodagent |
|---|---|---|---|---|
| All playlists for all tested recordings have a guitar-oriented rock/blues emphasis (5 points) | 4.5 | 5 | 5 | 5 |
| Playlist for Duane Allman version of the original recording is similar or identical to the first playlist | 1 | 0 | 0 | 1 |
| Playlist for Eric Clapton version of the original recording is similar or identical to the first playlist | 1 | 0 | 0 | 0 |
| Playlist for Rainbow Concert version is different from the first playlist | 1 | 0 | 1 | 1 |
| Playlist for Unplugged version has an acoustic emphasis compared to those for the original and Rainbow Concert versions | 1 | 0 | 1 | 1 |
| Playlists for Rainbow Concert and Unplugged versions have some live music emphasis (2 points) | 2 | 0 | 2 | 2 |
| Playlists do not contain excessive hit-single bias (5 points) | 2 | 1 | 5 | 5 |
| TOTAL SCORES (maximum 16 points) | 12.5 | 6 | 14 | 15 |
Some Personal Conclusions
As I stated in my previous post, I included Pandora in this comparison not because it competes directly with the other three playlist generators; it cannot, because it does not play your own music. On the other hand, it does offer some flexible ways to specify the music you like, and creates playlists containing that kind of music. So I'm using it as a baseline — a somewhat familiar standard for evaluating other music recommendation services.
With that standard in mind, I conclude that Genius performed significantly less effectively than Pandora, while the performance of MusicIP Mixer and Moodagent exceeded Pandora's. It seems obvious to me that these differences in performance are a direct result of the methods used to produce recommendations:
- The collaborative filtering approach used by Genius is inherently biased towards the most popular music tracks, hit singles, which dominate music sales. Oscar Celma Herrada [Ref 7] cites this statistic from Nielsen SoundScan: in 2007, 1% of all digital tracks accounted for 80% of all track sales. Naturally, people's iTunes libaries and playlists, which are the source of Genius's recommendations, reflect this bias.
- The acoustic profiling approaches used by MusicIP Mixer and Moodagent avoid the hit single bias, and are much better at playing all the music in your collection, no matter how obscure it may be. But — as this experiment revealed — they are not always guaranteed to produce the results you might expect. Software performing acoustic analysis may be influenced by music characteristics that are not obvious to the average listener.
Finally, I acknowledge that my test was somewhat artificial. As I stated at the outset, people's expectations regarding playlists are subjective. In practice it may not matter if similar seeds don't produce identical playlists, as long as those playlist still sound appropriate to the listener. In my subjective opinion, MusicIP Mixer and Moodagent pass this test, while Genius does not. Other reviewers, like Paul Lamere, have reached similar conclusions [Ref 12]; your mileage may vary.
References
Listed below are some useful academic papers and web pages that I cited within this and preceding posts on the topic of automatic playlist generation:
- Programming Collective Intelligence: Building Smart Web 2.0 Applications by Toby Segaran, O'Reilly Media, 2007 [Amazon]
- More of an art than a science: Supporting the creation of playlists and mixes by Cunningham, Bainbridge, Falconer. U. of Victoria, 2006 [pdf]
- Art Of The Mix [web site]
- Smarter Than Genius? Human Evaluation of Music Recommender Systems by Barrington, Oda, Lanckriet. International Soc. for Music Information Retrieval, 2009 [pdf]
- Enhancing Music Maps by Jakob Frank, Workshop on Data Analysis WDA´2008 [pdf]
- Analysing and Evaluating Playlists on Music Maps by Jakob Frank, Workshop on Data Analysis WDA´2009 [pdf]
- Music Recommendation and Discovery In The Long Tail by Oscar Celma Herrada, 2008 (Ph.D. Thesis) [pdf]
- Layla by Derek and The Dominoes [Wikipedia article]
- What is Pandora Radio? [web page]
- The Song Decoders by Rob Walker, New York Times, October 14, 2009 [web page]
- iTunes Genius [Wikipedia article]
- How smart *is* the Genius? by Paul Lamere [blog post]
- MusicIP Mixer [Software download]
- Moodagent: Automatic Playlist DJ for your music, your mood [web page]
- Loudness War [Wikipedia article]
- The Death of High Fidelity [Rolling Stone article]
- The Loudness War [Youtube demonstration]
Music,
Web 2.0 | tagged
Genius,
Layla,
MoodAgent,
MusicIP Mixer,
Pandora,
Recommenders,
playlist 

Reader Comments (1)
This a very interesting comparison and also your result is absolutely right:
1. There is still the problem with the subjectivity, so we need a good user profile that will customize the algorithms of the recommenders
2. There is still the problem with the long tail, but solution like MusicIP or mufin are on they way to solve them
Hence only a solution that will combine the context and content analysis by adapting it to the personal profile will hopefully win this challenge, and the user must be able to control these similarity settings.
Cheers