My research department works on programming computers to analyse music.
In this field, researchers like to have some idea of whether a problem is naturally easy or difficult for humans.
For example, tapping along with the beat of a musical recording is usually easy, and it’s fairly instinctive—you don’t need much training to do it.
Identifying the instrument that is playing a solo section takes some context. (You need to learn what the instruments sound like.) But we seem well-equipped to do it once we’ve heard the possible instruments a few times.
Naming the key of a piece while listening to it is hard, or impossible, without training, but some listeners can do it easily when practised.
Tasks that a computer scientist might think of as “search problems”, such as identifying performances that are actually the same while disregarding background noise and other interference, tend to be difficult for humans no matter how much experience they have.
Ground truth
It matters to a researcher whether the problem they’re studying is easy or difficult for humans. They need to be able to judge how successful their methods are, and to do that they need to have something to compare them with. If a problem is straightforward for humans, then there’s no problem—they can just see how closely their results match those from normal people.
But if it’s a problem that humans find difficult too, that won’t work. Being as good as a human isn’t such a great result if you’re trying to do something humans are no good at.
Researchers use the term “ground truth” to refer to something they can evaluate their work against. The idea, of course, is that the ground truth is known to be true, and computer methods are supposed to approach it more or less closely depending on how good they are. (The term comes from satellite image sensing, where the ground truth is literally the set of objects on the ground that the satellite is trying to detect.)
Music recommendation
Can there be a human “ground truth” for music recommendation?
When it comes to suggesting music that a listener might like, based on the music they’ve apparently enjoyed in the past—should computers be trying to approach “human” reliability? How else should we decide whether a recommendation method is successful or not?
What do you think?
How good are you at recommending music to the people you know best?
Can a human recommend music to another human better than a computer ever could? Under what circumstances? What does “better” mean anyway?
Or should a computer be able to do better than a human? Why?
(I’m not looking for academically rigorous replies—I’m just trying to get more of an idea about the fuzzy human and emotional factors that research methods would have to contend with in practice.)