Thursday, January 10, 2019

Incomplete editorial fingerings

Say we have the following fingerings in our gold standard:
  1. 1234 131 (1 annotator vote) .1
  2. 1xx4 1x1 (3 votes) .3
  3. 12xx x3x (2 votes) .2
  4. 1x1x 1xx (4 votes) .4
How much credit should a model get for suggesting 1234 131? Or 1214 131? That is, how likely is it that the user will accept the advice? How confident are we that the advice is likely to be good?

Do we just sum over all the matches? So 1234 131 would get .1 + .3 + .2 = .6? And 1214 131 would get the same: .2 + .4 = .6? But don't we have more evidence that 1234 131 is a good fingering?

We could amplify a fingering sequences based on how many actual annotations it has:
  1. 1234 131 (7 notes x 1 annotator = 7 votes) .189
  2. 1xx4 1x1 (4 x 3 = 12 votes) .324
  3. 12xx x3x (3 x 2 = 6 votes) .162
  4. 1x1x 1xx (3 x 4 = 12 votes) .324
Or I am thinking the more specific sequences should be amplified by less specific sequences that do not contradict them:
  1. 1234 131 (1 + 3 + 2 = 6 votes) .353
  2. 1xx4 1x1 (3 + 2 = 5 votes) .313
  3. 12xx x3x (2 votes) .125
  4. 1x1x 1xx (4 votes) .25
Or a combination, like this:
  1. 1234 131 (7 + 12 + 6 = 25 votes) .4
  2. 1xx4 1x1 (12 + 6 = 18 votes) .3
  3. 12xx x3x (6 votes) .1
  4. 1x1x 1xx (12 votes) .2
Shouldn't #3 and #4 reinforce each other somehow?

Also, shouldn't the amplification run both ways? If a complete annotation agrees with a partial, what does that tell us? An editor, who is pitching his advice to a general audience in what we assume is a minimally idiosyncratic way, says these milestone fingerings are the most important.

Given this, I am leaning back toward the simple summing approach I mentioned first, but amplifying by digit. So 1234 131 would get .189 + .324 + .162 = .675, and 1214 131 would get .162 + .324 = .486. This at least passes the smell test.

It strikes me that the sparseness of editorial scores may actually be a blessing. All of my trevails with edit distances seem moot in editorial scores, or at least rendered less pertinent. The editor implicitly tells us which specific annotations are most important and which are free to vary. Using Hamming distance here is less controversial: a digit either matches or it doesn't. This is clearly justified if we assume the editor is being explicit in the advice that actually appears.

Are we really not striving to model the behavior of a good editor and less that of a good pianist?

No comments:

Post a Comment