Wednesday, December 16, 2015

2015-12-17 status

Done

Data Collection

  • Closed out initial program advance in TEM.
  • Created a number of database views to make survey data easier to process.
  • Created 50 bar charts using R and LaTeX to describe distribution of answers to 50 survey questions.
  • Calculated "course-grained" mean agreement of Parncutt sequences in both Parncutt et al. and in our survey. (Think Fleiss's Kappa with each fingering representing a category, 191 annotators, one example to classify, and no need to worry about chance agreement. Anyway, it made sense to me while I was doing it.)
  • Split 191 complete responses into two datasets ("exploratory" and "validation") for out-of-sample testing in an effort to mitigate effects of data dredging if/when our quest for correlation runs amok.

     Doing

    1. Arranging with BDE for Alex to receive CS 398 credit next term for BowTIE work.
    2. Performing Chi Square analysis of exploratory dataset to correlate abbreviated Parncutt fingerings with gender, reach, age, Hanon usage, technical practice, preparation actions, injury, etc.
    3. Looking at how well selecting fingering a in Exercise A predicts selecting fingering b in Exercise B. That is, do people have common patterns of fingering preference?
    4. Analyzing "consensus" of finger choice that follows abbreviated Parncutt fingering. How arbitrary were these sequences? Can we identify more suitable sequences in our data?
    5. Doing more basic descriptive statistical analysis of survey data.
    6. Evaluating Tableau for easier (and richer) data visualization. (SQLite support missing for OS X. 

    Struggling

    • Can we legitimately treat fingerings as categories? 
    • How can we conflate the unpopular fingerings meaningfully? 
    • Not sure how out-of-sample testing will complicate contemplated ad hoc category definition.

    Data dredge

    As I am about to embark on my quest to find correlation in the data, I am chastened by fears of data dredging. So I propose we do randomized out-of-sample testing. Toward this end, we will split the data into two subsets of approximately equal size.

    The following query gives us the identifiers for subjects who completed both parts of the survey and for whom at least some fingering data were recorded:
    select response_id
    from well_known_subject s
    inner join
    (select distinct subject
    from finger where fingers != '') f
    on s.response_id = f.subject
    We save this query as the "complete_response_id" view. There are 191 such response_ids.
    So we load the "exploratory_response_id" table like so:
    insert into exploratory_response_id
    select response_id
    from complete_response_id
    order by random()
    limit 96
    The 95 response_ids  not included in this table are stored in the "validation_response_id" view:
    select c.response_id
    from complete_response_id c
    where not exists (select response_
    from exploratory_response_id e
    where e.response_id = c.response_id)
    The actual (scrubbed) profile data will remain in the "subject" table. We will create views to provide access to the appropriate data ("exploratory_subject" and "validation_subject"), which will leverage the subject_latexable view of the subject data to use camel-case column names. This makes it unnecessary to remap the column names in R.

    Wednesday, December 2, 2015

    2015-12-02 status

    Done

    Data Collection

    • Paid lottery winner for Survey II and notified UPAY1099 through PEAR of two payments to winners. Received confirmation of receipt from Amazon. 
    • Loaded data from Survey II to SQLite. 
    • Developed workflow to create charts in R and render them in LaTeX. 
    • Did some manual cleanup for "well-known subjects" (people who completed both parts of the survey), on whom we will focus our analysis. (In most conservative interpretation of the protocol, subjects who walk away at any point are discarded, except for two subjects from whom I obtained permission to retain their nearly complete submissions. I manually set the "finished_2" field in the database, so they would be included.) There are 199 such well-known subjects, of whom we have actual fingering data for 191. 

    BowTIE

    • Met with Jackson and helped get his Android "Hello World" working. 
    • Arranged for Alex to come to my meeting with BDE this week to discuss CS 398 credit next term. 

    Doing

    • Relearning R and LaTeX. 
    • Doing basic descriptive statistical analysis of survey data.

    Struggling

    • Upgrade to El Capitan at your peril. Root is no longer root in Mac Land. Broke Perl and LaTeX environment. 
    • Working with Sherice to close out $200 cash advance.