Database Design is Easy — Or Is It?

I came across a Request for Proposal from a government entity whose sole purpose was to conform to the data collection standards of a superior government entity.   The following entry is paraphrased for brevity.

Yes/No Questions. The “yes” response and the “no” response each has its own column in the spreadsheet tool.   To specify “yes” for a given question, the user should enter a “Y” in the given question’s “yes” column. To specify “no” for a given question, the user should enter an “N” in the given question’s “no” column.  One question on the form is an exception to this “standard” where there is only one column in the spreadsheet tool for this question, and users should enter a “Y” for “yes”  and “N” for “no”.

This was written by professionals who should be in a different profession.  I struggle with how to explain why this is fubar.  So here goes.

First of all, these are questions on a form, questions, mind you, with only two answers, “yes” or “no”.  Really.  How complicated can that be?  Apparently, very complicated.  If two responses (one in the Yes column and the other in the No column) are allowed, how will one know which response reflects the facts of the respondent?

The noted “exception” should actually be the “standard”.   Most likely, this was written by lawyers to allow for flexible interpretation of the law when enforcing compliance so as not to hurt anyone’s feelings.  Or maybe it exists to confuse the respondent, so they will give up filling out the form entirely to seek out other sources of assistance.  Whatever the reason, the person designing such forms should be in a different line of work because either their brain or their heart are not in it for the right reason.

Why is this important?  Say you wanted to report on the number of respondents who replied to a particular question.  Count the ones who respond “yes”.  Count the ones who respond “no”.  Simple, right?  OK.   Now calculate the percentage of respondents for each.  Well, if some put down “Y” in the Yes column AND “N” in the No column, the sum of the counts of each would be more than the total number of respondents.

Now, image you are a legislative leader and you are reading a compliance report based on these statistics in order to place a vote for a multi-billion dollar extension of the program.  You might see that 65% said “Yes” and 63% said “No” to a particular question on which you would like to base your decision.   Or maybe, the data analyst preparing the “required pie chart” report will recognize the data problem, and fudge the responses so the chart reflects HIS political preferences.   Democracy in action.

I have all the answers, its knowing the right the questions to ask that is the challenge.

For more examples of this kind of thing, please visit my favorite website, www.dbdebunk.com.