Statistical Tools

After viewing the initial survey results, the data is usually exported to a statistical analysis application. Most researchers choose to use Microsoft Excel for their initial data exploration, and some will then use a more robust statistical tool, such as SPSS or SAS for more complicated analyses. Since ArchMiner is intended as an exploratory tool, it will most likely replace that aspect of the researchers use of Excel. For this reason, we chose to look at how researchers currently use Excel to complete their exploratory tasks.

Most researchers use Microsoft Excel to create pivot tables and multi-question charts. One researcher commented that she will try to make charts for each relationship she thinks could be of meaningful. This can result in a large number of charts, with only a few yielding interesting results.

The researchers we interviewed also mentioned that histograms are the most common type of chart they create, but this is not available through the chart wizard in Excel because the data must be manually summarized before such a chart can be generated. This adds further complexity to the process. It was also reported that while pivot tables were useful, they were difficult to create, and even more difficult to make changes.

Since Excel is a fairly flexible tool, there are many options a user must understand in creating charts. This makes it is easy to create images that are meaningless or misleading, such as line charts showing ordinal data or pie charts that are not parts of a whole. By forcing the user to consciously choose the type of chart that is appropriate for their data, the application is adding yet another step to the process of chart creation.

Overall, the power of Excel is very useful for statistical analysis once the user has found relationships she would like to explore, but this flexibility impedes the exploration process by making the chart creation process too complicated.

While the process of creating these charts is time consuming, the raw data exported from the database is in such a format as to make it even more complicated.

The actual question text is not exported with the data, and has only recently been added to database. This means that the user must look at the survey and map page number, question number and score to the questions and specific responses they are interested in. While having the raw data in the form is necessary for statistical analysis, there is a significant gulf of execution between the amount of work needed to produce many charts for exploration and the amount of useful information it provides.