This repository contains the results for the Getting and Cleaning Data Course Project.
The following sections describe how the course project script works and provide a code book describing the variables of the output data.
The implementation of run_analysis.R uses the dplyr library along with base R functionality to manipulate the input data.
A base assumption made by the script is that the data contains no missing values and that the number of records in the measurements, label and subject files fit to each other. This was verified using R and the source data
Data is read using the read.table function. Column labels for the measurements are generated by filtering the features.txt file for all columns having mean() or std() in them. Additional substitions normalize the feature column names into a CamelCase style.
Descriptive labels based on the activity_labels.txt are joined to each loaded data set. The subject keys are read from the respective files and are appended to the data sets.
From the test data sets the files
- test/X_test.txt
- test/y_test.txt
- test/subject_test.txt
were loaded, combined and joined with the and activity labels.
From the training data sets the files
- test/X_train.txt
- test/y_train.txt
- test/subject_train.txt
were loaded, combined and joined with the and activity labels.
All names were transformed to apply to the widely used camel case notion starting with lowercase character. Especially regarding the long feature name I don't agree with using all lower case names without spaces as this does not contribute to readability.
The general transformation used for the mean and standard deviation features is to remove brackets, dashses and to convert into a camel case notation. For the output data set the mean prefix was added e.g.
tBodyAcc-mean()-X -> tBodyAccMeanX -> meanTBodyAccMeanX
Both test and training data sets have the same column layout and can be combined into one data frame using the union() function of R.
To generate the output from the tidied data set dplyr in combination with pipes is used.
Together with grouping by subject key and activity the summarise_each function is used to apply the mean function to all feature measurements columns. A subsequent update of the column names indicate that they are mean values.
The results are written out by the write.table(..) function to a file called subjectActivityAnalysis.txt
The output data file is named subjectActivityAnalysis.txt
The file contains 180 rows and 68 columns.
The first two columns are group columns over which the means were created. All remaining columns contain means for the features in question.
Each record contains the means for all features in question and a specific activity done by an individual subject.
There are 30 subjects in total which participated all in the six different activities.
The following table provides a list of all columns in the data file.
| Column Name | Type | Description |
|---|---|---|
| activityLabel | group column | The label of an activity that was done by a subject |
| subjectKey | group column | The id of a subject |
| meanTBodyAccMeanX | summary column | mean of tBodyAccMeanX feature |
| meanTBodyAccMeanY | summary column | mean of tBodyAccMeanY feature |
| meanTBodyAccMeanZ | summary column | mean of tBodyAccMeanZ feature |
| meanTBodyAccStdX | summary column | mean of tBodyAccStdX feature |
| meanTBodyAccStdY | summary column | mean of tBodyAccStdY feature |
| meanTBodyAccStdZ | summary column | mean of tBodyAccStdZ feature |
| meanTGravityAccMeanX | summary column | mean of tGravityAccMeanX feature |
| meanTGravityAccMeanY | summary column | mean of tGravityAccMeanY feature |
| meanTGravityAccMeanZ | summary column | mean of tGravityAccMeanZ feature |
| meanTGravityAccStdX | summary column | mean of tGravityAccStdX feature |
| meanTGravityAccStdY | summary column | mean of tGravityAccStdY feature |
| meanTGravityAccStdZ | summary column | mean of tGravityAccStdZ feature |
| meanTBodyAccJerkMeanX | summary column | mean of tBodyAccJerkMeanX feature |
| meanTBodyAccJerkMeanY | summary column | mean of tBodyAccJerkMeanY feature |
| meanTBodyAccJerkMeanZ | summary column | mean of tBodyAccJerkMeanZ feature |
| meanTBodyAccJerkStdX | summary column | mean of tBodyAccJerkStdX feature |
| meanTBodyAccJerkStdY | summary column | mean of tBodyAccJerkStdY feature |
| meanTBodyAccJerkStdZ | summary column | mean of tBodyAccJerkStdZ feature |
| meanTBodyGyroMeanX | summary column | mean of tBodyGyroMeanX feature |
| meanTBodyGyroMeanY | summary column | mean of tBodyGyroMeanY feature |
| meanTBodyGyroMeanZ | summary column | mean of tBodyGyroMeanZ feature |
| meanTBodyGyroStdX | summary column | mean of tBodyGyroStdX feature |
| meanTBodyGyroStdY | summary column | mean of tBodyGyroStdY feature |
| meanTBodyGyroStdZ | summary column | mean of tBodyGyroStdZ feature |
| meanTBodyGyroJerkMeanX | summary column | mean of tBodyGyroJerkMeanX feature |
| meanTBodyGyroJerkMeanY | summary column | mean of tBodyGyroJerkMeanY feature |
| meanTBodyGyroJerkMeanZ | summary column | mean of tBodyGyroJerkMeanZ feature |
| meanTBodyGyroJerkStdX | summary column | mean of tBodyGyroJerkStdX feature |
| meanTBodyGyroJerkStdY | summary column | mean of tBodyGyroJerkStdY feature |
| meanTBodyGyroJerkStdZ | summary column | mean of tBodyGyroJerkStdZ feature |
| meanTBodyAccMagMean | summary column | mean of tBodyAccMagMean feature |
| meanTBodyAccMagStd | summary column | mean of tBodyAccMagStd feature |
| meanTGravityAccMagMean | summary column | mean of tGravityAccMagMean feature |
| meanTGravityAccMagStd | summary column | mean of tGravityAccMagStd feature |
| meanTBodyAccJerkMagMean | summary column | mean of tBodyAccJerkMagMean feature |
| meanTBodyAccJerkMagStd | summary column | mean of tBodyAccJerkMagStd feature |
| meanTBodyGyroMagMean | summary column | mean of tBodyGyroMagMean feature |
| meanTBodyGyroMagStd | summary column | mean of tBodyGyroMagStd feature |
| meanTBodyGyroJerkMagMean | summary column | mean of tBodyGyroJerkMagMean feature |
| meanTBodyGyroJerkMagStd | summary column | mean of tBodyGyroJerkMagStd feature |
| meanFBodyAccMeanX | summary column | mean of fBodyAccMeanX feature |
| meanFBodyAccMeanY | summary column | mean of fBodyAccMeanY feature |
| meanFBodyAccMeanZ | summary column | mean of fBodyAccMeanZ feature |
| meanFBodyAccStdX | summary column | mean of fBodyAccStdX feature |
| meanFBodyAccStdY | summary column | mean of fBodyAccStdY feature |
| meanFBodyAccStdZ | summary column | mean of fBodyAccStdZ feature |
| meanFBodyAccJerkMeanX | summary column | mean of fBodyAccJerkMeanX feature |
| meanFBodyAccJerkMeanY | summary column | mean of fBodyAccJerkMeanY feature |
| meanFBodyAccJerkMeanZ | summary column | mean of fBodyAccJerkMeanZ feature |
| meanFBodyAccJerkStdX | summary column | mean of fBodyAccJerkStdX feature |
| meanFBodyAccJerkStdY | summary column | mean of fBodyAccJerkStdY feature |
| meanFBodyAccJerkStdZ | summary column | mean of fBodyAccJerkStdZ feature |
| meanFBodyGyroMeanX | summary column | mean of fBodyGyroMeanX feature |
| meanFBodyGyroMeanY | summary column | mean of fBodyGyroMeanY feature |
| meanFBodyGyroMeanZ | summary column | mean of fBodyGyroMeanZ feature |
| meanFBodyGyroStdX | summary column | mean of fBodyGyroStdX feature |
| meanFBodyGyroStdY | summary column | mean of fBodyGyroStdY feature |
| meanFBodyGyroStdZ | summary column | mean of fBodyGyroStdZ feature |
| meanFBodyAccMagMean | summary column | mean of fBodyAccMagMean feature |
| meanFBodyAccMagStd | summary column | mean of fBodyAccMagStd feature |
| meanFBodyBodyAccJerkMagMean | summary column | mean of fBodyBodyAccJerkMagMean feature |
| meanFBodyBodyAccJerkMagStd | summary column | mean of fBodyBodyAccJerkMagStd feature |
| meanFBodyBodyGyroMagMean | summary column | mean of fBodyBodyGyroMagMean feature |
| meanFBodyBodyGyroMagStd | summary column | mean of fBodyBodyGyroMagStd feature |
| meanFBodyBodyGyroJerkMagMean | summary column | mean of fBodyBodyGyroJerkMagMean feature |
| meanFBodyBodyGyroJerkMagStd | summary column | mean of fBodyBodyGyroJerkMagStd feature |