Skip to content

SebastianStoll/GettingDataCourseProject

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Getting Data Course Project

Overview

This repository contains the results for the Getting and Cleaning Data Course Project.

The following sections describe how the course project script works and provide a code book describing the variables of the output data.

Script Implementation

The implementation of run_analysis.R uses the dplyr library along with base R functionality to manipulate the input data.

A base assumption made by the script is that the data contains no missing values and that the number of records in the measurements, label and subject files fit to each other. This was verified using R and the source data

Reading data

Data is read using the read.table function. Column labels for the measurements are generated by filtering the features.txt file for all columns having mean() or std() in them. Additional substitions normalize the feature column names into a CamelCase style.

Descriptive labels based on the activity_labels.txt are joined to each loaded data set. The subject keys are read from the respective files and are appended to the data sets.

From the test data sets the files

  • test/X_test.txt
  • test/y_test.txt
  • test/subject_test.txt

were loaded, combined and joined with the and activity labels.

From the training data sets the files

  • test/X_train.txt
  • test/y_train.txt
  • test/subject_train.txt

were loaded, combined and joined with the and activity labels.

Naming and naming transformations

All names were transformed to apply to the widely used camel case notion starting with lowercase character. Especially regarding the long feature name I don't agree with using all lower case names without spaces as this does not contribute to readability.

The general transformation used for the mean and standard deviation features is to remove brackets, dashses and to convert into a camel case notation. For the output data set the mean prefix was added e.g.

  tBodyAcc-mean()-X -> tBodyAccMeanX -> meanTBodyAccMeanX

Merging the test and training data sets

Both test and training data sets have the same column layout and can be combined into one data frame using the union() function of R.

Generating the subject activity analysis data

To generate the output from the tidied data set dplyr in combination with pipes is used.

Together with grouping by subject key and activity the summarise_each function is used to apply the mean function to all feature measurements columns. A subsequent update of the column names indicate that they are mean values.

Writing the results

The results are written out by the write.table(..) function to a file called subjectActivityAnalysis.txt

Code book

The output data file is named subjectActivityAnalysis.txt

The file contains 180 rows and 68 columns.

The first two columns are group columns over which the means were created. All remaining columns contain means for the features in question.

Each record contains the means for all features in question and a specific activity done by an individual subject.

There are 30 subjects in total which participated all in the six different activities.

The following table provides a list of all columns in the data file.

Column Name Type Description
activityLabel group column The label of an activity that was done by a subject
subjectKey group column The id of a subject
meanTBodyAccMeanX summary column mean of tBodyAccMeanX feature
meanTBodyAccMeanY summary column mean of tBodyAccMeanY feature
meanTBodyAccMeanZ summary column mean of tBodyAccMeanZ feature
meanTBodyAccStdX summary column mean of tBodyAccStdX feature
meanTBodyAccStdY summary column mean of tBodyAccStdY feature
meanTBodyAccStdZ summary column mean of tBodyAccStdZ feature
meanTGravityAccMeanX summary column mean of tGravityAccMeanX feature
meanTGravityAccMeanY summary column mean of tGravityAccMeanY feature
meanTGravityAccMeanZ summary column mean of tGravityAccMeanZ feature
meanTGravityAccStdX summary column mean of tGravityAccStdX feature
meanTGravityAccStdY summary column mean of tGravityAccStdY feature
meanTGravityAccStdZ summary column mean of tGravityAccStdZ feature
meanTBodyAccJerkMeanX summary column mean of tBodyAccJerkMeanX feature
meanTBodyAccJerkMeanY summary column mean of tBodyAccJerkMeanY feature
meanTBodyAccJerkMeanZ summary column mean of tBodyAccJerkMeanZ feature
meanTBodyAccJerkStdX summary column mean of tBodyAccJerkStdX feature
meanTBodyAccJerkStdY summary column mean of tBodyAccJerkStdY feature
meanTBodyAccJerkStdZ summary column mean of tBodyAccJerkStdZ feature
meanTBodyGyroMeanX summary column mean of tBodyGyroMeanX feature
meanTBodyGyroMeanY summary column mean of tBodyGyroMeanY feature
meanTBodyGyroMeanZ summary column mean of tBodyGyroMeanZ feature
meanTBodyGyroStdX summary column mean of tBodyGyroStdX feature
meanTBodyGyroStdY summary column mean of tBodyGyroStdY feature
meanTBodyGyroStdZ summary column mean of tBodyGyroStdZ feature
meanTBodyGyroJerkMeanX summary column mean of tBodyGyroJerkMeanX feature
meanTBodyGyroJerkMeanY summary column mean of tBodyGyroJerkMeanY feature
meanTBodyGyroJerkMeanZ summary column mean of tBodyGyroJerkMeanZ feature
meanTBodyGyroJerkStdX summary column mean of tBodyGyroJerkStdX feature
meanTBodyGyroJerkStdY summary column mean of tBodyGyroJerkStdY feature
meanTBodyGyroJerkStdZ summary column mean of tBodyGyroJerkStdZ feature
meanTBodyAccMagMean summary column mean of tBodyAccMagMean feature
meanTBodyAccMagStd summary column mean of tBodyAccMagStd feature
meanTGravityAccMagMean summary column mean of tGravityAccMagMean feature
meanTGravityAccMagStd summary column mean of tGravityAccMagStd feature
meanTBodyAccJerkMagMean summary column mean of tBodyAccJerkMagMean feature
meanTBodyAccJerkMagStd summary column mean of tBodyAccJerkMagStd feature
meanTBodyGyroMagMean summary column mean of tBodyGyroMagMean feature
meanTBodyGyroMagStd summary column mean of tBodyGyroMagStd feature
meanTBodyGyroJerkMagMean summary column mean of tBodyGyroJerkMagMean feature
meanTBodyGyroJerkMagStd summary column mean of tBodyGyroJerkMagStd feature
meanFBodyAccMeanX summary column mean of fBodyAccMeanX feature
meanFBodyAccMeanY summary column mean of fBodyAccMeanY feature
meanFBodyAccMeanZ summary column mean of fBodyAccMeanZ feature
meanFBodyAccStdX summary column mean of fBodyAccStdX feature
meanFBodyAccStdY summary column mean of fBodyAccStdY feature
meanFBodyAccStdZ summary column mean of fBodyAccStdZ feature
meanFBodyAccJerkMeanX summary column mean of fBodyAccJerkMeanX feature
meanFBodyAccJerkMeanY summary column mean of fBodyAccJerkMeanY feature
meanFBodyAccJerkMeanZ summary column mean of fBodyAccJerkMeanZ feature
meanFBodyAccJerkStdX summary column mean of fBodyAccJerkStdX feature
meanFBodyAccJerkStdY summary column mean of fBodyAccJerkStdY feature
meanFBodyAccJerkStdZ summary column mean of fBodyAccJerkStdZ feature
meanFBodyGyroMeanX summary column mean of fBodyGyroMeanX feature
meanFBodyGyroMeanY summary column mean of fBodyGyroMeanY feature
meanFBodyGyroMeanZ summary column mean of fBodyGyroMeanZ feature
meanFBodyGyroStdX summary column mean of fBodyGyroStdX feature
meanFBodyGyroStdY summary column mean of fBodyGyroStdY feature
meanFBodyGyroStdZ summary column mean of fBodyGyroStdZ feature
meanFBodyAccMagMean summary column mean of fBodyAccMagMean feature
meanFBodyAccMagStd summary column mean of fBodyAccMagStd feature
meanFBodyBodyAccJerkMagMean summary column mean of fBodyBodyAccJerkMagMean feature
meanFBodyBodyAccJerkMagStd summary column mean of fBodyBodyAccJerkMagStd feature
meanFBodyBodyGyroMagMean summary column mean of fBodyBodyGyroMagMean feature
meanFBodyBodyGyroMagStd summary column mean of fBodyBodyGyroMagStd feature
meanFBodyBodyGyroJerkMagMean summary column mean of fBodyBodyGyroJerkMagMean feature
meanFBodyBodyGyroJerkMagStd summary column mean of fBodyBodyGyroJerkMagStd feature

About

This repository contains the results for the Getting and Cleaning Data Course Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages