Problem Statement/Goal/Background

This repo stores code supporting my work for the capstone project of the Udacity Data Scientist nanodegree program.

Problem Statement/Goal/Background

The goal of the project is develop a convolutional neutral network (CNN) to detect the breed of a dog based on an image containing a dog. The code will take in an image, determine if a dog - or a person - is in the image, and then provide a prediction of the breed that the dog - or the person! - most resembles.

The Jupyter notebook contains the required code and Q&A for the project itself. When the notebook runs, it will build and save a CNN model (via bottleneck) that is then utilized a web app, where a user can upload an arbitrary image and obtain a breed prediction. There is also an HTML rendering of the executed notebook, for reference.

The data for the project - the training/validation/testing images - have been provided by Udacity. Links to them can be found here. The Udacity repo also contains the template for the proejct notebook (dog_app.ipynb) which is built upon in this author's repo.

The model created by the author is a Keras Sequenctial moodel: a convolutional neural network that is built up sequentially (layer by layer). The author experiment with the design of this model, including the number of layers and the parameters for each later, using accuracy - the "hit rate" of the class predictions against the true/known values - as the key metric for training the model and evaluating the results.

Requirements

The code runs in Python, and a environment.yml file is provided to allow the creation of an Anaconda environment containing all the libraries used by the author.
- Note: the author used an Apple M3 laptop that requries the Apple channel for TensorFlow/Keras support.
- If the user is not using an Apple silicon platform, remove the 'Apple' channel at the top of the YAML file and remove the 'tensorflow-macos' and 'tensorflow-metal' lines in the pip section at the bottom (and install TensorFlow/Keras as necessary for your platform).
- If not using conda, you will need to at least install numpy, tensorflow, scikit-learn, plotly,flask, matplotlib and opencv (which may be via opencv-python-headless or similar, depending on your platform).
- Note: I uploaded this code to the Udacity virtual Jupyer virtual instance, but got repeated kernel crashes while pre-processing the image data. This is why I moved the code to Github.
To create the environment, run "conda env create -f environment.yml".
Some data files are required, but too large to store in Github.
- If you are on MacOS or Linux, you can run the download_externel_data.sh script to download the required external data, e.g. "sh ./download_external_data.sh" from the command line. This will take several minutes!
- Otherwise, you can do this manually:
  - Create a directory in the root of the repo called 'dogImages', download this and extract it in that directory.
  - Create a directory in the root of the repo called 'lfw', downlaod this and extract it in that directory.
  - Create a directory in the root of the repo called 'bottleneck_features', and download this and this into that directory.
The trained model is stored in the repo in the saveed_models directory. The web app will load the model from there. Alternatively, rRun the dog_app.ipnb file in its entirety, which will re-save the model in that location.

Running the web app

From the app directory, run "python app.py" to launch the web app.
Allow a few seconds for the model to load, and then click the URL the terminal shows to access the webapp.
From there, you can upload an arbitrary image: several are located in the 'downloaded_test_images' directory of this repo (4 of dogs, 2 of humans, and one with neither).

Acknowledgements

The author's experience with CNNs was limited prior to working on this project, and several sources were VERY helpful in working through the various sections of code:

DataCamp
AnalyticsVidhya
TensorFlow Documentation
GeeksForGeeks
TowardsDataScience
Microsoft's Copilot AI: While I avoided having AI tools generate much code for me, Copilot was helpful in making some boilerplate suggestions that reduced the amount of time I spent figuring out how to build the web app in the way I desired (e.g. the interactions between Python and Flask/Bootstrap/Plotly).

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
app		app
downloaded_test_images		downloaded_test_images
haarcascades		haarcascades
saved_models/justin_model		saved_models/justin_model
README.md		README.md
dog_app.html		dog_app.html
dog_app.ipynb		dog_app.ipynb
download_external_data.sh		download_external_data.sh
environment.yml		environment.yml
extract_bottleneck_features.py		extract_bottleneck_features.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Problem Statement/Goal/Background

Requirements

Running the web app

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

justinclarkhome/UdacityDataScienceProject4

Folders and files

Latest commit

History

Repository files navigation

Problem Statement/Goal/Background

Requirements

Running the web app

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages