diff --git a/README.md b/README.md index 32d542d..562a26e 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@ # Label Maker ## Data Preparation for Satellite Machine Learning -The tool downloads [OpenStreetMap QA Tile]((https://osmlab.github.io/osm-qa-tiles/)) information and satellite imagery tiles and saves them as an [`.npz` file](https://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html) for use in Machine Learning training. +The tool downloads [OpenStreetMap QA Tile]((https://osmlab.github.io/osm-qa-tiles/)) information and satellite imagery tiles and saves them as an [`.npz` file](https://docs.scipy.org/doc/numpy/reference/generated/numpy.savez.html) for use in machine learning training. ![example classification image overlaid over satellite imagery](examples/images/classification.png) _satellite imagery from [Mapbox](https://www.mapbox.com/) and [Digital Globe](https://www.digitalglobe.com/)_ diff --git a/examples/README.md b/examples/README.md index a329845..2d397cc 100644 --- a/examples/README.md +++ b/examples/README.md @@ -5,6 +5,7 @@ - [Creating a Neural Network to Find Populated Areas in Tanzania](walkthrough-classification-aws.md): Build a classifier using Keras on AWS - [Creating a building classifier in Vietnam using MXNet and SageMaker](walkthrough-classification-mxnet-sagemaker.md): Build a classifier on AWS SageMaker - [A building detector with TensorFlow API](walkthrough-tensorflow-object-detection.md): Use the TensorFlow Object Detection API for detecting buildings in Mexico City. +- [Preparing data for `skynet-train`](skynet-train-data-prep.md) ## Example Nets diff --git a/examples/skynet-train-data-prep.md b/examples/skynet-train-data-prep.md new file mode 100644 index 0000000..94716ed --- /dev/null +++ b/examples/skynet-train-data-prep.md @@ -0,0 +1,19 @@ +# Using `label-maker` with `skynet-train` + +## Background + +[`skynet-data`](https://github.com/developmentseed/skynet-data/) is a tool developed specifically to prepare data for [`skynet-train`]((https://github.com/developmentseed/skynet-train/)), an implementation of [SegNet](http://mi.eng.cam.ac.uk/projects/segnet/). `skynet-data` predates `label-maker` and prepares data in a very similar way: download OpenStreetMap data and satellite imagery tiles for use in Machine Learning training. Eventually, `skynet-data` will be deprecated as most of it's functionality can be replicated using `label-maker`. + +## Preparing data + +`skynet-train` requires a few separate files specific to [`caffe`](https://github.com/BVLC/caffe). To create these files, we've created a [utility script](utils/skynet.py) to help connect `label-maker` with [`skynet-train`](https://github.com/developmentseed/skynet-train/). First, prepare segmentation labels and images with `label-maker` by running `download`, `labels`, and `images` from the command line, following instructions from the [other examples](README.md) or the [README](../README.md). Then, in your data folder (the script uses relative paths), run: + +```bash +python utils/segnet.py +``` + +This should create the files (`train.txt`, `val.txt`, and `label-stats.csv`) which are needed for running `skynet-train` + +## Training + +Now you can mount your data folder as shown in the [`skynet-train` instructions](https://github.com/developmentseed/skynet-train/#quick-start) and training should begin. diff --git a/examples/utils/skynet.py b/examples/utils/skynet.py new file mode 100644 index 0000000..10ffa62 --- /dev/null +++ b/examples/utils/skynet.py @@ -0,0 +1,59 @@ +from os import makedirs, path as op +from shutil import copytree +from collections import Counter +import csv + +import numpy as np +from PIL import Image + +# create a greyscale folder for class labelled images +greyscale_folder = op.join('labels', 'grayscale') +if not op.isdir(greyscale_folder): + makedirs(greyscale_folder) +labels = np.load('labels.npz') + +# write our numpy array labels to images +# remove empty labels because we don't download images for them +keys = labels.keys() +class_freq = Counter() +image_freq = Counter() +for key in keys: + label = labels[key] + if np.sum(label): + label_file = op.join(greyscale_folder, '{}.png'.format(key)) + img = Image.fromarray(label.astype(np.uint8)) + print('Writing {}'.format(label_file)) + img.save(label_file) + # get class frequencies + unique, counts = np.unique(label, return_counts=True) + freq = dict(zip(unique, counts)) + for k, v in freq.items(): + class_freq[k] += v + image_freq[k] += 1 + else: + keys.remove(key) + +# copy our tiles to a folder with a different name +copytree('tiles', 'images') + +# sample the file names and use those to create text files +np.random.shuffle(keys) +split_index = int(len(keys) * 0.8) + +with open('train.txt', 'w') as train: + for key in keys[:split_index]: + train.write('/data/images/{}.png /data/labels/grayscale/{}.png\n'.format(key, key)) + +with open('val.txt', 'w') as val: + for key in keys[split_index:]: + val.write('/data/images/{}.png /data/labels/grayscale/{}.png\n'.format(key, key)) + +# write a csv with class frequencies +freqs = [dict(label=k, frequency=v, image_count=image_freq[k]) for k, v in class_freq.items()] +with open('labels/label-stats.csv', 'w') as stats: + fieldnames = list(freqs[0].keys()) + writer = csv.DictWriter(stats, fieldnames=fieldnames) + + writer.writeheader() + for f in freqs: + writer.writerow(f)