Skip to content
This repository was archived by the owner on Dec 13, 2024. It is now read-only.

Commit 065435b

Browse files
author
thisisaaronland
committed
update docs for query stuff
1 parent 759655d commit 065435b

File tree

2 files changed

+119
-13
lines changed

2 files changed

+119
-13
lines changed

README.md

Lines changed: 107 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ Usage of ./bin/wof-sqlite-index-features:
4141
-live-hard-die-fast
4242
Enable various performance-related pragmas at the expense of possible (unlikely) database corruption (default true)
4343
-mode string
44-
The mode to use importing data. Valid modes are: directory,featurecollection,file,filelist,geojsonl,metafile,repo,sqlite. (default "files")
44+
The mode to use importing data. Valid modes are: directory://, featurecollection://, file://, filelist://, geojsonl://, metafile://, repo://, sqlite://. (default "repo://")
4545
-names
4646
Index the 'names' table
4747
-optimize
@@ -76,11 +76,7 @@ $> ./bin/wof-sqlite-index-features \
7676
/usr/local/data/whosonfirst-data/meta/wof-microhood-latest.csv
7777
```
7878

79-
Note that the `-live-hard-die-fast` flag is enabled by default. That is to enable a number of performace-related PRAGMA commands (described [here](https://blog.devart.com/increasing-sqlite-performance.html) and [here](https://www.gaia-gis.it/gaia-sins/spatialite-cookbook/html/system.html)) without which database index can be prohibitive and time-consuming. These is a small but unlikely chance of database corruptions when this flag is enabled.
80-
81-
Also note that the `-live-hard-die-fast` flag will cause the `PAGE_SIZE` and `CACHE_SIZE` PRAGMAs to be set to `4096` and `1000000` respectively so the eventual cache size will require 4GB of memory. This is probably fine on most systems where you'll be indexing data but I am open to the idea that we may need to revisit those numbers or at least make them configurable.
82-
83-
...creating databases for all the Who's On First repos:
79+
Or creating databases for all the Who's On First repos:
8480

8581
```
8682
#!/bin/sh
@@ -107,6 +103,77 @@ do
107103
done
108104
```
109105

106+
#### Inline queries
107+
108+
You can also specify inline queries by passing a `-query` parameter which is a string in the format of:
109+
110+
```
111+
{PATH}={REGULAR EXPRESSION}
112+
```
113+
114+
Paths follow the dot notation syntax used by the [tidwall/gjson](https://github.com/tidwall/gjson) package and regular expressions are any valid [Go language regular expression](https://golang.org/pkg/regexp/). Successful path lookups will be treated as a list of candidates and each candidate's string value will be tested against the regular expression's [MatchString](https://golang.org/pkg/regexp/#Regexp.MatchString) method.
115+
116+
For example:
117+
118+
```
119+
$> ./bin/wof-sqlite-index-features \
120+
-all \
121+
-dsn ca-region.db \
122+
-query 'properties.wof:placetype=region' \
123+
-mode repo:// \
124+
/usr/local/data/whosonfirst-data-admin-ca
125+
126+
$> sqlite3 ca-region.db
127+
128+
SQLite version 3.28.0 2019-04-15 14:49:49
129+
Enter ".help" for usage hints.
130+
sqlite> SELECT id,name,placetype FROM spr;
131+
85682057|Ontario|region
132+
85682117|British Columbia|region
133+
85682065|New Brunswick|region
134+
85682123|Newfoundland and Labrador|region
135+
85682067|Northwest Territories|region
136+
85682075|Nova Scotia|region
137+
85682081|Prince Edward Island|region
138+
85682085|Manitoba|region
139+
85682091|Alberta|region
140+
85682095|Yukon|region
141+
85682113|Saskatchewan|region
142+
136251273|Quebec|region
143+
85682105|Nunavut|region
144+
sqlite>
145+
146+
```
147+
148+
You can pass multiple `-query` parameters. For example:
149+
150+
```
151+
$> ./bin/wof-sqlite-index-features \
152+
-all \
153+
-dsn ca-region.db \
154+
-query 'properties.wof:placetype=region' \
155+
-query 'properties.wof:name=(?i)new.*'
156+
-mode repo:// \
157+
/usr/local/data/whosonfirst-data-admin-ca
158+
159+
$> sqlite3 ca-region-new.db
160+
161+
SQLite version 3.28.0 2019-04-15 14:49:49
162+
Enter ".help" for usage hints.
163+
sqlite> SELECT id,name,placetype FROM spr;
164+
85682065|New Brunswick|region
165+
85682123|Newfoundland and Labrador|region
166+
sqlite>
167+
```
168+
169+
The default query mode is to ensure that all queries match but you can also specify that only one or more queries need to match by passing the `-query-mode ANY` flag:
170+
171+
#### SQLite performace-related PRAGMA
172+
173+
Note that the `-live-hard-die-fast` flag is enabled by default. That is to enable a number of performace-related PRAGMA commands (described [here](https://blog.devart.com/increasing-sqlite-performance.html) and [here](https://www.gaia-gis.it/gaia-sins/spatialite-cookbook/html/system.html)) without which database index can be prohibitive and time-consuming. These is a small but unlikely chance of database corruptions when this flag is enabled.
174+
175+
Also note that the `-live-hard-die-fast` flag will cause the `PAGE_SIZE` and `CACHE_SIZE` PRAGMAs to be set to `4096` and `1000000` respectively so the eventual cache size will require 4GB of memory. This is probably fine on most systems where you'll be indexing data but I am open to the idea that we may need to revisit those numbers or at least make them configurable.
176+
110177
### wof-sqlite-query-features
111178

112179
Query a search-enabled SQLite database by name(s). Results are output as CSV encoded rows containing `id` and `(wof:)name` properties.
@@ -153,10 +220,35 @@ Full-text search is supported using SQLite's FTS4 indexer. In order to index the
153220

154221
## Spatial indexes
155222

156-
Yes, if you have the [Spatialite extension](https://www.gaia-gis.it/fossil/libspatialite/index) installed and have indexed the `geometries` table. For example:
223+
### RTree
224+
225+
RTree indexes are available if SQLite has been compiled with the [R*Tree module](https://www.sqlite.org/rtree.html) and you have indexed the [rtree](https://github.com/whosonfirst/go-whosonfirst-sqlite-features#rtree), [spr](https://github.com/whosonfirst/go-whosonfirst-sqlite-features#spr) and [properties](https://github.com/whosonfirst/go-whosonfirst-sqlite-features#properties) tables. For example:
157226

158227
```
159-
$> ./bin/wof-sqlite-index-features -timings -spr -geometries -driver spatialite -mode repo:// -dsn test.db /usr/local/data/whosonfirst-data-constituency-ca/
228+
$> ./bin/wof-sqlite-index-features \
229+
-index-alt-files \
230+
-rtree \
231+
-spr \
232+
-properties \
233+
-timings \
234+
-dsn /usr/local/ca-alt.db \
235+
-mode repo:// \
236+
/usr/local/data/whosonfirst-data-admin-ca/
237+
```
238+
239+
### Spatialite
240+
241+
Spatial indexes are also available if you have the [Spatialite extension](https://www.gaia-gis.it/fossil/libspatialite/index) installed and have indexed the `geometries` table. For example:
242+
243+
```
244+
$> ./bin/wof-sqlite-index-features \
245+
-driver spatialite \
246+
-timings \
247+
-spr \
248+
-geometries \
249+
-mode repo:// \
250+
-dsn test.db /usr/local/data/whosonfirst-data-constituency-ca/
251+
160252
10:09:46.534281 [wof-sqlite-index-features] STATUS time to index geometries (87) : 21.251828704s
161253
10:09:46.534379 [wof-sqlite-index-features] STATUS time to index spr (87) : 3.206930799s
162254
10:09:46.534385 [wof-sqlite-index-features] STATUS time to index all (87) : 24.48004637s
@@ -204,8 +296,8 @@ Indexing time will vary depending on the specifics of your hardware (available R
204296

205297
```
206298
$> ./bin/wof-sqlite-index-features \
207-
-all \
208299
-driver spatialite \
300+
-all \
209301
-geometries \
210302
-dsn /usr/local/data/dist/sqlite/whosonfirst-data-latest.db \
211303
-timings \
@@ -228,7 +320,12 @@ $> ./bin/wof-sqlite-index-features \
228320
And without:
229321

230322
```
231-
$> ./bin/wof-sqlite-index-features -all -dsn /usr/local/data/dist/sqlite/whosonfirst-data-latest-nospatial.db -timings -mode repo:// /usr/local/data/whosonfirst-data
323+
$> ./bin/wof-sqlite-index-features \
324+
-all \
325+
-dsn /usr/local/data/dist/sqlite/whosonfirst-data-latest-nospatial.db \
326+
-timings \
327+
-mode repo:// \
328+
/usr/local/data/whosonfirst-data
232329
...time passes...
233330
10:06:13.226187 [wof-sqlite-index-features] STATUS time to index names (951541) : 12m32.359733539s
234331
10:06:13.226206 [wof-sqlite-index-features] STATUS time to index ancestors (951541) : 3m27.294843778s

cmd/wof-sqlite-index-features/main.go

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,13 +28,22 @@ import (
2828

2929
func main() {
3030

31-
valid_modes := strings.Join(wof_index.Modes(), ",")
32-
desc_modes := fmt.Sprintf("The mode to use importing data. Valid modes are: %s.", valid_modes)
31+
valid_modes := make([]string, 0)
32+
33+
for _, m := range wof_index.Modes() {
34+
35+
m = fmt.Sprintf("%s://", m)
36+
valid_modes = append(valid_modes, m)
37+
}
38+
39+
valid_modes_str := strings.Join(valid_modes, ", ")
40+
41+
desc_modes := fmt.Sprintf("The mode to use importing data. Valid modes are: %s.", valid_modes_str)
3342

3443
dsn := flag.String("dsn", ":memory:", "")
3544
driver := flag.String("driver", "sqlite3", "")
3645

37-
mode := flag.String("mode", "files", desc_modes)
46+
mode := flag.String("mode", "repo://", desc_modes)
3847

3948
all := flag.Bool("all", false, "Index all tables (except the 'search' and 'geometries' tables which you need to specify explicitly)")
4049
ancestors := flag.Bool("ancestors", false, "Index the 'ancestors' tables")

0 commit comments

Comments
 (0)