Skip to content
This repository was archived by the owner on Dec 13, 2024. It is now read-only.

Commit 9c7e030

Browse files
author
thisisaaronland
committed
make Go modules work; pull in newly pulled-apart sqlite/features/index packages
1 parent 742718e commit 9c7e030

File tree

13 files changed

+295
-222
lines changed

13 files changed

+295
-222
lines changed

README.md

Lines changed: 210 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,213 @@ Go package for indexing Who's On First features in SQLite databases.
55
## Important
66

77
This is work in progress and not ready for use yet.
8+
9+
## Tools
10+
11+
### wof-sqlite-index-features
12+
13+
```
14+
./bin/wof-sqlite-index-features -h
15+
Usage of ./bin/wof-sqlite-index-features:
16+
-all
17+
Index all tables (except the 'search' and 'geometries' tables which you need to specify explicitly)
18+
-ancestors
19+
Index the 'ancestors' tables
20+
-concordances
21+
Index the 'concordances' tables
22+
-driver string
23+
(default "sqlite3")
24+
-dsn string
25+
(default ":memory:")
26+
-geojson
27+
Index the 'geojson' table
28+
-geometries
29+
Index the 'geometries' table (requires that libspatialite already be installed)
30+
-live-hard-die-fast
31+
Enable various performance-related pragmas at the expense of possible (unlikely) database corruption
32+
-mode string
33+
The mode to use importing data. Valid modes are: directory,feature,feature-collection,files,geojson-ls,meta,path,repo,sqlite. (default "files")
34+
-names
35+
Index the 'names' table
36+
-processes int
37+
The number of concurrent processes to index data with (default 16)
38+
-search
39+
Index the 'search' table (using SQLite FTS4 full-text indexer)
40+
-spr
41+
Index the 'spr' table
42+
-timings
43+
Display timings during and after indexing
44+
```
45+
46+
For example:
47+
48+
```
49+
./bin/wof-sqlite-index-features -live-hard-die-fast -dsn microhoods.db -all -mode meta /usr/local/data/whosonfirst-data/meta/wof-microhood-latest.csv
50+
```
51+
52+
See the way we're passing a `-live-hard-die-fast` flag? That is to enable a number of performace-related PRAGMA commands (described [here](https://blog.devart.com/increasing-sqlite-performance.html) and [here](https://www.gaia-gis.it/gaia-sins/spatialite-cookbook/html/system.html)) without which database index can be prohibitive and time-consuming. These is a small but unlikely chance of database corruptions when this flag is enabled.
53+
54+
Also note that the `-live-hard-die-fast` flag will cause the `PAGE_SIZE` and `CACHE_SIZE` PRAGMAs to be set to `4096` and `1000000` respectively so the eventual cache size will require 4GB of memory. This is probably fine on most systems where you'll be indexing data but I am open to the idea that we may need to revisit those numbers or at least make them configurable.
55+
56+
You can also use `wof-sqlite-index-features` in combination with the [go-whosonfirst-api](https://github.com/whosonfirst/go-whosonfirst-api) `wof-api` tool and populate your SQLite database by piping API results on STDIN. For example, here's how you might index all the neighbourhoods in Montreal:
57+
58+
```
59+
/usr/local/bin/wof-api -param method=whosonfirst.places.getDescendants -param id=101736545 \
60+
-param placetype=neighbourhood -param api_key=mapzen-xxxxxx -geojson-ls | \
61+
/usr/local/bin/wof-sqlite-index-features -dsn neighbourhoods.db -all -mode geojson-ls STDIN
62+
```
63+
64+
Or creating databases for all the Who's On First repos:
65+
66+
```
67+
#!/bin/sh
68+
69+
for REPO in $@
70+
do
71+
72+
if [ ! -d ${REPO}/data ]
73+
then
74+
echo "${REPO} has no data directory"
75+
continue
76+
fi
77+
78+
FNAME=`basename ${REPO}`
79+
echo "make db for ${FNAME}"
80+
81+
if [ -f "/usr/local/data/whosonfirst-sqlite/${FNAME}.db" ]
82+
then
83+
rm /usr/local/data/whosonfirst-sqlite/${FNAME}.db
84+
fi
85+
86+
./bin/wof-sqlite-index-features -timings -live-hard-die-fast -all -dsn /usr/local/data/whosonfirst-sqlite/${FNAME}-latest.db -mode repo ${REPO}
87+
88+
done
89+
```
90+
91+
### wof-sqlite-query-features
92+
93+
Query a search-enabled SQLite database by name(s). Results are output as CSV encoded rows containing `id` and `(wof:)name` properties.
94+
95+
_This assumes you have created the database using the `wof-sqlite-index-features` tool with the `-search` paramter._
96+
97+
```
98+
./bin/wof-sqlite-query-features -h
99+
Usage of ./bin/wof-sqlite-query-features:
100+
-column string
101+
The 'names_*' column to query against. Valid columns are: names_all, names_preferred, names_variant, names_colloquial. (default "names_all")
102+
-driver string
103+
(default "sqlite3")
104+
-dsn string
105+
(default ":memory:")
106+
-is-ceased string
107+
A comma-separated list of valid existential flags (-1,0,1) to filter results according to whether or not they have been marked as ceased. Multiple flags are evaluated as a nested 'OR' query.
108+
-is-current string
109+
A comma-separated list of valid existential flags (-1,0,1) to filter results according to their 'mz:is_current' property. Multiple flags are evaluated as a nested 'OR' query.
110+
-is-deprecated string
111+
A comma-separated list of valid existential flags (-1,0,1) to filter results according to whether or not they have been marked as deprecated. Multiple flags are evaluated as a nested 'OR' query.
112+
-is-superseded string
113+
A comma-separated list of valid existential flags (-1,0,1) to filter results according to whether or not they have been marked as superseded. Multiple flags are evaluated as a nested 'OR' query.
114+
-output string
115+
A valid path to write (CSV) results to. If empty results are written to STDOUT.
116+
-table string
117+
The name of the SQLite table to query against. (default "search")
118+
```
119+
120+
For example:
121+
122+
```
123+
./bin/wof-sqlite-query-features -dsn test2.db JFK
124+
102534365,John F Kennedy Int'l Airport
125+
126+
./bin/wof-sqlite-query-features -dsn test2.db -column names_colloquial Paris
127+
85922583,San Francisco
128+
102027181,Shanghai
129+
102030585,Kolkata
130+
101751929,Tromsø
131+
```
132+
133+
Full-text search is supported using SQLite's FTS4 indexer. In order to index the `search` table you must explicitly pass the `-search` flag to the `wof-sqlite-index-features` command. It is _not_ included when you set the `-all` flag (which should probably be renamed to be `-common` but that's not the case today...) because it increases the overall indexing time by a non-trivial amount.
134+
135+
## Spatial indexes
136+
137+
Yes, if you have the [Spatialite extension](https://www.gaia-gis.it/fossil/libspatialite/index) installed and have indexed the `geometries` table. For example:
138+
139+
```
140+
> ./bin/wof-sqlite-index-features -timings -live-hard-die-fast -spr -geometries -driver spatialite -mode repo -dsn test.db /usr/local/data/whosonfirst-data-constituency-ca/
141+
10:09:46.534281 [wof-sqlite-index-features] STATUS time to index geometries (87) : 21.251828704s
142+
10:09:46.534379 [wof-sqlite-index-features] STATUS time to index spr (87) : 3.206930799s
143+
10:09:46.534385 [wof-sqlite-index-features] STATUS time to index all (87) : 24.48004637s
144+
145+
> sqlite3 test.db
146+
SQLite version 3.21.0 2017-10-24 18:55:49
147+
Enter ".help" for usage hints.
148+
149+
sqlite> SELECT load_extension('mod_spatialite.dylib');
150+
sqlite> SELECT s.id, s.name FROM spr s, geometries g WHERE ST_Intersects(g.geom, GeomFromText('POINT(-122.229137 49.450129)', 4326)) AND g.id = s.id;
151+
1108962831|Maple Ridge-Pitt Meadows
152+
```
153+
154+
Or:
155+
156+
```
157+
> spatialite whosonfirst-data-latest.db
158+
SpatiaLite version ..: 4.1.1 Supported Extensions:
159+
...spatialite chatter goes here...
160+
SQLite version 3.8.2 2013-12-06 14:53:30
161+
Enter ".help" for instructions
162+
Enter SQL statements terminated with a ";
163+
164+
spatialite> SELECT s.id, s.name FROM spr AS s, geometries AS g1, geometries AS g2 WHERE g1.id = 85834637 AND s.placetype = 'neighbourhood' AND g2.id = s.id AND ST_Touches(g1.geom, g2.geom) AND g2.ROWID IN (SELECT ROWID FROM SpatialIndex WHERE f_table_name = 'geometries' AND search_frame=g2.geom);
165+
102112179|La Lengua
166+
1108831803|Showplace Square
167+
168+
spatialite> SELECT s.id, s.name FROM spr AS s, geometries AS g1, geometries AS g2 WHERE g1.id != g2.id AND g1.id = 85865959 AND s.placetype = 'neighbourhood' AND s.is_current=1 AND g2.id = s.id AND (ST_Touches(g1.geom, g2.geom) OR ST_Intersects(g1.geom, g2.geom)) AND g2.ROWID IN (SELECT ROWID FROM SpatialIndex WHERE f_table_name = 'geometries' AND search_frame=g2.geom);
169+
1108831807|Fairmount
170+
85814471|Diamond Heights
171+
85869221|Eureka Valley
172+
173+
SELECT s.id, s.name, s.is_current FROM spr AS s, geometries AS g1, geometries AS g2 WHERE g1.id != g2.id AND g1.id = 102061079 AND s.placetype = 'neighbourhood' AND g2.id = s.id AND (ST_Touches(g1.geom, g2.geom) OR ST_Intersects(g1.geom, g2.geom)) AND g2.ROWID IN (SELECT ROWID FROM SpatialIndex WHERE f_table_name = 'geometries' AND search_frame=g2.geom);
174+
85892915|BoCoCa|0
175+
85869125|Boerum Hill|1
176+
420782915|Carroll Gardens|1
177+
85865587|Gowanus|1
178+
```
179+
180+
_Remember: When indexing geometries you will need to explcitly pass both the `-geometries` and `-driver spatialite` flags, even if you are already passing in the `-all` flag. This is so `-all` will continue to work as expected for people who don't have Spatialite installed on their computer._
181+
182+
## Indexing
183+
184+
Indexing time will vary depending on the specifics of your hardware (available RAM, CPU, disk I/O) but as a rule building indexes with the `geometries` table will take longer, and create a larger database, than doing so without. For example indexing the [whosonfirst-data](https://github.com/whosonfirst-data/whosonfirst-data) repository with spatial indexes:
185+
186+
```
187+
> ./bin/wof-sqlite-index-features -all -driver spatialite -geometries -dsn /usr/local/data/dist/sqlite/whosonfirst-data-latest.db -live-hard-die-fast -timings -mode repo /usr/local/data/whosonfirst-data
188+
...time passes...
189+
06:12:51.274132 [wof-sqlite-index-features] STATUS time to index geojson (951541) : 13m41.994217581s
190+
06:12:51.274158 [wof-sqlite-index-features] STATUS time to index spr (951541) : 13m0.21007633s
191+
06:12:51.274173 [wof-sqlite-index-features] STATUS time to index names (951541) : 17m50.759093941s
192+
06:12:51.274178 [wof-sqlite-index-features] STATUS time to index ancestors (951541) : 3m37.431723948s
193+
06:12:51.274182 [wof-sqlite-index-features] STATUS time to index concordances (951541) : 2m36.737857568s
194+
06:12:51.274187 [wof-sqlite-index-features] STATUS time to index geometries (951541) : 43m48.39054903s
195+
06:12:51.274192 [wof-sqlite-index-features] STATUS time to index all (951541) : 4h41m45.492361401s
196+
197+
> du -h /usr/local/data/dist/sqlite/whosonfirst-data-latest.db
198+
15G /usr/local/data/dist/sqlite/whosonfirst-data-latest.db
199+
```
200+
201+
And without:
202+
203+
```
204+
> ./bin/wof-sqlite-index-features -all -dsn /usr/local/data/dist/sqlite/whosonfirst-data-latest-nospatial.db -live-hard-die-fast -timings -mode repo /usr/local/data/whosonfirst-data
205+
...time passes...
206+
10:06:13.226187 [wof-sqlite-index-features] STATUS time to index names (951541) : 12m32.359733539s
207+
10:06:13.226206 [wof-sqlite-index-features] STATUS time to index ancestors (951541) : 3m27.294843778s
208+
10:06:13.226212 [wof-sqlite-index-features] STATUS time to index concordances (951541) : 2m5.947968206s
209+
10:06:13.226220 [wof-sqlite-index-features] STATUS time to index geojson (951541) : 10m11.355455209s
210+
10:06:13.226226 [wof-sqlite-index-features] STATUS time to index spr (951541) : 11m32.687081163s
211+
10:06:13.226233 [wof-sqlite-index-features] STATUS time to index all (951541) : 3h43m20.687783762s
212+
213+
> du -h /usr/local/data/dist/sqlite/whosonfirst-data-latest-nospatial.db
214+
12G /usr/local/data/dist/sqlite/whosonfirst-data-latest-nospatial.db
215+
```
216+
217+
As of this writing individual tables are indexed atomically. There may be some improvements to be made indexing tables in separate Go routines but my hunch is this will make SQLite sad and cause a lot of table lock errors. I don't need to be right about that, though...

go.mod

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,10 +2,11 @@ module github.com/whosonfirst/go-whosonfirst-sqlite-features-index
22

33
require (
44
github.com/whosonfirst/go-whosonfirst-geojson-v2 v0.10.2
5-
github.com/whosonfirst/go-whosonfirst-index v0.1.1
5+
github.com/whosonfirst/go-whosonfirst-index v0.1.2
66
github.com/whosonfirst/go-whosonfirst-log v0.1.0
7-
github.com/whosonfirst/go-whosonfirst-sqlite v0.0.2
8-
github.com/whosonfirst/go-whosonfirst-sqlite-features v0.1.0
7+
github.com/whosonfirst/go-whosonfirst-sqlite v0.1.0
8+
github.com/whosonfirst/go-whosonfirst-sqlite-features v0.2.0
9+
github.com/whosonfirst/go-whosonfirst-sqlite-index v0.0.1
910
github.com/whosonfirst/warning v0.1.0
1011
)
1112

go.sum

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,8 @@ github.com/whosonfirst/go-whosonfirst-hash v0.1.0 h1:FpnclPIb+8M1uhSXfl3z8nYcG/3
6565
github.com/whosonfirst/go-whosonfirst-hash v0.1.0/go.mod h1:1ZdCFZTnQt5bwnsj2daB9yHilKOKToVh+Tyj/Z8TbUk=
6666
github.com/whosonfirst/go-whosonfirst-index v0.1.1 h1:AV2dVzt0F9pAupbpsl4TpBxZLeOqL4jtKZKb1gAoJT0=
6767
github.com/whosonfirst/go-whosonfirst-index v0.1.1/go.mod h1:vgUaNF7Y7gFrqQ67UTkkMioXdGNUY4KpSlqhGy46wfg=
68+
github.com/whosonfirst/go-whosonfirst-index v0.1.2 h1:pm/NY4O21sN7PPrOsZNfuv7kX2/Yi+YzrH+piE2vNgA=
69+
github.com/whosonfirst/go-whosonfirst-index v0.1.2/go.mod h1:SfFN3GjmpS5TQK4mhvEcH+OfTSTWGLY8B6y48SmjLmQ=
6870
github.com/whosonfirst/go-whosonfirst-log v0.1.0 h1:mWYI5hn16uyeLxBmPsLSvYV4rQKK/cxGVhM+bC2ZoGc=
6971
github.com/whosonfirst/go-whosonfirst-log v0.1.0/go.mod h1:pmgBbxZSnjGVy2nsUJBBMcFagxwIKLlmRsW7ClkXmac=
7072
github.com/whosonfirst/go-whosonfirst-names v0.1.0 h1:uXop/DwQqH60uDBZvHCPg1yRSQLScbm6VZyqcaED2KE=
@@ -77,8 +79,14 @@ github.com/whosonfirst/go-whosonfirst-spr v0.1.0 h1:5qE629nCiucF2upy5NjPOEl9cFat
7779
github.com/whosonfirst/go-whosonfirst-spr v0.1.0/go.mod h1:R8GtEVz1GVSnwwOjzcoVUd172ZK26Q7hQSLI6SGG7lM=
7880
github.com/whosonfirst/go-whosonfirst-sqlite v0.0.2 h1:Xwscl5pHMaPzo74j7Dp9Io/H7z8HTmWTcE82AzTXIKs=
7981
github.com/whosonfirst/go-whosonfirst-sqlite v0.0.2/go.mod h1:JmSK+NaXOzmZJXkzOdy2mHwMJvAbUzKw//B3dVr98H0=
82+
github.com/whosonfirst/go-whosonfirst-sqlite v0.1.0 h1:Wx6DHzS8i/TNqOrVvmXqbpaYttvqlNZeSs/tXBUqFjI=
83+
github.com/whosonfirst/go-whosonfirst-sqlite v0.1.0/go.mod h1:mm4RnFLe1ydCn1sItwU+Jfy2SYTHNp2zMSZasmM3/1M=
8084
github.com/whosonfirst/go-whosonfirst-sqlite-features v0.1.0 h1:jWlWehPZKgtkOxl5bzH9lXrMBwmFLWG66HOba394vlM=
8185
github.com/whosonfirst/go-whosonfirst-sqlite-features v0.1.0/go.mod h1:CknYgZec4AQSSZ/7juJXQ8MTIRwQ1boE/tcClou17Xg=
86+
github.com/whosonfirst/go-whosonfirst-sqlite-features v0.2.0 h1:vXt4A15xNqiHSL50DqrhddO2ZJdgybNcfNHQ9c3LZtw=
87+
github.com/whosonfirst/go-whosonfirst-sqlite-features v0.2.0/go.mod h1:CknYgZec4AQSSZ/7juJXQ8MTIRwQ1boE/tcClou17Xg=
88+
github.com/whosonfirst/go-whosonfirst-sqlite-index v0.0.1 h1:/GuAgxIHn5vZwAEDA4XUSWe7rYWZW30I4HLQRTpBeJY=
89+
github.com/whosonfirst/go-whosonfirst-sqlite-index v0.0.1/go.mod h1:5fY1ikCDhe1u0umawonTOICPzv/2CgfFT9hTAen5nsI=
8290
github.com/whosonfirst/go-whosonfirst-uri v0.1.0 h1:JMlpam0x1hVrFBMTAPY3edIHz7azfMK8lLI2kM9BgbI=
8391
github.com/whosonfirst/go-whosonfirst-uri v0.1.0/go.mod h1:8eaDVcc4v+HHHEDaRbApdmhPwM4/JQllw2PktvZcPVs=
8492
github.com/whosonfirst/walk v0.0.0-20160802000000-c0a349674b73681a7272f5ce6ade8ea28055059f h1:hvKIIx2IuWmRtOdpDk29quD+t7GowpHZxz8bCfIGE58=

index.go

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ import (
77
"github.com/whosonfirst/go-whosonfirst-geojson-v2/feature"
88
wof_index "github.com/whosonfirst/go-whosonfirst-index"
99
"github.com/whosonfirst/go-whosonfirst-sqlite"
10-
sql_index "github.com/whosonfirst/go-whosonfirst-sqlite/index"
10+
sql_index "github.com/whosonfirst/go-whosonfirst-sqlite-index"
1111
"github.com/whosonfirst/warning"
1212
"io"
1313
"io/ioutil"

vendor/github.com/whosonfirst/go-whosonfirst-index/Makefile

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

vendor/github.com/whosonfirst/go-whosonfirst-index/go.mod

Lines changed: 1 addition & 7 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

vendor/github.com/whosonfirst/go-whosonfirst-index/go.sum

Lines changed: 3 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

vendor/github.com/whosonfirst/go-whosonfirst-sqlite/Makefile

Lines changed: 0 additions & 41 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

vendor/github.com/whosonfirst/go-whosonfirst-sqlite/go.mod

Lines changed: 11 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)