You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Dec 13, 2024. It is now read-only.
Copy file name to clipboardExpand all lines: README.md
+210Lines changed: 210 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,3 +5,213 @@ Go package for indexing Who's On First features in SQLite databases.
5
5
## Important
6
6
7
7
This is work in progress and not ready for use yet.
8
+
9
+
## Tools
10
+
11
+
### wof-sqlite-index-features
12
+
13
+
```
14
+
./bin/wof-sqlite-index-features -h
15
+
Usage of ./bin/wof-sqlite-index-features:
16
+
-all
17
+
Index all tables (except the 'search' and 'geometries' tables which you need to specify explicitly)
18
+
-ancestors
19
+
Index the 'ancestors' tables
20
+
-concordances
21
+
Index the 'concordances' tables
22
+
-driver string
23
+
(default "sqlite3")
24
+
-dsn string
25
+
(default ":memory:")
26
+
-geojson
27
+
Index the 'geojson' table
28
+
-geometries
29
+
Index the 'geometries' table (requires that libspatialite already be installed)
30
+
-live-hard-die-fast
31
+
Enable various performance-related pragmas at the expense of possible (unlikely) database corruption
32
+
-mode string
33
+
The mode to use importing data. Valid modes are: directory,feature,feature-collection,files,geojson-ls,meta,path,repo,sqlite. (default "files")
34
+
-names
35
+
Index the 'names' table
36
+
-processes int
37
+
The number of concurrent processes to index data with (default 16)
38
+
-search
39
+
Index the 'search' table (using SQLite FTS4 full-text indexer)
40
+
-spr
41
+
Index the 'spr' table
42
+
-timings
43
+
Display timings during and after indexing
44
+
```
45
+
46
+
For example:
47
+
48
+
```
49
+
./bin/wof-sqlite-index-features -live-hard-die-fast -dsn microhoods.db -all -mode meta /usr/local/data/whosonfirst-data/meta/wof-microhood-latest.csv
50
+
```
51
+
52
+
See the way we're passing a `-live-hard-die-fast` flag? That is to enable a number of performace-related PRAGMA commands (described [here](https://blog.devart.com/increasing-sqlite-performance.html) and [here](https://www.gaia-gis.it/gaia-sins/spatialite-cookbook/html/system.html)) without which database index can be prohibitive and time-consuming. These is a small but unlikely chance of database corruptions when this flag is enabled.
53
+
54
+
Also note that the `-live-hard-die-fast` flag will cause the `PAGE_SIZE` and `CACHE_SIZE` PRAGMAs to be set to `4096` and `1000000` respectively so the eventual cache size will require 4GB of memory. This is probably fine on most systems where you'll be indexing data but I am open to the idea that we may need to revisit those numbers or at least make them configurable.
55
+
56
+
You can also use `wof-sqlite-index-features` in combination with the [go-whosonfirst-api](https://github.com/whosonfirst/go-whosonfirst-api)`wof-api` tool and populate your SQLite database by piping API results on STDIN. For example, here's how you might index all the neighbourhoods in Montreal:
Query a search-enabled SQLite database by name(s). Results are output as CSV encoded rows containing `id` and `(wof:)name` properties.
94
+
95
+
_This assumes you have created the database using the `wof-sqlite-index-features` tool with the `-search` paramter._
96
+
97
+
```
98
+
./bin/wof-sqlite-query-features -h
99
+
Usage of ./bin/wof-sqlite-query-features:
100
+
-column string
101
+
The 'names_*' column to query against. Valid columns are: names_all, names_preferred, names_variant, names_colloquial. (default "names_all")
102
+
-driver string
103
+
(default "sqlite3")
104
+
-dsn string
105
+
(default ":memory:")
106
+
-is-ceased string
107
+
A comma-separated list of valid existential flags (-1,0,1) to filter results according to whether or not they have been marked as ceased. Multiple flags are evaluated as a nested 'OR' query.
108
+
-is-current string
109
+
A comma-separated list of valid existential flags (-1,0,1) to filter results according to their 'mz:is_current' property. Multiple flags are evaluated as a nested 'OR' query.
110
+
-is-deprecated string
111
+
A comma-separated list of valid existential flags (-1,0,1) to filter results according to whether or not they have been marked as deprecated. Multiple flags are evaluated as a nested 'OR' query.
112
+
-is-superseded string
113
+
A comma-separated list of valid existential flags (-1,0,1) to filter results according to whether or not they have been marked as superseded. Multiple flags are evaluated as a nested 'OR' query.
114
+
-output string
115
+
A valid path to write (CSV) results to. If empty results are written to STDOUT.
116
+
-table string
117
+
The name of the SQLite table to query against. (default "search")
118
+
```
119
+
120
+
For example:
121
+
122
+
```
123
+
./bin/wof-sqlite-query-features -dsn test2.db JFK
124
+
102534365,John F Kennedy Int'l Airport
125
+
126
+
./bin/wof-sqlite-query-features -dsn test2.db -column names_colloquial Paris
127
+
85922583,San Francisco
128
+
102027181,Shanghai
129
+
102030585,Kolkata
130
+
101751929,Tromsø
131
+
```
132
+
133
+
Full-text search is supported using SQLite's FTS4 indexer. In order to index the `search` table you must explicitly pass the `-search` flag to the `wof-sqlite-index-features` command. It is _not_ included when you set the `-all` flag (which should probably be renamed to be `-common` but that's not the case today...) because it increases the overall indexing time by a non-trivial amount.
134
+
135
+
## Spatial indexes
136
+
137
+
Yes, if you have the [Spatialite extension](https://www.gaia-gis.it/fossil/libspatialite/index) installed and have indexed the `geometries` table. For example:
sqlite> SELECT s.id, s.name FROM spr s, geometries g WHERE ST_Intersects(g.geom, GeomFromText('POINT(-122.229137 49.450129)', 4326)) AND g.id = s.id;
151
+
1108962831|Maple Ridge-Pitt Meadows
152
+
```
153
+
154
+
Or:
155
+
156
+
```
157
+
> spatialite whosonfirst-data-latest.db
158
+
SpatiaLite version ..: 4.1.1 Supported Extensions:
159
+
...spatialite chatter goes here...
160
+
SQLite version 3.8.2 2013-12-06 14:53:30
161
+
Enter ".help" for instructions
162
+
Enter SQL statements terminated with a ";
163
+
164
+
spatialite> SELECT s.id, s.name FROM spr AS s, geometries AS g1, geometries AS g2 WHERE g1.id = 85834637 AND s.placetype = 'neighbourhood' AND g2.id = s.id AND ST_Touches(g1.geom, g2.geom) AND g2.ROWID IN (SELECT ROWID FROM SpatialIndex WHERE f_table_name = 'geometries' AND search_frame=g2.geom);
165
+
102112179|La Lengua
166
+
1108831803|Showplace Square
167
+
168
+
spatialite> SELECT s.id, s.name FROM spr AS s, geometries AS g1, geometries AS g2 WHERE g1.id != g2.id AND g1.id = 85865959 AND s.placetype = 'neighbourhood' AND s.is_current=1 AND g2.id = s.id AND (ST_Touches(g1.geom, g2.geom) OR ST_Intersects(g1.geom, g2.geom)) AND g2.ROWID IN (SELECT ROWID FROM SpatialIndex WHERE f_table_name = 'geometries' AND search_frame=g2.geom);
169
+
1108831807|Fairmount
170
+
85814471|Diamond Heights
171
+
85869221|Eureka Valley
172
+
173
+
SELECT s.id, s.name, s.is_current FROM spr AS s, geometries AS g1, geometries AS g2 WHERE g1.id != g2.id AND g1.id = 102061079 AND s.placetype = 'neighbourhood' AND g2.id = s.id AND (ST_Touches(g1.geom, g2.geom) OR ST_Intersects(g1.geom, g2.geom)) AND g2.ROWID IN (SELECT ROWID FROM SpatialIndex WHERE f_table_name = 'geometries' AND search_frame=g2.geom);
174
+
85892915|BoCoCa|0
175
+
85869125|Boerum Hill|1
176
+
420782915|Carroll Gardens|1
177
+
85865587|Gowanus|1
178
+
```
179
+
180
+
_Remember: When indexing geometries you will need to explcitly pass both the `-geometries` and `-driver spatialite` flags, even if you are already passing in the `-all` flag. This is so `-all` will continue to work as expected for people who don't have Spatialite installed on their computer._
181
+
182
+
## Indexing
183
+
184
+
Indexing time will vary depending on the specifics of your hardware (available RAM, CPU, disk I/O) but as a rule building indexes with the `geometries` table will take longer, and create a larger database, than doing so without. For example indexing the [whosonfirst-data](https://github.com/whosonfirst-data/whosonfirst-data) repository with spatial indexes:
As of this writing individual tables are indexed atomically. There may be some improvements to be made indexing tables in separate Go routines but my hunch is this will make SQLite sad and cause a lot of table lock errors. I don't need to be right about that, though...
0 commit comments