Skip to content

Commit 6b6a885

Browse files
Distributed cache with redis (#135)
* refine abstractions for hexagonal architecture of caches * implement redis as a cache (#6) * add timeout to each redis interaction * Do not fail on missing entry from cache (#10) * feat(metrics): monitor cache hit and miss after awaiting concurrently executed queries (#12) * bump storage version * doc: upadate README * chore(README): Vertamedia to Contentsquare Co-authored-by: Francois Milhem <[email protected]>
1 parent 714292d commit 6b6a885

40 files changed

+2150
-1354
lines changed

Dockerfile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
FROM golang:1.13-alpine AS build
1+
FROM golang:1.16-alpine AS build
22

33
RUN apk add --update zstd-static zstd-dev make gcc musl-dev git
44
RUN go get golang.org/x/lint/golint

Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,8 +39,8 @@ clean:
3939
release-build:
4040
@echo "Ver: $(BUILD_TAG), OPTS: $(BUILD_OPTS)"
4141
GOOS=linux GOARCH=amd64 go build $(BUILD_OPTS)
42-
rm chproxy-linux-amd64-*.tar.gz
4342
tar czf chproxy-linux-amd64-$(BUILD_TAG).tar.gz chproxy
43+
rm chproxy-linux-amd64-*.tar.gz
4444

4545
release: format lint test clean release-build
4646
@echo "Ver: $(BUILD_TAG), OPTS: $(BUILD_OPTS)"

README.md

Lines changed: 69 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
1-
[![Go Report Card](https://goreportcard.com/badge/github.com/Vertamedia/chproxy)](https://goreportcard.com/report/github.com/Vertamedia/chproxy)
2-
[![Build Status](https://travis-ci.org/Vertamedia/chproxy.svg?branch=master)](https://travis-ci.org/Vertamedia/chproxy?branch=master)
3-
[![Coverage](https://img.shields.io/badge/gocover.io-75.7%25-green.svg)](http://gocover.io/github.com/Vertamedia/chproxy?version=1.9)
1+
[![Go Report Card](https://goreportcard.com/badge/github.com/ContentSquare/chproxy)](https://goreportcard.com/report/github.com/ContentSquare/chproxy)
2+
[![Build Status](https://travis-ci.org/ContentSquare/chproxy.svg?branch=master)](https://travis-ci.org/ContentSquare/chproxy?branch=master)
3+
[![Coverage](https://img.shields.io/badge/gocover.io-75.7%25-green.svg)](http://gocover.io/github.com/ContentSquare/chproxy?version=1.9)
44

55
# chproxy
66

@@ -29,7 +29,7 @@ Chproxy, is an http proxy and load balancer for [ClickHouse](https://ClickHouse.
2929
- Exposes various useful [metrics](#metrics) in [prometheus text format](https://prometheus.io/docs/instrumenting/exposition_formats/).
3030
- Configuration may be updated without restart - just send `SIGHUP` signal to `chproxy` process.
3131
- Easy to manage and run - just pass config file path to a single `chproxy` binary.
32-
- Easy to [configure](https://github.com/Vertamedia/chproxy/blob/master/config/examples/simple.yml):
32+
- Easy to [configure](https://github.com/ContentSquare/chproxy/blob/master/config/examples/simple.yml):
3333
```yml
3434
server:
3535
http:
@@ -52,7 +52,7 @@ clusters:
5252

5353
### Precompiled binaries
5454

55-
Precompiled `chproxy` binaries are available [here](https://github.com/Vertamedia/chproxy/releases).
55+
Precompiled `chproxy` binaries are available [here](https://github.com/ContentSquare/chproxy/releases).
5656
Just download the latest stable binary, unpack and run it with the desired [config](#configuration):
5757

5858
```
@@ -64,7 +64,7 @@ Just download the latest stable binary, unpack and run it with the desired [conf
6464
Chproxy is written in [Go](https://golang.org/). The easiest way to install it from sources is:
6565

6666
```
67-
go get -u github.com/Vertamedia/chproxy
67+
go get -u github.com/ContentSquare/chproxy
6868
```
6969

7070
If you don't have Go installed on your system - follow [this guide](https://golang.org/doc/install).
@@ -89,7 +89,7 @@ All the `INSERT`s may be routed to a [distributed table](http://clickhouse-docs.
8989

9090
It would be better to spread `INSERT`s among available shards and to route them directly to per-shard tables instead of distributed tables. The routing logic may be embedded either directly into applications generating `INSERT`s or may be moved to a proxy. Proxy approach is better since it allows re-configuring `ClickHouse` cluster without modification of application configs and without application downtime. Multiple identical proxies may be started on distinct servers for scalability and availability purposes.
9191

92-
The following minimal `chproxy` config may be used for [this use case](https://github.com/Vertamedia/chproxy/blob/master/config/examples/spread.inserts.yml):
92+
The following minimal `chproxy` config may be used for [this use case](https://github.com/ContentSquare/chproxy/blob/master/config/examples/spread.inserts.yml):
9393
```yml
9494
server:
9595
http:
@@ -127,7 +127,7 @@ All the `SELECT`s may be routed to a [distributed table](http://clickhouse-docs.
127127

128128
It would be better to create identical distributed tables on each shard and spread `SELECT`s among all the available shards.
129129

130-
The following minimal `chproxy` config may be used for [this use case](https://github.com/Vertamedia/chproxy/blob/master/config/examples/spread.selects.yml):
130+
The following minimal `chproxy` config may be used for [this use case](https://github.com/ContentSquare/chproxy/blob/master/config/examples/spread.selects.yml):
131131
```yml
132132
server:
133133
http:
@@ -157,10 +157,10 @@ clusters:
157157
### Authorize users by passwords via HTTPS
158158
159159
Suppose you need to access `ClickHouse` cluster from anywhere by username/password.
160-
This may be used for building graphs from [ClickHouse-grafana](https://github.com/Vertamedia/ClickHouse-grafana) or [tabix](https://tabix.io/).
160+
This may be used for building graphs from [ClickHouse-grafana](https://github.com/ContentSquare/ClickHouse-grafana) or [tabix](https://tabix.io/).
161161
It is bad idea to transfer unencrypted password and data over untrusted networks.
162162
So HTTPS must be used for accessing the cluster in such cases.
163-
The following `chproxy` config may be used for [this use case](https://github.com/Vertamedia/chproxy/blob/master/config/examples/https.yml):
163+
The following `chproxy` config may be used for [this use case](https://github.com/ContentSquare/chproxy/blob/master/config/examples/https.yml):
164164
```yml
165165
server:
166166
https:
@@ -206,16 +206,18 @@ clusters:
206206

207207
caches:
208208
- name: "shortterm"
209-
dir: "/path/to/cache/dir"
210-
max_size: 150Mb
209+
mode: "file_system"
210+
file_system:
211+
dir: "/path/to/cache/dir"
212+
max_size: 150Mb
211213

212214
# Cached responses will expire in 130s.
213215
expire: 130s
214216
```
215217
216218
### All the above configs combined
217219
218-
All the above cases may be combined in a single `chproxy` [config](https://github.com/Vertamedia/chproxy/blob/master/config/examples/combined.yml):
220+
All the above cases may be combined in a single `chproxy` [config](https://github.com/ContentSquare/chproxy/blob/master/config/examples/combined.yml):
219221

220222
```yml
221223
server:
@@ -278,17 +280,19 @@ clusters:
278280
279281
caches:
280282
- name: "shortterm"
281-
dir: "/path/to/cache/dir"
282-
max_size: 150Mb
283+
mode: "file_system"
284+
file_system:
285+
dir: "/path/to/cache/dir"
286+
max_size: 150Mb
283287
expire: 130s
284288
```
285289

286290
## Configuration
287291

288292
### Server
289-
`Chproxy` may accept requests over `HTTP` and `HTTPS` protocols. [HTTPS](https://github.com/Vertamedia/chproxy/blob/master/config#https_config) must be configured with custom certificate or with automated [Let's Encrypt](https://letsencrypt.org/) certificates.
293+
`Chproxy` may accept requests over `HTTP` and `HTTPS` protocols. [HTTPS](https://github.com/ContentSquare/chproxy/blob/master/config#https_config) must be configured with custom certificate or with automated [Let's Encrypt](https://letsencrypt.org/) certificates.
290294

291-
Access to `chproxy` can be limitied by list of IPs or IP masks. This option can be applied to [HTTP](https://github.com/Vertamedia/chproxy/blob/master/config#http_config), [HTTPS](https://github.com/Vertamedia/chproxy/blob/master/config#https_config), [metrics](https://github.com/Vertamedia/chproxy/blob/master/config#metrics_config), [user](https://github.com/Vertamedia/chproxy/blob/master/config#user_config) or [cluster-user](https://github.com/Vertamedia/chproxy/blob/master/config#cluster_user_config).
295+
Access to `chproxy` can be limitied by list of IPs or IP masks. This option can be applied to [HTTP](https://github.com/ContentSquare/chproxy/blob/master/config#http_config), [HTTPS](https://github.com/ContentSquare/chproxy/blob/master/config#https_config), [metrics](https://github.com/ContentSquare/chproxy/blob/master/config#metrics_config), [user](https://github.com/ContentSquare/chproxy/blob/master/config#user_config) or [cluster-user](https://github.com/ContentSquare/chproxy/blob/master/config#cluster_user_config).
292296

293297
### Users
294298
There are two types of users: `in-users` (in global section) and `out-users` (in cluster section).
@@ -298,13 +302,13 @@ with overriding credentials.
298302
Suppose we have one ClickHouse user `web` with `read-only` permissions and `max_concurrent_queries: 4` limit.
299303
There are two distinct applications `reading` from ClickHouse. We may create two distinct `in-users` with `to_user: "web"` and `max_concurrent_queries: 2` each in order to avoid situation when a single application exhausts all the 4-request limit on the `web` user.
300304

301-
Requests to `chproxy` must be authorized with credentials from [user_config](https://github.com/Vertamedia/chproxy/blob/master/config#user_config). Credentials can be passed via [BasicAuth](https://en.wikipedia.org/wiki/Basic_access_authentication) or via `user` and `password` [query string](https://en.wikipedia.org/wiki/Query_string) args.
305+
Requests to `chproxy` must be authorized with credentials from [user_config](https://github.com/ContentSquare/chproxy/blob/master/config#user_config). Credentials can be passed via [BasicAuth](https://en.wikipedia.org/wiki/Basic_access_authentication) or via `user` and `password` [query string](https://en.wikipedia.org/wiki/Query_string) args.
302306

303307
Limits for `in-users` and `out-users` are independent.
304308

305309
### Clusters
306310
`Chproxy` can be configured with multiple `cluster`s. Each `cluster` must have a name and either a list of nodes
307-
or a list of replicas with nodes. See [cluster-config](https://github.com/Vertamedia/chproxy/tree/master/config#cluster_config) for details.
311+
or a list of replicas with nodes. See [cluster-config](https://github.com/ContentSquare/chproxy/tree/master/config#cluster_config) for details.
308312
Requests to each cluster are balanced among replicas and nodes using `round-robin` + `least-loaded` approach.
309313
The node priority is automatically decreased for a short interval if recent requests to it were unsuccessful.
310314
This means that the `chproxy` will choose the next least loaded healthy node among least loaded replica
@@ -313,14 +317,14 @@ for every new request.
313317
Additionally each node is periodically checked for availability. Unavailable nodes are automatically excluded from the cluster until they become available again. This allows performing node maintenance without removing unavailable nodes from the cluster config.
314318

315319
`Chproxy` automatically kills queries exceeding `max_execution_time` limit. By default `chproxy` tries to kill such queries
316-
under `default` user. The user may be overriden with [kill_query_user](https://github.com/Vertamedia/chproxy/blob/master/config#kill_query_user_config).
320+
under `default` user. The user may be overriden with [kill_query_user](https://github.com/ContentSquare/chproxy/blob/master/config#kill_query_user_config).
317321

318-
If `cluster`'s [users](https://github.com/Vertamedia/chproxy/blob/master/config#cluster_user_config) section isn't specified, then `default` user is used with no limits.
322+
If `cluster`'s [users](https://github.com/ContentSquare/chproxy/blob/master/config#cluster_user_config) section isn't specified, then `default` user is used with no limits.
319323

320324
### Caching
321325

322326
`Chproxy` may be configured to cache responses. It is possible to create multiple
323-
[cache-configs](https://github.com/Vertamedia/chproxy/blob/master/config/#cache_config) with various settings.
327+
cache-configs with various settings.
324328
Response caching is enabled by assigning cache name to user. Multiple users may share the same cache.
325329
Currently only `SELECT` responses are cached.
326330
Caching is disabled for request with `no_cache=1` in query string.
@@ -329,8 +333,21 @@ distinct responses for the identical query under distinct cache namespaces. Addi
329333
an instant cache flush may be built on top of cache namespaces - just switch to new namespace in order
330334
to flush the cache.
331335

336+
Two types of cache configuration are supported:
337+
- local instance cache
338+
- distributed cache
339+
340+
#### Local cache
341+
Local cache is stored on machine's file system. Therefore it is suitable for single replica deployments.
342+
Configuration template for local cache can be found [here](https://github.com/ContentSquare/chproxy/blob/master/config/#file_system_cache_config)
343+
344+
#### Distributed cache
345+
Distributed cache relies on external database to share cache across multiple replicas. Therefore it is suitable for
346+
multiple replicas deployments. Currently only [redis](https://redis.io/) key value store is supported.
347+
Configuration template for distributed cache can be found [here](https://github.com/ContentSquare/chproxy/blob/master/config/#distributed_cache_config)
348+
332349
### Security
333-
`Chproxy` removes all the query params from input requests (except the user's [params](https://github.com/Vertamedia/chproxy/blob/master/config#param_groups_config) and listed [here](https://github.com/Vertamedia/chproxy/blob/master/scope.go#L292))
350+
`Chproxy` removes all the query params from input requests (except the user's [params](https://github.com/ContentSquare/chproxy/blob/master/config#param_groups_config) and listed [here](https://github.com/ContentSquare/chproxy/blob/master/scope.go#L292))
334351
before proxying them to `ClickHouse` nodes. This prevents from unsafe overriding
335352
of various `ClickHouse` [settings](http://clickhouse-docs.readthedocs.io/en/latest/interfaces/http_interface.html).
336353

@@ -339,7 +356,7 @@ By default `chproxy` tries detecting the most obvious configuration errors such
339356

340357
Special option `hack_me_please: true` may be used for disabling all the security-related checks during config validation (if you are feeling lucky :) ).
341358

342-
#### Example of [full](https://github.com/Vertamedia/chproxy/blob/master/config/testdata/full.yml) configuration:
359+
#### Example of [full](https://github.com/ContentSquare/chproxy/blob/master/config/testdata/full.yml) configuration:
343360
```yml
344361
# Whether to print debug logs.
345362
#
@@ -354,18 +371,30 @@ hack_me_please: true
354371
# Optional response cache configs.
355372
#
356373
# Multiple distinct caches with different settings may be configured.
374+
375+
name: "shortterm"
376+
mode: "file_system"
377+
file_system:
378+
dir: "/path/to/cache/dir"
379+
max_size: 150Mb
380+
expire: 130s
357381
caches:
358382
# Cache name, which may be passed into `cache` option on the `user` level.
359383
#
360384
# Multiple users may share the same cache.
361385
- name: "longterm"
362386

363-
# Path to directory where cached responses will be stored.
364-
dir: "/path/to/longterm/cachedir"
365-
366-
# Maximum cache size.
367-
# `Kb`, `Mb`, `Gb` and `Tb` suffixes may be used.
368-
max_size: 100Gb
387+
# Cache mode, either [[file_system]] or [[redis]]
388+
mode: "file_system"
389+
390+
# Applicable for cache mode: file_system
391+
file_system:
392+
# Path to directory where cached responses will be stored.
393+
dir: "/path/to/longterm/cachedir"
394+
395+
# Maximum cache size.
396+
# `Kb`, `Mb`, `Gb` and `Tb` suffixes may be used.
397+
max_size: 100Gb
369398

370399
# Expiration time for cached responses.
371400
expire: 1h
@@ -381,8 +410,14 @@ caches:
381410
grace_time: 20s
382411

383412
- name: "shortterm"
384-
dir: "/path/to/shortterm/cachedir"
385-
max_size: 100Mb
413+
mode: "redis"
414+
415+
# Applicable for cache mode: redis
416+
redis:
417+
addresses:
418+
- "localhost:6379"
419+
username: "user"
420+
password: "pass"
386421
expire: 10s
387422

388423
# Optional network lists, might be used as values for `allowed_networks`.
@@ -627,7 +662,7 @@ clusters:
627662
allowed_networks: ["office"]
628663
```
629664
630-
#### Full specification is located [here](https://github.com/Vertamedia/chproxy/blob/master/config)
665+
#### Full specification is located [here](https://github.com/ContentSquare/chproxy/blob/master/config)
631666
632667
## Metrics
633668
Metrics are exposed in [prometheus text format](https://prometheus.io/docs/instrumenting/exposition_formats/) at `/metrics` path.
@@ -660,7 +695,7 @@ Metrics are exposed in [prometheus text format](https://prometheus.io/docs/instr
660695
| timeout_request_total | Counter | The number of timed out requests | `user`, `cluster`, `cluster_user`, `replica`, `cluster_node` |
661696
| user_queue_overflow_total | Counter | The number of overflows for per-user request queues | `user`, `cluster`, `cluster_user` |
662697

663-
An example of [Grafana's](https://grafana.com) dashboard for `chproxy` metrics is available [here](https://github.com/Vertamedia/chproxy/blob/master/chproxy_overview.json)
698+
An example of [Grafana's](https://grafana.com) dashboard for `chproxy` metrics is available [here](https://github.com/ContentSquare/chproxy/blob/master/chproxy_overview.json)
664699

665700
![dashboard example](https://user-images.githubusercontent.com/2902918/31392734-b2fd4a18-ade2-11e7-84a9-4aaaac4c10d7.png)
666701

cache/async_cache.go

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
package cache
2+
3+
import (
4+
"github.com/Vertamedia/chproxy/clients"
5+
"github.com/Vertamedia/chproxy/config"
6+
"github.com/go-redis/redis/v8"
7+
"time"
8+
)
9+
10+
// AsyncCache is a transactional cache allowing the results from concurrent queries.
11+
// When query A is equal to query B and A arrives no more than defined graceTime, query A will await for the results of query B for the max time equal to:
12+
// graceTime - (arrivalB - arrivalA)
13+
type AsyncCache struct {
14+
Cache
15+
TransactionRegistry
16+
17+
graceTime time.Duration
18+
}
19+
20+
func (c *AsyncCache) Close() error {
21+
if c.TransactionRegistry != nil {
22+
c.TransactionRegistry.Close()
23+
}
24+
if c.Cache != nil {
25+
c.Cache.Close()
26+
}
27+
return nil
28+
}
29+
30+
func (c *AsyncCache) AwaitForConcurrentTransaction(key *Key) bool {
31+
startTime := time.Now()
32+
33+
for {
34+
if time.Since(startTime) > c.graceTime {
35+
// The entry didn't appear during graceTime.
36+
// Let the caller creating it.
37+
return false
38+
}
39+
40+
ok := c.TransactionRegistry.IsDone(key)
41+
if ok {
42+
return ok
43+
}
44+
45+
// Wait for graceTime in the hope the entry will appear
46+
// in the cache.
47+
//
48+
// This should protect from thundering herd problem when
49+
// a single slow query is executed from concurrent requests.
50+
d := 100 * time.Millisecond
51+
if d > c.graceTime {
52+
d = c.graceTime
53+
}
54+
time.Sleep(d)
55+
}
56+
}
57+
58+
func NewAsyncCache(cfg config.Cache) (*AsyncCache, error) {
59+
graceTime := time.Duration(cfg.GraceTime)
60+
if graceTime == 0 {
61+
// Default grace time.
62+
graceTime = 5 * time.Second
63+
}
64+
if graceTime < 0 {
65+
// Disable protection from `dogpile effect`.
66+
graceTime = 0
67+
}
68+
69+
var cache Cache
70+
var transaction TransactionRegistry
71+
var err error
72+
73+
switch cfg.Mode {
74+
case "file_system":
75+
cache, err = newFilesSystemCache(cfg, graceTime)
76+
transaction = newInMemoryTransactionRegistry(graceTime)
77+
case "redis":
78+
var redisClient redis.UniversalClient
79+
redisClient, err = clients.NewRedisClient(cfg.Redis)
80+
cache = newRedisCache(redisClient, cfg)
81+
transaction = newRedisTransactionRegistry(redisClient, time.Duration(cfg.GraceTime))
82+
}
83+
84+
if err != nil {
85+
return nil, err
86+
}
87+
88+
return &AsyncCache{
89+
Cache: cache,
90+
TransactionRegistry: transaction,
91+
graceTime: graceTime,
92+
}, nil
93+
94+
}

0 commit comments

Comments
 (0)