You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on May 3, 2024. It is now read-only.
Copy file name to clipboardExpand all lines: doc/HLD-FOP-State-Machine.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ See [4] and [5] for the description of fop architecture.
23
23
* fop state machine (fom) is a state machine [6] that represents the current state of the fop's [r.fop]ST execution on a node. fom is associated with the particular fop and implicitly includes this fop as part of its state.
24
24
* a fom state transition is executed by a handler thread[r.lib.threads]. The association between the fom and the handler thread is short-living: a different handler thread can be selected to execute the next state transition.
25
25
26
-
## Requirements
26
+
## Requirements
27
27
*`[r.non-blocking.few-threads]` : Motr service should use a relatively small number of threads: a few per processor [r.lib.processors].
28
28
*`[r.non-blocking.easy]`: non-blocking infrastructure should be easy to use and non-intrusive.
29
29
*`[r.non-blocking.extensibility]`: addition of new "cross-cut" functionality (e.g., logging, reporting) potentially including blocking points and affecting multiple fop types should not require extensive changes to the data structures for each fop type involved.
@@ -35,7 +35,7 @@ See [4] and [5] for the description of fop architecture.
35
35
## Design Highlights
36
36
A set of data structures similar to one maintained by a typical thread or process scheduler in an operating system kernel (or a user-level library thread package) is used for non-blocking fop processing: prioritized run-queues of fom-s ready for the next state transition and wait-queues of fom-s parked waiting for events to happen.
37
37
38
-
## Functional Specification ##
38
+
## Functional Specification
39
39
A fop belongs to a fop type. Similarly, a fom belongs to a fom type. The latter is part of the corresponding fop type. fom type specifies machine states as well as its transition function. A mandatory part of fom state is a phase, indicating how far the fop processing progressed. Each fom goes through standard phases, described in [7], as well as some fop-type specific phases.
40
40
41
41
The fop-type implementation provides an enumeration of non-standard phases and state-transition function for the fom.
@@ -121,13 +121,13 @@ The network request scheduler (NRS) has its queue of fop-s waiting for the execu
121
121
## Security Model
122
122
Security checks (authorization and authentication) are done in one of the standards fom phases (see [7]).
123
123
124
-
## Refinement ##
124
+
## Refinement
125
125
The data structures, their relationships, concurrency control, and liveness issues follow quite straightforwardly from the logical specification above.
126
126
127
-
## State ##
127
+
## State
128
128
See [7] for the description of fom state machine.
129
129
130
-
## Use Cases ##
130
+
## Use Cases
131
131
132
132
**Scenarios**
133
133
@@ -183,7 +183,7 @@ Scenario 4
183
183
|Response| handler threads wait on a per-locality condition variable until the locality run-queue is non-empty again. |
184
184
|Response Measure|
185
185
186
-
## Failures ##
186
+
## Failures
187
187
- Failure of a fom state transition: this lands fom in the standard FAILED phase;
188
188
- Dead-lock: dealing with the dead-lock (including ones involving activity in multiple address spaces) is outside of the scope of the present design. It is assumed that general mechanisms of dead-lock avoidance (resource ordering, &c.) are used.
189
189
- Time-out: if a fom is staying on the wait-list for too long, it is forced into the FAILED state.
Copy file name to clipboardExpand all lines: doc/HLD-Resource-Management-Interface.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -59,7 +59,7 @@ Motr functionality, both internal and external, is often specified in terms of r
59
59
-`[r.resource.power]`: (electrical) power consumed by a device is a resource.
60
60
61
61
62
-
## Design Highlights ##
62
+
## Design Highlights ##
63
63
- hierarchical resource names. Resource name assignment can be simplified by introducing variable length resource identifiers.
64
64
- conflict-free schedules: no observable conflicts. Before a resource usage credit is canceled, the owner must re-integrate all changes made to the local copy of the resource. Conflicting usage credits can be granted only after all changes are re-integrated. Yet, the ordering between actual re-integration network requests and cancellation requests can be arbitrary, subject to server-side NRS policy.
65
65
- resource management code is split into two parts:
Copy file name to clipboardExpand all lines: doc/HLD-of-FOL.md
+18-18Lines changed: 18 additions & 18 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ A FOL is a central M0 data structure, maintained by every node where the M0 core
19
19
20
20
Roughly speaking, a FOL is a partially ordered collection of FOL records, each corresponding to (part of) a consistent modification of the file system state. A FOL record contains information determining the durability of the modification (how many volatile and persistent copies it has and where etc.) and dependencies between modifications, among other things. When a client node has to modify a file system state to serve a system call from a user, it places a record in its (possibly volatile) FOL. The record keeps track of operation state: has it been re-integrated to servers, has it been committed on the servers, etc. A server, on receiving a request to execute an update on a client's behalf, inserts a record, describing the request into its FOL. Eventually, FOL is purged to reclaim storage, culling some of the records.
21
21
22
-
## Definitions ##
22
+
## Definitions
23
23
- a (file system) operation is a modification of a file system state preserving file system consistency (i.e., when applied to a file system in a consistent state it produces a consistent state). There is a limited repertoire of operation types: mkdir, link, create, write, truncate, etc. M0 core maintains serializability of operation execution;
24
24
- an update (of an operation) is a sub-modification of a file system state that modifies the state on a single node only. For example, a typical write operation against a RAID-6 striped file includes updates that modify data blocks on a server A and updates which modify parity blocks on a server B;
25
25
- an operation or update undo is a reversal of state modification, restoring the original state. An operation can be undone only when the parts of the state it modifies are compatible with the operation having been executed. Similarly, an operation or update redo is modifying state in the "forward" direction, possibly after undo;
@@ -43,7 +43,7 @@ Roughly speaking, a FOL is a partially ordered collection of FOL records, each c
43
43
<strong>Note</strong>: It would be nice to refine the terminology to distinguish between operation description (i.e., intent to carry it out) and its actual execution. This would make a description of dependencies and recovery less obscure, at the expense of some additional complexity.
44
44
</p>
45
45
46
-
## Requirements ##
46
+
## Requirements
47
47
48
48
-`[R.FOL.EVERY-NODE]`: every node where M0 core is deployed maintains FOL;
49
49
-`[R.FOL.LOCAL-TXN]`: a node FOL is used to implement local transactional containers
@@ -68,23 +68,23 @@ Roughly speaking, a FOL is a partially ordered collection of FOL records, each c
68
68
-`[R.FOL.ADDB]`: FOL is integrated with ADDB. ADDB records matching a given FOL record can be found efficiently;
69
69
-`[R.FOL.FILE]`: FOL records pertaining to a given file (-set) can be found efficiently.
70
70
71
-
## Design Highlights ##
71
+
## Design Highlights
72
72
A FOL record is identified by its LSN. LSN is defined and selected as to be able to encode various partial orders imposed on FOL records by the requirements.
73
73
74
-
## Functional Specification ##
74
+
## Functional Specification
75
75
The FOL manager exports two interfaces:
76
76
- main interface used by the request handler. Through this interface FOL records can be added to the FOL and the FOL can be forced (i.e., made persistent up to a certain record);
77
77
- auxiliary interfaces, used for FOL pruning and querying.
78
78
79
-
## Logical Specification ##
79
+
## Logical Specification
80
80
81
-
### Overview ###
81
+
### Overview
82
82
FOL is stored in a transactional container [1] populated with records indexed [2] by LSN. An LSN is used to refer to a point in FOL from other meta-data tables (epochs table, object index, sessions table, etc.). To make such references more flexible, a FOL, in addition to genuine records corresponding to updates, might contain pseudo-records marking points of interest in the FOL to which other file system tables might want to refer (for example, an epoch boundary, a snapshot origin, a new server secret key, etc.). By abuse of terminology, such pseudo-records will be called FOL records too. Similarly, as part of the redo-recovery implementation, DTM might populate a node FOL with records describing updates to be performed on other nodes.
83
83
84
84
[1][R.BACK-END.TRANSACTIONAL] ST
85
85
[2][R.BACK-END.INDEXING] ST
86
86
87
-
### Record Structure ###
87
+
### Record Structure
88
88
A FOL record, added via the main FOL interface, contains the following:
89
89
- an operation opcode, identifying the type of file system operation;
90
90
- LSN;
@@ -100,11 +100,11 @@ A FOL record, added via the main FOL interface, contains the following:
100
100
- distributed transaction management data, including an epoch this update and operation, are parts of;
101
101
- liveness state: a number of outstanding references to this record.
102
102
103
-
### Liveness and Pruning ###
103
+
### Liveness and Pruning
104
104
A node FOL must be prunable if only to function correctly on a node without persistent storage. At the same time, a variety of sub-systems both from M0 core and outside of it might want to refer to FOL records. To make pruning possible and flexible, each FOL record is augmented with a reference counter, counting all outstanding references to the record. A record can be pruned if its reference count drops to 0 together with reference counters of all earlier (in lsn sense) unpruned records in the FOL.
105
105
106
106
107
-
### Conformance ###
107
+
### Conformance
108
108
-`[R.FOL.EVERY-NODE]`: on nodes with persistent storage, M0 core runs in the user space and the FOL is stored in a database table. On a node without persistent storage, or M0 core runs in the kernel space, the FOL is stored in the memory-only index. Data-base and memory-only index provide the same external interface, making FOL code portable;
109
109
-`[R.FOL.LOCAL-TXN]`: request handler inserts a record into FOL table in the context of the same transaction where the update is executed. This guarantees WAL property of FOL;
110
110
-`[R.FOL]`: vacuous;
@@ -129,36 +129,36 @@ A node FOL must be prunable if only to function correctly on a node without pers
129
129
-`[R.FOL.FILE]`: an object index table, enumerating all files and file sets for the node contains references to the latest FOL record for the file (or file-set). By following the previous operation LSN references the history of modifications of a given file can be recovered.
130
130
131
131
132
-
### Dependencies ###
132
+
### Dependencies
133
133
- back-end:
134
134
-`[R.BACK-END.TRANSACTIONAL] ST`: back-end supports local transactions so that FOL could be populated atomically with other tables.
135
135
-`[R.BACK-END.INDEXING] ST`: back-end supports containers with records indexed by a key.
136
136
137
-
### Security Model ###
137
+
### Security Model
138
138
FOL manager by itself does not deal with security issues. It trusts its callers (request handler, DTM, etc.) to carry out necessary authentication and authorization checks before manipulating FOL records. The FOL stores some security information as part of its records.
139
139
140
-
### Refinement ###
140
+
### Refinement
141
141
The FOL is organized as a single indexed table containing records with LSN as a primary key. The structure of an individual record as outlined above. The detailed main FOL interface is straightforward. FOL navigation and querying in the auxiliary interface are based on a FOL cursor.
142
142
143
-
## State ##
143
+
## State
144
144
FOL introduces no extra state.
145
145
146
146
## Use Cases
147
-
### Scenarios ###
147
+
### Scenarios
148
148
149
149
FOL QAS list is included here by reference.
150
150
151
-
### Failures ###
151
+
### Failures
152
152
Failure of the underlying storage container in which FOL is stored is treated as any storage failure. All other FOL-related failures are handled by DTM.
153
153
154
-
## Analysis ##
154
+
## Analysis
155
155
156
-
### Other ###
156
+
### Other
157
157
An alternative design is to store FOL in a special data structure, instead of a standard indexed container. For example, FOL can be stored in an append-only flat file with starting offset of a record serving as its lsn. The perceived advantage of this solution is avoiding overhead of full-fledged indexing (b-tree). Indeed, general-purpose indexing is not needed, because records with lsn less than the maximal one used in the past are never inserted into the FOL (aren't they?).
158
158
159
159
Yet another possible design is to use db4 extensible logging to store FOL records directly in a db4 transactional log. The advantage of this is that forcing FOL up to a specific record becomes possible (and easy to implement), and the overhead of indexing is again avoided. On the other hand, it is not clear how to deal with pruning.
0 commit comments