Skip to content

Commit a5133d9

Browse files
Update advanced training to current syntax (#641)
* (feat): Update advanced training to new syntax Updates to the advanced training to use the current syntax making it more consistent with the other guides. Changes: - Replace all pipe operators with method chaining - Where required, assigns to intermediate channel and uses in a process - Moves all logic into workflows * chore: update starting code * fix: Name all iterators in closures * chore: remove extra files * fix: add all named closures within workflow scope * fix: Use def for every new variable * fix: do not use same variable name twice * fix: remove variables that are not used * fix: multiMap syntax * fix: add block to every process * feat: adds checkIfExists to block users accidentally missing step * fix: Add period to beginning of map to make copy+paste easier * fix: add grouping starter back * style: code style in grouping.md * fix: MapReads assigning output * fix: review fixups * fix: Groovy training * fix: structure training inconsistencies * Apply suggestion from @FriederikeHanssen Co-authored-by: Friederike Hanssen <[email protected]> * Update docs/advanced/operators.md Co-authored-by: Friederike Hanssen <[email protected]> * Update docs/advanced/operators.md Co-authored-by: Friederike Hanssen <[email protected]> * docs: update advanced grouping training to new syntax - Update grouping.md documentation with modern Nextflow syntax - Modify main.nf workflow to use current DSL2 conventions - Remove deprecated working_with_files/main.nf file - Improve code examples and explanations for better learning experience * feat: add structure training examples with Groovy classes - Add Dog.groovy and Metadata.groovy classes for structure training - Update main.nf workflow to demonstrate class usage and imports - Modify cars.R script for better R integration example - Update structure.md documentation with new examples - Demonstrate proper project structure with lib/ directory * style: Highlight code in operator tour * style: Highlight code in metadata * style: Line numbers and highlighting in operators * style: Line numbers and highlighting in metadata * style: Line numbers and highlighting in grouping * fixup * fixup * style: Line numbers and highlighting in groovy * style: Line numbers and highlighting in structure --------- Co-authored-by: Friederike Hanssen <[email protected]>
1 parent 9ffcd98 commit a5133d9

File tree

14 files changed

+679
-606
lines changed

14 files changed

+679
-606
lines changed

docs/advanced/groovy.md

Lines changed: 88 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -10,36 +10,36 @@ cd groovy
1010

1111
Let's assume that we would like to pull in a samplesheet, parse the entries and run them through the FastP tool. So far, we have been concerned with local files, but Nextflow will handle remote files transparently:
1212

13-
```groovy linenums="1"
13+
```groovy linenums="3"
14+
params.input = "https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/samplesheet/v3.10/samplesheet_test.csv"
15+
1416
workflow {
15-
params.input = "https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/samplesheet/v3.10/samplesheet_test.csv"
1617
1718
Channel.fromPath(params.input)
18-
| splitCsv(header: true)
19-
| view
19+
.splitCsv(header: true)
20+
.view()
2021
}
2122
```
2223

2324
Let's write a small closure to parse each row into the now-familiar map + files shape. We might start by constructing the meta-map:
2425

25-
```groovy linenums="1"
26+
```groovy linenums="5" hl_lines="5-8"
2627
workflow {
27-
params.input = "https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/samplesheet/v3.10/samplesheet_test.csv"
2828
29-
Channel.fromPath(params.input)
30-
| splitCsv(header: true)
31-
| map { row ->
32-
meta = row.subMap('sample', 'strandedness')
33-
meta
34-
}
35-
| view
29+
samples = Channel.fromPath(params.input)
30+
.splitCsv(header: true)
31+
.map { row ->
32+
def meta = row.subMap('sample', 'strandedness')
33+
meta
34+
}
35+
.view()
3636
}
3737
```
3838

39-
... but this precludes the possibility of adding additional columns to the samplesheet. We might to ensure the parsing will capture any extra metadata columns should they be added. Instead, let's partition the column names into those that begin with "fastq" and those that don't:
39+
... but this precludes the possibility of adding additional columns to the samplesheet. We might to ensure the parsing will capture any extra metadata columns should they be added. Instead, let's partition the column names into those that begin with "fastq" and those that don't. Within the map closure, let's add an additional line to partition the column names:
4040

41-
```groovy linenums="1"
42-
(readKeys, metaKeys) = row.keySet().split { it =~ /^fastq/ }
41+
```groovy linenums="10"
42+
def (readKeys, metaKeys) = row.keySet().split { key -> key =~ /^fastq/ }
4343
```
4444

4545
!!! note "New methods"
@@ -48,26 +48,24 @@ workflow {
4848

4949
We're also using the `.split()` method, which divides collection based on the return value of the closure. The mrhaki blog [provides a succinct summary](https://blog.mrhaki.com/2009/12/groovy-goodness-splitting-with-closures.html).
5050

51-
From here, let's
51+
From here, let's add another line collect the values of the read keys into a list of file objects:
5252

53-
```groovy linenums="1"
54-
reads = row.subMap(readKeys).values().collect { file(it) }
53+
```groovy linenums="11"
54+
def reads = row.subMap(readKeys).values().collect { value -> file(value) }
5555
```
5656

5757
... but we run into an error:
5858

59-
```groovy linenums="1"
59+
```groovy
6060
Argument of `file` function cannot be empty
6161
```
6262

63-
If we have a closer look at the samplesheet, we notice that not all rows have two read pairs. Let's add a condition
63+
If we have a closer look at the samplesheet, we notice that not all rows have two read pairs. Let's add a condition to the collect method to only include the values that are not empty:
6464

65-
```groovy linenums="1"
66-
reads = row
67-
.subMap(readKeys)
68-
.values()
69-
.findAll { it != "" } // Single-end reads will have an empty string
70-
.collect { file(it) } // Turn those strings into paths
65+
```groovy linenums="11"
66+
def reads = row.subMap(readKeys).values()
67+
.findAll { value -> value != "" } // Single-end reads will have an empty string
68+
.collect { path -> file(path) }
7169
```
7270

7371
Now we need to construct the meta map. Let's have a quick look at the FASTP module that I've already pre-defined:
@@ -94,37 +92,40 @@ process FASTP {
9492

9593
I can see that we require two extra keys, `id` and `single_end`:
9694

97-
```groovy linenums="1"
98-
meta = row.subMap(metaKeys)
99-
meta.id ?= meta.sample
100-
meta.single_end = reads.size == 1
95+
```groovy linenums="14" hl_lines="1-3"
96+
def meta = row.subMap(metaKeys)
97+
meta = meta + [ id: meta.sample, single_end: reads.size == 1 ]
98+
[meta, reads]
10199
```
102100

103101
This is now able to be passed through to our FASTP process:
104102

105-
```groovy linenums="1"
106-
Channel.fromPath(params.input)
107-
| splitCsv(header: true)
108-
| map { row ->
109-
(readKeys, metaKeys) = row.keySet().split { it =~ /^fastq/ }
110-
reads = row.subMap(readKeys).values()
111-
.findAll { it != "" } // Single-end reads will have an empty string
112-
.collect { file(it) } // Turn those strings into paths
113-
meta = row.subMap(metaKeys)
114-
meta.id ?= meta.sample
115-
meta.single_end = reads.size == 1
116-
[meta, reads]
117-
}
118-
| FASTP
103+
```groovy linenums="5" hl_lines="15 17"
104+
workflow {
105+
106+
samples = Channel.fromPath(params.input)
107+
.splitCsv(header: true)
108+
.map { row ->
109+
def (readKeys, metaKeys) = row.keySet().split { key -> key =~ /^fastq/ }
110+
def reads = row.subMap(readKeys).values()
111+
.findAll { value -> value != "" } // Single-end reads will have an empty string
112+
.collect { path -> file(path) }
113+
def meta = row.subMap(metaKeys)
114+
meta = meta + [ id: meta.sample, single_end: reads.size == 1 ]
115+
[meta, reads]
116+
}
119117
120-
FASTP.out.json | view
118+
FASTP(samples)
119+
120+
FASTP.out.json.view()
121+
}
121122
```
122123

123124
Let's assume that we want to pull some information out of these JSON files. To make our lives a little more convenient, let's "publish" these json files so that they are more convenient. We're going to discuss configuration more completely in a later chapter, but that's no reason not to dabble a bit here.
124125

125126
We'd like to add a `publishDir` directive to our FASTP process.
126127

127-
```groovy linenums="1"
128+
```groovy linenums="3"
128129
process {
129130
withName: 'FASTP' {
130131
publishDir = [
@@ -155,26 +156,18 @@ This enables us to iterate quickly to test out our JSON parsing without waiting
155156
nextflow run . -resume
156157
```
157158

158-
Let's consider the possibility that we'd like to capture some of these metrics so that they can be used downstream. First, we'll have a quick peek at the [Groovy docs](https://groovy-lang.org/documentation.html) and I see that I need to import a `JsonSlurper`:
159-
160-
```groovy linenums="1"
161-
import groovy.json.JsonSlurper
162-
163-
// We can also import a Yaml parser just as easily:
164-
// import org.yaml.snakeyaml.Yaml
165-
// new Yaml().load(new FileReader('your/data.yml'))
166-
```
159+
Let's consider the possibility that we'd like to capture some of these metrics so that they can be used downstream. First, we'll have a quick peek at the [Groovy docs](https://groovy-lang.org/documentation.html) and I see that I need to use `JsonSlurper`.
167160

168161
Now let's create a second entrypoint to quickly pass these JSON files through some tests:
169162

170163
!!! note "Entrypoint developing"
171164

172165
Using a second Entrypoint allows us to do quick debugging or development using a small section of the workflow without disturbing the main flow.
173166

174-
```groovy linenums="1"
167+
```groovy linenums="5"
175168
workflow Jsontest {
176169
Channel.fromPath("results/fastp/json/*.json")
177-
| view
170+
.view()
178171
}
179172
```
180173

@@ -184,41 +177,43 @@ which we run with
184177
nextflow run . -resume -entry Jsontest
185178
```
186179

187-
Let's create a small function at the top of the workflow to take the JSON path and pull out some basic metrics:
180+
Let's create a small function inside the workflow to take the JSON path and pull out some basic metrics:
188181

189-
```bash
182+
```groovy linenums="5"
190183
def getFilteringResult(json_file) {
191-
fastpResult = new JsonSlurper().parseText(json_file.text)
184+
return new groovy.json.JsonSlurper().parseText(json_file.text)
192185
}
193-
```
194186
195-
!!! exercise
196-
197-
The `fastpResult` returned from the `parseText` method is a large Map - a class which we're already familiar with. Modify the `getFilteringResult` function to return just the `after_filtering` section of the report.
187+
workflow Jsontest {
188+
Channel.fromPath("results/fastp/json/*.json")
189+
.view()
190+
}
191+
```
198192

199-
??? solution
193+
The `fastpResult` returned from the `parseText` method is a large Map - a class which we're already familiar with. Modify the `getFilteringResult` function to return just the `after_filtering` section of the report.
200194

201-
Here is one potential solution.
195+
In the interest of brevity, here is the solution to return just the `after_filtering` section of the report:
202196

203-
```groovy linenums="1"
204-
def getFilteringResult(json_file) {
205-
new JsonSlurper().parseText(json_file.text)
206-
?.summary
207-
?.after_filtering
208-
}
209-
```
197+
```groovy linenums="5"
198+
def getFilteringResult(json_file) {
199+
return new groovy.json.JsonSlurper().parseText(json_file.text)
200+
?.summary
201+
?.after_filtering
202+
}
203+
```
210204

211-
!!! note
205+
!!! note
212206

213-
`?.` is new notation is a null-safe access operator. The `?.summary` will access the summary property if the property exists.
207+
`?.` is new notation is a null-safe access operator. The `?.summary` will access the summary property if the property exists.
214208

215209
We can then join this new map back to the original reads using the `join` operator:
216210

217-
```groovy linenums="1"
218-
FASTP.out.json
219-
| map { meta, json -> [meta, getFilteringResult(json)] }
220-
| join( FASTP.out.reads )
221-
| view
211+
```groovy linenums="31"
212+
FASTP.out.json
213+
.map { meta, json -> [meta, getFilteringResult(json)] }
214+
.join( FASTP.out.reads )
215+
.view()
216+
}
222217
```
223218

224219
!!! exercise
@@ -227,17 +222,17 @@ FASTP.out.json
227222

228223
??? solution
229224

230-
```groovy linenums="1"
231-
FASTP.out.json
232-
| map { meta, json -> [meta, getFilteringResult(json)] }
233-
| join( FASTP.out.reads )
234-
| map { meta, fastpMap, reads -> [meta + fastpMap, reads] }
235-
| branch { meta, reads ->
236-
pass: meta.q30_rate >= 0.935
237-
fail: true
225+
```groovy linenums="31"
226+
reads = FASTP.out.json
227+
.map { meta, json -> [meta, getFilteringResult(json)] }
228+
.join( FASTP.out.reads )
229+
.map { meta, fastpMap, reads -> [meta + fastpMap, reads] }
230+
.branch { meta, reads ->
231+
pass: meta.q30_rate >= 0.935
232+
fail: true
233+
}
234+
235+
reads.fail.view { meta, _reads -> "Failed: ${meta.id}" }
236+
reads.pass.view { meta, _reads -> "Passed: ${meta.id}" }
238237
}
239-
| set { reads }
240-
241-
reads.fail | view { meta, reads -> "Failed: ${meta.id}" }
242-
reads.pass | view { meta, reads -> "Passed: ${meta.id}" }
243238
```

0 commit comments

Comments
 (0)