-
Notifications
You must be signed in to change notification settings - Fork 14.4k
[mlir][OpenMP] inscan reduction modifier and scan op mlir support #114737
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
59ac266
1be9aa4
eb274aa
5c984a0
2adadc1
56c1105
807be08
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -178,7 +178,7 @@ def ParallelOp : OpenMP_Op<"parallel", traits = [ | |
|
||
let assemblyFormat = clausesAssemblyFormat # [{ | ||
custom<PrivateReductionRegion>($region, $private_vars, type($private_vars), | ||
$private_syms, $reduction_vars, type($reduction_vars), $reduction_byref, | ||
$private_syms, $reduction_mod, $reduction_vars, type($reduction_vars), $reduction_byref, | ||
$reduction_syms) attr-dict | ||
}]; | ||
|
||
|
@@ -223,7 +223,7 @@ def TeamsOp : OpenMP_Op<"teams", traits = [ | |
|
||
let assemblyFormat = clausesAssemblyFormat # [{ | ||
custom<PrivateReductionRegion>($region, $private_vars, type($private_vars), | ||
$private_syms, $reduction_vars, type($reduction_vars), $reduction_byref, | ||
$private_syms, $reduction_mod, $reduction_vars, type($reduction_vars), $reduction_byref, | ||
$reduction_syms) attr-dict | ||
}]; | ||
|
||
|
@@ -282,7 +282,7 @@ def SectionsOp : OpenMP_Op<"sections", traits = [ | |
|
||
let assemblyFormat = clausesAssemblyFormat # [{ | ||
custom<PrivateReductionRegion>($region, $private_vars, type($private_vars), | ||
$private_syms, $reduction_vars, type($reduction_vars), $reduction_byref, | ||
$private_syms, $reduction_mod, $reduction_vars, type($reduction_vars), $reduction_byref, | ||
$reduction_syms) attr-dict | ||
}]; | ||
|
||
|
@@ -469,7 +469,7 @@ def LoopOp : OpenMP_Op<"loop", traits = [ | |
|
||
let assemblyFormat = clausesAssemblyFormat # [{ | ||
custom<PrivateReductionRegion>($region, $private_vars, type($private_vars), | ||
$private_syms, $reduction_vars, type($reduction_vars), $reduction_byref, | ||
$private_syms, $reduction_mod, $reduction_vars, type($reduction_vars), $reduction_byref, | ||
$reduction_syms) attr-dict | ||
}]; | ||
|
||
|
@@ -521,7 +521,7 @@ def WsloopOp : OpenMP_Op<"wsloop", traits = [ | |
|
||
let assemblyFormat = clausesAssemblyFormat # [{ | ||
custom<PrivateReductionRegion>($region, $private_vars, type($private_vars), | ||
$private_syms, $reduction_vars, type($reduction_vars), $reduction_byref, | ||
$private_syms, $reduction_mod, $reduction_vars, type($reduction_vars), $reduction_byref, | ||
$reduction_syms) attr-dict | ||
}]; | ||
|
||
|
@@ -575,7 +575,7 @@ def SimdOp : OpenMP_Op<"simd", traits = [ | |
|
||
let assemblyFormat = clausesAssemblyFormat # [{ | ||
custom<PrivateReductionRegion>($region, $private_vars, type($private_vars), | ||
$private_syms, $reduction_vars, type($reduction_vars), $reduction_byref, | ||
$private_syms, $reduction_mod, $reduction_vars, type($reduction_vars), $reduction_byref, | ||
$reduction_syms) attr-dict | ||
}]; | ||
|
||
|
@@ -782,7 +782,7 @@ def TaskloopOp : OpenMP_Op<"taskloop", traits = [ | |
custom<InReductionPrivateReductionRegion>( | ||
$region, $in_reduction_vars, type($in_reduction_vars), | ||
$in_reduction_byref, $in_reduction_syms, $private_vars, | ||
type($private_vars), $private_syms, $reduction_vars, | ||
type($private_vars), $private_syms, $reduction_mod, $reduction_vars, | ||
type($reduction_vars), $reduction_byref, $reduction_syms) attr-dict | ||
}]; | ||
|
||
|
@@ -1706,6 +1706,26 @@ def CancellationPointOp : OpenMP_Op<"cancellation_point", clauses = [ | |
let hasVerifier = 1; | ||
} | ||
|
||
def ScanOp : OpenMP_Op<"scan", [ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If I understand the behavior of the corresponding OpenMP directive correctly, it basically splits a loop body at the point it appears. Everything before it is the input phase, and everything after it is the scan phase. In that case, shouldn't the corresponding MLIR operation reflect this by defining either one or two regions? // 2-region alternative.
omp.loop_nest ... {
omp.scan ...
input {
// Everything before '!$omp scan ...'
...
omp.terminator
} scan {
// Everything after '!$omp scan ...'
...
omp.terminator
}
// Nothing else other than omp.yield allowed here.
omp.yield
}
// 1-region alternative.
omp.loop_nest ... {
// Everything before '!$omp scan ...'
...
omp.scan ... {
// Everything after '!$omp scan ...'
...
omp.terminator
}
// Nothing else other than omp.yield allowed here.
omp.yield
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great question. There are a few facts to be considered here. For Simplicity, i will consider body of for-loop as
These are why the current representation is chosen. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My understanding is that clang would be able to base the implementation of However, MLIR transformation passes are able to move operations within a region in a way that would potentially break But I see the problem you point to about the naming of these regions. However, if instead of calling them omp.wsloop reduction(inscan @reduction %0 -> %red0 : !llvm.ptr) {
omp.loop_nest ... {
omp.scan exclusive(%red0 : !llvm.ptr)
pre {
// Everything before '!$omp scan ...'
...
omp.terminator
} post {
// Everything after '!$omp scan ...'
...
omp.terminator
}
omp.yield
}
} The questions that I guess would remain for this approach would be whether we can let common sub-expressions, constants etc. be hoisted out of the CC: @kiranchandramohan, @tblah. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Overall I like @skatrak's proposal The examples for inclusive scans here seem to indicate that we will need to allow yielding values from the As for whether we should allow non-terminators after (or before) the omp.scan, the language used in the standard is As for common subexpression hoisting, I think this is safe if an expression is hoisted all of the way out of the loop (because then it must not have any side effects or depend upon the loop iteration). I'm unsure about putting it before the scan inside of the loop. It would probably easier to generate the LLVMIR if this was not allowed. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The 2-region option proposed by @skatrak would be easier to interface with the OpenMPIRBuilder during translation. The 1-region option might be easier for the lowering in Flang (parse-tree to MLIR). But just want to understand whether the no-region option will definitely lead to an issue.
Are these Operations moving across the scan operation? Would these be only constants or something else? Assuming the scan operation is side-effecting will operations cross it? Generally, from a flang point of view, we typically have loading and storing from memory/variables. Since all variables are from outside the loop region these will be visible in both
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you @tblah and @kiranchandramohan for sharing your thoughts. @anchuraj and I had a call about this the other day, and at the time the 2-region approach made the most sense. But if we can somehow ensure that no MLIR transformations are able to move operations across the With regards to both approaches introducing regions, they have the disadvantage that the current to the PFT to MLIR lowering system doesn't seem well-suited to lower only parts of a block of code, so we thought that, if we followed that approach, then we'd probably want to first produce an If a common subexpression doesn't depend on the loop index, I agree that this would result in no problems for any of the approaches, since it should be hoisted out of the loop body. Same thing with constants, for example. However, one case I had in mind, which I think could potentially result in MLIR optimizations moving operations across the split point would be the subroutine foo(v1, v2, size)
implicit none
integer, intent(in) :: size, v1(size)
integer, intent(out) :: v2(size)
integer :: sum, i
sum = 0
!$omp parallel do reduction(+:sum)
do i = 0, size - 1
sum = sum + v1(i + 1)
!$omp scan inclusive(sum)
v2(i + 1) = sum
end do
!$omp end parallel do
end subroutine foo At the moment that results in separate So I'm thinking that perhaps it would indeed be enough to mark There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you for your suggestions @tblah, @skatrak and @kiranchandramohan.
I believe, the above restriction makes the examples like the one using
With yielding we need to make the updates in first for loop available in second for loop which might require additional transformations. Circling back on @kiranchandramohan's question, is it possible to prevent such optimizations inside There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I guess giving the scan memory effects could work. I can't think of a reason off the top of my head why @skatrak's suggestion to use the memory effects only on the exclusive/inclusive arguments wouldn't work. But just to be safe I would probably do the initial implementation saying it has some broadly specified memory effects. If we follow the recommendations for modelling syscalls given here, I think that should ensure that any observable effects before the scan must stay before (and vice versa). So it would be modeled a bit like
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thank you. I discussed this offline with @skatrak. Based on the discussions and inputs, I am proceeding with implementing no region approach with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Updated the operation with |
||
AttrSizedOperandSegments, MemoryEffects<[MemWrite]> | ||
], clauses = [ | ||
OpenMP_InclusiveClause, OpenMP_ExclusiveClause]> { | ||
let summary = "scan directive"; | ||
let description = [{ | ||
The scan directive allows to specify scan reductions. It should be | ||
enclosed within a parent directive along with which a reduction clause | ||
with `inscan` modifier must be specified. The scan directive allows to | ||
split code blocks into input phase and scan phase in the region | ||
enclosed by the parent. | ||
}] # clausesDescription; | ||
|
||
let builders = [ | ||
OpBuilder<(ins CArg<"const ScanOperands &">:$clauses)> | ||
]; | ||
|
||
let hasVerifier = 1; | ||
} | ||
|
||
//===----------------------------------------------------------------------===// | ||
// 2.19.5.7 declare reduction Directive | ||
//===----------------------------------------------------------------------===// | ||
|
Uh oh!
There was an error while loading. Please reload this page.