[SYCL] Fix wrong except. raising for ALLOWLIST #2719

dm-vodopyanov · 2020-11-02T17:21:21Z

This patch fixes the following situation:

User set

SYCL_DEVICE_ALLOWLIST=DeviceName:{{DEV_NAME}},DriverVersion:{{X.Y.Z}}
SYCL_DEVICE_TYPE=GPU
SYCL_BE=PI_OPENCL // no effect

Machine has

OpenCL GPU device with name DEV_NAME and driver version
X.Y.Z
Level Zero device with name DEV_NAME and driver version A.B.C

User expects that OpenCL GPU device will be selected but instead got
cl::sycl::runtime_error Requested SYCL device not found -1
(CL_DEVICE_NOT_FOUND).

Since GPU driver 20.43.18277, OpenCL GPU and Level Zero device names are
the same. So, DPC++ RT checks OpenCL GPU device first - everything is
ok, then checks Level Zero device, and throws the exception above.

This patch changes behavior - instead of raising exception in case of
device name == device name in SYCL_DEVICE_ALLOWLIST and driver version
!= driver version in SYCL_DEVICE_ALLOWLIST - now DPC++ raise an exception
in case none devices were selected based on a value from
SYCL_DEVICE_ALLOWLIST.

Regression test: LIT test config/select_device.cpp with GPU driver
20.43.18277.

…location." This reverts commit e2fceec. This commit broke one of the bots. Reverting while I investigate.

This diff refactors the code which determines the tool type based on how llvm-objcopy is invoked (objcopy vs strip vs bitcode-strip vs install-name-tool). NFC. Test plan: make check-all Differential revision: https://reviews.llvm.org/D89713

When switching the register debug operands to $noreg in setupDebugValueUndef() also clear the sub-register indices for virtual registers. This is done when marking DBG_VALUEs undef in other cases, e.g. in LiveDebugVariables. I have not found any cases where leaving the sub-register index causes any issues, and the indices would eventually get dropped when LiveDebugVariables reinserted the undef DBG_VALUEs after register scheduling, but if nothing else it looked a bit weird in printouts to have sub-register indices on $noreg, and I don't think the sub-register index holds any meaningful information at that point. I have not been able to find any source-level reproducer for this with an upstream target, so I have just added an instrumented machine-sink test. Reviewed By: djtodoro, jmorse Differential Revision: https://reviews.llvm.org/D89941

Differential revision: https://reviews.llvm.org/D89957

Use isKnownXY comparators when one of the operands can be with scalable vectors or getFixedSize() for all the other cases. This patch also does bug fixes for getPrimitiveSizeInBits by using getFixedSize() near the places with the TypeSize comparison. Differential Revision: https://reviews.llvm.org/D89703

Differential Revision: https://reviews.llvm.org/D88955

These are all inspired by existing test coverage we have in an internal testsuite. Reviewed by: grimar, MaskRay Differential Revision: https://reviews.llvm.org/D89775

@vSK

This patch copies @vSK's fix to instcombine from D85555 over to mem2reg. The motivation and rationale are exactly the same: When mem2reg removes an alloca, it erases the dbg.{addr,declare} instructions which refer to the alloca. It would be better to instead remove all debug intrinsics which describe the contents of the dead alloca, namely all dbg.value(<dead alloca>, ..., DW_OP_deref)'s. As far as I can tell, prior to D80264 these `dbg.value+deref`s would have been silently dropped instead of being made `undef`, so we're just returning to previous behaviour with these patches. Testing: `llvm-lit llvm/test` and `ninja check-clang` gave no unexpected failures. Added 3 tests, each of which covers a dbg.value deletion path in mem2reg: mem2reg-promote-alloca-1.ll mem2reg-promote-alloca-2.ll mem2reg-promote-alloca-3.ll The first is based on the dexter test inlining.c from D89543. This patch also improves the debugging experience for loop.c from D89543, which suffers similarly after arg promotion instead of inlining.

…t iteration that didn't change the op. Before this change, we would run `maxIterations` if the first iteration changed the op. After this change, we exit the loop as soon as an iteration hasn't changed the op. Assuming that we have reached a fixed point when an iteration doesn't change the op, this doesn't affect correctness. Reviewed By: rriddle Differential Revision: https://reviews.llvm.org/D89981

…d allocations. Added optimization pass to convert heap-based allocs to stack-based allocas in buffer placement. Added the corresponding test file. Differential Revision: https://reviews.llvm.org/D89688

This patch adjusts _when_ something happens in LiveDebugValues / InstrRefBasedLDV, to make it more amenable to dealing with DBG_INSTR_REF instructions. There's no functional change. In the current InstrRefBasedLDV implementation, we collect the machine value-number transfer function for blocks at the same time as the variable-value transfer function. After solving machine value numbers, the variable-value transfer function is updated so that DBG_VALUEs of live-in registers have the correct value. The same would need to be done for DBG_INSTR_REFs, to connect instruction-references with machine value numbers. Rather than writing more code for that, this patch separates the two: we collect the (machine-value-number) transfer function and solve for machine value numbers, then step through the MachineInstrs again collecting the variable value transfer function. This simplifies things for the new few patches. Differential Revision: https://reviews.llvm.org/D85760

Split the current NetBSD watchpoint implementation for x86 into Utility, and revamp it to improve readability. This code is meant to be used as a common class for all x86 watchpoint implementation, particularly these on FreeBSD and Linux. The code uses global watchpoint enable bits, as required by the NetBSD kernel. If it ever becomes necessary for any platform to use local enable bits instead, this can be trivially abstracted out. The code also postpones clearing DR6 until a new different watchpoint is being set in place of the old one. This is necessary since LLDB repeatedly reenables watchpoints on all threads, by clearing and restoring them. When DR6 is cleared as a part of that, then pending events on other threads can no longer be associated with watchpoints correctly. Differential Revision: https://reviews.llvm.org/D89874

This patch adds a specialized implementation of getIntrinsicInstrCost and add initial cost-modeling for min/max vector intrinsics. AArch64 NEON support umin/smin/umax/smax for vectors <8 x i8>, <16 x i8>, <4 x i16>, <8 x i16>, <2 x i32> and <4 x i32>. Notably, it does not support vectors with i64 elements. This change by itself should have very little impact on codegen, but in follow-up patches I plan to teach the vectorizers to consider using those intrinsics on platforms where it is profitable, e.g. because there is no general 'select'-like instruction. The current cost returned should be better for throughput, latency and size. Reviewed By: dmgreen Differential Revision: https://reviews.llvm.org/D89953

Differential revision: https://reviews.llvm.org/D90017

…everse. NFCI. This matches bswap and bitreverse intrinsics, so we should make that clear in the function name.

…ng flags. matchBSwapOrBitReverse was hardcoded to just match bswaps - we're going to need to expose the ability to match bitreverse as well, so make this part of the function call.

This patch provides C API for MLIR affine expression. - Implement C API for methods of AffineExpr class. - Implement C API for methods of derived classes (AffineBinaryOpExpr, AffineDimExpr, AffineSymbolExpr, and AffineConstantExpr). Differential Revision: https://reviews.llvm.org/D89856

As discussed in D89952, instcombine can sometimes find a way to reduce similar patterns, but it is incomplete. InstSimplify uses the computeConstantRange() ValueTracking analysis via simplifyICmpWithConstant(), so we just need to fill in the max value of ctpop to process any "icmp pred ctpop(X), C" pattern (the min value is initialized to zero automatically). Differential Revision: https://reviews.llvm.org/D89976

…file In memory VFS cannot handle aceesssing the same file with different paths. This diff just stops using VFS for modulemap files. Fixes PR47839 Differential Revision: https://reviews.llvm.org/D89886

This allows us to check whether enum field is actually sent over the wire or missing. Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D89882

Differential Revision: https://reviews.llvm.org/D90027

Differential Revision: https://reviews.llvm.org/D90030

`llvm::sys::path` is used a lot in the remote index marshalling code. We can save space by avoiding spelling it out explicitly for most functions and times. Reviewed By: kadircet Differential Revision: https://reviews.llvm.org/D90016

This is a search-and-replace of f6cb7f3.

This is a search-and-replace of f6cb7f3

As discussed in D89952, instcombine can sometimes find a way to reduce similar patterns, but it is incomplete. InstSimplify uses the computeConstantRange() ValueTracking analysis via simplifyICmpWithConstant(), so we just need to fill in the max value of ctlz to process any "icmp pred ctlz(X), C" pattern (the min value is initialized to zero automatically). Follow-up to D89976.

As discussed in D89952, instcombine can sometimes find a way to reduce similar patterns, but it is incomplete. InstSimplify uses the computeConstantRange() ValueTracking analysis via simplifyICmpWithConstant(), so we just need to fill in the max value of cttz to process any "icmp pred cttz(X), C" pattern (the min value is initialized to zero automatically). https://alive2.llvm.org/ce/z/Z_SLWZ Follow-up to D89976.

This allows it to have a separate namespace (grpc versioned service) without putting versioning info on all of the other protos (before we need it). clang-index-server is still broken (from 81e5f29). Differential Revision: https://reviews.llvm.org/D90031

s-kanaev

LGTM

These tests can be treated as ESIMD code samples

* [SYCL][NFC] Extend ABI tests to cover device code

As with [750c979], remove the already-existing archive file to ensure correct behavior of the LIT in local environments. Signed-off-by: Artem Gindinson <[email protected]>

Moving the load into the CAS loop greatly improves performance, especially on GPU. Performance improvement measured over 16 million iterations is 20% for CPU and 500% for GPU. (no impact for the host device). Signed-off-by: Chris Perkins <[email protected]>

dm-vodopyanov · 2020-11-06T10:07:34Z

There is some merge conflict, I'm going to resolve it.

This patch fixes the following situation: 1. User set * SYCL_DEVICE_ALLOWLIST=DeviceName:{{DEV_NAME}},DriverVersion:{{X.Y.Z}} * SYCL_DEVICE_TYPE=GPU * SYCL_BE=PI_OPENCL // no effect 2. Machine has * OpenCL GPU device with name DEV_NAME and driver version X.Y.Z * Level Zero device with name DEV_NAME and driver version A.B.C User expects that OpenCL GPU device will be selected but instead got cl::sycl::runtime_error Requested SYCL device not found -1 (CL_DEVICE_NOT_FOUND). Since GPU driver 20.43.18277, OpenCL GPU and Level Zero device names are the same. So, DPC++ RT checks OpenCL GPU device first - everything is ok, then checks Level Zero device, and throws the exception above. This patch changes behaviour - instead of raising exception in case of device name == device name in SYCL_DEVICE_ALLOWLIST and driver version != driver version in SYCL_DEVICE_ALLOWLIST - now DPC++ raising exception in case both device name and driver version are not equal the requested ones in SYCL_DEVICE_ALLOWLIST. Regression test: LIT test config/select_device.cpp with GPU driver 20.43.18277.

select_device test

…ps://github.com/dm-vodopyanov/llvm into private/dvodopya/fix-crash-in-allow-list-filter

dm-vodopyanov · 2020-11-06T10:19:03Z

Corrupted PR. Will re-open

dm-vodopyanov · 2020-11-06T11:18:30Z

Opened #2739

Signed-off-by: Marcos Maronas <[email protected]> Original commit: KhronosGroup/SPIRV-LLVM-Translator@52ba01c7e56cf4a

lhames and others added 30 commits October 22, 2020 23:21

Revert "[JITLink][ELF] Add support for ELF::R_X86_64_REX_GOTPCRELX re…

0f91038

…location." This reverts commit e2fceec. This commit broke one of the bots. Reverting while I investigate.

[ARM][SchedModels] Let ldm* instruction scheduling use MCSchedPredicate

7a78073

Differential revision: https://reviews.llvm.org/D89957

[llvm-mca] Add test for cortex-a57 NEON instructions

ffc0f57

[AMDGPU] Add simplification/combines for llvm.amdgcn.fmul.legacy

86a480e

Differential Revision: https://reviews.llvm.org/D88955

[lld][ELF][test] Add additional test coverage for LTO

342040b

These are all inspired by existing test coverage we have in an internal testsuite. Reviewed by: grimar, MaskRay Differential Revision: https://reviews.llvm.org/D89775

[MLIR] Added PromoteBuffersToStackPass to convert heap- to stack-base…

0d1d363

…d allocations. Added optimization pass to convert heap-based allocs to stack-based allocas in buffer placement. Added the corresponding test file. Differential Revision: https://reviews.llvm.org/D89688

[ARM][SchedModels] Convert IsR1P0AndLaterPred to MCSchedPredicate. NFC

cb86522

Differential revision: https://reviews.llvm.org/D90017

[X86] lowerShuffleWithPERMV - use MVT::changeTypeToInteger helper. NFCI.

936ef89

[InstCombine] Rename InstCombinerImpl::matchBSwap to matchBSwapOrBitR…

19a13bf

…everse. NFCI. This matches bswap and bitreverse intrinsics, so we should make that clear in the function name.

[InstCombine] matchBSwapOrBitReverse - expose bswap/bitreverse matchi…

1cab3bf

…ng flags. matchBSwapOrBitReverse was hardcoded to just match bswaps - we're going to need to expose the ability to match bitreverse as well, so make this part of the function call.

[clang-tidy] Fix redefinition of module in the same module.modulemap …

55a2dee

…file In memory VFS cannot handle aceesssing the same file with different paths. This diff just stops using VFS for modulemap files. Fixes PR47839 Differential Revision: https://reviews.llvm.org/D89886

[clangd] Migrate to proto2 syntax

421a2a0

This allows us to check whether enum field is actually sent over the wire or missing. Reviewed By: sammccall Differential Revision: https://reviews.llvm.org/D89882

[CMake] generate_grpc_protos -> generate_protos(... GRPC). NFC

81f7b2a

Differential Revision: https://reviews.llvm.org/D90027

[CMake] Fix hardcoding of protobuf output basename. NFC

2d25004

Differential Revision: https://reviews.llvm.org/D90030

[InstSimplify] add tests for ctlz constant range; NFC

9bcb437

This is a search-and-replace of f6cb7f3.

[InstSimplify] add tests for cttz constant range; NFC

0351bd9

This is a search-and-replace of f6cb7f3

s-kanaev previously approved these changes Nov 5, 2020

View reviewed changes

vladimirlaz and others added 6 commits November 5, 2020 09:31

[SYCL] Fix link to ESIMD tests (#2736)

7521fd4

These tests can be treated as ESIMD code samples

[SYCL][NFC] Extend ABI tests to cover device code (#2725)

2934aac

* [SYCL][NFC] Extend ABI tests to cover device code

[SYCL][Test] Remove leftovers for FPGA archives (#2735)

6057e93

As with [750c979], remove the already-existing archive file to ensure correct behavior of the LIT in local environments. Signed-off-by: Artem Gindinson <[email protected]>

[SYCL] Move tests with dependencies to on-device directory (#2732)

e67681b

[BuildBot] OpenCL CPU/FPGAEMU driver uplift (#2733)

e79a0c9

dm-vodopyanov added 5 commits November 6, 2020 13:14

Remove exception from filterAllowList() and update

9728d98

select_device test

Apply missing clang-format

a84fdfe

Apply CR comments

01c8340

Merge branch 'private/dvodopya/fix-crash-in-allow-list-filter' of htt…

824d977

…ps://github.com/dm-vodopyanov/llvm into private/dvodopya/fix-crash-in-allow-list-filter

dm-vodopyanov dismissed s-kanaev’s stale review via 824d977 November 6, 2020 10:18

dm-vodopyanov requested review from AGindinson, AlexeySachkov, AlexeySotkin, asavonic, bader, elizabethandrews, Fznamznon, mdtoguchi, premanandrao, smaslov-intel, vzakhari and a team as code owners November 6, 2020 10:18

dm-vodopyanov closed this Nov 6, 2020

jsji pushed a commit that referenced this pull request Sep 21, 2024

Fix typo in intrinsic name in a comment. (#2719)

22359e7

Signed-off-by: Marcos Maronas <[email protected]> Original commit: KhronosGroup/SPIRV-LLVM-Translator@52ba01c7e56cf4a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL] Fix wrong except. raising for ALLOWLIST #2719

[SYCL] Fix wrong except. raising for ALLOWLIST #2719

Uh oh!

dm-vodopyanov commented Nov 2, 2020 •

edited

Loading

Uh oh!

s-kanaev left a comment

Uh oh!

dm-vodopyanov commented Nov 6, 2020

Uh oh!

dm-vodopyanov commented Nov 6, 2020

Uh oh!

dm-vodopyanov commented Nov 6, 2020

Uh oh!

Uh oh!

[SYCL] Fix wrong except. raising for ALLOWLIST #2719

[SYCL] Fix wrong except. raising for ALLOWLIST #2719

Uh oh!

Conversation

dm-vodopyanov commented Nov 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

s-kanaev left a comment

Choose a reason for hiding this comment

Uh oh!

dm-vodopyanov commented Nov 6, 2020

Uh oh!

dm-vodopyanov commented Nov 6, 2020

Uh oh!

dm-vodopyanov commented Nov 6, 2020

Uh oh!

Uh oh!

dm-vodopyanov commented Nov 2, 2020 •

edited

Loading