Skip to content

segment fault during level zero lib exit on CentOS7.4 #675

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bosheng1 opened this issue Sep 7, 2023 · 2 comments
Closed

segment fault during level zero lib exit on CentOS7.4 #675

bosheng1 opened this issue Sep 7, 2023 · 2 comments
Labels

Comments

@bosheng1
Copy link

bosheng1 commented Sep 7, 2023

ran xpu-smi on dGPU environemnt, met segement fault. centos7.4 met this issue, ubuntu20.04 works well.
levelzero source:
repository: https://github.com/intel/compute-runtime
branch: releases/23.22
revision: e75654a

xpu-smi discovery

+-----------+--------------------------------------------------------------------------------------+
| Device ID | Device Information |
+-----------+--------------------------------------------------------------------------------------+
| 0 | Device Name: Intel(R) Data Center GPU Flex 140 |
| | Vendor Name: Intel(R) Corporation |
| | UUID: 00000000-0000-0000-9e34-11e0b30e7c0a |
| | PCI BDF Address: 0000:9e:00.0 |
| | DRM Device: /dev/dri/card0 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+
| 1 | Device Name: Intel(R) Data Center GPU Flex 140 |
| | Vendor Name: Intel(R) Corporation |
| | UUID: 00000000-0000-0000-f033-d4dbd6c46f8f |
| | PCI BDF Address: 0000:a2:00.0 |
| | DRM Device: /dev/dri/card1 |
| | Function Type: physical |
+-----------+--------------------------------------------------------------------------------------+
Segmentation fault (core dumped)

gdb callback
#0 0x00007ffff6873598 in __memcpy_ssse3_back () from /lib64/libc.so.6
#1 0x00007ffff72dda74 in std::string::append(std::string const&) () from /lib64/libstdc++.so.6
#2 0x00007ffff1c8eb38 in driverHandleDestructor () at /home/media/compute-runtime/level_zero/core/source/linux/driver_teardown.cpp:31

void attribute((destructor)) driverHandleDestructor() {
std::string loaderLibraryName= "lib" + L0::loaderLibraryFilename + ".so.1";
L0::setDriverTeardownHandleInLoader(loaderLibraryName);
L0::globalDriverTeardown();
}
after triage, found variable L0::loaderLibraryFilename is released during driverHandleDestructor, so segment fault happen. it works, when using std::string loaderLibraryName = "libze_loader.so.1";
maybe registering atexit callback is better.

@JablonskiMateusz
Copy link
Contributor

Hi @bosheng1. Thanks for reporting the issue. Could you please check if 4f68822 commit helps for the issue?

@bosheng1
Copy link
Author

@JablonskiMateusz thanks for your quick fix! no segment fault is found with picking up commit 4f68822

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants