Skip to content

better cython debugging #53821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Oct 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
.mesonpy-native-file.ini
MANIFEST
compile_commands.json
debug
.debug

# Python files #
################
Expand Down
30 changes: 28 additions & 2 deletions doc/source/development/debugging_extensions.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ For Python developers with limited or no C/C++ experience this can seem a daunti
2. `Fundamental Python Debugging Part 2 - Python Extensions <https://willayd.com/fundamental-python-debugging-part-2-python-extensions.html>`_
3. `Fundamental Python Debugging Part 3 - Cython Extensions <https://willayd.com/fundamental-python-debugging-part-3-cython-extensions.html>`_

Generating debug builds
-----------------------
Debugging locally
-----------------

By default building pandas from source will generate a release build. To generate a development build you can type::

Expand All @@ -27,6 +27,32 @@ By default building pandas from source will generate a release build. To generat

By specifying ``builddir="debug"`` all of the targets will be built and placed in the debug directory relative to the project root. This helps to keep your debug and release artifacts separate; you are of course able to choose a different directory name or omit altogether if you do not care to separate build types.

Using Docker
------------

To simplify the debugging process, pandas has created a Docker image with a debug build of Python and the gdb/Cython debuggers pre-installed. You may either ``docker pull pandas/pandas-debug`` to get access to this image or build it from the ``tooling/debug`` folder locallly.

You can then mount your pandas repository into this image via:

.. code-block:: sh

docker run --rm -it -w /data -v ${PWD}:/data pandas/pandas-debug

Inside the image, you can use meson to build/install pandas and place the build artifacts into a ``debug`` folder using a command as follows:

.. code-block:: sh

python -m pip install -ve . --no-build-isolation --config-settings=builddir="debug" --config-settings=setup-args="-Dbuildtype=debug"

If planning to use cygdb, the files required by that application are placed within the build folder. So you have to first ``cd`` to the build folder, then start that application.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bug in Cython, which still uses optparse (deprecated in 3.2) insted of argparse. Seems like the tool should allow you to specify the build folder in theory, but in practice doesn't work. Will push up a PR separately to Cython to fix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


.. code-block:: sh

cd debug
cygdb

Within the debugger you can use `cygdb commands <https://docs.cython.org/en/latest/src/userguide/debugging.html#using-the-debugger>`_ to navigate cython extensions.

Editor support
--------------

Expand Down
10 changes: 9 additions & 1 deletion pandas/_libs/meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -101,12 +101,20 @@ libs_sources = {
'writers': {'sources': ['writers.pyx']}
}

cython_args = [
'--include-dir',
meson.current_build_dir(),
'-X always_allow_keywords=true'
]
if get_option('buildtype') == 'debug'
cython_args += ['--gdb']
endif

foreach ext_name, ext_dict : libs_sources
py.extension_module(
ext_name,
ext_dict.get('sources'),
cython_args: ['--include-dir', meson.current_build_dir(), '-X always_allow_keywords=true'],
cython_args: cython_args,
include_directories: [inc_np, inc_pd],
dependencies: ext_dict.get('deps', ''),
subdir: 'pandas/_libs',
Expand Down
11 changes: 10 additions & 1 deletion pandas/_libs/tslibs/meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,20 @@ tslibs_sources = {
'vectorized': {'sources': ['vectorized.pyx']},
}

cython_args = [
'--include-dir',
meson.current_build_dir(),
'-X always_allow_keywords=true'
]
if get_option('buildtype') == 'debug'
cython_args += ['--gdb']
endif

foreach ext_name, ext_dict : tslibs_sources
py.extension_module(
ext_name,
ext_dict.get('sources'),
cython_args: ['--include-dir', meson.current_build_dir(), '-X always_allow_keywords=true'],
cython_args: cython_args,
include_directories: [inc_np, inc_pd],
dependencies: ext_dict.get('deps', ''),
subdir: 'pandas/_libs/tslibs',
Expand Down
3 changes: 3 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -418,6 +418,9 @@ def maybe_cythonize(extensions, *args, **kwargs):

kwargs["nthreads"] = parsed.parallel
build_ext.render_templates(_pxifiles)
if debugging_symbols_requested:
kwargs["gdb_debug"] = True

return cythonize(extensions, *args, **kwargs)


Expand Down
35 changes: 35 additions & 0 deletions tooling/debug/Dockerfile.pandas-debug
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
FROM ubuntu:latest

RUN apt-get update && apt-get upgrade -y
RUN apt-get install -y build-essential git valgrind

# cpython dev install
RUN git clone -b 3.10 --depth 1 https://github.com/python/cpython.git /clones/cpython
RUN apt-get install -y libbz2-dev libffi-dev libssl-dev zlib1g-dev liblzma-dev libsqlite3-dev libreadline-dev
RUN cd /clones/cpython && ./configure --with-pydebug && CFLAGS="-g3" make -s -j$(nproc) && make install

# gdb installation
RUN apt-get install -y wget libgmp-dev
RUN cd /tmp && wget http://mirrors.kernel.org/sourceware/gdb/releases/gdb-12.1.tar.gz && tar -zxf gdb-12.1.tar.gz
RUN cd /tmp/gdb-12.1 && ./configure --with-python=python3 && make -j$(nproc) && make install
RUN rm -r /tmp/gdb-12.1

# pandas dependencies
RUN python3 -m pip install \
cython \
hypothesis \
ninja \
numpy \
meson \
meson-python \
pytest \
pytest-asyncio \
python-dateutil \
pytz \
versioneer[toml]

# At the time this docker image was built, there was a bug/limitation
# with meson where only having a python3 executable and not python
# would cause the build to fail. This symlink could be removed if
# users stick to always calling python3 within the container
RUN ln -s /usr/local/bin/python3 /usr/local/bin/python
19 changes: 19 additions & 0 deletions tooling/debug/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
The Docker image here helps to set up an isolated environment containing a debug version of Python and a gdb installation which the Cython debugger can work with.

If you have internet access, you can pull a pre-built image via

```sh
docker pull pandas/pandas-debug
```

To build the image locally, you can do

```sh
docker build . -t pandas/pandas-debug -f Dockerfile.pandas-debug
```

For pandas developers, you can push a new copy of the image to dockerhub via

```sh
docker push pandas/pandas-debug
```