Skip to content

fuzz-tests: add a test for handle_peer_error_or_warning() #8304

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

Chand-ra
Copy link

handle_peer_error_or_warning() in common/read_peer_message is responsible for parsing any incoming error or warning messages as defined in BOLT #1. Add a test for it.

Checklist

Before submitting the PR, ensure the following tasks are completed. If an item is not applicable to your PR, please mark it as checked:

  • The changelog has been updated in the relevant commit(s) according to the guidelines.
  • Tests have been added or modified to reflect the changes.
  • Documentation has been reviewed and updated as needed.
  • Related issues have been listed and linked, including any that this PR closes.

The FUZZ_COMMON_OBJS list roughly follows lexicographic
order. Make it adhere strictly to the order. This makes adding
and reviewing changes to the file easier.
@Chand-ra
Copy link
Author

Chand-ra commented May 22, 2025

@morehouse , I think this test has exposed a bug, although I'm not 100% confident.

We're injecting the fuzzer input as msg to handle_peer_error_or_warning(struct per_peer_state *pps, const u8 *msg). I've looked into this function's callers and I'm pretty sure msg is derived from the wire in most of them. I set up a similar test for the ping-pong message handler:

void run(const uint8_t *data, size_t size)
{
    u8 *pong;
    u8 *buf = tal_dup_arr(tmpctx, u8, data, size, 0);
    check_ping_make_pong(tmpctx, buf, &pong);

    clean_tmpctx();
}

and that works without any breakage.

@Chand-ra
Copy link
Author

Chand-ra commented Jun 2, 2025

Hey @morehouse, like we discussed earlier, I investigated the breakage caused by this test:

common/daemon_conn.c:161:18: runtime error: member access within null pointer of type 'struct daemon_conn'
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior common/daemon_conn.c:161:18 
MS: 0 ; base unit: 0000000000000000000000000000000000000000

And it looks like the breakage occurs due to common/status.c::status_send(const u8 *msg) passing an uninitialized struct daemon_conn to daemon_conn_send(struct daemon_conn *dc, u8 *msg) which then tries accessing within this pointer and causing the break.

I've faced this issue in a couple of other targets that I'm working on as well, and it doesn't take a lot of fuzzing time to pop up so I'm not sure if this is an actual bug or if I'm missing something in the target setup. I'd appreciate it if you could take a look.

@morehouse
Copy link
Contributor

And it looks like the breakage occurs due to common/status.c::status_send(const u8 *msg) passing an uninitialized struct daemon_conn to daemon_conn_send(struct daemon_conn *dc, u8 *msg) which then tries accessing within this pointer and causing the break.

I ran the fuzz target with UBSAN_OPTIONS=print_stacktrace=1 to get a stack trace:

common/daemon_conn.c:161:18: runtime error: member access within null pointer of  type 'struct daemon_conn'
    #0 in daemon_conn_send common/daemon_conn.c:161:18
    #1 in status_vfmt common/status.c:154:2                                                                            
    #2 in status_fmt common/status.c:165:2                                                                             
    #3 in handle_peer_error_or_warning common/read_peer_msg.c:29:3                                                     
    #4 in run tests/fuzz/fuzz-error-warning.c:14:5                                                                     

From this it's clear the initialized global is status_conn. This is a big hint that the fuzz target isn't initializing everything it needs to.

I think probably you need to add the following lines to the target's init:

int devnull = open("/dev/null", O_WRONLY);
assert(devnull >= 0);
status_setup_sync(devnull);

@Chand-ra Chand-ra marked this pull request as ready for review June 18, 2025 07:30
@Chand-ra
Copy link
Author

This test is now ready to review.

Copy link
Contributor

@morehouse morehouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking closer at this target, I'm not sure how much benefit it really provides.

handle_peer_error_or_warning seems like a relatively simple function. I'm not sure it's worth adding ~200 new files to test it... WDYT?

Comment on lines 15 to 16
void peer_failed_connection_lost(void)
{ longjmp(exit_jmp, 1); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't find any place where peer_failed_connection_lost is called from this fuzz target. Perhaps we can just abort if it does get called?

@Chand-ra
Copy link
Author

Looking closer at this target, I'm not sure how much benefit it really provides.

handle_peer_error_or_warning seems like a relatively simple function. I'm not sure it's worth adding ~200 new files to test it... WDYT?

Right, makes me wonder why this happens though. Shouldn't the corpus for a relatively simpler target like this be smaller as well?

Chandra Pratap added 2 commits June 19, 2025 07:42
Changelog-None: `handle_peer_error_or_warning()` in
`common/read_peer_message.{c, h}` is responsible for parsing any
incoming `error` or `warning` messages as defined in BOLT ElementsProject#1.

Add a test for it.
Add a minimal input set as a seed corpus for the newly introduced
test. This leads to discovery of interesting code paths faster.
@morehouse
Copy link
Contributor

Right, makes me wonder why this happens though. Shouldn't the corpus for a relatively simpler target like this be smaller as well?

Yes, simpler targets should generally have smaller corpora. Other fuzz targets for CLN have corpus sizes that range from ~50 inputs to ~2000 inputs, so 200 isn't abnormal.

If libFuzzer used only coverage metrics to pick corpus elements, the corpora would be smaller. But libFuzzer also takes other input "features" into consideration, which increases corpus sizes.

The one thing we can do to decrease corpus sizes is use run.py to minimize the corpus before submitting the PR.

Copy link
Contributor

@morehouse morehouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 55cd685

Looking even closer, I think this is worth fuzzing. The function sanitize_error is called, which does some string operations that are good to test.

Tested and verified that the current corpus is minimal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants