Skip to content

Conversation

@rabihchrabieh
Copy link

@rabihchrabieh rabihchrabieh commented Jun 27, 2025

Description

Added HARQ support for 5G LDPC codes.
Created a Jupyter notebook describing the functionality and tests it.

Currently, the HARQ functionality does not support pruning (TODO for RV0, etc).
Graph mode is working but should be revisited.

All (or most) existing tests related to 5G LDPC pass.
No new unit test was created (TODO).

The additional API does not impact the existing API.

Fixed a few unrelated tests that were previously broken (cn_type changed to cn_update).

Checklist

  • Detailed description
  • Added references to issues and discussions
  • Added / modified documentation as needed
  • Added / modified unit tests as needed
  • Passes all tests
  • Lint the code
  • Performed a self review
  • Ensure you Signed-off the commits. Required to accept contributions!
  • [ ] Co-authored with someone? Add Co-authored-by: user@domain and ensure they signed off their commits too.

@SebastianCa
Copy link
Collaborator

Hi @rabihchrabieh,

Thank you for your contribution to Sionna. We will review the code shortly and get back to you.

@SebastianCa
Copy link
Collaborator

Hi @rabihchrabieh,

I started reviewing the PR, and overall, it looks good.

However, my main concern at this stage is the API design and its compatibility with graph/XLA mode.
The current implementation likely introduces unwanted side effects in graph mode: setting rv outside the graph after compilation will have no effect unless tracing is re-triggered. Additionally, dynamic batch sizes are problematic in this context.

I think we should clarify the API before proceeding further.

Would it be an option to pass rv directly during the call?

  • Encoder: This should be straightforward since it only changes the slicing position. For example: encoder(u, rv=2)
  • Decoder: This is trickier because it should remain stateless (no internal memory in graph mode).
    One option could be to stack all received RV versions, i.e., the input shape is [bs, num_rvs, n].

Alternatively, the encoder could return a stacked version of all RVs (only if this mode is explicitly enabled), which could then be fed directly into the decoder using the same API.

What do you think?

A few additional remarks:

  • Do you plan to include the notebook as a Sionna tutorial? If yes, it should provide more background—e.g., explaining HARQ in 5G and the concept of RV. That would be a great contribution.
  • Since the effective rates differ, does it make sense to plot results in terms of Eb/N0? The current SNR offset feels somewhat ad hoc. Perhaps Es/N0 would be a better choice?
  • We need unittests before merging. Would it be possible to test against the “classical” decoder (rv=0) as a reference for the equivalent lower rate?
  • Why is there a modification in test_mimo_flat_fading.py?

@rabihchrabieh
Copy link
Author

Thank you, @SebastianCa, for the helpful feedback.

We can definitely move the "rv" to the function parameters — that makes sense and ensures compatibility with graph/XLA mode. We might also allow passing "n" optionally, in case the user wants to test combinations like (rv0, n0), (rv1, n1).

Regarding stacking all RVs inside the encoder: I think that adds unnecessary complexity. In most scenarios, we want to test RVs through time-varying channels, so the stacked output would need to be unstacked anyway before transmission — which somewhat defeats the purpose.

For the decoder, I agree it should remain stateless. A simple and flexible approach is to accumulate the received RVs outside the decoder. This keeps the decoder logic clean and allows users to apply any desired accumulation method or weighting strategy.

That said, we could optionally provide a utility method for accumulation within the class (but not tied to the decoder itself), to support convenience without introducing state.

Let me know if this approach works for you or if you'd prefer something more integrated.

I will extend the notebook into a tutorial with background on HARQ and RVs, and switch to Es/N₀ for plotting to better reflect varying code rates. I will also add unittests comparing the HARQ decoder to the classical (rv=0) case under equivalent redundancy.

Good catch: the change to test_mimo_flat_fading.py wasn’t intentional and must have slipped in during testing. I’ll remove it from the PR.

@rabihchrabieh
Copy link
Author

As an alternative, I’d like to propose a simpler structure when HARQ mode is activated:

  • encode(u) returns the full codeword (n_cb); no rate matching inside the encoder.
  • decode(llr) takes the full-length soft input (n_cb); no rate matching or accumulation inside the decoder.
  • Rate matching (RV slicing) and accumulation are handled externally in the simulation.

This keeps encode and decode clean, stateless, and graph-compatible, and avoids repeated calls to encode(u) when testing multiple RVs — the full n_cb output is stored once and sliced as needed for retransmissions.

A harq_mode flag is provided as a function parameter to both encode and decode. This supports backward compatibility:

  • When harq_mode=False, we follow the standard RV0 behavior: the first 2*Z bits are skipped, and pruning may be applied.
  • When harq_mode=True, we start from bit 0, and pruning is disabled — suitable for HARQ retransmissions.

A first transmission can use harq_mode=False, while retransmissions typically use harq_mode=True.

Let me know if this direction would be acceptable.

@SebastianCa
Copy link
Collaborator

SebastianCa commented Jul 28, 2025

Hi @rabihchrabieh,
my feeling is that an additional HARQ Block makes the API usage more complicated (from a user perspective); I would suggest keeping it inside of the encoder/decoder.

Here are my thoughts:
If we pass a list of rvs to the encoder, the user still has the flexibility to directly generate all rv-versions during the first encoding.
Simple example:

  1. ) x = encode(u) returns [bs, n_ldpc] bits
  2. ) x = encode(u, rv=["rv3",]) returns [bs, n_ldpc] bits or [bs, 1, n_ldpc] bits (to be discussed)
  3. ) x = encode(u, rv=["rv0", "rv1"])returns [bs, 2, n_ldpc] bits

As the decoder has no internal state, we must provide all rv versions anyhow. Here I would suggest to simply stack the inputs. In case a user wants to do advanced experiments, e.g., only decode "rv3", one could stack with zero llrs

x_rv3 = encoder(u, rv=["rv3",])
llr_3 = channel(x_rv3,...)
llr = tf.concat(tf.zeros((bs,2,n_ldpc), llr_rv3, axis=0)
x_hat = decoder(llr)

Otherwise, one could then just stack the different rv versions (e.g., from transmission over different instances of time-varying channels) and run the decoder once. For example

# initial transmission
x_rv1 = encoder(u, rv=["rv1",])
llr_1 = channel(x_rv1,...)
x_hat1 = decoder(llr_1)
# 1st HARQ
x_rv2 = encoder(u, rv=["rv2",])
llr_2 = channel(x_rv1,...)
llr = tf.concat((llr_1, llr_2))
x_hat2 = decoder(llr)
...

Or alternatively, do it as one-shot experiment (assuming the channel model supports this)

x = encoder(u, rv=["rv1","rv2"])
# possibly reshape for compatibility with advanced channel models
# x = tf.reshape(x, (-1,n_ldpc))
llr= channel(x,...)
# undo reshaping
# llr = tf.reshape(llr, (-1,2,n_ldpc))
x_hat = decoder(llr)

The overhead in the decoder should be fairly small as it boils down to gathering and adding llrs before decoding.

I would always disable pruning if harq_mode is active.

What do you think, does that make sense?

@rabihchrabieh
Copy link
Author

Hi @SebastianCa,

Thanks again for the detailed input.

Before moving forward, I wanted to clarify a few points to make sure I understand the proposed design fully — especially in the context of stacked RVs.

Suppose a user provides [rv0, rv0, rv2, rv1] to the encoder. Then:

  1. Encoder behavior: I assume the encoder runs once, producing the full n_ldpc block, and then applies RV-specific zeroing to generate 4 stacked versions — each masking out the bits not selected by its corresponding RV.

  2. Channel interface: The encoder’s output cannot be transmitted directly, since only n (not n_ldpc) bits are sent per RV. This implies a step is needed to reduce each RV’d version to its n active bits.
    Is the plan to perform this reduction:

    • Inside the encoder, embedding rate matching directly into encode()?
    • Or via a separate method (e.g., get_rv_bits()), to keep encode() clean while keeping rate matching within the same LDPC class?
  3. Decoder input realignment: On the decoder side, if the user stacks received vectors (each of length n), they need to be realigned or zero-padded back to n_ldpc before decoding. This seems like another form of rate matching.
    Should this happen:

    • Inside decode() itself, using an RV list?
    • Or externally, through a method like accumulate_rv_bits() (of the LDPC class) that maps the n bits back into the full n_ldpc domain?
      Also, just to confirm: the decoder should not assume a fixed RV order (e.g., always [rv0, rv1, rv2, ...]). Users may transmit any RVs in any order, or repeat RVs if needed.

Alternatively, is the idea to transmit all n_ldpc bits through the channel, even though only n are meaningful per RV — and then mask the untransmitted bits at the decoder (setting them to 0)? That would deviate from standard behavior and may complicate things if the user is constrained to transmit only n bits through various blocks (like interleaving, IFFT) and channel.

Just trying to get a clearer picture of how you see these components fitting together.

@SebastianCa
Copy link
Collaborator

Hi @rabihchrabieh,

ah, my bad - sorry for confusion; when saying n_ldpc I was actually referring to n, i.e., the codeword length after rate-matching.
I forgot that n_ldpc is also an attribute of the encoder/decoder.
Short example to clarify:

enc = LDPC5GEncoder(k=64, n=128, harq=True)
dec = LDPC5GDecoder(enc)

u = source([bs,64])
x = encoder(u, rv=["rv1","rv2"])
# shape of x is [bs, 2, 128]
y = channel(x,...) 
# shape of y is [bs, 2, 128]
x_hat = decoder(y)
# shape of x_hat is [bs, 64]

Comment: Mapping/demapping is not shown here

regarding your questions:
1.) encoder behavior: yes, the decoder should internally generate all n_ldpc bits and generates the requested "rv versions" by gathering from the right positions (or slicing&stacking).
2.) channel interface: see above, the encoder should produce rv versions of length n that can be directly transmitted.
3:) decoder re-alignment: see above, it should be n LLRs per rv; rate matching must then happen inside the decoder (as currently done)

In my eyes, that's the best version to give users the flexibility to simulate realistic HARQ scenarios (i.e., re-transmission of full code blocks) without adding a lot of complexity to the API.

@rabihchrabieh
Copy link
Author

Sounds good — static RV lists.

One suggestion: should the decoder also take the same rv list as input, like the encoder? This would make it easier to test cases like ["rv0", "rv0", "rv2"], and also allows passing an optional weight vector per RV (e.g., based on estimated SNR) for soft combining before decoding.

Let me know.

@SebastianCa
Copy link
Collaborator

Yes, we could add rv as input to the decoder. Would it be an option to make it optional, i.e., if None is provided, the standard order of rvs is used? I think it's a fairly exotic use-case to have ["rv0", "rv0", "rv2"], right?

Why do we need weighting? shouldn't the LLR already be scaled by the reliability/SNR? I would try to keep the API streamlined. Applying weights could be also done manually before feeding the LLRs to the decoder?

@rabihchrabieh
Copy link
Author

OK, making "rv" optional in the decoder sounds good. We can default to the standard RV order (["rv0", "rv2", "rv3", "rv1"]), which aligns with typical HARQ scheduling.

Just to clarify: sequences like ["rv0", "rv2"] or ["rv0", "rv2", "rv3"] are quite common. Repeating an RV, such as ["rv0", "rv0"], can also occur — for example, if the initial transmission was interfered with, it's sometimes better to retransmit RV0 rather than move to RV2. So having the option to explicitly specify the RV list is useful — especially for comparing cases like ["rv0", "rv2"] vs. ["rv0", "rv0"].

Regarding weighting: you're right that LLR scaling is ideally handled by the user. In some practical setups, LLRs are quantized to 8 bits after each RV reception. In such cases, to combine with a new RV, the stored LLRs must be dequantized, rescaled (e.g., based on SNR), accumulated and re-quantized. But I agree — it's best to leave this under user control.

I will proceed with this implementation.

@SebastianCa
Copy link
Collaborator

Sounds good!

@rabihchrabieh
Copy link
Author

Hi @SebastianCa,

To display Es/No, it seems that it's not currently exposed in the PlotBER class. I believe I need to add a new ebno argument to call and pass it to plot_ber.

Do you have a different suggestion or a preferred way to handle this?

Thanks for your help!

@SebastianCa
Copy link
Collaborator

The straightforward way is to not call the ebno2db function and calculate the noise variance no manually (or just set the rate to 1); see FEC tutorial for an example.

@rabihchrabieh
Copy link
Author

The issue is with displaying and printing the word EsNo on graphs and results tables. I had to make small changes.

@rabihchrabieh
Copy link
Author

I've pushed the updated, squashed commit with the refactored HARQ implementation per feedback. Ready for re-review.

  • Squashed to one commit rebased on latest main
  • New encoder/decoder RV API per feedback
  • Removed old approach
  • Updated tutorial with introduction to 5G HARQ
  • Added unit test

- Implemented HARQ mode for LDPC encoder/decoder with RV selection
- Added HARQ tutorial with AWGN BLER performance example
- Added unit tests

Signed-off-by: Rabih Chrabieh <[email protected]>
@SebastianCa
Copy link
Collaborator

Great - we'll review the code soon.

modified docstrings
removed (uneded) public functions 
minor code changes
raise ValueError("Last dimension must be of length k.")

def call(self, bits):
def call(self, bits, rv=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with rv as a (Python) list is that it will re-trace the graph whenever the input signature changes, i.e., TF will trace one graph for each (used) combination of rv.
This can be ok as we (typically) have a small amount of possibilities

An alternative would be to provide a list of tf.ints and gather the rv_starts from another tensor. This could work in a dynamic graph and avoids re-tracing. However, the code becomes less readable (and the wrap-arounds may become a bit ugly).

Copy link
Author

@rabihchrabieh rabihchrabieh Nov 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This likely complicates the code, and only partially solves the problem: if we have a different number of rv, then it needs to re-trace. If we set it to always max number of rv (how many retransmissions is that?) then it's a lot of overhead and complex to manage the masked rv. Let me know if you have a particular solution in mind.

Copy link
Collaborator

@SebastianCa SebastianCa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reviewed all files except the ipynb. This can be done once the code is ready

def encode_harq():
bits = source([batch_size, k])
x_ref = encoder_ref(bits) # Shape: [batch_size, n_cb]
x_ref = tf.roll(x_ref, shift=starts["rv0"], axis=-1) # Undo shift of RV0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to shift the reference here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

encoder_ref outputs the full n_cb buffer but shifted by 2Z, as this is the original non-HARQ encoder. Here we undo this shift by 2Z for the correct, full and unshifted buffer of n_cb bits; then it's easier to make comparisons to the HARQ mode using rv_list.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the logic a bit

@SebastianCa
Copy link
Collaborator

SebastianCa commented Aug 21, 2025

Hi @rabihchrabieh,
I've reviewed the code and added a lot of comments. Please have a look.

Keep in mind that the LDPC Block is one of Sionna's core components and we should be very careful with changes are not strictly related to HARQ. And it's important that we maintain backwards compatibility (ideally no breaking changes).

A few big picture comments:
1.) The current implementation does not support dynamic reconfiguration of the rv version; this means for each new rv configuration (i.e., new encoder/decoder input signature) TensorFlow will trace a new graph. As an example decoder(llr, "rv0") and decoder(llr, "rv1") will result in two different graphs (2x compilation time). One could alternatively feed tf.ints and use tf.gather to get the start_pos indices, however, if the input shapes changes re-tracing is still required. Not sure what you think?
2.) We want to keep the API doc and the user experience as clean as possible. We should only expose what is really useful for users.

Besides that, the PR looks very promising.

btw: I'm using the Github comment function; Thus, I haven't executed the code locally, my changes may have introduced a few minor incompatibilities.

@rabihchrabieh
Copy link
Author

Hi @SebastianCa, thanks for your thorough review and good feedback.

I agree with most of your comments and suggestions but as I am currently on travel, it will take a bit of time to respond.

@SebastianCa
Copy link
Collaborator

No worries - there is no deadline. We'll wait for your response.

@SebastianCa
Copy link
Collaborator

Hi @rabihchrabieh,

thanks for updating the PR; please expect some delays with the review due to traveling.

@rabihchrabieh
Copy link
Author

Just pushed new commits — sorry for the delay! I've aligned with all your suggestions except one, which I've explained in the thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants