Skip to content

Commit 18f367b

Browse files
authored
feat: json/curl string operation boundary (#359)
* feat: string comparison as byte-span * feat: enforce C string semantic in CURL string options * fix: misc issues
1 parent a523a15 commit 18f367b

11 files changed

Lines changed: 474 additions & 60 deletions

File tree

docs/efuns/perform_using.md

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,14 +22,23 @@ that object already has a transfer in progress.
2222
- `"timeout_ms"` with a number value sets the libcurl timeout in milliseconds.
2323
- `"follow_location"` with a non-zero number value enables redirect following.
2424
25+
For string-based option values (`"url"`, `"headers"`, and string `"post_data"`/`"body"`),
26+
embedded NUL bytes are rejected. If binary data with embedded NUL bytes is
27+
intentional for the request body, pass a `buffer` to `"post_data"`/`"body"`.
28+
29+
This matches the driver's explicit C-string boundary behavior in related efuns:
30+
`c_str()` and `to_json()` both ensure C-string-safe contents when LPC code
31+
needs text data routed through C-string-oriented APIs.
32+
2533
Passing `0` clears `"url"`, `"headers"`, or `"post_data"`/`"body"`.
2634
2735
## ERRORS
2836
An error is raised when:
2937
- `opt` is not a string
3038
- `val` has the wrong type for the selected option
39+
- a string value for `"url"`, `"headers"`, or string `"post_data"`/`"body"` contains embedded NUL bytes
3140
- the current object already has an active `perform_to()` transfer
3241
- the option name is unknown
3342
3443
## SEE ALSO
35-
[perform_to()](perform_to.md), [in_perform()](in_perform.md)
44+
[perform_to()](perform_to.md), [in_perform()](in_perform.md), [c_str()](c_str.md), [to_json()](to_json.md)

docs/plan/counted-string.md

Lines changed: 55 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -34,25 +34,27 @@ early as possible (compile-time preferred, runtime as backstop).
3434

3535
### Near-Term Focus
3636

37-
- Prioritize byte-span readiness for comparison semantics in production paths
38-
(equality/inequality complete; next grouping/ordering and related helpers).
37+
- Keep comparison byte-span coverage stable; production comparison paths are
38+
complete for current scope (equality/inequality, ordering operators, shared
39+
helper paths).
3940
- Promote transparent boundary handles to abstract handles so boundary misuse fails at
4041
compile time.
4142
- Keep JSON boundary contract coverage stable, including CURL ingress
42-
(`buffer -> from_json`) and explicit UTF-8/error-handling behavior.
43-
- Expand regression coverage to lock in counted-string semantics across LPC,
44-
efuns, and JSON boundaries.
43+
(`buffer -> from_json`), `perform_using` C-string boundaries, and explicit
44+
UTF-8/error-handling behavior.
45+
- Maintain regression coverage for JSON/CURL efun boundaries (`perform_using`,
46+
`c_str()`, `to_json()`) as option surface evolves.
4547

4648
## Status
4749

4850
| Stage | Status |
4951
|---|---|
5052
| Foundation: `blkend` model + shared-table non-NUL lookup + initial safety/tests | complete |
5153
| Implementation: VM/operator NUL-removal and span API normalization | complete |
52-
| Implementation: comparison byte-span readiness | in progress (equality/inequality complete; next grouping/ordering semantics) |
54+
| Implementation: comparison byte-span readiness | complete |
5355
| Implementation: abstract typed handles + runtime contract enforcement | complete |
5456
| Implementation: C++ RAII wrapper adoption on exception baseline | complete |
55-
| Implementation: efun byte-span readiness | in progress (narrowed: defer broad LPC string-efun hardening; prioritize JSON/CURL ingress) |
57+
| Implementation: efun byte-span readiness | complete for narrowed JSON/CURL ingress scope (broad LPC string-efun hardening remains deferred by plan) |
5658
| Implementation: JSON boundary contract and tests | complete |
5759
| Validation: end-to-end LPC/JSON regression matrix | complete |
5860

@@ -92,12 +94,52 @@ Completed milestones since the previous handoff:
9294
- Test strategy was tightened for this milestone: equality tests now execute
9395
real operator paths (`f_eq`/`f_ne`) on stack operands instead of simulating
9496
behavior with helper-level `memcmp` assertions.
97+
- Ordering comparison migration is complete for current scope:
98+
`f_lt`/`f_le`/`f_gt`/`f_ge` use byte-span lexical comparison semantics
99+
(embedded `\0` participates; length is tie-breaker after equal prefixes),
100+
and array helper string comparison paths (`sameval`, builtin sort comparators)
101+
now route through the same span-aware compare behavior.
102+
- Regression coverage now includes direct ordering-operator tests with embedded
103+
null bytes (`OrderingOperatorsUseByteSpanForEmbeddedNul`,
104+
`OrderingOperatorsUseLengthAfterEqualPrefix`) plus array helper equality
105+
semantics (`SamevalUsesByteSpanForEmbeddedNulStrings`).
106+
- LPC-level ordering validation now includes `sort_array` with embedded-null
107+
strings (`LpcSortArrayOrdersEmbeddedNulByByteSpan`), with inputs constructed
108+
via `from_json("\"...\\u0000...\"")` to ensure full byte-span payloads are
109+
exercised through real efun/operator/runtime paths.
110+
- Directional coverage for LPC sort ordering is now explicit: both ascending
111+
and descending embedded-null cases are regression-tested
112+
(`LpcSortArrayOrdersEmbeddedNulByByteSpan`,
113+
`LpcSortArrayOrdersEmbeddedNulByByteSpanDescending`).
114+
- JSON/CURL ingress hardening advanced at the `perform_using` boundary:
115+
string values for `url`, `headers`, and string `post_data`/`body` now reject
116+
embedded NUL bytes, while binary payload intent remains supported via
117+
`buffer` for `post_data`/`body`.
118+
- C++ wrapper-style read access was expanded in `lib/curl/curl_efuns.cpp`
119+
for `perform_using`/`perform_to` validation and string-boundary checks via
120+
`lpc::const_svalue_view`, reducing direct raw `svalue_t` field access on
121+
those paths.
122+
- `perform_using` embedded-NUL rejection matrix coverage is now complete for
123+
current option scope: regression tests cover `url`, string `body`,
124+
string `headers`, and string-array `headers` entry rejection paths, plus the
125+
binary body `buffer` allow-path.
126+
- JSON-derived boundary routing coverage is now explicit for current CURL
127+
text-option scope: `perform_using(url, from_json(...))` rejects embedded-NUL
128+
strings, while `perform_using(url, c_str(from_json(...)))` succeeds via
129+
explicit C-string boundary conversion; similarly,
130+
`perform_using(body, to_json(from_json(...)))` succeeds where
131+
`perform_using(body, from_json(...))` is rejected.
95132

96133
Validation snapshot:
97134

98135
- Linux validation is green (`ctest --preset ut-linux`).
99136
- Cross-platform validation for the counted-string hardening set remains green
100137
(`vs16-x64`, `vs16-win32`, `clang-x64`).
138+
- As of 2026-04-17, full test runs on `vs16-x64` and `clang-x64` are confirmed
139+
green for this milestone.
140+
- As of 2026-04-17, post-hardening Linux CTest sweep is green after adding
141+
`perform_using` header matrix and `c_str()`/`to_json()` boundary-routing
142+
regressions.
101143

102144
Still intentional / not candidates for replacement:
103145

@@ -114,13 +156,13 @@ Notable bug fix retained for handoff context:
114156

115157
### Next Focus
116158

117-
- **Comparison byte-span readiness (next milestone)** — finish migrating and
118-
validating LPC string comparison semantics end to end under byte-span rules.
119-
Equality/inequality production behavior and regression coverage are complete;
120-
next target is grouping/ordering paths and shared comparison helpers.
121159
- **JSON/CURL boundary follow-through** — keep existing JSON/CURL ingress
122160
coverage stable as counted-string and efun refactors continue, and add
123161
targeted regression tests only when new boundary behaviors are introduced.
162+
- **Next high-priority efun hardening (JSON/CURL ingress)** — prioritize
163+
keeping `c_str()` / `to_json()` boundary-path behavior stable as future
164+
CURL option surface expands, so explicit C-string conversion remains the only
165+
truncation boundary and is behaviorally locked.
124166

125167
### Baseline and Out of Scope
126168

@@ -295,8 +337,8 @@ Migration order:
295337
| P0 | Normalize internal helper APIs | string construction/lookup boundaries | New/updated helpers accept explicit lengths/spans; touched callers stop using sentinel termination as logical length; shared-string boundaries continue to route via `make_shared_string(s,end)` / `findstring(s,end)`. |
296338
| P0 | Enforce counted-string semantic boundaries | [src/stralloc.h](../../src/stralloc.h), [lib/lpc/types.h](../../lib/lpc/types.h), typed-string boundaries | Boundary-handle mode enabled under `STRING_TYPE_SAFETY`; contract APIs require explicit typed handles or bridge helpers; runtime contract checks remain release-enabled; identifier-class shared strings remain NUL-terminated. |
297339
| P1 | C++ wrapper adoption on baseline boundaries | C++ wrappers around `svalue_t` | `lpc::svalue_view`/`lpc::svalue` introduced without C ABI layout change; no duplicate exception-boundary rewrites are introduced; wrapper move/dtor are `noexcept`; unit-test-first ownership migration for counted-string targets is complete (no `free_string_svalue` / `free_svalue` in `tests/**/*.cpp`); remaining work is production C++ efun/helper adoption and any targeted perf checks for newly touched hot paths. |
298-
| P1 | Efun byte-span readiness (deferred broad LPC string paths) | [lib/efuns/string.c](../../lib/efuns/string.c), [lib/efuns/unsorted.c](../../lib/efuns/unsorted.c), [lib/efuns/sprintf.c](../../lib/efuns/sprintf.c), [lib/efuns/sscanf.c](../../lib/efuns/sscanf.c) | Scope is intentionally narrowed for this phase: no broad LPC-side behavioral expansion unless required by JSON/CURL boundary safety. Any touched path must preserve LPC compatibility and existing tests remain green. |
299-
| P1 | LPC operator semantics vs C-string efuns | [lib/lpc/operator.c](../../lib/lpc/operator.c), [lib/lpc/array.c](../../lib/lpc/array.c), [docs/efuns/strcmp.md](../../docs/efuns/strcmp.md) | Equality/inequality byte-span semantics are complete and regression-covered; remaining work migrates grouping/ordering helpers to full byte-span behavior (including embedded null participation) while `strcmp()` remains explicitly documented and tested as C-string-oriented behavior. |
340+
| P1 | Efun byte-span readiness (deferred broad LPC string paths) | [lib/efuns/string.c](../../lib/efuns/string.c), [lib/efuns/unsorted.c](../../lib/efuns/unsorted.c), [lib/efuns/sprintf.c](../../lib/efuns/sprintf.c), [lib/efuns/sscanf.c](../../lib/efuns/sscanf.c), [lib/curl/curl_efuns.cpp](../../lib/curl/curl_efuns.cpp) | complete for narrowed JSON/CURL ingress scope: `perform_using` rejects embedded-NUL string values for `url`/`headers`/string body, preserves binary payload path via `buffer`, and regression coverage includes header single/array rejection plus explicit `c_str()`/`to_json()` routing checks for JSON-derived string inputs in CURL text options. Broad LPC string-efun hardening remains deferred by plan. |
341+
| P1 | LPC operator semantics vs C-string efuns | [lib/lpc/operator.c](../../lib/lpc/operator.c), [lib/lpc/array.c](../../lib/lpc/array.c), [docs/efuns/strcmp.md](../../docs/efuns/strcmp.md) | complete for current scope: equality/inequality and grouping/ordering helper paths use full byte-span behavior (including embedded null participation); `strcmp()` remains explicitly documented and tested as C-string-oriented boundary behavior. |
300342
| P1 | JSON boundary contract (priority: CURL buffer ingress) | JSON efuns/helpers (`from_json`, `to_json`) and CURL ingress paths | complete: contract docs state LPC byte spans vs JSON text; `from_json` rejects invalid UTF-8 and raises LPC runtime error on invalid sequences; coverage includes explicit UTF-8 pass/fail tests (`fromJsonValidUtf8StringAccepted`, `fromJsonInvalidUtf8StringError`, `fromJsonInvalidUtf8BufferError`) plus buffer ingress success/error/size paths (`fromJsonBuffer`, `fromJsonInvalidBufferError`, `fromJsonLargeBuffer`) and a CURL callback payload integration test (`CurlBufferPayloadParsesViaFromJson`); JSON strings containing embedded null bytes are stored and copied byte-for-byte (full span), not truncated at C-string boundaries. |
301343
| P1 | Unicode and escape consistency | JSON encode/decode implementation | complete: encoder/decoder symmetry is now covered for control escapes, `\\`, `\"`, `\uXXXX`, surrogate pairs, and non-BMP round-trips via `toJsonControlEscapes`, `fromJsonControlEscapes`, `fromJsonSurrogatePairAccepted`, `fromJsonLoneHighSurrogateError`, and `roundTripNonBmpCharacter`; contract docs are aligned in `docs/efuns/from_json.md` and `docs/efuns/to_json.md`. |
302344
| P2 | End-to-end regression matrix | LPC-level and efun/unit tests | complete for current hardening scope: dedicated unit suite `tests/test_string_operators` added (21 cases, discovered via CTest), and full matrix validation is passing on Linux, VS16 x64, VS16 win32, and clang x64. Future LPC/JSON round-trip and negative-matrix additions remain follow-on expansion work. |

lib/curl/curl_efuns.cpp

Lines changed: 54 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -260,7 +260,8 @@ static int ensure_easy_handle(curl_http_t *handle) {
260260
}
261261

262262
static int is_clear_value(const svalue_t *value) {
263-
return value->type == T_NUMBER && value->u.number == 0;
263+
auto view = lpc::const_svalue_view::from(value);
264+
return view.is_number() && view.number() == 0;
264265
}
265266

266267
static void clear_headers(curl_http_t *handle) {
@@ -270,21 +271,45 @@ static void clear_headers(curl_http_t *handle) {
270271
}
271272
}
272273

274+
static int has_embedded_nul(const svalue_t *value) {
275+
auto view = lpc::const_svalue_view::from(value);
276+
277+
if (!view.is_string()) {
278+
return 0;
279+
}
280+
281+
const char *data = view.c_str();
282+
size_t len = view.length();
283+
if (len == 0) {
284+
return 0;
285+
}
286+
287+
return std::memchr(data, '\0', len) != nullptr;
288+
}
289+
273290
static void set_headers_from_array(curl_http_t *handle, array_t *headers) {
274291
struct curl_slist *new_headers = nullptr;
275292
int index;
276293

277294
for (index = 0; index < headers->size; index++) {
278295
svalue_t *item = &headers->item[index];
296+
auto item_view = lpc::const_svalue_view::from(item);
279297

280-
if (item->type != T_STRING) {
298+
if (!item_view.is_string()) {
281299
if (new_headers) {
282300
curl_slist_free_all(new_headers);
283301
}
284302
error("perform_using(headers, value) requires a string or string array.\n");
285303
}
286304

287-
new_headers = curl_slist_append(new_headers, SVALUE_STRPTR(item));
305+
if (has_embedded_nul(item)) {
306+
if (new_headers) {
307+
curl_slist_free_all(new_headers);
308+
}
309+
error("perform_using(headers, value) string entries must not contain embedded NUL bytes.\n");
310+
}
311+
312+
new_headers = curl_slist_append(new_headers, item_view.c_str());
288313
if (!new_headers) {
289314
error("Out of memory while configuring CURL headers.\n");
290315
}
@@ -295,6 +320,8 @@ static void set_headers_from_array(curl_http_t *handle, array_t *headers) {
295320
}
296321

297322
static void configure_option(curl_http_t *handle, const char *option, svalue_t *value) {
323+
auto value_view = lpc::const_svalue_view::from(value);
324+
298325
if (!std::strcmp(option, "url")) {
299326
if (is_clear_value(value)) {
300327
if (handle->url) {
@@ -304,14 +331,18 @@ static void configure_option(curl_http_t *handle, const char *option, svalue_t *
304331
return;
305332
}
306333

307-
if (value->type != T_STRING) {
334+
if (!value_view.is_string()) {
308335
error("perform_using(url, value) requires a string.\n");
309336
}
310337

338+
if (has_embedded_nul(value)) {
339+
error("perform_using(url, value) string must not contain embedded NUL bytes.\n");
340+
}
341+
311342
if (handle->url) {
312343
FREE(handle->url);
313344
}
314-
handle->url = dup_bytes(SVALUE_STRPTR(value), SVALUE_STRLEN(value));
345+
handle->url = dup_bytes(value_view.c_str(), value_view.length());
315346
if (!handle->url) {
316347
error("Out of memory while configuring CURL url.\n");
317348
}
@@ -324,15 +355,15 @@ static void configure_option(curl_http_t *handle, const char *option, svalue_t *
324355
return;
325356
}
326357

327-
if (value->type == T_STRING) {
358+
if (value_view.is_string()) {
328359
array_t *single = allocate_empty_array(1);
329360
assign_svalue_no_free(&single->item[0], value);
330361
set_headers_from_array(handle, single);
331362
free_array(single);
332363
return;
333364
}
334365

335-
if (value->type != T_ARRAY) {
366+
if (!value_view.is_array()) {
336367
error("perform_using(headers, value) requires a string or string array.\n");
337368
}
338369

@@ -353,9 +384,12 @@ static void configure_option(curl_http_t *handle, const char *option, svalue_t *
353384
return;
354385
}
355386

356-
if (value->type == T_STRING) {
357-
size = SVALUE_STRLEN(value);
358-
handle->post_data = dup_bytes(SVALUE_STRPTR(value), size);
387+
if (value_view.is_string()) {
388+
if (has_embedded_nul(value)) {
389+
error("perform_using(post_data/body, value) string must not contain embedded NUL bytes; use a buffer for binary payloads.\n");
390+
}
391+
size = value_view.length();
392+
handle->post_data = dup_bytes(value_view.c_str(), size);
359393
handle->post_size = static_cast<int>(size);
360394
}
361395
else if (value->type == T_BUFFER) {
@@ -374,20 +408,21 @@ static void configure_option(curl_http_t *handle, const char *option, svalue_t *
374408
}
375409

376410
if (!std::strcmp(option, "timeout_ms")) {
377-
if (value->type != T_NUMBER) {
411+
if (!value_view.is_number()) {
378412
error("perform_using(timeout_ms, value) requires a number.\n");
379413
}
380414

381-
handle->timeout_ms = value->u.number > 0 ? static_cast<long>(value->u.number) : 0L;
415+
handle->timeout_ms = value_view.number() > 0 ?
416+
std::min(LONG_MAX, value_view.number()) : 0L;
382417
return;
383418
}
384419

385420
if (!std::strcmp(option, "follow_location")) {
386-
if (value->type != T_NUMBER) {
421+
if (!value_view.is_number()) {
387422
error("perform_using(follow_location, value) requires a number.\n");
388423
}
389424

390-
handle->follow_location = value->u.number != 0;
425+
handle->follow_location = value_view.number() != 0;
391426
return;
392427
}
393428

@@ -684,6 +719,7 @@ static size_t curl_response_write_callback(void *ptr, size_t size, size_t nmemb,
684719
extern "C" void f_perform_using(void) {
685720
svalue_t *option_value = sp - 1;
686721
svalue_t *setting_value = sp;
722+
auto option_view = lpc::const_svalue_view::from(option_value);
687723
const char *option;
688724
int handle_id;
689725
curl_http_t *handle;
@@ -692,7 +728,7 @@ extern "C" void f_perform_using(void) {
692728
error("CURL subsystem is not available.\n");
693729
}
694730

695-
if (option_value->type != T_STRING) {
731+
if (!option_view.is_string()) {
696732
error("perform_using() option must be a string.\n");
697733
}
698734

@@ -710,7 +746,7 @@ extern "C" void f_perform_using(void) {
710746
error("Failed to create CURL easy handle.\n");
711747
}
712748

713-
option = SVALUE_STRPTR(option_value);
749+
option = option_view.c_str();
714750
configure_option(handle, option, setting_value);
715751
handle->state = CURL_STATE_CONFIGURED;
716752
pop_n_elems(2);
@@ -734,7 +770,8 @@ extern "C" void f_perform_to(void) {
734770

735771
callback_value = sp - argc + 1;
736772
flag_value = callback_value + 1;
737-
if (flag_value->type != T_NUMBER) {
773+
auto flags_view = lpc::const_svalue_view::from(flag_value);
774+
if (!flags_view.is_number()) {
738775
error("perform_to() flags argument must be a number.\n");
739776
}
740777

lib/lpc/array.c

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
#include "otable.h"
1515
#include "program.h"
1616
#include "lpc/include/origin.h"
17+
#include "svalue.h"
1718

1819
#ifdef ARRAY_STATS
1920
int num_arrays;
@@ -664,7 +665,7 @@ sameval (svalue_t * arg1, svalue_t * arg2)
664665
case T_STRING:
665666
if (string_length_differs (arg1, arg2))
666667
return 0;
667-
return !strcmp (SVALUE_STRPTR(arg1), SVALUE_STRPTR(arg2));
668+
return svalue_string_lexcmp (arg1, arg2) == 0;
668669
case T_OBJECT:
669670
return arg1->u.ob == arg2->u.ob;
670671
case T_MAPPING:
@@ -1121,7 +1122,7 @@ static int builtin_sort_array_cmp_fwd (svalue_t * p1, svalue_t * p2) {
11211122
{
11221123
case T_STRING:
11231124
{
1124-
return strcmp (SVALUE_STRPTR(p1), SVALUE_STRPTR(p2));
1125+
return svalue_string_lexcmp (p1, p2);
11251126
}
11261127

11271128
case T_NUMBER:
@@ -1145,7 +1146,7 @@ static int builtin_sort_array_cmp_fwd (svalue_t * p1, svalue_t * p2) {
11451146
{
11461147
case T_STRING:
11471148
{
1148-
return strcmp (SVALUE_STRPTR(v1->item), SVALUE_STRPTR(v2->item));
1149+
return svalue_string_lexcmp (v1->item, v2->item);
11491150
}
11501151

11511152
case T_NUMBER:
@@ -1178,7 +1179,7 @@ static int builtin_sort_array_cmp_rev (svalue_t * p1, svalue_t * p2) {
11781179
{
11791180
case T_STRING:
11801181
{
1181-
return strcmp (SVALUE_STRPTR(p2), SVALUE_STRPTR(p1));
1182+
return svalue_string_lexcmp (p2, p1);
11821183
}
11831184

11841185
case T_NUMBER:
@@ -1202,7 +1203,7 @@ static int builtin_sort_array_cmp_rev (svalue_t * p1, svalue_t * p2) {
12021203
{
12031204
case T_STRING:
12041205
{
1205-
return strcmp (SVALUE_STRPTR(v2->item), SVALUE_STRPTR(v1->item));
1206+
return svalue_string_lexcmp (v2->item, v1->item);
12061207
}
12071208

12081209
case T_NUMBER:

0 commit comments

Comments
 (0)