Skip to content

Conversation

dragoangel
Copy link

@dragoangel dragoangel commented Aug 13, 2025

Resolves #1528 by adding SELFCHECK TCP command that utilize SelfCheck functional (do reload only if it's required and provide answer if reload is triggered/running or skipped), but on demand via network instead of periodic interval which allow more fine controlled reload of clustered clamd via freshclam notify custom script.

@dragoangel
Copy link
Author

dragoangel commented Aug 14, 2025

What I think off is about naming, honestly I would see default RELOAD cmd to work that way I described SELFCHECK. SelfCheck itself has a bit unclear naming - it looks like it only verify something but not apply reload, still I think that my function which 100% reuse existing logic should reuse same or related naming. Another option to use RELOAD_ON_CHANGES and rename SelfCheck option in conf to ReloadOnChanges too. It would be clear that I verify DB and do reload periodically...

@dragoangel
Copy link
Author

@jhumlick can I ask you for a review please?

Copy link
Contributor

@val-ms val-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't a full review. I haven't tested yet. I noticed a couple of things while reading the code

@val-ms
Copy link
Contributor

val-ms commented Sep 8, 2025

By the way, adding a new option and command is something that will have to wait until after 1.5 is published. So it would go towards the 1.6 release if we merge it.

@dragoangel
Copy link
Author

dragoangel commented Sep 8, 2025

@val-ms there is no way to backport it? As far I understand 1.6 is a bit far in the future?

Added comments for SELFCHECK command configuration.
Added documentation for EnableSelfCheckCommand option.
@dragoangel dragoangel requested a review from val-ms September 8, 2025 17:23
@val-ms
Copy link
Contributor

val-ms commented Sep 8, 2025

@dragoangel No sorry we're behind on 1.5 and between the release candidate and release - I'm trying my best not to merge any new features before we do. We don't put new features in patch versions (e.g., 1.5.1) either. So it will have to go into 1.6.0 which means it will make it into a release maybe 4-ish months from now, assuming we can get back on track with a 4-month feature release cadence.

@dragoangel
Copy link
Author

Okay, thanks.
I see:

clamd/session.c:101:62: error: code should be clang-formatted [-Wclang-format-violations]
    {CMD21, sizeof(CMD21) - 1, COMMAND_ALLMATCHSCAN, 1, 0, 1}

Eh that linters 😅

@dragoangel
Copy link
Author

dragoangel commented Sep 8, 2025

4: Received: 
4: RELOADING
4: 
4: Expected: 
4: DBUPTODATE

Autotests failed, I honestly expected that they will go one after another and will have a delay, so on time autotests would trigger SELFCHECK it would already pass RELOADING stage 😑. Can we write in autotests that both options DBUPTODATE or RELOADING are okay (as far I can see - no)? Or it just simpler to put this new autotest to the bottom, check only for DBUPTODATE and it will not fail?

P.s. at least autotests showing that code is working 🤣

@dragoangel
Copy link
Author

dragoangel commented Sep 8, 2025

Moved this autotest a bit lower, will see if that help to get result I expect on already up-to-date database 😊

@dragoangel
Copy link
Author

@val-ms can you please run approve once again so github runner can start build?

@val-ms
Copy link
Contributor

val-ms commented Sep 9, 2025

@val-ms can you please run approve once again so github runner can start build?

Sorry for the delay. Ran it now

@dragoangel
Copy link
Author

4: [DEBUG]: stdout: Running suite(s): clamd
4: 98%: Checks: 72, Failures: 1, Errors: 0
4: /Users/runner/work/clamav/clamav/unit_tests/check_clamd.c:286:F:clamd commands:test_basic_commands:3: Wrong reply for command nVERSIONCOMMANDS
4: .
4: Received: 
4: ClamAV 1.5.0-rc| COMMANDS: SCAN QUIT RELOAD SELFCHECK PING CONTSCAN VERSIONCOMMANDS VERSION END SHUTDOWN MULTISCAN FILDES STATS IDSESSION INSTREAM DETSTATSCLEAR DETSTATS ALLMATCHSCAN
4: 
4: Expected: 
4: ClamAV 1.5.0-rc| COMMANDS: SCAN QUIT RELOAD PING CONTSCAN VERSIONCOMMANDS VERSION END SHUTDOWN MULTISCAN FILDES STATS IDSESSION INSTREAM DETSTATSCLEAR DETSTATS ALLMATCHSCAN

Missed one test, will fix it 😊

@dragoangel
Copy link
Author

done, @val-ms can you please re-review? 😊

@dragoangel
Copy link
Author

dragoangel commented Sep 9, 2025

Eh, looks like this test should be changed to be stable. As far I understand - we can run scan on non-reloaded yet clamd due to concurrent reload (which is enabled by default) I assume, right @val-ms?

In last run - macos build - passed (which means we got DBUPTODATE), but in ubuntu build - failed due to facing RELOADING on SELFSCAN. I assume we get very slow runner 😑... I can write a real - proper test to wait to the end of reload will be finished instead of accepting both RELOADING|DBUPTODATE as a valid replies, what you think @val-ms - what test fits better from your view?

Here is potential patches to achieve both options (requires testing & review):

simple - accept both options:

diff --git a/unit_tests/check_clamd.c b/unit_tests/check_clamd.c
index 0f0f0f0..1a1a1a1 100644
--- a/unit_tests/check_clamd.c
+++ b/unit_tests/check_clamd.c
@@ -35,6 +35,44 @@
 #include <errno.h>
 #include <time.h>
 
+/* Accept one-of expected replies by separating with '|' in the expect field.
+ * Example: "DBUPTODATE|RELOADING"
+ */
+static int expected_matches(const char *actual, const char *expect)
+{
+    /* fast path: no alternation */
+    if (!expect || !strchr(expect, '|'))
+        return (expect && actual) ? (strcmp(actual, expect) == 0) : (actual == expect);
+
+    /* copy to a scratch buffer so we can strtok it safely */
+    char tmp[256];
+    size_t elen = strlen(expect);
+    if (elen >= sizeof(tmp)) {
+        /* fallback: if expect is absurdly long, do strict compare */
+        return strcmp(actual, expect) == 0;
+    }
+    memcpy(tmp, expect, elen + 1);
+
+    for (char *tok = strtok(tmp, "|"); tok; tok = strtok(NULL, "|")) {
+        if (strcmp(actual, tok) == 0)
+            return 1;
+    }
+    return 0;
+}
+
@@ -280,7 +318,7 @@ static void run_one_test(const struct req *r)
     /* ... code that sends r->cmd and reads reply into 'buf' ... */
-    if (strcmp(buf, r->expect) != 0) {
+    if (!expected_matches(buf, r->expect)) {
         ck_abort_msg("Wrong reply for command %s\nReceived: \n%s\n\nExpected: \n%s\n",
                      r->cmd, buf, r->expect ? r->expect : "(null)");
     }
 }
@@ -612,9 +650,9 @@ static struct req tests[] = {
     /* ping */
     {"PING",        NULL, "PONG",        1, 0, IDS_REJECT},
 
-    /* selfcheck on freshly loaded database */
-    {"SELFCHECK",   NULL, "DBUPTODATE",  1, 0, IDS_REJECT},
+    /* SELFCHECK can legitimately be RELOADING while the db worker finishes. */
+    {"SELFCHECK",   NULL, "DBUPTODATE|RELOADING", 1, 0, IDS_REJECT},
 
    /* unknown commands */
 };

verify that SELFCHECK return or RELOADING or DBUPTODATE, fail immediately in other case. Fail test if we not get DBUPTODATE after timeout is reached. Pass test right after getting DBUPTODATE

diff --git a/unit_tests/check_clamd.c b/unit_tests/check_clamd.c
index 1234567..89abcde 100644
--- a/unit_tests/check_clamd.c
+++ b/unit_tests/check_clamd.c
@@ -610,9 +610,6 @@ static struct req tests[] = {
     /* ping */
     {"PING",        NULL, "PONG",        1, 0, IDS_REJECT},
 
-    /* selfcheck on freshly loaded database */
-    {"SELFCHECK",   NULL, "DBUPTODATE",  1, 0, IDS_REJECT},
-
     /* unknown commands */
 };
 
@@ -750,6 +747,63 @@ START_TEST(test_stats)
 }
 END_TEST
 
+/* Robust SELFCHECK: tolerate RELOADING for a short while, then require DBUPTODATE */
+#define SELFCHECK_EXPECT    "DBUPTODATE"
+#define SELFCHECK_RELOADING "RELOADING"
+
+START_TEST(test_selfcheck)
+{
+    char *recvdata = NULL;
+    size_t len;
+    int rc;
+    int attempts = 0;
+    const int max_attempts = 60; /* timeout ~3m with check each 3s */
+    const int sleep_ms = 3000;
+
+    conn_setup();
+
+    do {
+        const char *cmd = "nSELFCHECK\n";
+        len = strlen(cmd);
+        rc = send(sockd, cmd, len, 0);
+        ck_assert_msg((size_t)rc == len, "Unable to send(): %s\n", strerror(errno));
+
+        recvdata = (char *)recvfull(sockd, &len);
+        ck_assert_msg(recvdata != NULL, "recvfull() returned NULL");
+
+        /* Trim trailing newlines */
+        while (len > 0 && (recvdata[len - 1] == '\n' || recvdata[len - 1] == '\r')) {
+            recvdata[--len] = '\0';
+        }
+
+        if (strcmp(recvdata, SELFCHECK_EXPECT) == 0) {
+            /* success */
+            free(recvdata);
+            conn_teardown();
+            return;
+        }
+
+        if (strcmp(recvdata, SELFCHECK_RELOADING) != 0) {
+            ck_abort_msg("Wrong reply for SELFCHECK: '%s' (expected DBUPTODATE or RELOADING)", recvdata);
+        }
+
+        /* still reloading, wait then retry */
+        free(recvdata);
+        recvdata = NULL;
+
+#if defined(_WIN32)
+        Sleep(sleep_ms);
+#else
+        struct timespec ts = { .tv_sec = sleep_ms / 1000,
+                               .tv_nsec = (sleep_ms % 1000) * 1000000L };
+        nanosleep(&ts, NULL);
+#endif
+    } while (++attempts < max_attempts);
+
+    ck_abort_msg("SELFCHECK did not reach DBUPTODATE within timeout");
+
+    conn_teardown();
+}
+END_TEST
+
 Suite *make_clamd_suite(void)
 {
     Suite *s;
@@ -780,6 +834,7 @@ Suite *make_clamd_suite(void)
     tcase_add_test(tc_clamd, test_stats);
+    tcase_add_test(tc_clamd, test_selfcheck);
 
     suite_add_tcase(s, tc_clamd);
     return s;

@val-ms
Copy link
Contributor

val-ms commented Sep 9, 2025

Eh, looks like this test should be changed to be stable. As far I understand - we can run scan on non-reloaded yet clamd due to concurrent reload (which is enabled by default) I assume, right @val-ms?

Yes you can scan while reloading with the concurrent reload feature enabled.

In last run - macos build - passed (which means we got DBUPTODATE), but in ubuntu build - failed due to facing RELOADING on SELFSCAN. I assume we get very slow runner 😑... I can write a real - proper test to wait to the end of reload will be finished instead of accepting both RELOADING|DBUPTODATE as a valid replies, what you think @val-ms - what test fits better from your view?

I kind of like the second option more where it retries waiting for the reload to complete. Do you agree?

@dragoangel
Copy link
Author

Yes, same for me @val-ms 😊, I don't want to mess with random commits, I will try running this tests locally to be sure that they would work as expected and then will add it to PR. Meanwhile it would be nice if you were able to test clamd manually on your local env ☺️, thanks.

@dragoangel
Copy link
Author

dragoangel commented Sep 10, 2025

P.s. on my k8s env I disabled concurrent reload of dbs to economy ram memory (2x time less requests for a pod) and have liveness & readiness probes, so when clamd of 1 pod reloading it's just not getting incoming traffic and it's routed to another pod, forgot that concurrent reload is enabled by default 😅. There is minimum of 5 pods and maximum of 15 via horizontal pod auto scaling 😊, so there is always some pods to respond

Added robust SELFCHECK test to handle RELOADING state.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ClamAV CHECKDB command missing
2 participants