Skip to content

Improve authorizer performance #48

@sbernauer

Description

@sbernauer

In Discord it was reported

We are running hdfs inside Stackable with Kerberos+OPA. The CPU usage looks strange to me and the performance, even for for simple file operations (find, ls, du) is awful. While running spark jobs with data access we see consistent. high CPU load on the Namenode and OPA:
(197.0 % opa, 130.1% Namenode) in top on the host machine.
Even setting "default allow true" in OPA and removing everything else does nothing. (basically disabling OPA)

I can trigger this with a simplehdfs dfs -du hdfs://our-staackable-hdfs/data/year=2024/month=10
This takes 50 seconds (while idle) for this to run. (9200 files in 2300 dirs.)
And every time ~50 seconds, which leads me to belive that there is zero caching anywhere...
For comparison: Our seven year old server in the old cluster does this query in 2 seconds.

We believe there are multiple ways this can be improved and we implemented those in these PRs:

Make requests to OPA smaller

One reason for the slowness can be because the JSON requests send between HDFS and OPA are gigantic. We need to somehow reduce this size.

Without having thought too long about this, one solution could be to not send the inodeAttrs, inodes, pathByNameArr and callerContext fields by default, while still having a opt-in config for users, so that users that care about this fields can still get them.

Misc improvements

The other prbolem we found (well @iAlex97 did, thank you 📯 ) was that the Enforce would be created over and over again, creating thread pools and unnecessary Jackson Mappers etc.
In other words: There are some easy wins to be made by improving the Java code.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Done

    Status

    Done

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions