Skip to content

Commit 25b5faa

Browse files
docs: Document users in e2e-security demo (#49)
* docs: Document users in e2e-security demo * Add some more information to the e2e demo docs. * Add textual representation of the image in the documentation for accessibility reasons. * Fix Spark mention --------- Co-authored-by: Sönke Liebau <[email protected]>
1 parent 98754ae commit 25b5faa

File tree

4 files changed

+89
-3
lines changed

4 files changed

+89
-3
lines changed
71.1 KB
Loading
Loading
97.1 KB
Loading

docs/modules/demos/pages/end-to-end-security.adoc

Lines changed: 89 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,27 @@
22

33
:k8s-cpu: https://kubernetes.io/docs/tasks/debug/debug-cluster/resource-metrics-pipeline/#cpu
44

5-
Install this demo on an existing Kubernetes cluster:
5+
This is a demo to showcase what can be done with Open Policy Agent around authorization in the Stackable Data Platform.
6+
It covers the following aspects of security:
7+
8+
This demo will:
9+
10+
* Install the Stackable operators
11+
* Spin up the following data products
12+
** *Trino*: A fast distributed SQL query engine for big data analytics that helps you explore your data universe. This demo uses it to enable SQL access to the data.
13+
** *Spark*: A multi-language engine for executing data engineering, data science, and machine learning. This demo uses it to create a (rather simple) report and write the results back into the persistence.
14+
** *HDFS*: A distributed file system that is designed to scale up from single servers to thousands of machines, each offering local computation and storage.
15+
** *Hive metastore*: A service that stores metadata related to Apache Hive and other services. This demo uses it as metadata storage for Trino and Spark.
16+
** *Open policy agent (OPA)*: An open-source, general-purpose policy engine unifies policy enforcement across the stack. This demo uses it as the authorizer for Trino, which decides which user can query which data.
17+
** *Superset*: A modern data exploration and visualization platform. This demo utilizes Superset to retrieve data from Trino via SQL queries and build dashboards on top of that data.
18+
* Configure security to showcase the following features
19+
** Column- and row-level filtering
20+
** OIDC support across the board
21+
** Kerberos on Kubernetes
22+
** Keycloak and flexible group lookup
23+
** Open Policy Agent for the utmost flexibility in building access rules
24+
25+
The following figure gives an overview of how the components interact with each other:
626

727
[source,console]
828
----
@@ -25,8 +45,8 @@ To run this demo, your system needs at least:
2545
2646
== Recording
2747

28-
// We don't embed the video but only link it becuase of privacy concerns.
29-
*On 2024-05-16 our collegue Sönke Liebau held a Stackable TechTalk - Mastering Data Platform Security.
48+
// We don't embed the video but only link it because of privacy concerns.
49+
*On 2024-05-16 our colleague Sönke Liebau held a Stackable TechTalk - Mastering Data Platform Security.
3050
You can find the recording on https://www.youtube.com/watch?v=ATlq_l3WNiA[Youtube].*
3151

3252
== Overview
@@ -35,7 +55,73 @@ You can see the deployed products and their relationship in the following diagra
3555

3656
image::end-to-end-security/overview.png[Architectural overview]
3757

58+
Please note the different types of arrows used to connect the technologies in here, which symbolize
59+
how authentication happens along that route and if impersonation is used for queries executed.
60+
3861
The Trino schema (with schemas, tables and views) is shown below.
3962

4063
// the svg does not have a specified size, so we need to size it here or it will be 0x0
4164
image::end-to-end-security/trino-schema.svg[Trino schema,700]
65+
66+
=== User credentials
67+
68+
The following user accounts are configured in Keycloak:
69+
70+
[cols="1,1,2"]
71+
|===
72+
|Username|Password|Team member
73+
74+
|sophia.clarke
75+
|sophia.clarke
76+
|Head of Compliance Analytics
77+
78+
|william.lewis
79+
|william.lewis
80+
|Team member of Compliance Analytics
81+
82+
|daniel.king
83+
|daniel.king
84+
|Team member of Compliance Analytics
85+
86+
|pamela.scott
87+
|pamela.scott
88+
|Head of Customer Analytics
89+
90+
|justin.martin
91+
|justin.martin
92+
|Team member of Customer Analytics
93+
94+
|isla.williams
95+
|isla.williams
96+
|Team member of Customer Analytics
97+
98+
|mark.ketting
99+
|mark.ketting
100+
|Head of Marketing
101+
|===
102+
103+
=== Ruleset
104+
105+
The rules that are configured in this demo show different options of giving full or restricted access to data with OPA.
106+
107+
==== General Access Control
108+
At the highest level, everybody is allowed to see data from the schema of the department they are a member of.
109+
So in the following example, Justin Martin, who is a member of the Customer Service department will only be
110+
able to see tables from the Customer Service schema.
111+
112+
image::e2e-justin.png[]
113+
114+
==== Column-based Access Control
115+
116+
Sophia Clarke from the Compliance department can see tables for the Compliance department, but has also been given
117+
restricted access to the customers table.
118+
119+
The following diagram shows which rules are in place, you can easily test these with a sql editor of your chice.
120+
121+
image::e2e-sophia.png[]
122+
123+
==== Row-level Access Control
124+
Access control at the row level has been implemented on the employee table, where everybody can see information
125+
about themselves, as well as people who report to them.
126+
127+
image::e2e-sophia-employee.png[]

0 commit comments

Comments
 (0)