From 334f0accebe7f5e4c77c0d3f571156ab7ad055b3 Mon Sep 17 00:00:00 2001 From: joshua-spacetime Date: Mon, 24 Feb 2025 12:08:43 -0800 Subject: [PATCH 1/3] Add best practices for Spacetime SQL Closes #179 --- docs/sql/index.md | 57 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+) diff --git a/docs/sql/index.md b/docs/sql/index.md index b1f70d9c..5c8066f8 100644 --- a/docs/sql/index.md +++ b/docs/sql/index.md @@ -479,6 +479,63 @@ SELECT * FROM "Order" SELECT * FROM "Balance$" ``` +## Best Practices for Performance and Scalability + +When designing your schema or crafting your queries, +consider the following best practices to ensure optimal performance: + +- **Define Unique Constraints:** + If a column's values are guaranteed to be unique, + define a unique constraint on that column. + The query planner can further optimize joins if it knows the join values to be unique. + +- **Index Filtered Columns:** + Define indexes on columns frequently used in a `WHERE` clause. + Indexes reduce the number of rows scanned by the query engine. + +- **Index Join Columns:** + Define indexes on columns whose values are frequently used as join keys. + These are columns that are used in the `ON` condition of a `JOIN`. + Again, this reduces the number of rows that must be scanned to answer a query. + It is also critical for the performance of subscription updates -- + so much so that it is a compiler-enforced requirement, + as mentioned in the [subscription](#from) section. + +- **Optimize Join Order:** + Place tables with the most selective filters first in your `FROM` clause. + This minimizes intermediate result sizes and improves query efficiency. + +### Example + +Take the following query that was used in a previous example: +```sql +-- Find all customers who ordered a particular product and when they ordered it +SELECT customer.first_name, customer.last_name, o.date +FROM Customers customer +JOIN Orders o ON customer.id = o.customer_id +JOIN Inventory product ON o.product_id = product.id +WHERE product.name = {product_name} +``` + +In order to conform with the best practices for optimizing performance and scalability: + +- An index should be defined on `Inventory.name` because we are filtering on that column. +- Unique constraints should be defined on `Inventory.id` and `Customers.id` since they are join columns. +- Additionally non-unique indexes should be defined on `Orders.product_id` and `Orders.customer_id`. +- `Inventory` should appear first in the `FROM` clause because it is the only table mentioned in the `WHERE` clause. +- `Orders` should come next because it joins directly with `Inventory`. +- `Customers` should come next because it joins directly with `Orders`. + +As such, the query should be rewritten as follows: +```sql +-- Find all customers who ordered a particular product and when they ordered it +SELECT c.first_name, c.last_name, o.date +FROM Inventory product +JOIN Orders o ON product.id = o.product_id +JOIN Customers c ON c.id = o.customer_id +WHERE product.name = {product_name}; +``` + ## Appendix Common production rules that have been used throughout this document. From 804a65bcdaee04482537debce5a7f33c6827266c Mon Sep 17 00:00:00 2001 From: joshua-spacetime Date: Mon, 24 Feb 2025 15:03:43 -0800 Subject: [PATCH 2/3] changes from review --- docs/sql/index.md | 86 ++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 81 insertions(+), 5 deletions(-) diff --git a/docs/sql/index.md b/docs/sql/index.md index 5c8066f8..39762d1d 100644 --- a/docs/sql/index.md +++ b/docs/sql/index.md @@ -484,9 +484,9 @@ SELECT * FROM "Balance$" When designing your schema or crafting your queries, consider the following best practices to ensure optimal performance: -- **Define Unique Constraints:** - If a column's values are guaranteed to be unique, - define a unique constraint on that column. +- **Define Primary Key and Unique Constraints:** + If a column is a primary key or its values are guaranteed to be unique, + define a primary key or unique constraint on that column. The query planner can further optimize joins if it knows the join values to be unique. - **Index Filtered Columns:** @@ -520,13 +520,89 @@ WHERE product.name = {product_name} In order to conform with the best practices for optimizing performance and scalability: - An index should be defined on `Inventory.name` because we are filtering on that column. -- Unique constraints should be defined on `Inventory.id` and `Customers.id` since they are join columns. +- `Inventory.id` and `Customers.id` should be defined as primary keys. - Additionally non-unique indexes should be defined on `Orders.product_id` and `Orders.customer_id`. - `Inventory` should appear first in the `FROM` clause because it is the only table mentioned in the `WHERE` clause. - `Orders` should come next because it joins directly with `Inventory`. - `Customers` should come next because it joins directly with `Orders`. -As such, the query should be rewritten as follows: +:::server-rust +```rust +#[table( + name = Inventory, + index(name = product_name, btree = [name]), + public +)] +struct Inventory { + #[primary_key] + id: u64, + name: String, + .. +} + +#[table( + name = Customers, + public +)] +struct Customers { + #[primary_key] + id: u64, + first_name: String, + last_name: String, + .. +} + +#[table( + name = Orders, + public +)] +struct Orders { + #[primary_key] + id: u64, + #[unique] + product_id: u64, + #[unique] + customer_id: u64, + .. +} +``` +::: +:::server-csharp +```cs +[SpacetimeDB.Table(Name = "Inventory")] +[SpacetimeDB.Index(Name = "product_name", BTree = ["name"])] +public partial struct Inventory +{ + [SpacetimeDB.PrimaryKey] + public long id; + public string name; + .. +} + +[SpacetimeDB.Table(Name = "Customers")] +public partial struct Customers +{ + [SpacetimeDB.PrimaryKey] + public long id; + public string first_name; + public string last_name; + .. +} + +[SpacetimeDB.Table(Name = "Orders")] +public partial struct Orders +{ + [SpacetimeDB.PrimaryKey] + public long id; + [SpacetimeDB.Unique] + public long product_id; + [SpacetimeDB.Unique] + public long customer_id; + .. +} +``` +::: + ```sql -- Find all customers who ordered a particular product and when they ordered it SELECT c.first_name, c.last_name, o.date From b761d6a22bfb4855031c2f7725f1cea8856474ac Mon Sep 17 00:00:00 2001 From: joshua-spacetime Date: Mon, 24 Feb 2025 21:13:09 -0800 Subject: [PATCH 3/3] changes from review --- docs/sql/index.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/docs/sql/index.md b/docs/sql/index.md index 39762d1d..59e90ca7 100644 --- a/docs/sql/index.md +++ b/docs/sql/index.md @@ -484,23 +484,27 @@ SELECT * FROM "Balance$" When designing your schema or crafting your queries, consider the following best practices to ensure optimal performance: -- **Define Primary Key and Unique Constraints:** - If a column is a primary key or its values are guaranteed to be unique, - define a primary key or unique constraint on that column. +- **Add Primary Key and/or Unique Constraints:** + Constrain columns whose values are guaranteed to be distinct as either unique or primary keys. The query planner can further optimize joins if it knows the join values to be unique. - **Index Filtered Columns:** - Define indexes on columns frequently used in a `WHERE` clause. + Index columns frequently used in a `WHERE` clause. Indexes reduce the number of rows scanned by the query engine. - **Index Join Columns:** - Define indexes on columns whose values are frequently used as join keys. + Index columns whose values are frequently used as join keys. These are columns that are used in the `ON` condition of a `JOIN`. + Again, this reduces the number of rows that must be scanned to answer a query. It is also critical for the performance of subscription updates -- so much so that it is a compiler-enforced requirement, as mentioned in the [subscription](#from) section. + If a column that has already been constrained as unique or a primary key, + it is not necessary to explicitly index it as well, + since these constraints automatically index the column in question. + - **Optimize Join Order:** Place tables with the most selective filters first in your `FROM` clause. This minimizes intermediate result sizes and improves query efficiency.