Documents using different column types, including string length

rmasters · rmasters · commit 0cdc1b12f488 · 2023-12-27T01:46:53.000Z
diff --git a/docs/advanced/column-types.md b/docs/advanced/column-types.md
@@ -0,0 +1,152 @@
+# Column Types
+
+In the tutorial, we stored scalar data types in our tables, like strings, numbers and timestamps. In practice, we often
+work with more complicated types that need to be converted to a data type our database supports.
+
+## Customising String Field Lengths
+
+As we discussed in [`TEXT` or `VARCHAR`](../tutorial/create-db-and-table.md#text-or-varchar), a `str` field type will be
+created as a `VARCHAR`, which has varying maximum-lengths depending on the database engine you are using.
+
+For cases where you know you only need to store a certain length of text, string field maximum length can be reduced
+using the `max_length` validation argument to `Field()`:
+
+```Python hl_lines="11"
+{!./docs_src/advanced/column_types/tutorial001.py[ln:1-12]!}
+```
+
+/// details | 👀 Full file preview
+
+```Python
+{!./docs_src/advanced/column_types/tutorial001.py!}
+```
+
+///
+
+/// warning
+
+Database engines behave differently when you attempt to store longer text than the character length of the `VARCHAR`
+column. Notably:
+
+* SQLite does not enforce the length of a `VARCHAR`. It will happily store up to 500-million characters of text.
+* MySQL will emit a warning, but will also truncate your text to fit the size of the `VARCHAR`.
+* PostgreSQL will respond with an error code, and your query will not be executed.
+
+///
+
+However if you need to store much longer strings than `VARCHAR` can allow, databases provide `TEXT` or `CLOB`
+(**c**haracter **l**arge **ob**ject) column types. We can use these by specifying an SQLAlchemy column type to the field
+with the `sa_type` keyword argument:
+
+```Python hl_lines="12"
+{!./docs_src/advanced/column_types/tutorial001.py[ln:5-45]!}
+```
+
+/// tip
+
+`Text` also accepts a character length argument, which databases use to optimise the storage of a particular field.
+Some databases support `TINYTEXT`, `SMALLTEXT`, `MEDIUMTEXT` and `LONGTEXT` column types - ranging from 255 bytes to
+4 gigabytes. If you know the maximum length of data, specifying it like `Text(1000)` will automatically select the
+best-suited, supported type for your database engine.
+
+///
+
+
+With this approach, we can use [any kind of SQLAlchemy type](https://docs.sqlalchemy.org/en/20/core/type_basics.html).
+For example, if we were building a mapping application, we could store spatial information:
+
+```Python
+{!./docs_src/advanced/column_types/tutorial002.py!}
+```
+
+## Supported Types
+
+Python types are mapped to column types as so:
+
+<table>
+<tr>
+<th>Python type</th><th>SQLAlchemy type</th><th>Database column types</th>
+</tr>
+<tr>
+<td>str</td><td>String</td><td>VARCHAR</td>
+</tr>
+<tr>
+<td>int</td><td>Integer</td><td>INTEGER</td>
+</tr>
+<tr>
+<td>float</td><td>Float</td><td>FLOAT, REAL, DOUBLE</td>
+</tr>
+<tr>
+<td>bool</td><td>Boolean</td><td>BOOL or TINYINT</td>
+</tr>
+<tr>
+<td>datetime.datetime</td><td>DateTime</td><td>DATETIME, TIMESTAMP, DATE</td>
+</tr>
+<tr>
+<td>datetime.date</td><td>Date</td><td>DATE</td>
+</tr>
+<tr>
+<td>datetime.timedelta</td><td>Interval</td><td>INTERVAL, INT</td>
+</tr>
+<tr>
+<td>datetime.time</td><td>Time</td><td>TIME, DATETIME</td>
+</tr>
+<tr>
+<td>bytes</td><td>LargeBinary</td><td>BLOB, BYTEA</td>
+</tr>
+<tr>
+<td>Decimal</td><td>Numeric</td><td>DECIMAL, FLOAT</td>
+</tr>
+<tr>
+<td>enum.Enum</td><td>Enum</td><td>ENUM, VARCHAR</td>
+</tr>
+<tr>
+<td>uuid.UUID</td><td>GUID</td><td>UUID, CHAR(32)</td>
+</tr>
+</table>
+
+In addition, the following types are stored as `VARCHAR`:
+
+* ipaddress.IPv4Address
+* ipaddress.IPv4Network
+* ipaddress.IPv6Address
+* ipaddress.IPv6Network
+* pathlib.Path
+
+### IP Addresses
+
+IP Addresses from the <a href="https://docs.python.org/3/library/ipaddress.html" class="external-link" target="_blank">Python `ipaddress` module</a> are stored as text.
+
+```Python hl_lines="1 11"
+{!./docs_src/advanced/column_types/tutorial003.py[ln:1-13]!}
+```
+
+### Filesystem Paths
+
+Paths to files and directories using the <a href="https://docs.python.org/3/library/pathlib.html" class="external-link" target="_blank">Python `pathlib` module</a> are stored as text.
+
+```Python hl_lines="3 12"
+{!./docs_src/advanced/column_types/tutorial003.py[ln:1-13]!}
+```
+
+/// tip
+
+The stored value of a Path is the basic string value: `str(Path('../path/to/file'))`. If you need to store the full path
+ensure you call `absolute()` on the path before setting it in your model.
+
+///
+
+### UUIDs
+
+UUIDs from the <a href="https://docs.python.org/3/library/uuid.html" class="external-link" target="_blank">Python `uuid`
+module</a> are stored as `UUID` types in supported databases (just PostgreSQL at the moment), otherwise as a `CHAR(32)`.
+
+```Python hl_lines="4 10"
+{!./docs_src/advanced/column_types/tutorial003.py[ln:1-13]!}
+```
+
+## Custom Pydantic types
+
+As SQLModel is built on Pydantic, you can use any custom type as long as it would work in a Pydantic model. However, if
+the type is not a subclass of [a type from the table above](#supported-types), you will need to specify an SQLAlchemy
+type to use.
diff --git a/docs/tutorial/create-db-and-table.md b/docs/tutorial/create-db-and-table.md
@@ -500,7 +500,7 @@ To make it easier to start using **SQLModel** right away independent of the data
 
 /// tip
 
-You will learn how to change the maximum length of string columns later in the Advanced Tutorial - User Guide.
+You can learn how to change the maximum length of string columns later in the [Advanced Tutorial - User Guide](../advanced/column-types.md){.internal-link target=_blank}.
 
 ///
 
diff --git a/docs_src/advanced/column_types/__init__.py b/docs_src/advanced/column_types/__init__.py
diff --git a/docs_src/advanced/column_types/tutorial001.py b/docs_src/advanced/column_types/tutorial001.py
@@ -0,0 +1,80 @@
+from typing import Optional
+
+from sqlalchemy import Text
+from sqlmodel import Field, Session, SQLModel, create_engine, select
+from wonderwords import RandomWord
+
+
+class Villian(SQLModel, table=True):
+    id: Optional[int] = Field(default=None, primary_key=True)
+    name: str = Field(index=True)
+    country_code: str = Field(max_length=2)
+    backstory: str = Field(sa_type=Text())
+
+
+sqlite_file_name = "database.db"
+sqlite_url = f"sqlite:///{sqlite_file_name}"
+
+engine = create_engine(sqlite_url, echo=True)
+
+
+def create_db_and_tables():
+    SQLModel.metadata.create_all(engine)
+
+
+def generate_backstory(words: int) -> str:
+    return " ".join(RandomWord().random_words(words, regex=r"\S+"))
+
+
+def create_villains():
+    villian_1 = Villian(
+        name="Green Gobbler", country_code="US", backstory=generate_backstory(500)
+    )
+    villian_2 = Villian(
+        name="Arnim Zozza", country_code="DE", backstory=generate_backstory(500)
+    )
+    villian_3 = Villian(
+        name="Low-key", country_code="AS", backstory=generate_backstory(500)
+    )
+
+    with Session(engine) as session:
+        session.add(villian_1)
+        session.add(villian_2)
+        session.add(villian_3)
+
+        session.commit()
+
+
+def count_words(sentence: str) -> int:
+    return sentence.count(" ") + 1
+
+
+def select_villians():
+    with Session(engine) as session:
+        statement = select(Villian).where(Villian.name == "Green Gobbler")
+        results = session.exec(statement)
+        villian_1 = results.one()
+        print(
+            "Villian 1:",
+            {"name": villian_1.name, "country_code": villian_1.country_code},
+            count_words(villian_1.backstory),
+        )
+
+        statement = select(Villian).where(Villian.name == "Low-key")
+        results = session.exec(statement)
+        villian_2 = results.one()
+        print(
+            "Villian 2:",
+            {"name": villian_2.name, "country_code": villian_2.country_code},
+            count_words(villian_1.backstory),
+        )
+
+
+def main():
+    create_db_and_tables()
+    create_villains()
+    select_villians()
+
+
+if __name__ == "__main__":
+    main()
diff --git a/docs_src/advanced/column_types/tutorial002.py b/docs_src/advanced/column_types/tutorial002.py
@@ -0,0 +1,17 @@
+from typing import Optional
+
+from geoalchemy2.types import Geography
+from sqlmodel import Field, SQLModel, create_engine
+
+
+class BusStop(SQLModel, table=True):
+    id: Optional[int] = Field(default=..., primary_key=True)
+    latlng: Geography = Field(sa_type=Geography(geometry_type="POINT", srid=4326))
+
+
+sqlite_file_name = "database.db"
+sqlite_url = f"sqlite:///{sqlite_file_name}"
+
+engine = create_engine(sqlite_url, echo=True)
+
+SQLModel.metadata.create_all(engine)
diff --git a/docs_src/advanced/column_types/tutorial003.py b/docs_src/advanced/column_types/tutorial003.py
@@ -0,0 +1,21 @@
+import ipaddress
+from datetime import UTC, datetime
+from pathlib import Path
+from uuid import UUID, uuid4
+
+from sqlmodel import Field, SQLModel, create_engine
+
+
+class Avatar(SQLModel, table=True):
+    id: UUID = Field(default_factory=uuid4, primary_key=True)
+    source_ip_address: ipaddress.IPv4Address
+    upload_location: Path
+    uploaded_at: datetime = Field(default=datetime.now(tz=UTC))
+
+
+sqlite_file_name = "database.db"
+sqlite_url = f"sqlite:///{sqlite_file_name}"
+
+engine = create_engine(sqlite_url, echo=True)
+
+SQLModel.metadata.create_all(engine)
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -98,6 +98,7 @@ nav:
   - Advanced User Guide:
     - advanced/index.md
     - advanced/decimal.md
+    - advanced/column-types.md
   - alternatives.md
   - help.md
   - contributing.md
diff --git a/pyproject.toml b/pyproject.toml
@@ -54,6 +54,9 @@ httpx = "0.24.1"
 dirty-equals = "^0.6.0"
 typer-cli = "^0.0.13"
 mkdocs-markdownextradata-plugin = ">=0.1.7,<0.3.0"
+# For column type tests
+wonderwords = "^2.2.0"
+geoalchemy2 = "^0.14.3"
 
 [build-system]
 requires = ["poetry-core"]
diff --git a/tests/test_advanced/test_column_types/__init__.py b/tests/test_advanced/test_column_types/__init__.py
diff --git a/tests/test_advanced/test_column_types/test_tutorial001.py b/tests/test_advanced/test_column_types/test_tutorial001.py
@@ -0,0 +1,44 @@
+from unittest.mock import patch
+
+from sqlmodel import create_engine
+
+from ...conftest import get_testing_print_function
+
+expected_calls = [
+    [
+        "Villian 1:",
+        {
+            "name": "Green Gobbler",
+            "country_code": "US",
+        },
+        500,
+    ],
+    [
+        "Villian 2:",
+        {
+            "name": "Low-key",
+            "country_code": "AS",
+        },
+        500,
+    ],
+]
+
+
+def test_tutorial(clear_sqlmodel):
+    """
+    Unfortunately, SQLite does not enforce varchar lengths, so we can't test an oversize case without spinning up a
+    database engine.
+
+    """
+
+    from docs_src.advanced.column_types import tutorial001 as mod
+
+    mod.sqlite_url = "sqlite://"
+    mod.engine = create_engine(mod.sqlite_url)
+    calls = []
+
+    new_print = get_testing_print_function(calls)
+
+    with patch("builtins.print", new=new_print):
+        mod.main()
+    assert calls == expected_calls