chanzuckerberg
diff --git a/‎docs/api/quickstart2d.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/api/quickstart2d.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/assets/saber_gui.png‎
-966 KB b/‎docs/assets/saber_gui.png‎
-966 KB
diff --git a/‎docs/getting-started/import-tomos.md‎
Lines changed: 7 additions & 8 deletions b/‎docs/getting-started/import-tomos.md‎
Lines changed: 7 additions & 8 deletions
diff --git a/‎docs/getting-started/quickstart.md‎
Lines changed: 26 additions & 16 deletions b/‎docs/getting-started/quickstart.md‎
Lines changed: 26 additions & 16 deletions
diff --git a/‎docs/tutorials/preprocessing.md‎
Lines changed: 34 additions & 17 deletions b/‎docs/tutorials/preprocessing.md‎
Lines changed: 34 additions & 17 deletions
diff --git a/‎saber/analysis/organelle_statistics.py‎
Lines changed: 27 additions & 24 deletions b/‎saber/analysis/organelle_statistics.py‎
Lines changed: 27 additions & 24 deletions
diff --git a/‎saber/classifier/cli.py‎
Lines changed: 2 additions & 0 deletions b/‎saber/classifier/cli.py‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎saber/classifier/datasets/singleZarrDataset.py‎
Lines changed: 9 additions & 8 deletions b/‎saber/classifier/datasets/singleZarrDataset.py‎
Lines changed: 9 additions & 8 deletions
@@ -15,8 +15,8 @@ This quickstart guide shows you how to use SABER's API to segment 2D micrographs
 Before starting, ensure you have SABER installed and import the necessary modules: SABER supports various file formats commonly used in microscopy:
 ```python
 from saber.segmenters.micro import cryoMicroSegmenter
-from saber.classifier.models import common
 from saber.visualization import classifier as viz
+from saber.classifier.models import common
 from saber.utils import io
 import numpy as np
 import torch
 
@@ -77,16 +77,15 @@ The `copick config filesystem` command assumes local paths, but you can edit the
 
 </details>
 
-<details markdown="1">
-<summary><strong> 💡 Understanding the `--objects` flag</strong></summary>
+</details>
 
-The `--objects` flag accepts 2-4 elements separated by commas:
+!!! info
+    The `--objects` flag accepts 2-4 elements separated by commas:
 
-1. **Particle name** (required): e.g., `ribosome`
-2. **Is pickable** (required): `True` for particles, `False` for continuous segmentations
-3. **Particle radius** (optional): in Ångströms, e.g., `130`
-4. **PDB ID** (optional): reference structure, e.g., `6QZP`
-</details>
+    1. **Particle name** (required): e.g., `ribosome`
+    2. **Is pickable** (required): `True` for particles, `False` for continuous segmentations
+    3. **Particle radius** (optional): in Ångströms, e.g., `130`
+    4. **PDB ID** (optional): reference structure, e.g., `6QZP`
 
 This structure supports both particle picking for sub-tomogram averaging and broader 3D segmentation tasks. Our deep learning platform [Octopi 🐙](https://github.com/chanzuckerberg/octopi) is designed to train models from copick projects for:
 
 
@@ -15,62 +15,72 @@ For reference, you can skip steps 1 and 2 to visualize the raw SAM2 segmentation
 ## 🧩 Phase 1: Curating Training Labels and Training and Domain Expert Classifier 
 
 ### Producing Intial SAM2 Segmentations
-Use `prepare-tomogram-training` to generate 2D segmentations from a tomogram using SAM2-style slab-based inference. These masks act as a rough initialization for downstream curation and model training.
+Use `prep3` to generate 2D segmentations from a tomogram using SAM2-style slab-based inference. These masks act as a rough initialization for downstream curation and model training.
 
 #### For tomogram data:
 ```bash
-saber classifier prepare-tomogram-training \
+saber classifier prep3d \
     --config config.json \
     --voxel-size 10 --tomo-alg denoised \
-    --num-slabs 3 --output training_data.zarr \
+    --num-slabs 3 --output training.zarr \
 ```
 This will save slab-wise segmentations in a Zarr volume that can be reviewed or refined further.
 
 #### For electron micrograph/single-particle data:
 ```bash
-saber classifier prepare-micrograph-training \
+saber classifier prep2d \
     --input path/to/folder/*.mrc \
-    --ouput training_data.zarr \
+    --ouput training.zarr \
     --target-resolution 10 
 ```
 
-In the case of referencing MRC files from single particle datasets use `prepare-micrograph-training` instead. 
+In the case of referencing MRC files from single particle datasets use `prep2d` instead. 
 
 ### 🎨 Annotating Segmentations for the Classifier with the Interactive GUI
 
 Launch an interactive labeling session to annotate the initial SAM2 segmentations and assign class labels.
 ```
-saber gui \
-    --input output_zarr_fname.zarr \
-    --output curated_labels.zarr \
-    --class-names carbon,lysosome,artifacts
+saber gui --input training.zarr 
 ```
 
-For transfering the data between machines, its recommended ziping (compressing) the zarr file prior to data transfer (e.g. `zip -r curated_labels.zarr.zip curated_labels.zarr`).
+For transfering the data between machines, its recommended ziping (compressing) the zarr file prior to data transfer (e.g. `zip -r training.zarr.zip training.zarr`).
 
-Once annotations are complete, split the dataset into training and validation sets:
+After you download the anntoated JSON file, you can apply the annotations on the original zarr file. 
+
+```bash
+saber classifier labeler \
+    --input training.zarr \
+    --labels labels.json \
+    --classes class1,class2,class3 \
+    --output labeld.zarr
+```
+
+Once the labeled zarr is available, split the dataset into training and validation sets:
 
 ```
 saber classifier split-data \
-    --input curated_labels.zarr \
+    --input labeled.zarr \
     --ratio 0.8
 ```
-This generates `curated_labels_train.zarr` and `curated_labels_val.zarr` for use in model training.
+This generates `labeled_train.zarr` and `labeled_val.zarr` for use in model training.
+
+!!! info "Learn More"
+    For detailed annotation instructions, see the [Annotation and Labeling](../tutorials/preprocessing.md#-annotation-with-the-saber-gui) section.
 
 ## 🧠 Phase 2: Train a Domain Expert Classifier
 
 Train a classifier using your curated annotations. This model improves segmentation accuracy beyond zero-shot results by learning from expert-provided labels.
 ```
 saber classifier train \
-    --train curated_labels_train.zarr --validate curated_labels_val.zarr \
+    --train labeled_train.zarr --validate labeled_val.zarr \
     --num-epochs 75 --num-classes 4 
 ```
 The number of classes should be 1 greater than the number of class names provided during annotation (to account for background).
 Training logs, model weights, and evaluation metrics will be saved under `results/`.
 
 ## 🔍 Phase 3: Inference
 
-### 🖼️ Producting 2D Segmentations with SABER
+### 🖼️ Producing 2D Segmentations with SABER
 
 SABER operates in two modes depending on your input: interactive mode when processing a single image, and batch processing mode when you provide a file path pattern (like `--input 'path/to/*.mrc'`) to process entire datasets automatically.
 
 
@@ -69,10 +69,10 @@ This preview helps you understand what structures SAM2 naturally identifies in y
 
 ## 🧬 Pre-processing Electron Micrographs
 
-For single-particle datasets, ADF/BF signals from S/TEM, or FIB-SEM micrographs -- use the `saber classifier prepare-micrograph-training` command:
+For single-particle datasets, ADF/BF signals from S/TEM, or FIB-SEM micrographs -- use the `saber classifier prep2d` command:
 
 ```bash
-saber classifier prepare-micrograph-training \
+saber classifier prep2d \
     --input 'path/to/*.mrc' \
     --output training.zarr
 ```
@@ -92,7 +92,7 @@ Traditional workflows require you to manually draw every mask from scratch. SABE
 
 Generate comprehensive slab-based segmentations that maintain 3D context:
 ```bash
-saber classifier prepare-tomogram-training \
+saber classifier prep3d \
     --config config.json \
     --zarr-path output_zarr_fname.zarr \
     --num-slabs 3
@@ -113,31 +113,48 @@ Small objects or sparse structures might not be present in a single slab project
 
 ---
 
-## 🎨 Next Step: Annotation with the SABER GUI
+## 🎨 Annotation with the SABER GUI
 
+Launch the GUI to begin annotating your pre-processed data: 
+```bash
+saber gui --input output_zarr_fname.zarr 
+```
 Once preprocessing is complete, SABER's unique annotation workflow begins. Instead of drawing masks from scratch, you simply:
 
-1. **Point and Click** on the precomputed segmentations.
-2. **Assign Class Labels** using the dropdown menu.
+!!! info "How the GUI works:"
+    1. **Point and Click**  on the precomputed SAM2 segmentations.
+    2.  **Assign Class Labels** using the menu on the right.
+    3. **Save the Annotations** Save the resulting JSON file with the bottom right button.
 
 ![SABER GUI](../assets/saber_gui.png)
 
-```bash
-saber gui \
-    --input output_zarr_fname.zarr \
-    --output curated_labels.zarr \
-    --class-names carbon,lysosome,artifacts
-```
-
-**Class Configuration**: The `--class-names` flag defines the biological classes present in your data. For binary classification (object vs. background), you can omit this flag for a simple two-class system.
-
-**💡 How Many Micrographs / Tomograms Should I Annotate?** In general we recommend annotating 20-40 runs per dataset. In cases where there are several objects per image/slab the lower range should be sufficient. If only a few instances are available, the higher range is recommended.  
+!!! tip "Annotation Guidelines - How Many Images to Annotate?"
+    - We recommend 20-40 runs per dataset
+    - Lower range (20): When multiple objects appear per image/slab
+    - Higher range (40): When only few instances are available
+    - Consistency is key: Maintain uniform criteria across all annotations
+    - Handle ambiguous segments: When uncertain, prefer skipping over mislabeling
 
 **Tip:** For transferring data between machines, it's recommended to compress your Zarr files:
 ```bash
 zip -r curated_labels.zarr.zip curated_labels.zarr
 ```
 
+## 🏷️ Applying Annotations for Classifier Training
+
+Once you've completed annotations in the GUI, use the `labeler` command to apply your JSON annotations to the SAM2 masks, creating a training-ready dataset. The labeler converts your point-and-click annotations into properly indexed training data, handling class ordering automatically or according to your specifications.
+
+!!! example "Basic Usage"
+    ```bash
+    saber classifier labeler \
+        --input training.zarr \
+        --labels labels.json \
+        --classes lysosome,carbon,edge \
+        --output labeled.zarr
+    ```
+
+We can either control the ordering of the labels or apply a subset of the labels with the `--classes` flag. If the flag is omitted, all classes are used in alphabetical orde
+
 ---
 
-_Ready to move on? Check out the [Training a Classifier](training.md) tutorial!_
+_Ready to move on? Check out the [Training a Classifier](training.md) tutorial!_
@@ -1,13 +1,22 @@
 from skimage.measure import regionprops
+from copick_utils.io import writers
 import numpy as np
 
-def extract_organelle_statistics(run, mask, organelle_name, session_id, user_id, voxel_size, save_copick = True, zfile=None, xyz_order=True):
+def extract_organelle_statistics(
+    run, mask, organelle_name, session_id, user_id, 
+    voxel_size, save_copick = True, save_statistics=True, xyz_order=True):
+    """
+    Extract statistics and return CSV rows.
+    
+    Returns:
+        List of CSV rows if save_statistics is True, empty list otherwise
+    """
 
     unique_labels = np.unique(mask)
     unique_labels = unique_labels[unique_labels > 0]  # Ignore background (label 0)
 
     coordinates = {}
-    results = {}
+    csv_rows = []
     for label in unique_labels:
 
         component_mask = (mask == label).astype("int")
@@ -19,45 +28,39 @@ def extract_organelle_statistics(run, mask, organelle_name, session_id, user_id,
             centroid = centroid[::-1]
         coordinates[str(label)] = centroid
 
-        if zfile is not None:
+        if save_statistics:
 
             # Compute Volume in nm^3
             volume = np.sum(component_mask) * (voxel_size/10)**3 # Convert from Angstom to nm^3
 
             # Sort axes to identify the first (Z-biased) and two in-plane dimensions
-            axes_lengths = sorted([rprops.axis_major_length, rprops.axis_minor_length, rprops.axis_minor_length])
+            axes_lengths = sorted([rprops.axis_major_length, rprops.axis_minor_length, 
+                                   rprops.axis_minor_length])
 
             # Convert to physical units (nm)
             axis_x = axes_lengths[1] * (voxel_size/10)  # Likely an in-plane axis
             axis_y = axes_lengths[2] * (voxel_size/10)  # Likely an in-plane axis
             diameter = (axis_x + axis_y) / 2
 
-            # Save Statistics in a structured dictionary
-            results[str(label)] = {'volume': volume, 'diameter': diameter, 'coordinates': centroid}
+            # Prepare row for CSV
+            csv_row = [
+                run.name,
+                int(label),
+                volume,
+                diameter,
+            ]
+            csv_rows.append(csv_row)
 
-    # Save to Copick
+    # Save Statistics to CSV File
     if len(coordinates) > 0:
-
-        # Save to Copick
+        # Save Coordinates to Copick
         if save_copick:
-            save_coordinates_to_copick(run, coordinates, organelle_name, session_id, user_id, voxel_size)
-
-        # Save Statistics into Zarr File
-        if zfile is not None:
-            group = zfile.create_group(run.name)
-            # Save metadata as an array
-            labels = np.array(list(results.keys()), dtype=int)
-            volumes = np.array([r["volume"] for r in results.values()], dtype=float)
-            diameters = np.array([r["diameter"] for r in results.values()], dtype=float)
-            coordinates = np.array([r["coordinates"] for r in results.values()], dtype=float)
-
-            group.create_dataset("labels", data=labels, overwrite=True)
-            group.create_dataset("volumes", data=volumes, overwrite=True)
-            group.create_dataset("diameters", data=diameters, overwrite=True)
-            group.create_dataset("coordinates", data=coordinates, overwrite=True)
+            save_coordinates_to_copick(run, coordinates, organelle_name, 
+                                      session_id, user_id, voxel_size)
     else:
         print(f"{run.name} didn't have any organelles present!")
 
+    return csv_rows
 
 def save_coordinates_to_copick(run, coordinates, organelle_name, session_id, user_id, voxel_size):
 
 
@@ -3,6 +3,7 @@
 from saber.classifier.preprocess.tomogram_training_prep import prepare_tomogram_training
 from saber.classifier.preprocess.split_merge_data import split_data, merge_data
 from saber.classifier.preprocess.training_data_info import class_info
+from saber.classifier.preprocess.apply_labels import labeler
 from saber.classifier.inference import predict, predict_slurm
 from saber.classifier.train import train, train_slurm
 from saber.classifier.evaluator import evaluate
@@ -22,6 +23,7 @@ def classifier_routines():
 classifier_routines.add_command(prepare_micrograph_training)
 classifier_routines.add_command(evaluate)
 classifier_routines.add_command(class_info)
+classifier_routines.add_command(labeler)
 
 @click.group(name="classifier")
 def slurm_classifier_routines():
 
@@ -32,19 +32,20 @@ def __init__(self, zarr_path, mode='train', transform=None, min_area = 250):
         self.samples = []
         for run_id in tqdm(self.run_ids):
             group = self.zfile[run_id]
-            image = group['image'][:]
-
-            if 'masks' in group:
-                # Process candidate masks
-                candidate_masks = group['masks'][:] # [Nclass, Nx, Ny]
+            image = group['0'][:]
+            labels = group['labels']
+            
+            # Process candidate masks
+            if '0' in labels:
+                candidate_masks = labels['0'][:] # [Nclass, Nx, Ny]
                 self._process_masks(candidate_masks, image)
             else:
                 continue
 
             # Check if "rejected_masks" exists before accessing
-            if 'rejected_masks' in group:
+            if 'rejected' in labels:
                 # Process rejected masks
-                rejected_masks = group['rejected_masks'][::negative_class_reduction]
+                rejected_masks = labels['rejected'][::negative_class_reduction]
                 self._process_masks(rejected_masks, image, is_negative_mask=True)  
 
     def _process_masks(self, masks, image, is_negative_mask = False):
@@ -66,7 +67,7 @@ def _process_masks(self, masks, image, is_negative_mask = False):
                         self.samples.append({
                             'image': image,
                             'mask': component_mask,
-                            'label': 0 if is_negative_mask else class_idx + 1  # Assign labels properly
+                            'label': 0 if is_negative_mask else class_idx  # Assign labels properly
                         })
 
     def __len__(self):