Add merged notebooks

mvdoc · mvdoc · commit 03b02a44b6de · 2025-03-31T14:00:13.000-07:00
diff --git a/tutorials/notebooks/shortclips/vem_tutorials_merged_for_colab.ipynb b/tutorials/notebooks/shortclips/vem_tutorials_merged_for_colab.ipynb
@@ -1440,14 +1440,14 @@
    "outputs": [],
    "source": [
     "from scipy.stats import zscore\n",
+    "from voxelwise_tutorials.utils import zscore_runs\n",
     "\n",
     "# indice of first sample of each run\n",
     "run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
     "print(run_onsets)\n",
     "\n",
     "# zscore each training run separately\n",
-    "Y_train = np.split(Y_train, run_onsets[1:])\n",
-    "Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n",
+    "Y_train = zscore_runs(Y_train, run_onsets)\n",
     "# zscore each test run separately\n",
     "Y_test = zscore(Y_test, axis=1)"
    ]
@@ -1474,7 +1474,6 @@
    "source": [
     "Y_test = Y_test.mean(0)\n",
     "# We need to zscore the test data again, because we took the mean across repetitions.\n",
-    "# This averaging step makes the standard deviation approximately equal to 1/sqrt(n_repeats)\n",
     "Y_test = zscore(Y_test, axis=0)\n",
     "\n",
     "print(\"(n_samples_test, n_voxels) =\", Y_test.shape)"
@@ -2117,7 +2116,7 @@
     "Similarly to {cite:t}`huth2012`, we correct the coefficients of features linked by a\n",
     "semantic relationship. When building the wordnet features, if a frame was\n",
     "labeled with `wolf`, the authors automatically added the semantically linked\n",
-    "categories `canine`, `carnivore`, `placental mammal`, `mamma`, `vertebrate`,\n",
+    "categories `canine`, `carnivore`, `placental mammal`, `mammal`, `vertebrate`,\n",
     "`chordate`, `organism`, and `whole`. The authors thus argue that the same\n",
     "correction needs to be done on the coefficients.\n",
     "\n"
@@ -2413,6 +2412,7 @@
     "import numpy as np\n",
     "from scipy.stats import zscore\n",
     "from voxelwise_tutorials.io import load_hdf5_array\n",
+    "from voxelwise_tutorials.utils import zscore_runs\n",
     "\n",
     "file_name = os.path.join(directory, \"responses\", f\"{subject}_responses.hdf\")\n",
     "Y_train = load_hdf5_array(file_name, key=\"Y_train\")\n",
@@ -2425,8 +2425,7 @@
     "run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
     "\n",
     "# zscore each training run separately\n",
-    "Y_train = np.split(Y_train, run_onsets[1:])\n",
-    "Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n",
+    "Y_train = zscore_runs(Y_train, run_onsets)\n",
     "# zscore each test run separately\n",
     "Y_test = zscore(Y_test, axis=1)"
    ]
@@ -2616,14 +2615,16 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Intermission: understanding delays\n",
+    "## Understanding delays\n",
     "\n",
     "To have an intuitive understanding of what we accomplish by delaying the\n",
     "features before model fitting, we will simulate one voxel and a single\n",
     "feature. We will then create a ``Delayer`` object (which was used in the\n",
-    "previous pipeline) and visualize its effect on our single feature. Let's\n",
-    "start by simulating the data.\n",
-    "\n"
+    "previous pipeline) and visualize its effect on our single feature. \n",
+    "\n",
+    "Let's start by simulating the data. We assume a simple scenario in which an event in\n",
+    "our experiment occurred at $t = 20$ seconds and lasted for 10 seconds. For each timepoint, our simulated feature\n",
+    "is a simple variable that indicates whether the event occurred or not."
    ]
   },
   {
@@ -2634,71 +2635,83 @@
    },
    "outputs": [],
    "source": [
-    "# number of total trs\n",
-    "n_trs = 50\n",
-    "# repetition time for the simulated data\n",
-    "TR = 2.0\n",
-    "rng = np.random.RandomState(42)\n",
-    "y = rng.randn(n_trs)\n",
-    "x = np.zeros(n_trs)\n",
-    "# add some arbitrary value to our feature\n",
-    "x[15:20] = 0.5\n",
-    "x += rng.randn(n_trs) * 0.1  # add some noise\n",
+    "from voxelwise_tutorials.delays_toy import create_voxel_data\n",
     "\n",
-    "# create a delayer object and delay the features\n",
-    "delayer = Delayer(delays=[0, 1, 2, 3, 4])\n",
-    "x_delayed = delayer.fit_transform(x[:, None])"
+    "# simulate an activation pulse at 20 s for 10 s of duration\n",
+    "simulated_X, simulated_Y, times = create_voxel_data(onset=20, duration=10)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We next plot the simulated data. In this toy example, we assumed a \"canonical\" \n",
+    "hemodynamic response function (HRF) (a double gamma function). This is an idealized\n",
+    "HRF that is often used in the literature to model the BOLD response. In practice, \n",
+    "however, the HRF can vary significantly across brain areas.\n",
+    "\n",
+    "Because of the HRF, notice that even though the event occurred at $t = 20$ seconds, \n",
+    "the BOLD response is delayed in time. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import matplotlib.pyplot as plt\n",
+    "from voxelwise_tutorials.delays_toy import plot_delays_toy\n",
+    "\n",
+    "plot_delays_toy(simulated_X, simulated_Y, times)\n",
+    "plt.show()"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In the next cell we are plotting six lines. The subplot at the top shows the\n",
-    "simulated BOLD response, while the other subplots show the simulated feature\n",
-    "at different delays. The effect of the delayer is clear: it creates multiple\n",
+    "We next create a `Delayer` object and use it to delay the simulated feature. \n",
+    "The effect of the delayer is clear: it creates multiple\n",
     "copies of the original feature shifted forward in time by how many samples we\n",
     "requested (in this case, from 0 to 4 samples, which correspond to 0, 2, 4, 6,\n",
     "and 8 s in time with a 2 s TR).\n",
     "\n",
     "When these delayed features are used to fit a voxelwise encoding model, the\n",
     "brain response $y$ at time $t$ is simultaneously modeled by the\n",
-    "feature $x$ at times $t-0, t-2, t-4, t-6, t-8$. In the remaining\n",
-    "of this example we will see that this method improves model prediction\n",
-    "accuracy and it allows to account for the underlying shape of the hemodynamic\n",
-    "response function.\n",
-    "\n"
+    "feature $x$ at times $t-0, t-2, t-4, t-6, t-8$. For example, the time sample highlighted\n",
+    "in the plot below ($t = 30$ seconds) is modeled by the features at \n",
+    "$t = 30, 28, 26, 24, 22$ seconds."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "collapsed": false
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
-    "import matplotlib.pyplot as plt\n",
+    "# create a delayer object and delay the features\n",
+    "delayer = Delayer(delays=[0, 1, 2, 3, 4])\n",
+    "simulated_X_delayed = delayer.fit_transform(simulated_X[:, None])\n",
     "\n",
-    "fig, axs = plt.subplots(6, 1, figsize=(6, 6), constrained_layout=True, sharex=True)\n",
-    "times = np.arange(n_trs) * TR\n",
-    "\n",
-    "axs[0].plot(times, y, color=\"r\")\n",
-    "axs[0].set_title(\"BOLD response\")\n",
-    "for i, (ax, xx) in enumerate(zip(axs.flat[1:], x_delayed.T)):\n",
-    "    ax.plot(times, xx, color=\"k\")\n",
-    "    ax.set_title(\n",
-    "        \"$x(t - {0:.0f})$ (feature delayed by {1} sample{2})\".format(\n",
-    "            i * TR, i, \"\" if i == 1 else \"s\"\n",
-    "        )\n",
-    "    )\n",
-    "for ax in axs.flat:\n",
-    "    ax.axvline(40, color=\"gray\")\n",
-    "    ax.set_yticks([])\n",
-    "_ = axs[-1].set_xlabel(\"Time [s]\")\n",
+    "# plot the simulated data and highlight t = 30\n",
+    "plot_delays_toy(simulated_X_delayed, simulated_Y, times, highlight=30)\n",
     "plt.show()"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This simple example shows how the delayed features take into account of the HRF. \n",
+    "This approach is often referred to as a \"finite impulse response\" (FIR) model.\n",
+    "By delaying the features, the regression model learns the weights for each voxel \n",
+    "separately. Therefore, the FIR approach is able to adapt to the shape of the HRF in each \n",
+    "voxel, without assuming a fixed canonical HRF shape. \n",
+    "As we will see in the remaining of this notebook, this approach improves model \n",
+    "prediction accuracy significantly."
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -2988,6 +3001,7 @@
     "import numpy as np\n",
     "from scipy.stats import zscore\n",
     "from voxelwise_tutorials.io import load_hdf5_array\n",
+    "from voxelwise_tutorials.utils import zscore_runs\n",
     "\n",
     "file_name = os.path.join(directory, \"responses\", f\"{subject}_responses.hdf\")\n",
     "Y_train = load_hdf5_array(file_name, key=\"Y_train\")\n",
@@ -3000,8 +3014,7 @@
     "run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
     "\n",
     "# zscore each training run separately\n",
-    "Y_train = np.split(Y_train, run_onsets[1:])\n",
-    "Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n",
+    "Y_train = zscore_runs(Y_train, run_onsets)\n",
     "# zscore each test run separately\n",
     "Y_test = zscore(Y_test, axis=1)"
    ]
@@ -3383,7 +3396,7 @@
     "semantic information.\n",
     "\n",
     "To better disentangle the two feature spaces, we developed a joint model\n",
-    "called `banded ridge regression` {cite}`nunez2019,dupre2022`, which fits multiple feature spaces\n",
+    "called **banded ridge regression** {cite}`nunez2019,dupre2022`, which fits multiple feature spaces\n",
     "simultaneously with optimal regularization for each feature space. This model\n",
     "is described in the next example.\n",
     "\n"
@@ -3488,6 +3501,7 @@
     "import numpy as np\n",
     "from scipy.stats import zscore\n",
     "from voxelwise_tutorials.io import load_hdf5_array\n",
+    "from voxelwise_tutorials.utils import zscore_runs\n",
     "\n",
     "file_name = os.path.join(directory, \"responses\", f\"{subject}_responses.hdf\")\n",
     "Y_train = load_hdf5_array(file_name, key=\"Y_train\")\n",
@@ -3500,8 +3514,7 @@
     "run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
     "\n",
     "# zscore each training run separately\n",
-    "Y_train = np.split(Y_train, run_onsets[1:])\n",
-    "Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n",
+    "Y_train = zscore_runs(Y_train, run_onsets)\n",
     "# zscore each test run separately\n",
     "Y_test = zscore(Y_test, axis=1)"
    ]
diff --git a/tutorials/notebooks/shortclips/vem_tutorials_merged_for_colab_model_fitting.ipynb b/tutorials/notebooks/shortclips/vem_tutorials_merged_for_colab_model_fitting.ipynb
@@ -843,14 +843,14 @@
    "outputs": [],
    "source": [
     "from scipy.stats import zscore\n",
+    "from voxelwise_tutorials.utils import zscore_runs\n",
     "\n",
     "# indice of first sample of each run\n",
     "run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
     "print(run_onsets)\n",
     "\n",
     "# zscore each training run separately\n",
-    "Y_train = np.split(Y_train, run_onsets[1:])\n",
-    "Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n",
+    "Y_train = zscore_runs(Y_train, run_onsets)\n",
     "# zscore each test run separately\n",
     "Y_test = zscore(Y_test, axis=1)"
    ]
@@ -877,7 +877,6 @@
    "source": [
     "Y_test = Y_test.mean(0)\n",
     "# We need to zscore the test data again, because we took the mean across repetitions.\n",
-    "# This averaging step makes the standard deviation approximately equal to 1/sqrt(n_repeats)\n",
     "Y_test = zscore(Y_test, axis=0)\n",
     "\n",
     "print(\"(n_samples_test, n_voxels) =\", Y_test.shape)"
@@ -1520,7 +1519,7 @@
     "Similarly to {cite:t}`huth2012`, we correct the coefficients of features linked by a\n",
     "semantic relationship. When building the wordnet features, if a frame was\n",
     "labeled with `wolf`, the authors automatically added the semantically linked\n",
-    "categories `canine`, `carnivore`, `placental mammal`, `mamma`, `vertebrate`,\n",
+    "categories `canine`, `carnivore`, `placental mammal`, `mammal`, `vertebrate`,\n",
     "`chordate`, `organism`, and `whole`. The authors thus argue that the same\n",
     "correction needs to be done on the coefficients.\n",
     "\n"
@@ -1823,6 +1822,7 @@
     "import numpy as np\n",
     "from scipy.stats import zscore\n",
     "from voxelwise_tutorials.io import load_hdf5_array\n",
+    "from voxelwise_tutorials.utils import zscore_runs\n",
     "\n",
     "file_name = os.path.join(directory, \"responses\", f\"{subject}_responses.hdf\")\n",
     "Y_train = load_hdf5_array(file_name, key=\"Y_train\")\n",
@@ -1835,8 +1835,7 @@
     "run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
     "\n",
     "# zscore each training run separately\n",
-    "Y_train = np.split(Y_train, run_onsets[1:])\n",
-    "Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n",
+    "Y_train = zscore_runs(Y_train, run_onsets)\n",
     "# zscore each test run separately\n",
     "Y_test = zscore(Y_test, axis=1)"
    ]
@@ -2218,7 +2217,7 @@
     "semantic information.\n",
     "\n",
     "To better disentangle the two feature spaces, we developed a joint model\n",
-    "called `banded ridge regression` {cite}`nunez2019,dupre2022`, which fits multiple feature spaces\n",
+    "called **banded ridge regression** {cite}`nunez2019,dupre2022`, which fits multiple feature spaces\n",
     "simultaneously with optimal regularization for each feature space. This model\n",
     "is described in the next example.\n",
     "\n"
@@ -2323,6 +2322,7 @@
     "import numpy as np\n",
     "from scipy.stats import zscore\n",
     "from voxelwise_tutorials.io import load_hdf5_array\n",
+    "from voxelwise_tutorials.utils import zscore_runs\n",
     "\n",
     "file_name = os.path.join(directory, \"responses\", f\"{subject}_responses.hdf\")\n",
     "Y_train = load_hdf5_array(file_name, key=\"Y_train\")\n",
@@ -2335,8 +2335,7 @@
     "run_onsets = load_hdf5_array(file_name, key=\"run_onsets\")\n",
     "\n",
     "# zscore each training run separately\n",
-    "Y_train = np.split(Y_train, run_onsets[1:])\n",
-    "Y_train = np.concatenate([zscore(run, axis=0) for run in Y_train], axis=0)\n",
+    "Y_train = zscore_runs(Y_train, run_onsets)\n",
     "# zscore each test run separately\n",
     "Y_test = zscore(Y_test, axis=1)"
    ]