From bf62b1fc22750f896c4deaabf90ca6a1e51099c9 Mon Sep 17 00:00:00 2001 From: Pham Nguyen Hung <97870091+HangenYuu@users.noreply.github.com> Date: Mon, 22 Jul 2024 02:35:06 +0000 Subject: [PATCH 1/3] Fixed dead wiki links --- doc/introduction.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/doc/introduction.rst b/doc/introduction.rst index cfbfeaf90f..06f69aceac 100644 --- a/doc/introduction.rst +++ b/doc/introduction.rst @@ -157,7 +157,7 @@ to extend PyTensor, please feel free to ask. install tutorial/index -.. _LISA: https://mila.umontreal.ca/ +.. _LISA: https://mila.quebec/en .. _Greek mathematician: http://en.wikipedia.org/wiki/Theano_(mathematician) .. _numpy: http://numpy.scipy.org/ .. _BLAS: http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms From 89f418c64cd483fbb881def072aac96e6178c7ac Mon Sep 17 00:00:00 2001 From: Pham Nguyen Hung <97870091+HangenYuu@users.noreply.github.com> Date: Mon, 22 Jul 2024 02:41:41 +0000 Subject: [PATCH 2/3] Fixed dead wiki links --- doc/introduction.rst | 2 +- doc/links.rst | 14 +++++++------- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/doc/introduction.rst b/doc/introduction.rst index 06f69aceac..5c7a837fa9 100644 --- a/doc/introduction.rst +++ b/doc/introduction.rst @@ -159,7 +159,7 @@ to extend PyTensor, please feel free to ask. .. _LISA: https://mila.quebec/en .. _Greek mathematician: http://en.wikipedia.org/wiki/Theano_(mathematician) -.. _numpy: http://numpy.scipy.org/ +.. _numpy: https://numpy.org/ .. _BLAS: http://en.wikipedia.org/wiki/Basic_Linear_Algebra_Subprograms .. _sympy: http://www.sympy.org/ diff --git a/doc/links.rst b/doc/links.rst index 8d2689fed1..ec22e14f12 100644 --- a/doc/links.rst +++ b/doc/links.rst @@ -39,18 +39,18 @@ This is a sort of memo for developers and would-be developers. .. _git: http://git-scm.com/ .. _pytest: http://docs.pytest.org/en/latest/ -.. _numpy: http://numpy.scipy.org/ +.. _numpy: https://numpy.org/ .. _python: http://www.python.org .. _scipy: http://scipy.org/ .. _autodiff: http://www.autodiff.org -.. _boost.python: http://www.boost.org/doc/libs/1_38_0/libs/python/doc/index.html +.. _boost.python: https://www.boost.org/doc/libs/1_85_0/libs/python/doc/html/index.html .. _cython: http://www.cython.org/ .. _liboil: http://liboil.freedesktop.org/wiki/ .. _llvm: http://llvm.org/ -.. _networkx: http://networkx.lanl.gov/ -.. _pypy: http://codespeak.net/pypy/dist/pypy/doc/ +.. _networkx: https://networkx.org/ +.. _pypy: https://doc.pypy.org/en/latest/ .. _swig: http://www.swig.org/ -.. _unpython: http://code.google.com/p/unpython/ -.. _pycppad: http://www.seanet.com/~bradbell/pycppad/index.xml -.. _shedskin: http://shed-skin.blogspot.com/ +.. _unpython: https://code.google.com/archive/p/unpython/ +.. _pycppad: https://github.com/Simple-Robotics/pycppad +.. _shedskin: https://shedskin.github.io/shedskin/ From 864dfd2d2caad5306d123898a4cb7b739403310a Mon Sep 17 00:00:00 2001 From: Pham Nguyen Hung <97870091+HangenYuu@users.noreply.github.com> Date: Thu, 25 Jul 2024 01:57:31 +0000 Subject: [PATCH 3/3] Deleted old documentation at doc/sandbox. --- doc/sandbox/ccodegen.rst | 255 ----------------- doc/sandbox/compilation.rst | 18 -- doc/sandbox/debugging_with_stepmode.rst | 75 ----- doc/sandbox/elemwise_compiler.rst | 86 ------ doc/sandbox/function.rst | 9 - doc/sandbox/functional.rst | 7 - doc/sandbox/how_to_make_ops.rst | 295 -------------------- doc/sandbox/index.rst | 11 - doc/sandbox/index2.rst | 15 - doc/sandbox/interactive_debugger.rst | 56 ---- doc/sandbox/logistic_regression_example.rst | 77 ----- doc/sandbox/performance.rst | 23 -- doc/sandbox/randomnumbers.rst | 245 ---------------- doc/sandbox/rethinkccodegen.rst | 124 -------- doc/sandbox/sandbox.rst | 161 ----------- doc/sandbox/software.rst | 19 -- doc/sandbox/sparse.rst | 147 ---------- doc/sandbox/tensoroptools.rst | 9 - 18 files changed, 1632 deletions(-) delete mode 100644 doc/sandbox/ccodegen.rst delete mode 100644 doc/sandbox/compilation.rst delete mode 100644 doc/sandbox/debugging_with_stepmode.rst delete mode 100644 doc/sandbox/elemwise_compiler.rst delete mode 100644 doc/sandbox/function.rst delete mode 100644 doc/sandbox/functional.rst delete mode 100644 doc/sandbox/how_to_make_ops.rst delete mode 100644 doc/sandbox/index.rst delete mode 100644 doc/sandbox/index2.rst delete mode 100644 doc/sandbox/interactive_debugger.rst delete mode 100644 doc/sandbox/logistic_regression_example.rst delete mode 100644 doc/sandbox/performance.rst delete mode 100644 doc/sandbox/randomnumbers.rst delete mode 100644 doc/sandbox/rethinkccodegen.rst delete mode 100644 doc/sandbox/sandbox.rst delete mode 100644 doc/sandbox/software.rst delete mode 100644 doc/sandbox/sparse.rst delete mode 100644 doc/sandbox/tensoroptools.rst diff --git a/doc/sandbox/ccodegen.rst b/doc/sandbox/ccodegen.rst deleted file mode 100644 index 1d9730b97d..0000000000 --- a/doc/sandbox/ccodegen.rst +++ /dev/null @@ -1,255 +0,0 @@ -'''C code is actually generated this way. Could be refreshed as developer documentation. Olivier to review. 20080904.''' - -Here is a proposal on the interface to generate C code: - -What will be passed to C -======================== - -For each ResultBase, the C code gets a variable called storage_ which contains a PyObject* pointing to a 1-element list (a sort of cell). That is the "channel" via which C and Python can communicate data. Of course, the C code will not manipulate that directly. At every execution of the C function, the PyObject* inside the storage is extracted and given the name py_ (its reference count is handled automatically). - - -Extracting the data for use with C -================================== - -In ResultBase, we have several methods to generate C code for particular purposes. They should return templated strings of C code (see below) but should not actually fill the template. The caller will fill it. - -List of template variables you can use: - * '''%(name)s:''' Will be filled in by a mangled name representing this ResultBase. - * '''%(fail)s:''' This can be inserted in the code to make the current function fail. It will proceed to cleanup everything that needs to be cleaned up. This cannot be used in any cleanup routine (and hence it is forbidden for a cleanup routine to fail!) If a code block uses %(fail)s, its corresponding cleanup block will be called first, so make sure that the cleanup can be done properly at any point where you use %(fail)s, even if you didn't allocate or INCREF everything yet. - -List of methods in ResultBase: - -'''c_declare:''' This method returns code that declares one or more variables ''without'' initializing them. These are the variables that all C code using this ResultBase will use to manipulate the data. The code should ''only'' declare variables and typedefs (no #defines, but a future extension might address this). Example: if we have a ResultBase representing a double, c_declare may simply return "double %(name)s;". ''All'' variables declared should contain the %(name)s template, but they may prefix or suffix it. - -'''c_init:''' This method returns code that initializes (zeros/sets to NULL, typically) the variables declared in c_declare. - -'''c_extract:''' This method should manipulate py_ to set the values of the variables declared by c_declare. For example, if we have a ResultBase representing a double, c_extract might return "%(name)s = PyFloat_AsDouble(py_%(name)s);" (plus error checking!). If something is wrong with the data provided from Python, c_extract should set an informative error message and insert %(fail)s. - -'''c_sync:''' This method should adjust the py_ variable using the values of the variables declared by c_declare. For example, if we have a ResultBase representing a double, c_sync might return "Py_XDECREF(py_%(name)s); py_%(name)s = PyFloat_FromDouble(%(name)s);". The result will then be made accessible from Python. c_sync is not allowed to fail, though it is not really cleanup code. - -'''c_cleanup:''' This method should clean up all the variables declared by c_declare. - -.. warning:: - - This page describes usage of c_init and c_extract as of version 0.4.0 (and - previous versions). This will change in the future, to allow c_code to - use preallocated memory buffers of the outputs. - -Important notes: - * ''Either'' c_init or c_extract will be called. The former for temporary variables and outputs, the latter for inputs. If the former is used, py_ will be set to Py_None regardless of what is in storage_. - * c_sync will only be called on the outputs, not on inputs or temporaries. - * c_cleanup will ''always'' be called. If c_sync decides to relay some data to Python (thus ousting it from the op's scope), it should NULL any pointers that c_cleanup is not allowed to free. - - -Manipulating the data from C -============================ - -The Op class has in turn several methods that generate C code. As for ResultBase, they should return templated strings of C code (see below) but should not actually fill the template. The caller will fill it. - -List of template variables you can use: - * '''%()s:''' See c_var_names. These will be substituted for mangled names. - * '''%(fail)s:''' This can be inserted in the code to make the current function fail. It will proceed to cleanup everything that needs to be cleaned up. This cannot be used in any cleanup routine (and hence it is forbidden for a cleanup routine to fail!). If a code block uses %(fail)s, its corresponding cleanup block will be called first, so make sure that the cleanup can be done properly at any point where you use %(fail)s, even if you didn't allocate or INCREF everything yet. - -'''c_var_names''': This method should return two lists, one list of strings representing the input names and one list of strings representing the output names. The actual names might be mangled by the compiler. In the template strings returned by the next few methods, you can use the names defined here. For example, if op.c_var_names() returns [['x', 'y'], ['z']], then "%(x)s" in op's templates will be the same as "%(name)s" in op.inputs[0]'s templates. This means that all the variables declared by the inputs and outputs can easily be used in the op's templates. - -'''c_validate_update''': This method should return code that ensures that the inputs are valid for processing by this Op (checking shapes, bounds, etc.). If anything is invalid, it should set an informative error message and use %(fail)s. Then, it should prepare the outputs: for example, if the output is a tensor, allocate a tensor, resize it appropriately and place it in the appropriate variable (see c_var_names). - -'''c_validate_update_cleanup''': This method should clean up any temporary storage used by c_validate_update. It is not forbidden to do it in c_validate_update itself, but this can come in handy. - -'''c_code''': This is the meat of the Op that actually calculates the function. If an error occurs in the process, it may use %(fail)s. It should work in place on the variables declared by its inputs and outputs and rely on their c_sync routines to relay the results to Python. - -'''c_code_cleanup''': This cleans up any temporary structures allocated by c_code. - -'''c_is_simple (field)''': Class field. Defaults to False. It is basically a compiler hint that this class represents a builtin C type or a small struct, so we can optimize its access. - - -Important notes: - * There might be provisions in the future to skip the validate_update step if the Op can guarantee that the inputs are valid and the outputs are set up properly. - * It is not forbidden to just put the validate_update code in c_code. Some situations might require it, but it helps organization to segregate them. - - -Failure -======= - -Besides cleanup code, all code has access to the %(fail)s template. For three code blocks, the generated C code will pretty much look like this: - -.. code-block:: cpp - - int failure = 0; - { - - { - - { - - label3: - - } - label2: - - } - label1: - - } - return failure; - -And %(fail)s in the nth code block will take the value "{failure = n; goto label;}". This means only the blocks executed up to the failure point are cleaned up and the return value indicates which block failed, which is handy for debugging. - -When compiling an Op, we want to sync the outputs so we can get the results from Python. In case of failure, we will not necessarily want to sync. Because of that, typical code will look like this: - -.. code-block:: cpp - - int failure = 0; - - - { - - { - - { - - label3: - - } - label2: - if (!failure) - - - } - label1: - - } - return failure; - -Furthermore, is not necessary to extract the output because we mean to overwrite it anyway. In that case, will be a no-op, but of course we may still need to clean up or sync what will put in the declared outputs. - - -Example ResultBase -================== - -The following ResultBase represents a double (we only care about the C part). - -.. code-block:: python - - class Double(ResultBase): - # - def c_declare(self): - return "double %(name)s;" - def c_init(self): - return "%(name)s = 0.0;" - def c_extract(self): - return "%(name)s = PyFloat_AsDouble(py_%(name)s);" - def c_cleanup(self): - return "" # nothing to do - def c_sync(self): - return "Py_XDECREF(py_%(name)s); py_%(name)s = PyFloat_FromDouble(%(name)s);" - - -Example Op -========== - -The following ResultBase represents addition of two nonnegative doubles (we only care about the C part). - -.. code-block:: python - - class Add(COp): - # - def c_var_names(self): - return "[['x', 'y'], ['z']]" - def c_validate_update(self): - return "if (%(x)s < 0 || %(y)s < 0) %(fail)s" # fail if x or y is negative - def c_validate_update_cleanup(self): - return "" # nothing to do - def c_code(self): - return "%(z)s = %(x)s + %(y)s;" - def c_code_cleanup(self): - return "" # nothing to do - -Generating a C function -======================= - -For the example Op, the generated C function will typically look like this: - -.. code-block:: cpp - - void add(PyObject* storage_x, PyObject* storage_y, PyObject* storage_z) { - PyObject* py_x = PyList_GET_ITEM(storage_x, 0); Py_XINCREF(py_x); // automatic - PyObject* py_y = PyList_GET_ITEM(storage_y, 0); Py_XINCREF(py_y); // automatic - PyObject* py_z = Py_None; // we don't care what's currently in storage_z - - failure = 0 - double x; // x.c_declare - double y; // y.c_declare - double z; // z.c_declare - { - x = PyFloat_AsDouble(py_x); // x.c_extract - { - y = PyFloat_AsDouble(py_y); // y.c_extract - { - # we don't need to use z.c_extract - { - if (x < 0 || y < 0) { // add.validate_update - // This is automatically inserted in place of %(fail)s - failure = 4; - goto label_add_validate_update_cleanup; - } - { - z = x + y; // add.c_code - label_add_code_cleanup: - } - label_add_validate_update_cleanup: - } - label_z_sync_or_cleanup: - if (!failure) { - Py_XDECREF(py_z); // z.c_sync - py_z = PyFloat_FromDouble(z); // z.c_sync, the result is now available from Python! - PyList_SET_ITEM(storage_z, 0, py_z); // always done after _.c_sync - } - Py_XDECREF(py_z); // always done after _.c_cleanup - } - label_y_cleanup: - Py_XDECREF(py_y); // always done after _.c_cleanup - } - label_x_cleanup: - Py_XDECREF(py_x); // always done after _.c_cleanup - } - return failure; - } - -Generating a C struct -===================== - -To accelerate processing a tad, a struct can be generated instead of a function. The struct will keep pointers to the storage where to fetch inputs and store outputs, but it will also store fields declared by outputs and temporaries' c_declare methods. - -Here is a sketch of the struct equivalent of the previous function: - -.. code-block:: cpp - - struct add { - PyObject* storage_x; - PyObject* storage_y; - PyObject* storage_z; - double z; // z.c_declare - - void init(PyObject* storage_x, PyObject* storage_y, PyObject* storage_z) { - // - // - } - - void cleanup(void) { - // - } - - void run(void) { - // - } - - add() { this->init(); } - ~add() { this->cleanup(); } - }; - -Advantages of using a struct: - * Can be run several times even if we provide the storage only once. - * Output variables or temporary variables can reuse what they allocated the last time. This is not particularly useful with doubles (in fact it might be detrimental), but if z was a large tensor it might be interesting to recycle the memory over thousands of runs of the Op. - -No struct members will be made if a result's c_is_simple field is True. They will be allocated on the stack instead. diff --git a/doc/sandbox/compilation.rst b/doc/sandbox/compilation.rst deleted file mode 100644 index fad7d71ef9..0000000000 --- a/doc/sandbox/compilation.rst +++ /dev/null @@ -1,18 +0,0 @@ - -.. _compilation: - -======================= -Compilation and Linking -======================= - -.. index:: - single: Linker - -.. _linker: - -Linker -====== - -WRITEME - - diff --git a/doc/sandbox/debugging_with_stepmode.rst b/doc/sandbox/debugging_with_stepmode.rst deleted file mode 100644 index fba3a63e71..0000000000 --- a/doc/sandbox/debugging_with_stepmode.rst +++ /dev/null @@ -1,75 +0,0 @@ - -.. _sandbox_debugging_step_mode: - -Debugging with a customized so-called StepMode -============================================== - -One convenient trick I've found for debugging my programs that are running with pytensor is to -use what I call a 'StepMode'. There is no such StepMode in the standard library because the -purpose of it is to hack it to investigate what your own particular program is doing. - - -.. code-block:: python - - from pytensor.link import WrapLinkerMany - from pytensor.configdefaults import config - from pytensor.compile.mode import (Mode, register_mode, predefined_modes, predefined_linkers, - predefined_optimizers) - - class StepMode(Mode): - def __init__(self, linker=None, optimizer='default'): - if linker is None: - linker = config.linker - if optimizer is 'default': - optimizer = config.optimizer - def blah(i, node, th): - # This function will be run for each node in your compiled program. - # here you can inspect all the values as they are computed, - # ... you can even change them ! - - # 'i' is the execution position in the serialized graph - # node is the symbolic Apply instance - # th is a callable thing that will compute the node. - - print i, node, len(th.inputs) - - # the symbolic inputs of the node are in node.inputs - # the j'th non-symbolic input of the node is in th.inputs[j][0] - - th() # call the function to actually 'run' the graph - - # the symbolic outputs of the node are in node.outputs - # the j'th non-symbolic output of the node is in th.outputs[j][0] - - print type(th.outputs[0][0]) - - if i == 39: - print 'this node is weird...', th.outputs[0][0] - - - self.provided_linker = linker - self.provided_optimizer = optimizer - if isinstance(linker, basestring) or linker is None: - linker = predefined_linkers[linker] - - self.linker = WrapLinkerMany([linker], [blah]) - - if isinstance(optimizer, basestring) or optimizer is None: - optimizer = predefined_optimizers[optimizer] - self._optimizer = optimizer - - - -The way to use it is like this: - -.. code-block:: python - - fn = function(inputs, outputs, mode=StepMode()) - -When you call fn, your function in the stepmode will be called for each node in the compiled -program. You can print out some or all of the values, you can change them in mid-execution. -You can see where bizarre values are first occurring in your computations. It's a very -powerful way to understand your program's execution. - -Remember, if you give names your variables then printing nodes will give you a better idea of -where in the calculations you are. diff --git a/doc/sandbox/elemwise_compiler.rst b/doc/sandbox/elemwise_compiler.rst deleted file mode 100644 index 8c7825b7c4..0000000000 --- a/doc/sandbox/elemwise_compiler.rst +++ /dev/null @@ -1,86 +0,0 @@ -.. _sandbox_elemwise: - -========================== -:class:`Elemwise` compiler -========================== - -.. todo:: Stale specification page. Upgrade this to provide useful developer doc. 2008.09.04 - -Definitions -=========== - -The element-wise compiler takes inputs {{{(in0, in1, in2, ...)}}}, outputs {{{(out0, out1, out2, ...)}}}, broadcast modes {{{(mod0, mod1, mod2, ...)}}} where each mode corresponds to an output as well as {{{order}}} which determines if we broadcast/accumulate over the first or last dimensions (the looping order, basically, but some operations are only valid for one particular order!). - -The broadcast mode serves to calculate the rank of the corresponding output and how to map each input element to an output element: - - * {{{broadcast}}} - * output.rank = max(input.rank) - * the inputs of lesser rank are broadcasted over missing dimensions - * if {{{order == f}}} ([3, 5], [5]) => [3, 5] or ([7, 8, 9], [8, 9]) => [7, 8, 9] - * if {{{order == c}}} ([3, 5], [3]) => [3, 5] or ([7, 8, 9], [7, 8]) => [7, 8, 9] - * {{{(accumulate, Accumulator)}}} - * output.rank = min(input.rank) - * for the inputs of greater rank, we use Accumulator (sum, product, etc.) to accumulate over the first dimensions - - * e.g. {{{if Accumulator == sum, order == c, x.rank == 2, y.rank == 1 and z = f(x, y) then z[i] = f(sum_j(x[i, j]), y[i])}}} - - * if {{{order == f}}} ([3, 5], [5]) => [5] or ([7, 8, 9], [8, 9]) => [8, 9] - * if {{{order == c}}} ([3, 5], [3]) => [3] or ([7, 8, 9], [7, 8]) => [7, 8] - -{{{order == c}}} is equivalent to transposing the outputs of an {{{order == f}}} operation on transposed inputs. - -This does not cover all cases of broadcasting, but I believe they cover enough. Other cases of broadcasting can be emulated with proper transposition and/or slicing. - * Could you give some examples of what kinds of broadcasting are and are not covered by your proposed implementation? - - * For rank <= 2, I think only operations of the form {{{add(ones(3,1), ones(1,3)))}}} are missing. I actually didn't think of that one before now. - * In general, it only handles f(shape(head, ...), shape(head, ...), ...) and f(shape(..., tail), shape(..., tail), ...) - * Maybe I could add a general case later... the thing is that I think the ones I am considering here are easier to streamline. - -Point of clarification: the order discussed here corresponds to a set of broadcasting rules, and is independent from the storage order. The 'f' order corresponds to numpy's broadcasting rules, while the 'c' order is something new and different (TODO VERIFY!) - -Question: does it make sense to apply the order to the loop, or is this broadcast order something which will be local to each input argument. What happens when the elemwise compiler deals with more complex subgraphs with multiple inputs and outputs? - -The loop -======== - -Here is the loop for {{{order == c}}}. Check for errors! - -.. code-block:: cpp - - - - i1 = -1 - while (++i1 < dim1) { - i2 = -1 - rank_N-1_accumulator = init - while (++i2 < dim2) { - ... - iN = -1 - while (++iN < dimN) { - - - - } - ... - } - - - } - -When {{{order == f}}}, the iterators ''ideally'' (but not necessarily) iterate in FORTRAN order, i.e. the while loops are on {{{dimN..dim1}}} instead of {{{dim1..dimN}}}. - -{{{order}}} does __not__ represent the {{{C/F_CONTIGUOUS}}} flags of the inputs or outputs. Depending on combinations of those parameters, different loops will be used. If {{{order == f and C_CONTIGUOUS(array)}}}, for example, the loop will be on {{{dim1..dimN}}} and the matrices of lesser rank will need to be looped over several times. - -An rewrite should look at the operations in the graph and figure out whether to allocate C_CONTIGUOUS (ideal for {{{order == c}}}) or F_CONTIGUOUS (ideal for {{{order == f}}}) arrays. - -Gradient -======== - -The input ranks become the output ranks and gradients of the same rank as the outputs are added to the input list. If an output was given mode {{{broadcast}}}, then all inputs used to calculate it had to be broadcasted to that shape, so we must sum over the broadcasted dimensions on the gradient. The mode that we give to those inputs is therefore {{{(accumulate, sum)}}}. Inversely, if an output was given mode {{{(accumulate, sum)}}}, then all inputs used to calculate it had to be summed over those dimensions. Therefore, we give them mode {{{broadcast}}} in grad. Other accumulators than sum might prove more difficult. For example, the ith gradient for product is grad*product/x_i. Not sure how to handle that automatically. - * I don't exactly follow this paragraph, but I think I catch the general idea and it seems to me like it will work very well. - - * In a nutshell for {{{broadcast}}} I calculate the gradient as normal assuming the shape is broadcasted and then I sum over what I had to broadcast. - - * Could you explain why the accumulator gradient (e.g. product) can be trickier? - - * I thought about it and I figured that the general case is {{{g_accum[N-i+1], g_m[i] = grad_fn(accum[i-1], m[i], g_accum[N-i])}}} where {{{g_accum}}} is the accumulated gradient wrt the accumulator {{{accum}}}. It can be short-circuited in sum and product's case: for sum, grad_fn is the identity on its last argument so {{{g_m[i] == g_accum[i] == g_accum[0] == g_z for all i}}}. In product's case, {{{accum[i-1] == product(m[1:i-1]) and g_accum[N-i] == g_z * product(m[i+1:N])}}}, multiply them together and you obtain {{{g_z * product(m)/m[i]}}} where obviously we only need to compute {{{product(m)}}} once. It's worth handling those two special cases, for the general case I don't know. diff --git a/doc/sandbox/function.rst b/doc/sandbox/function.rst deleted file mode 100644 index f5a0a29f0d..0000000000 --- a/doc/sandbox/function.rst +++ /dev/null @@ -1,9 +0,0 @@ - -.. _function: - -================== -function interface -================== - -WRITEME - diff --git a/doc/sandbox/functional.rst b/doc/sandbox/functional.rst deleted file mode 100644 index 97d4d65b52..0000000000 --- a/doc/sandbox/functional.rst +++ /dev/null @@ -1,7 +0,0 @@ - -========== -Functional -========== - -Want to know about PyTensor's `function design -`? diff --git a/doc/sandbox/how_to_make_ops.rst b/doc/sandbox/how_to_make_ops.rst deleted file mode 100644 index 9fd92e0d04..0000000000 --- a/doc/sandbox/how_to_make_ops.rst +++ /dev/null @@ -1,295 +0,0 @@ -.. _how_to_make_ops: - -################# -How to Make Ops -################# - - -Parametrization -=============== - -An Op class can represent one or a wide variety of functions depending on how you choose to parametrize it. The parameters of an Op do not show up in the structure of the computation graph - they are local to the Op. [*What does the last sentence mean? What is its effect?*] When an Op's ``make_node`` function is called on an Op instance with a list of inputs, the computation that is performed depends on the type and value of those inputs and on the internal parameters of the Op. - -It is not always obvious what should be a parameter and what should be an input. For example, a generic indexing Op could take a list and an index as graph inputs, whereas a specific indexing Op could have an index parameter, so you could have a specialized Op instance to fetch the nth element of a list, where n is known statically. [*Could you give some advice about the relative tradeoffs of having something as a parameter and something as an input?*] - -Examples of parameterized Ops in pytensor: - ``Broadcast(, )`` - upgrades an op that works on scalars so it works on tensors. Can work inplace or not. - ``Reduce(, )`` - reduces the specified axes using the provided scalar op. - ``Add()`` - adds scalars and puts the variable in a scalar whose type is inferred from the input types using ``output_type_inferrer(*inputs)`` - ``Composite()`` - makes a single Op out of a graph of scalar operations. - -[*These examples are a little abstract. I'm not sure what are the inputs and what are the parameters. Maybe also give like something that has a random seed.*] - -Ideas: - ``MyOp()`` - prints debugging information in perform or the C implementation if debug is True. - ``MyOp()`` - always use the python implementation if allow C is False (raise an exception in c_code) - -``__eq__``, ``__ne__`` and ``__hash__`` ---------------------------------------------- - -In order for certain rewrites to apply (such as the merging of duplicate -calculations by `MergeOptimizer`), it is necessary for `Op`\s that do the same -thing to compare equal. If `Op` instances are generated by a function call -(for example) then it can happen that several different `Op` instances do the -same thing; in that case you will have to override `Op.__eq__`, `Op.__ne__`, and -`Op.__hash__` for the `MergeOptimizer` to recognize them as equal. - -Recall: the contract for any ``__hash__`` is that ``a == b`` implies ``hash(a) == hash(b)``. - -:meth:`Op.make_node` -==================== - -The :meth:`Op.make_node` method is expected to have the following signature: - -.. code-block:: python - - make_node(self, *inputs) - -``inputs`` may be a list of anything that the user wants to provide as symbolic -input (symbolic: standing for the actual values that will be passed when the -graph is compiled into an executable function). [*The PyTensor intro should -describe symbolic in greater depth, and we should link to that from here.*] This -may or may not include Variable instances (but if you want the inputs of this Op -to sometimes be outputs of another Op, then the inputs should be Variable -instances). [*What else could they be? Constant, Values, ...*] The return value -should be an instance of [GraphStructures Apply] (see the example below). Here -are the tasks typically handled in ``make_node``. - - * Check that the inputs are valid (type checking, etc.). [*Since we don't actually have values, what can we do besides type checking?*] - * If needed, wrap the inputs in Variable instances with the proper type. - * Make the Variable instances that will serve as the outputs of the node. - * ``return Apply(self, , )`` - -The ``inputs`` and ``outputs`` arguments to ``Apply`` must be lists of -`Variable` instances (or instances of subclasses of ``Variable``). The inputs -given to `Apply` do not have to be the same as the inputs passed to -`make_node`, but it is recommended that the order corresponds. [*why?*] The -behavior of `make_node` should not depend on the structure of the graph of -[*or?*] its inputs: it may look at the type and type fields of its inputs, but -not at their owner field, because modifications to the graph structure do not -use `make_node`. - -Example: - -.. code-block:: python - - from pytensor.scalar import * - - class Add(Op): - #... - def make_node(self, x, y): - # note 1: constant, int64 and ScalarType are defined in pytensor.scalar - # note 2: constant(x) is equivalent to Constant(type=int64, data=x) - # note 3: the call int64() is equivalent to Variable(type=int64, None) or Variable(type=ScalarType(dtype = 'int64'), None) - if isinstance(x, int): - x = constant(x) - elif not isinstance(x, Variable) or not x.type == int64: - raise TypeError("expected an int64 ScalarType") - if isinstance(y, int): - y = constant(y) - elif not isinstance(y, Variable) or not x.type == int64: - raise TypeError("expected an int64 ScalarType") - inputs = [x, y] - outputs = [int64()] - node = Apply(op = self, inputs = inputs, outputs = outputs) - return node - #... - - add = Add() # I make an instance of Add - node1 = add.make_node(int64(), int64()) # I make a node with two Variable inputs - node2 = add.make_node(1, 2) # this works too - node3 = add.make_node(int64(), 79) # this works three - node4 = add.make_node(float64(), int64()) # this raises a TypeError - -[*What type is an instance of Add? It's an Apply? But that's not a Variable, and cannot be used as input for another Op.*] - -Two Apply nodes ``node1`` and ``node2`` are *assumed* by the compiler to represent the same behavior if: - 1. ``node1.op == node2.op`` - 1. ``all(input1.type == input2.type for input1, input2 in zip(node1.inputs, node2.inputs))`` - 1. ``all(output1.type == output2.type for output1, output2 in zip(node1.outputs, node2.outputs))`` - -It is considered an *error* to have conditions 1 and 2 but not condition 3. A corollary to those conditions is that repeated calls to ``make_node`` with the same inputs should produce equivalent nodes. - -``__call__`` ----------------- - -In ``Op``, ``__call__`` is defined in terms of ``make_node``. Instead of returning a node, it returns the output Variables directly, which is practical from a UI standpoint. Here is pseudocode: - -.. code-block:: python - - if len(outputs) is 1: - __call__(*inputs) <=> make_node(*inputs).outputs[0] - else: - __call__(*inputs) <=> make_node(*inputs).outputs - -It is not necessary or recommended to override ``__call__`` unless you want to hide some outputs from view (see hidden outputs section). - -perform -======= - -The ``perform`` method is expected to have the following signature: - -`` -perform(self, node, inputs, output_storage) -`` - -Where: - * *node*: a pointer to an Apply instance - ``node`` is assumed to be produced by a previous call to ``self.make_node``. - * *inputs*: *not* the same as ``node.inputs`` - it is a list of values. [*i.e. actually data, not just symbolic stuff?*] - * *output_storage*: *not* the same as ``node.outputs`` - it is a list of lists of length 1 where the variables of the computation must be put. - -[*Can you explain better how inputs is not node.inputs and output_storage is not node.outputs?*] - -[*Would it be better to call inputs as 'inputs_storage'?*] - -Here is an example of a properly defined ``perform``: - -.. code-block:: python - - class Add(Op): - ... - def perform(self, node, inputs, output_storage): - # this does z = x + y - x, y = inputs # extract the two inputs - z, = output_storage # extract the one storage (the comma after z is not optional) - z[0] = x + y # we must put the variable in z[0] - ... - - add = Add() # I make an instance of Add - node = add.make_node(int64(), int64()) # I make a node with two integer inputs - storage = [None] # I make my storage as a 1-element list with None - add.perform(node, (3, 7), (storage, )) # I provide the node, two inputs and storage for one output - print storage[0] # prints 10 - -[*Why is node never used in the perform function? Why is self never used?*] - -[*What does the comma after z do? Why is it not optional?*] - -The ``node`` parameter is not always needed, but might come in handy sometimes [*when?*]. There are as many entries in ``output_storage`` as there are in ``node.outputs`` and each entry is a list of length 1. The outputs must be computed from the inputs and put in those lists. The lists in ``output_storage`` must not be resized - the only allowed operation is to set or read their first element. [*Since these instructions correspond to more general principles, could you state the principles of the contract more generally and put it __above__ the example?*] - -reusing outputs ---------------- - -The output storage in ``output_storage`` might not be empty. In fact, whatever the op allocates to store the computation and puts in the storage *might* still be there the second time around. [*huh?*] This is an intended feature and it is acceptable for ``perform`` to *reuse* what is in the output storage if it is worth it. For example, if ``perform`` must add two ``1000x1000`` matrices into a new matrix of the same size and that there is already a ``1000x1000`` matrix in the corresponding output storage, it may reuse it and thus save a lot in memory and allocation time. It may also freely discard what is already there. - -Note that it is not *guaranteed* that the outputs will stick around. Indeed, the linker may, at its discretion, clean them up. It is not guaranteed either (though it will usually be the case) that the contents of the output storage was allocated by a previous call to ``perform``. It *is* however guaranteed that the contents are either ``None`` or a structure of the proper type which it can use. - -If the contents of the storage are ``None``, *new* storage is expected for that output (typical case is that we "gave" the output to the user so we don't own it anymore). Therefore, it is not acceptable to have a private cache of previously allocated storage unless you know what you are doing. - -Advanced note: for an Op with multiple outputs, it is possible that some of them can be reused and some others not. If an Op with multiple outputs shares storage between them, e.g. the first output is a view of the second, if the first output is reset to ``None``, the second should *not* be reused, even if it's available, because a fresh output is expected for the first. It is not recommended in general to share storage between outputs unless one of them is hidden (see hidden outputs section), because the engine does not know how to handle that situation safely. - -grad -==== - -``grad`` is an PyTensor-specific [*as opposed to?*] function - it does not interface with core rewrite and compilation facilities, but it provides a useful interface to differentiation. Its expected signature is: - -.. code-block:: python - - grad(self, inputs, output_gradients) - - -where: - * ``inputs`` is a list of Variable instances. It is assumed to be the ``inputs`` field of a node produced by ``make_node``. - * ``output_gradients`` is a list of Variable instances. They have the same properties as the outputs of the node, but are filled with gradient values. - -Essentially, the semantics are: - -.. code-block:: python - - # Not completely sure about this, James should doublecheck -jpt and ob - def grad(self, (x, ), (gz, )): - return [gz * (dz/dx)] - def grad(self, (x, y), (gz, )): - return gz*(dz/dx), gz*(dz/dy) - def grad(self, (x, y), (gz, gw)): - # In this situation you want two return values that have the shape of x and y respectively - return gz*dz/dx + gw*dw/dx, gz*dz/dy + gw*dw/dy - -More specifically, -``grad`` must return a list or tuple of input gradients, as many as there are inputs. Let C be a Variable (currently assumed to be a scalar) that depends through an PyTensor symbolic expression on the node outputs. Then each output_gradients[i] represents symbolically dC/doutputs[i]. The returned input gradients should represent symbolically dC/dinputs[i]. - -Example: - -.. code-block:: python - - class Mul(Op): - ... - def grad(self, inputs, output_gradients): - x, y = inputs - gz, = output_gradients # here again, the comma is not optional - return mul(gz, y), mul(gz, x) - ... - mul = Mul() - -If the op is not differentiable wrt one of its inputs, the gradient for that input should be ``None``; if the op is not differentiable with respect to any of its inputs, it should return something equivalent to -``[None] * len(inputs)``. If ``grad`` is not implemented for any op in a graph, then the symbolic gradient engine will complain (with an attribute exception). - - - -If the op only has one input, be careful to still return a list or tuple: - * fine: ``return gx,`` - * fine: ``return [gx]`` - * not fine: ``return gx`` - -The [http://www.iro.umontreal.ca/~pift6266/A06/cours/gradient.pdf principle] behide this is explaned in section 2. - -Destroyers and viewers -====================== - -Destroyers ----------- - -An Op may change the contents of its inputs. For example, ``z = add_inplace(x, y)`` will increment ``x`` with ``y``, erasing the previous contents of ``x``. ``z`` represents ``x`` after it was incremented. However, the engine needs to be told about all this so it can guarantee that ``add_inplace`` will only be executed as soon as we don't need ``x`` anywhere else. - -This is done by setting the ``destroy_map`` field of the op. ``destroy_map`` must be a dictionary which associates an output index or ``None`` to a list of input indices that are destroyed by that output. For example, ``add_inplace.destroy_map == {0: [0]``} because the first input is overwritten by the first output. If it was ``y`` that was overwritten, then ``destroy_map`` would be ``{0: [1]``}, because the second input is overwritten by the first output. In a nutshell, to each output must correspond the list of inputs that were changed and share storage with that output. Use ``None`` if the inputs were only destroyed to do temporary calculations, etc. and are not reused as the output storage. - -Viewers -------- - -Similarly, an Op might not modify the inputs, but return an output which shares state with one or several of its inputs. For example, ``transpose`` can be done efficiently by viewing the same data as the original with modified dimensions and strides. That is fine, but the compiler needs to be told. - -This is done by setting the ``view_map`` field of the op. It works like the ``destroy_map`` field: to an output index is associated the list of inputs that it shares state with. For example, ``transpose.view_map == {0: [0]``} because its first output uses the same data as its first input. ``view_map`` is conservative: if there is any probability that an output will be the view of an input, that input must be in the view list of that output. - -Important note: currently, an output can only be the view of one input. This is limiting, as an 'if' or 'switch' op would need to declare its output as a view of both its then and else branches, but for the time being the framework is not powerful enough to handle it. A future version should address this issue. - -Hidden outputs (as a form of op state) -====================================== - -For performance purposes, an ``op`` might want to have a hidden internal state. - -Example: if we expect to call the op repeatedly on incrementally bigger inputs, we might want private output storage that's a lot bigger than needed and take incrementally bigger views on it, to save allocation overhead. In order to do this, we can have two outputs: one that we will return normally and will contain the answer and the other that will be the (larger) container. In this case, the advanced note in the 'reusing outputs' section applies. Furthermore, ``__call__`` should be overridden to only return the first output instead of both of them. Here is what the example's ``perform`` and ``__call__`` would look like: - -.. code-block:: python - - class Add(Op): - """ - Use a hidden buffer to prevent unnecessary reallocation of memory. - """ - default_output = 0 - def make_node(self, x, y): - return Apply(self, [x,y], [x.type.make_variable(), x.type.make_variable()]) - - def perform(self, node, (x, y), (z, stor)): - if z[0] is None or stor[0] is None: - stor[0] = numpy.ndarray(x.size * 2) - else: - if x.size > stor[0].size: - stor[0].resize(x.size * 2, refcheck = 0) - z[0] = stor[0][:x.size] - numpy.add(x, y, z[0]) - ... - -Another example: for a FFTW Op, we would like to cache FFTW's plan along -with the inputs it was computed on, so we can reuse it if the inputs -are similar to the previous ones. - -It is also possible but potentially more complicated to use "private -inputs" to do the same thing: inputs cannot be set, though their contents -can be modified, so a wrapper would be needed and the input must be -marked as 'destroyed' by the Op using the 'destroy_map' field. diff --git a/doc/sandbox/index.rst b/doc/sandbox/index.rst deleted file mode 100644 index afbff2cb5e..0000000000 --- a/doc/sandbox/index.rst +++ /dev/null @@ -1,11 +0,0 @@ -:orphan: - -========================================================= -Sandbox, this documentation may or may not be out-of-date -========================================================= - -.. toctree:: - :glob: - - * - diff --git a/doc/sandbox/index2.rst b/doc/sandbox/index2.rst deleted file mode 100644 index 8b1c02b948..0000000000 --- a/doc/sandbox/index2.rst +++ /dev/null @@ -1,15 +0,0 @@ - -.. _advanced: - -==================================== -Advanced Topics (under construction) -==================================== - -.. toctree:: - :maxdepth: 2 - - compilation - ccodegen - function - debugging_with_stepmode - diff --git a/doc/sandbox/interactive_debugger.rst b/doc/sandbox/interactive_debugger.rst deleted file mode 100644 index c72fd3f206..0000000000 --- a/doc/sandbox/interactive_debugger.rst +++ /dev/null @@ -1,56 +0,0 @@ -==================== -Interactive Debugger -==================== - -'''Seed of discussion for what an interactive debugging tool might look like. 2009.03.27.''' - -== Interactive debugger ( #352 ) == - -The interactive debugger should allow the user to go step by step in a graph to debug it. It should allow setting breakpoints on arbitrary Ops or subgraphs. If we can group ops by the user's function that defined them, we could have a logical grouping of the graph into subgraphs. - -The debugger should save the inputs at each step so the user loses no info through inplace operations. Ideally, the debugger should be a normal python shell enriched with commands to control the flow and all the inputs should be made available so the user can use numpy interactively on them. - -Command wishlist - * py_perform (perform the current operation using the python implementation) - * c_perform (perform the current operation using the C implementation) - * perform (use the Linker's preference) - * get_inputs (get the inputs of the current op) - * set_inputs (set the inputs of the current op) - * get_outputs (get the outputs of the current op) - * set_outputs (set the outputs of the current op (bypasses its perform)) - * next (perform and go to the next breakpoint) - * breakpoint (set a breakpoint on the current Op or subgraph) - * step (perform and go to the next Op or subgraph) - * step_in (go to the first Op inside the current subgraph) - * step_out (exit the subgraph containing this Op) - * Of course, normal python shell functionality! - * The global context where the debugger was called (so the user can define his own helper functions, etc.) - -A good, simple way to do it would be to have those commands as methods of a structure that would be returned by a DebugLinker. This would allow an interactive session like the following: - -{{{ ->>> a, b, c = Tensor(), Tensor(), Tensor() ->>> d = b * c ->>> e = a + d ->>> debug = make_function(DebugLinker(FunctionGraph([a, b, c], [e]))) ->>> debug.set_breakpoint(d) ->>> debug.debug(10, 20, 30) # a, b, c = 10, 20, 30 -Now at: Mul(b, c) -Context: d = b * c ->>> debug.get_inputs() # we are at the node d = b * c -[20, 30] ->>> debug.get_outputs() -[None] ->>> debug.py_perform() ->>> debug.get_outputs() -[600] ->>> debug.step() -Now at: Add(a, Mul) -Context: e = a + d ->>> debug.get_inputs() -[30, 600] ->>> debug.step() -Finished. -[630] ->>> -}}} diff --git a/doc/sandbox/logistic_regression_example.rst b/doc/sandbox/logistic_regression_example.rst deleted file mode 100644 index 1631dcce1e..0000000000 --- a/doc/sandbox/logistic_regression_example.rst +++ /dev/null @@ -1,77 +0,0 @@ -.. _logistic_regression_example: - -State example -============= - -In this example, we'll look at a complete logistic regression model, with -training by gradient descent. - -BUT, YOU GOTTA RUN THIS CODE AND MAKE SURE IT STILL WORKS NICELY, HEY? - -.. code-block:: python - - def build_logistic_regression_model(n_in, n_out, l2_coef=30.0) - # DECLARE SOME VARIABLES - - import pytensor.tensor as pt - - x = pt.matrix() #our points, one point per row - y = pt.matrix() #store our labels as place codes (label 3 of 5 is vector [00100]) - - w = pt.matrix() #the linear transform to apply to our input points - b = pt.vector() #a vector of biases, which make our transform affine instead of linear - - stepsize = pt.scalar('stepsize') # a stepsize for gradient descent - - # REGRESSION MODEL AND COSTS TO MINIMIZE - - prediction = pt.softmax(pt.dot(x, w) + b) - cross_entropy = pt.sum(y * pt.log(prediction), axis=1) - cost = pt.sum(cross_entropy) + l2_coef * pt.sum(pt.sum(w*w)) - - # GET THE GRADIENTS NECESSARY TO FIT OUR PARAMETERS - - grad_w, grad_b = pt.grad(cost, [w, b]) - - # - # GET THE GRADIENTS NECESSARY TO FIT OUR PARAMETERS - - update_fn = pytensor.function( - inputs = [x, y, stepsize, - In(w, - name='w', - value=numpy.zeros((n_in, n_out)), - update=w - stepsize * grad_w, - mutable=True, - strict=True) - In(b, - name='b', - value=numpy.zeros(n_out), - update=b - lr * grad_b, - mutable=True, - strict=True) - ], - outputs = cost, - mode = 'EXPENSIVE_OPTIMIZATIONS') - - apply_fn = pytensor.function( - inputs = [x, In(w, value=update_fn.storage[w]), In(b, value=update_fn.storage[b])], - outputs = [prediction]) - - return update_fn, apply_fn - - #USUALLY THIS WOULD BE IN A DIFFERENT FUNCTION/CLASS - #FIT SOME DUMMY DATA: 100 points with 10 attributes and 3 potential labels - - up_fn, app_fn = build_logistic_regression_model(n_in=10, n_out=3, l2_coef=30.0) - - x_data = numpy.random.standard_normal((100, 10)) - y_data = numpy.random.standard_normal((100, 3)) - y_data = _asarray(y_data == numpy.max(y_data, axis=1), dtype='int64') - - print "Model Training ..." - for iteration in range(1000): - print " iter", iteration, "cost", update_fn(x_data, y_data, stepsize=0.0001) - - print "Model Predictions" - print apply_fn(x_data) diff --git a/doc/sandbox/performance.rst b/doc/sandbox/performance.rst deleted file mode 100644 index a62b00b345..0000000000 --- a/doc/sandbox/performance.rst +++ /dev/null @@ -1,23 +0,0 @@ - -=========== -Performance -=========== - -PyTensor uses several tricks to obtain good performance: - * common sub-expression elimination - * [custom generated] C code for many operations - * pre-allocation of temporary storage - * loop fusion (which gcc normally can't do) - -On my neural net experiments for my course projects, I was getting around 10x -speed improvements over basic numpy by using pytensor. -[More specific speed tests would be nice.] - - -With a little work, PyTensor could also implement more sophisticated -rewrites: - - * automatic ordering of matrix multiplications - * profile-based memory layout decisions (e.g. row-major vs. col-major) - * gcc intrinsics to use MMX, SSE2 parallelism for faster element-wise arithmetic - * conditional expressions diff --git a/doc/sandbox/randomnumbers.rst b/doc/sandbox/randomnumbers.rst deleted file mode 100644 index fcdded1c2b..0000000000 --- a/doc/sandbox/randomnumbers.rst +++ /dev/null @@ -1,245 +0,0 @@ -.. _sandbox_randnb: - -============== -Random Numbers -============== - -''' This has been implemented (#182). 20090327.''' - -= Random Numbers = - -== Requirements == - - -PyTensor functions sometimes need random numbers. -Random operations are not as simple as other operations such as ones_like, or pow(), because the output must be different when we call the same function repeatedly. CompileFunction's new default-valued, updatable input variables make this possible. At the same time we need random streams to be repeatable, and easy to work with. So the basic requirements of our random number mechanism are: - - 1. Internal random number generators must be used in a clear manner, and be accessible to the caller after a function has been compiled. - 1. A random-number-producing Op (from now on: {{{RandomOp}}}) should generally produce exactly the same stream of random numbers regardless of any other {{{RandomOp}}} instances in its own graph, and any other times the graph was compiled. - 1. A {{{RandomOp}}}'s stream should be isolated from other {{{RandomOp}}} instances in a compiled graph, so that it is possible to adjust any one {{{RandomOp}}} independently from the others. - 1. It should be easy to put the {{{RandomOp}}}s in a graph into a state from which their outputs are all independent. - 1. It should be easy to save the current state of the {{{RandomOp}}}s in a graph. - 1. It should be easy to re-instate a previous state of the {{{RandomOp}}}s in a graph. - -== Basic Technical Spec == - -One option would be to skirt the issue by requiring users to pass all the random numbers we might need as input. -However, it is not always simple to know how many random numbers will be required because the shape of a random matrix might be computed within the graph. -The solution proposed here is to pass one or more random number generators as input to {{{pytensor.function}}}. - -Sharing a random number generator between different {{{RandomOp}}} instances makes it difficult to producing the same stream regardless of other ops in graph, and to keep {{{RandomOps}}} isolated. -Therefore, each {{{RandomOp}}} instance in a graph will have its very own random number generator. -That random number generator is an input to the function. -In typical usage, we will use the new features of function inputs ({{{value}}}, {{{update}}}) to pass and update the rng for each {{{RandomOp}}}. -By passing RNGs as inputs, it is possible to use the normal methods of accessing function inputs to access each {{{RandomOp}}}'s rng. -In this approach it there is no pre-existing mechanism to work with the combined random number state of an entire graph. -So the proposal is to provide the missing functionality (the last three requirements) via auxiliary functions: {{{seed, getstate, setstate}}}. - -== Syntax == - -.. code-block:: python - - #!python - # create a random generator, providing a default seed to condition how RandomOp instances are produced. - from pytensor.compile.function import function - - - r = MetaRandom(metaseed=872364) - - # create a different random generator - rr = MetaRandom(metaseed=99) - - # create an Op to produce a stream of random numbers. - # This generates random numbers uniformly between 0.0 and 1.0 excluded - # u will remember that it was made from r. - u = r.uniform(shape=(3,4,5), low=0.0, high=1.0) - - # create a second Op for more random numbers - # v will remember that it was made from r. - v = r.uniform(shape=(8,), low=-1.0, high=0.0) - - # create a third Op with a different underlying random state - # w will remember that it was made from rr. - w = rr.uniform(shape=(), low=-10., high=10.) - - # compile a function to draw random numbers - # note: un-named state inputs will be added automatically. - # note: it is not necessary to draw samples for u, even though - # u was created by r before v. - fn_v = function([], [v]) - - # this prints some representation of v's rng in fn_v. - # The .rng property works for Result instances produced by MetaRandom. - print fn_v.state[v.rng] - - # compile a function to draw each of u, v, w - # note: un-named state inputs will be added automatically - # note: This function (especially its internal state) is independent from fn_v. - fn_uvw = function([], [u,v,w]) - - # N.B. The random number streams of fn_v and fn_uvw are independent. - assert fn_v.state[v.rng] != fn_uvw.state[v.rng] - - fn_v() # returns random numbers A (according to metaseed 872364) - fn_v() # returns different random numbers B - - # note that v's stream here is identical to the one in fn_v() - fn_uvw() # returns random numbers C, A, E - - #explicitly re-seed v's random stream in fn_v - r.seed(fn_v, 872364) - fn_v() # returns random numbers A (as above) - fn_v() # returns random numbers B (as above) - - #re-seed w's random stream in fn_uvw, but not u's or v's - rr.seed(fn_uvw, 99) - fn_uvw() # returns random numbers D, B, E - - -== {{{MetaRandom}}} == - -The {{{MetaRandom}}} class is the proposed interface for getting {{{RandomOp}}} instances. -There are some syntactic similarities in the way {{{MetaRandom}}} is used to construct graphs, and the way {{{numpy.RandomState}}} appears in a corresponding procedural implementation. But since pytensor is symbolic the meaning of {{{MetaRandom}}} is quite different. - -As with {{{numpy.RandomState}}} though, a global instance of {{{MetaRandom}}} will be instantiated at import time for the scripter's convenience. - -A {{{MetaRandom}}} instance will remember every {{{Result}}} that it returns during its lifetime. -When calling functions like {{{seed, setstate}}}, this list is consulted so that only the streams associated with Results returned by {{{self}}} are modified. -The use of multiple {{{MetaRandom}}} objects in a single function is mostly for debugging (e.g., when you want to synchronize two sets of random number streams). - -The typical case is that only one (global) {{{MetaRandom}}} object is used to produce all the random streams in a function, so seeding (once) will reset the entire function. - -.. code-block:: python - - class MetaRandom(obj): - def __init__(self, metaseed=): ... # new functions will be initialized so that seed(fn, ) has no effect on output. - - def __contains__(self, Result): ... # True if Result was returned by a call to self. - def results(self): ... # Iterate over returned Result instances in creation order. - - def seed(self, fn, bits): ... # See below. - def getstate(self, fn): ... # See below. - def setstate(self, fn, state): ... # See below. - - def uniform(...): ... # return a Result of an Apply of a RandomOp. - # The return value is also stored internally for __contains__ and results(). - def normal(...): ... - def bernoulli(...): ... - ... - - -=== {{{MetaRandom.getstate}}} === - -.. code-block:: python - - def getstate(self, fn): ... - - ''return'':: - list, set, dict, instance... something to store the random number generators associated with every one of {{{self}}}'s members in {{{fn}}} - -=== {{{MetaRandom.setstate}}} === - -Re-install the random number generators in {{{rstates}}} to the {{{randomobj}}} members in {{{fn}} - -.. code-block:: python - - def setstate(self, fn, rstates): .... - - ''fn:: - a CompileFunction instance, generally with some Apply instances inside that are members of {{{self}}}. - ''rstates'':: - a structure returned by a previous call to {{{getstate}}} - ''return'':: - nothing - - -=== {{{MetaRandom.seed}}} === - -.. code-block:: python - - def seed(self, fn, bits): .... - - ''fn:: - a CompileFunction instance, generally with some Apply instances inside that are members of {{{self}}}. - ''bits'':: - Something to use as a seed. Typically an integer or list of integers. - ''return'':: - None - -Set the states of self's members in fn in a deterministic way based on bits. -Each member of self should generate independent samples after this call. - -Seed is like a dynamically-computed setstate. If the user runs -.. code-block:: python - - r.seed(fn, 99) - state_99 = r.getstate(fn) - -then any time afterward both {{{r.setstate(fn, state_99)}}} and {{{r.seed(fn, 99)}}} will put {{{fn}}} into the same state. - - - -= Potential Other syntax = - - -.. code-block:: python - - #!python - # create a random state - from pytensor.compile.function import function - - - r = RandomState(name = 'r') - - # create a different random state - rr = RandomState(name = 'rr') - - # create an Op to produce a stream of random numbers. - # That stream is a function of r's seed. - # This generates random numbers uniformly between 0.0 and 1.0 excluded - u = r.uniform(shape=(3,4,5), 0.0, 1.0) - - # create a second Op for more random numbers - # This stream is seeded using a different function of r's seed. - # u and v should be independent - v = r.uniform(shape=(8,), -1.0, 0.0) - - # create a third Op with a different underlying random state - w = rr.uniform(shape=(), -10., 10.) - - # compile a function to draw random numbers - # note: it is not necessary to draw samples for u. - # we provide the seed for the RandomState r in the inputs list as a "Type 4" input - fn_v = function([(r, 872364)], [v]) - - # compile a function to draw each of u, v, w - # we provide the seeds for the RandomStates r and rr in the inputs list as "Type 4" inputs - # note: the random state for r here is seeded independently from the one in fn_v, which means - # random number generation of fn_v and fn_uvw will not interfere. Since the seed is the - # same, it means they will produce the same sequence of tensors for the output v. - fn_uvw = function([(r, 872364), (rr, 99)], [u,v,w]) - - - fn_v() # returns random numbers A - fn_v() # returns different random numbers B - - # note that v's stream here is identical to the one in fn_v() - fn_uvw() # returns random numbers C, A, E - - #re-seed v's random stream in fn - fn_v.r = 872364 - - ### Is this state readable? What should we do here: - print fn_v.r - - fn() # returns random numbers A - - ### Is this state well-defined? - ### Does there even exist a number such that fn_v.r = N would have no effect on the rng states? - print fn_v.r - - fn() # returns random numbers B - - #re-seed w's random stream, but not u's or v's - fn_uvw.rr = 99 - fn_uvw() # returns random numbers D, B, E diff --git a/doc/sandbox/rethinkccodegen.rst b/doc/sandbox/rethinkccodegen.rst deleted file mode 100644 index 462f424452..0000000000 --- a/doc/sandbox/rethinkccodegen.rst +++ /dev/null @@ -1,124 +0,0 @@ -'''An open proposal. This is still relevant. 20080904''' - -====================== -New C code generation? -====================== - -Issues -====== - -There are several issues with the current way C code is generated: - * Ops cannot declare their own persistent variables. - * Reliance on weave, but most of weave's features go unused. - * There could easily be conflicts between support code from different Ops/Results. - * It is currently impossible to specialize support code based on the self. - * Caching of the generated code for graphs is greatly suboptimal. - -Structure -========= - -Currently, the general structure of the generated C code is approximately as follows: - -.. code-block:: c - - - - - - struct my_computation { - - - init() { } - cleanup { } - run { } - }; - - - PyObject* instantiate(PyObject* args) { - - - - } - - -The module produced via that method then has to be used as such:: - - obj = module.instantiate(error_storage, input_storage, output_storage, orphan_storage) - cutils.run_cthunk(obj) - - -We would like to get rid of weave dependencies, avoid name conflicts with the support code and have a nicer user interface for the produced module. The proposed new structure is as follows: - -.. code-block:: c - - - - struct op1 { - - - init() { } - cleanup { } - run() { } - }; - - struct op2 { }; - ... - struct opN { }; - - struct driver { - op1 o1; op2 o2; ... opN oN; - - - init() { } - cleanup() { } - run() { - - o1.run(input1, input2); - o2.run(o1.output1); - ... - oN.run(...); - - } - } - - PyObject* (PyObject* inputs) { - - - driver.run() - - - } - - PyObject* _driver(PyObject* storage) { - - - } - - and _driver> - -Gains: - * support code can be put inside a struct and become private to the Op - * we can export several functions that can be used directly, eg ``z = module.add(1, 2)`` - * this won't do filtering like ``Result.filter`` so the usefulness is limited by that - * the sequence of operations might be clearer to read - * we can use more descriptive names in each Op struct representing its input names (if we can find them using the inspect module) without worrying about name conflicts - -Losses: - * maybe gcc can't optimize it as well? - * make functions static and inline as much as possible - - -Caching -======= - -The current way of caching is from a hash of the generated code. That is inefficient because code has to be generated each time, which might be a costly process. Furthermore, usage of hashing in sets make it difficult to ensure a consistent ordering of Ops in graphs where several orderings are valid, so the generated C code is potentially different each time. Here is a proposal for a better way to compute the hash: - * Result_hash = Result version + Result desc - * Op_hash = Op version + Op desc + input/output hashes - * FunctionGraph_hash = FunctionGraph version + combination of the Op hashes and their traversal order wrt a consistent traversal method - -The version could be set explicitly via a ``__version__`` field or it could simply be equal to the file's last modification date. We could also have a ``__nocache__`` field indicating that code produced by the Op or Result cannot be cached. - -It should also be easier to bypass the cache (eg an option to CLinker to regenerate the code). - - - diff --git a/doc/sandbox/sandbox.rst b/doc/sandbox/sandbox.rst deleted file mode 100644 index 4ab3e78182..0000000000 --- a/doc/sandbox/sandbox.rst +++ /dev/null @@ -1,161 +0,0 @@ -Basically, this file contains stuff that should be documented, but is not. - -Feel free to contribute things that you want documented, as well as to add -or correct documentation. - - -====================================== -How do you define the grad function? -====================================== - -Let's talk about defining the :meth:`Op.grad` function in an :class:`Op`, using an -illustrative example. - -In Poisson regression (Ranzato and Szummer, 2008), the target *t* is -integer data, which we predict using a continuous output *o*. -In the negative log likelihood of the Poisson regressor, there is a term: - -.. math:: - - \log(t!) - -Let's say we write a logfactorial :class:`Op`. We then compute the gradient - -You should define gradient, even if it is undefined. -[give log factorial example] - -If an :class:`Op` does not define ``grad``, but this :class:`Op` does not appear in the path when -you compute the gradient, then there is no problem. - -If an :class:`Op` does not define ``grad``, and this :class:`Op` *does* appear in the path when -you compute the gradient, **WRITEME**. - -Gradients for a particular variable can be one of four kinds: -1) forgot to implement it - -You will get an exception of the following form:: - - pytensor.graph.utils.MethodNotDefined: ('grad', , 'LogFactorial') - -2) a symbolic variable -3) None / zero -4) undefined mathematically - -currently, there is no way for a ``grad()`` method to distinguish between cases 3 -and 4 -but the distinction is important because graphs with type-3 gradients are ok -to run, whereas graphs with type-4 gradients are not. -so I suggested that Joseph return a type-4 gradient by defining an :class:`Op` with no -perform method. -the idea would be that this would suit the graph-construction phase, but would -prevent linking. -how does that sound to you? - -**This documentation is useful when we show users how to write :class:`Op`\s.** - -====================================== -What is staticmethod, st_impl? -====================================== - -``st_impl`` is an optional method in an :class:`Op`. -``@staticmethod`` is a Python decorator for a class method that does not -implicitly take the class instance as a first argument. Hence, st_impl -can be used for :class:`Op` implementations when no information from the :class:`Op` -instance is needed. This can be useful for testing an implementation. -See the ``XlogX`` class below for an example. - -**This documentation is useful when we show users how to write :class:`Op`\s. -Olivier says this behavior should be discouraged but I feel that st_impl -should be encouraged where possible.** - -============================================================ -how do we write scalar ops and upgrade them to tensor ops? -============================================================ - -**Olivier says that** :class:`~pytensor.tensor.xlogx.XlogX` **gives a good example. In fact, I would -like to beef xlogx up into our running example for demonstrating how to -write an :class:`Op`:** - -.. code-block:: python - - class XlogX(scalar.UnaryScalarOp): - """ - Compute X * log(X), with special case 0 log(0) = 0. - """ - @staticmethod - def st_impl(x): - if x == 0.0: - return 0.0 - return x * numpy.log(x) - def impl(self, x): - return XlogX.st_impl(x) - def grad(self, inp, grads): - x, = inp - gz, = grads - return [gz * (1 + scalar.log(x))] - def c_code(self, node, name, inp, out, sub): - x, = inp - z, = out - if node.inputs[0].type in [scalar.float32, scalar.float64]: - return """%(z)s = - %(x)s == 0.0 - ? 0.0 - : %(x)s * log(%(x)s);""" % locals() - raise NotImplementedError('only floatingpoint is implemented') - scalar_xlogx = XlogX(scalar.upgrade_to_float, name='scalar_xlogx') - xlogx = pytensor.tensor.elemwise.Elemwise(scalar_xlogx, name='xlogx') - -**It is also necessary to talk about UnaryScalarOp vs. BinaryOp.** - -UnaryScalarOp is the same as scalar.ScalarOp with member variable nin=1. -**give an example of this** - -======================================================= -How to use the `PrintOp` -======================================================= - -** This is also useful in the How to write an :class:`Op` tutorial. ** - -======================================================= -Mammouth -======================================================= - -**This is internal documentation. Guillaume can you make sure to hit these points:** - -export PYTENSOR_BLAS_LDFLAGS='-lmkl -liomp5 -fopenmp' - -**Do we want the following:** - -export OMP_NUM_THREADS=2 - -======================================================= -Type checking -======================================================= - - * Are there functions for doing type checking? - like dtype of this matrix is an int-type (not just int32 - or int64) - "if isinstance(item, int):" is the preferred way to do it in - python now, so mimic this - If the type is wrong, what exception should be raised? - -====================================== -More simple numpy stuff -====================================== - - * If we have a matrix with only one row, how do we convert it to a vector? - ``x.reshape(x.size)`` - You can also use ``resize`` but there is not reason to ''resize'' - * How do you convert the type of a numpy array? - ``pytensor._asarray(x, dtype = 'int32')`` - Note that using ``numpy.asarray`` is potentially dangerous, due to - a problem in numpy where the type may not be properly set (see - numpy's Track ticket #870). - - -========================================= -How to reuse (overwrite) a storage tensor -========================================= - -``pytensor.compile.io.Out(gw1, borrow = True)`` for that value in -``pytensor.compile.function.function`` diff --git a/doc/sandbox/software.rst b/doc/sandbox/software.rst deleted file mode 100644 index 12ccc68108..0000000000 --- a/doc/sandbox/software.rst +++ /dev/null @@ -1,19 +0,0 @@ -=============== -Others software -=============== - -Other software to look at and maybe recommend to users: - -* [http://www.pytables.org/moin PyTables] - This is looking really - promising for dataset storage and experiment logging... This might - actually be useful for large data sets. -* [http://matplotlib.sourceforge.net/ MatPlotLib] - visualization tools - (plot curves interactively, like matlab's figure window) -* [http://www.pythonware.com/products/pil/ PIL] - Python Image Library: - write your matrices out in png! (Kinda a weird recommendation, I think) -* [http://www.logilab.org/857 pylint] - Syntax checker for python to - help beautify your code. (We'd be hypocrites to recommend this :) -* [http://www.winpdb.org/ Winpdb] - A Platform Independent Python - Debugger. (Except it doesn't really help you debug PyTensor graphs) -* [http://wiki.python.org/moin/IntegratedDevelopmentEnvironments Python - Integrated Development Environments] - for all your coding needs diff --git a/doc/sandbox/sparse.rst b/doc/sandbox/sparse.rst deleted file mode 100644 index 27ccb8c449..0000000000 --- a/doc/sandbox/sparse.rst +++ /dev/null @@ -1,147 +0,0 @@ -.. _sparse: - -=============== -Sparse matrices -=============== - -scipy.sparse ------------- - -Note that you want SciPy >= 0.7.2 - -.. warning:: - - In SciPy 0.6, `scipy.csc_matrix.dot` has a bug with singleton - dimensions. There may be more bugs. It also has inconsistent - implementation of sparse matrices. - - We do not test against SciPy versions below 0.7.2. - -We describe the details of the compressed sparse matrix types. - `scipy.sparse.csc_matrix` - should be used if there are more rows than column (``shape[0] > shape[1]``). - `scipy.sparse.csr_matrix` - should be used if there are more columns than rows (``shape[0] < shape[1]``). - `scipy.sparse.lil_matrix` - is faster if we are modifying the array. After initial inserts, - we can then convert to the appropriate sparse matrix format. - -The following types also exist: - `dok_matrix` - Dictionary of Keys format. From their doc: This is an efficient structure for constructing sparse matrices incrementally. - `coo_matrix` - Coordinate format. From their lil doc: consider using the COO format when constructing large matrices. - -There seems to be a new format planned for SciPy 0.7.x: - `bsr_matrix` - Block Compressed Row (BSR). From their doc: The Block Compressed Row - (BSR) format is very similar to the Compressed Sparse Row (CSR) - format. BSR is appropriate for sparse matrices with dense sub matrices - like the last example below. Block matrices often arise in vector-valued - finite element discretizations. In such cases, BSR is considerably more - efficient than CSR and CSC for many sparse arithmetic operations. - `dia_matrix` - Sparse matrix with DIAgonal storage - -There are four member variables that comprise a compressed matrix ``sp`` (for at least csc, csr and bsr): - - ``sp.shape`` - gives the shape of the matrix. - ``sp.data`` - gives the values of the non-zero entries. For CSC, these should - be in order from (I think, not sure) reading down in columns, - starting at the leftmost column until we reach the rightmost - column. - ``sp.indices`` - gives the location of the non-zero entry. For CSC, this is the - row location. - ``sp.indptr`` - gives the other location of the non-zero entry. For CSC, there are - as many values of indptr as there are ``columns + 1`` in the matrix. - ``sp.indptr[k] = x`` and ``indptr[k+1] = y`` means that column - ``k`` contains ``sp.data[x:y]``, i.e. the ``x``-th through the y-1th non-zero values. - -See the example below for details. - -.. code-block:: python - - >>> import scipy.sparse - >>> sp = scipy.sparse.csc_matrix((5, 10)) - >>> sp[4, 0] = 20 - SparseEfficiencyWarning: changing the sparsity structure of a csc_matrix is expensive. lil_matrix is more efficient. - SparseEfficiencyWarning) - >>> sp[0, 0] = 10 - >>> sp[2, 3] = 30 - >>> sp.todense() - matrix([[ 10., 0., 0., 0., 0., 0., 0., 0., 0., 0.], - [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], - [ 0., 0., 0., 30., 0., 0., 0., 0., 0., 0.], - [ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], - [ 20., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]) - >>> print sp - (0, 0) 10.0 - (4, 0) 20.0 - (2, 3) 30.0 - >>> sp.shape - (5, 10) - >>> sp.data - array([ 10., 20., 30.]) - >>> sp.indices - array([0, 4, 2], dtype=int32) - >>> sp.indptr - array([0, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3], dtype=int32) - -Several things should be learned from the above example: - -* We actually use the wrong sparse matrix type. In fact, it is the - *rows* that are sparse, not the columns. So, it would have been - better to use ``sp = scipy.sparse.csr_matrix((5, 10))``. -* We should have actually created the matrix as a `lil_matrix`, - which is more efficient for inserts. Afterwards, we should convert - to the appropriate compressed format. -* ``sp.indptr[0] = 0`` and ``sp.indptr[1] = 2``, which means that - column 0 contains ``sp.data[0:2]``, i.e. the first two non-zero values. -* ``sp.indptr[3] = 2`` and ``sp.indptr[4] = 3``, which means that column - three contains ``sp.data[2:3]``, i.e. the third non-zero value. - -TODO: Rewrite this documentation to do things in a smarter way. - -Speed ------ - -For faster sparse code: - * Construction: lil_format is fast for many inserts. - * Operators: "Since conversions to and from the COO format are - quite fast, you can use this approach to efficiently implement lots - computations on sparse matrices." (Nathan Bell on scipy mailing list) - -Misc ----- -The sparse equivalent of `dmatrix` is `csc_matrix` and `csr_matrix`. - -:class:`~pytensor.sparse.basic.Dot` vs. :class:`~pytensor.sparse.basic.StructuredDot` -------------------------------------------------------------------------------------- - -Often when you use a sparse matrix it is because there is a meaning to the -structure of non-zeros. The gradient on terms outside that structure -has no meaning, so it is computationally efficient not to compute them. - -`StructuredDot` is when you want the gradient to have zeroes corresponding to -the sparse entries in the matrix. - -`TrueDot` and `Structured` dot have different gradients -but their perform functions should be the same. - -The gradient of `TrueDot` can have non-zeros where the sparse matrix had zeros. -The gradient of `StructuredDot` can't. - -Suppose you have ``dot(x,w)`` where ``x`` and ``w`` are square matrices. -If ``w`` is dense, like ``standard_normal((5,5))`` and ``x`` is of full rank (though -potentially sparse, like a diagonal matrix of ones) then the output will -be dense too. -What's important is the density of the gradient on the output. -If the gradient on the output is dense, and ``w`` is dense (as we said it was) -then the ``True`` gradient on ``x`` will be dense. -If our dot is a `TrueDot`, then it will say that the gradient on ``x`` is dense. -If our dot is a `StructuredDot`, then it will say the gradient on ``x`` is only -defined on the diagonal and ignore the gradients on the off-diagonal. diff --git a/doc/sandbox/tensoroptools.rst b/doc/sandbox/tensoroptools.rst deleted file mode 100644 index 132924142f..0000000000 --- a/doc/sandbox/tensoroptools.rst +++ /dev/null @@ -1,9 +0,0 @@ - -.. _tensoroptools: - -================ -Tensor Op Tools -================ - -WRITEME - describe how to use Elemwise here -