Merge pull request #27 from Nonannet/dev

Dev
Added tensor support and type hints for math functions
2026-01-05 14:13:04 +01:00 · 2026-01-05 13:39:53 +01:00 · 2026-01-05 13:39:01 +01:00 · 2026-01-05 10:58:57 +01:00 · 2026-01-05 10:49:47 +01:00 · 2026-01-02 15:24:21 +01:00
35 changed files with 536 additions and 217 deletions
--- a/.github/workflows/build_wheels.yml
+++ b/.github/workflows/build_wheels.yml
@ -22,6 +22,11 @@ jobs:
          name: stencil-object-files
          path: src/copapy/obj/*.o

+      - uses: actions/upload-artifact@v4
+        with:
+          name: musl-object-files
+          path: /object_files/*
+
  build_wheels:
    if: contains(github.ref, '-beta') == false
    needs: [build_stencils]
@ -33,6 +38,9 @@ jobs:

    steps:
      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+          fetch-tags: true

      - uses: actions/download-artifact@v4
        with:
--- a/.github/workflows/ci.yml
+++ b/.github/workflows/ci.yml
@ -41,8 +41,6 @@ jobs:
    steps:
    - name: Check out code
      uses: actions/checkout@v4
-      with:
-        fetch-tags: true

    - uses: actions/download-artifact@v4
      with:
@ -89,8 +87,6 @@ jobs:
    steps:
    - name: Check out code
      uses: actions/checkout@v4
-      with:
-        fetch-tags: true

    - uses: actions/download-artifact@v4
      with:
@ -144,8 +140,6 @@ jobs:
    continue-on-error: true
    steps:
    - uses: actions/checkout@v4
-      with:
-        fetch-tags: true
    - uses: actions/download-artifact@v4
      with:
        name: stencil-object-files
@ -175,8 +169,6 @@ jobs:
    continue-on-error: true
    steps:
    - uses: actions/checkout@v4
-      with:
-        fetch-tags: true
    - uses: actions/download-artifact@v4
      with:
        name: stencil-object-files
@ -206,8 +198,6 @@ jobs:
    continue-on-error: true
    steps:
    - uses: actions/checkout@v4
-      with:
-        fetch-tags: true
    - uses: actions/download-artifact@v4
      with:
        name: stencil-object-files
@ -242,8 +232,6 @@ jobs:
    steps:
    - name: Check out code
      uses: actions/checkout@v4
-      with:
-        fetch-tags: true

    - uses: actions/download-artifact@v4
      with:
@ -290,11 +278,8 @@ jobs:
    steps:
      - uses: actions/checkout@v4
        with:
-          fetch-depth: 1
+          fetch-depth: 0
          fetch-tags: true
-          sparse-checkout: |
-            pyproject.toml
-            tools/get_tag.sh

      - name: Download artifacts
        uses: actions/download-artifact@v4
--- a/.gitignore
+++ b/.gitignore
@ -29,3 +29,4 @@ docs/source/api
 core
 *.log
 docs/source/start.md
+/src/copapy/_version.py
--- a/README.md
+++ b/README.md
@ -1,6 +1,6 @@
 # Copapy

-Copapy is a Python framework for deterministic, low-latency realtime computation with automatic differentiation support, targeting hardware applications - for example in the fields of robotics, aerospace, embedded systems and control systems in general.
+Copapy is a Python framework for deterministic, low-latency realtime computation with automatic differentiation support, targeting hardware applications - for example in the fields of robotics, aerospace, SDR, embedded systems and control systems in general.

 GPU frameworks like PyTorch, JAX and TensorFlow jump-started the development in the field of AI. With the right balance of flexibility and performance, they allow for fast iteration of new ideas while still being performant enough to test or even use them in production.

@ -187,11 +187,11 @@ For more complex operations - where inlining is less useful - stencils call a no
            e: R_X86_64_PLT32    result_float-0x4
 ```

-Unlike stencils, non-stencil functions are not stripped and do not need to be tail-call-optimizable.
+Unlike stencils, non-stencil functions like `sinf` are not stripped and do not need to be tail-call-optimizable. These functions can be provided as C code and compiled together with the stencils or can be object files like in the case of `sinf` compiled from C and assembly code and merged into the stencil object files. Math functions like `sinf` are currently provided by the MUSL C library, with architecture-specific optimizations. 

-Non-stencil functions and constants are stored together with the stencils in an ELF object file for each supported CPU architecture. The required non-stencil functions and constants are bundled during compilation. The compiler includes only the data and code required for the specific program.
+Non-stencil functions and constants are stored together with the stencils in an ELF object file for each supported CPU architecture. The required non-stencil functions and constants are bundled during compilation. The compiler includes only the data and code required for a specific Copapy program.

-The whole compilation process is independent of the actual instruction set. It relies purely on relocation entries and symbol metadata from the ELF file generated by the C compiler.
+The Copapy compilation process is independent of the actual instruction set. It relies purely on relocation entries and symbol metadata from the ELF file generated by the C compiler.

 ## Developer Guide

--- a/docs/source/compiler.md
+++ b/docs/source/compiler.md
@ -1,4 +1,4 @@
-# Compiler
+# How it works
 ```{toctree}
 :maxdepth: 1
 :hidden:
--- a/docs/source/generate_class_list.py
+++ b/docs/source/generate_class_list.py
@ -97,7 +97,7 @@ if __name__ == "__main__":

        write_functions(f, ['*'], 'copapy', title='Vector functions', path_patterns=['*_vectors*'], api_dir=api_dir)

-        write_functions(f, ['*'], 'copapy', title='Matrix functions', path_patterns=['*_matrices*'], api_dir=api_dir)
+        write_functions(f, ['*'], 'copapy', title='Tensor/Matrix functions', path_patterns=['*_tensors*'], api_dir=api_dir)

        #write_manual(f, ['NumLike'], title='Types')

--- a/pyproject.toml
+++ b/pyproject.toml
@ -31,7 +31,7 @@ copapy = ["obj/*.o", "py.typed"]

 [tool.setuptools_scm]
 version_scheme = "post-release"
-local_scheme = "node-and-date"
+local_scheme = "no-local-version"
 tag_regex = "^v(?P<version>\\d+\\.\\d+\\.\\d+(?:-beta)?)$"
 fallback_version = "0.0.0"
 write_to = "src/copapy/_version.py"
--- a/src/copapy/init.py
+++ b/src/copapy/init.py
@ -40,7 +40,7 @@ from ._tensors import tensor, zeros, ones, arange, eye, identity, diagonal
 from ._math import sqrt, abs, sign, sin, cos, tan, asin, acos, atan, atan2, log, exp, pow, get_42, clamp, min, max, relu
 from ._autograd import grad
 from ._tensors import tensor as matrix
-from ._version import __version__
+from ._version import __version__  # Run "pip install -e ." to generate _version.py


 __all__ = [
--- a/src/copapy/_autograd.py
+++ b/src/copapy/_autograd.py
@ -89,8 +89,11 @@ def grad(x: Any, y: value[Any] | Sequence[value[Any]] | vector[Any] | tensor[Any
            elif opn == 'sqrt':
                add_grad(a, g * (0.5 / cp.sqrt(a)))

-            #elif opn == 'abs':
-            #    add_grad(x, g * cp.sign(x))
+            elif opn == 'abs':
+                add_grad(a, g * cp.sign(a))
+
+            elif opn == 'neg':
+                add_grad(a, -b)

            elif opn == 'sin':
                add_grad(a, g * cp.cos(a))
--- a/src/copapy/_basic_types.py
+++ b/src/copapy/_basic_types.py
@ -1,5 +1,5 @@
 import pkgutil
-from typing import Any, Sequence, TypeVar, overload, TypeAlias, Generic, cast, Callable
+from typing import Any, Sequence, TypeVar, overload, TypeAlias, Generic, Callable
 from ._stencils import stencil_database, detect_process_arch
 import copapy as cp
 from ._helper_types import TNum
@ -230,13 +230,11 @@ class value(Generic[TNum]):
    def __rfloordiv__(self, other: NumLike) -> Any:
        return add_op('floordiv', [other, self])

-    def __abs__(self: TCPNum) -> TCPNum:
-        return cp.abs(self)  # type: ignore
+    def __abs__(self: 'value[TNum]') -> 'value[TNum]':
+        return cp.abs(self)

-    def __neg__(self: TCPNum) -> TCPNum:
-        if self.dtype == 'float':
-            return cast(TCPNum, add_op('sub', [value(0.0), self]))
-        return cast(TCPNum, add_op('sub', [value(0), self]))
+    def __neg__(self: 'value[TNum]') -> 'value[TNum]':
+        return add_op('neg', [self])

    def __gt__(self, other: TVarNumb) -> 'value[int]':
        return add_op('gt', [self, other], dtype='bool')
@ -362,7 +360,7 @@ class CPConstant(Node):
        return self.node_hash


-class Write(Node):
+class Store(Node):
    def __init__(self, input: value[Any] | Net | int | float):
        if isinstance(input, value):
            net = input.net
@ -372,7 +370,7 @@ class Write(Node):
            node = CPConstant(input)
            net = Net(node.dtype, node)

-        self.name = 'write_' + transl_type(net.dtype)
+        self.name = 'store_' + transl_type(net.dtype)
        self.args = (net,)
        self.node_hash = hash(self.name) ^ hash(net.source.node_hash)

--- a/src/copapy/_compiler.py
+++ b/src/copapy/_compiler.py
@ -2,7 +2,7 @@ from typing import Generator, Iterable, Any
 from . import _binwrite as binw
 from ._stencils import stencil_database, patch_entry
 from collections import defaultdict, deque
-from ._basic_types import Net, Node, Write, CPConstant, Op, transl_type
+from ._basic_types import Net, Node, Store, CPConstant, Op, transl_type


 def stable_toposort(edges: Iterable[tuple[Node, Node]]) -> list[Node]:
@ -132,7 +132,7 @@ def get_const_nets(nodes: list[Node]) -> list[Net]:
    return [net_lookup[node] for node in nodes if isinstance(node, CPConstant)]


-def add_read_ops(node_list: list[Node]) -> Generator[tuple[Net | None, Node], None, None]:
+def add_load_ops(node_list: list[Node]) -> Generator[tuple[Net | None, Node], None, None]:
    """Add read node before each op where arguments are not already positioned
    correctly in the registers

@ -156,7 +156,7 @@ def add_read_ops(node_list: list[Node]) -> Generator[tuple[Net | None, Node], No
                    #if net in registers:
                    #    print('x  swap registers')
                    type_list = ['int' if r is None else transl_type(r.dtype) for r in registers]
-                    new_node = Op(f"read_{transl_type(net.dtype)}_reg{i}_" + '_'.join(type_list), [])
+                    new_node = Op(f"load_{transl_type(net.dtype)}_reg{i}_" + '_'.join(type_list), [])
                    yield net, new_node
                    registers[i] = net

@ -170,7 +170,7 @@ def add_read_ops(node_list: list[Node]) -> Generator[tuple[Net | None, Node], No
                yield None, node


-def add_write_ops(net_node_list: list[tuple[Net | None, Node]], const_nets: list[Net]) -> Generator[tuple[Net | None, Node], None, None]:
+def add_store_ops(net_node_list: list[tuple[Net | None, Node]], const_nets: list[Net]) -> Generator[tuple[Net | None, Node], None, None]:
    """Add write operation for each new defined net if a read operation is later followed

    Returns:
@ -181,19 +181,19 @@ def add_write_ops(net_node_list: list[tuple[Net | None, Node]], const_nets: list
    # Initialize set of nets with constants
    stored_nets = set(const_nets)

-    #assert all(node.name.startswith('read_') for net, node in net_node_list if net)
+    #assert all(node.name.startswith('load_') for net, node in net_node_list if net)
    read_back_nets = {
        net for net, node in net_node_list
-        if net and node.name.startswith('read_')}
+        if net and node.name.startswith('load_')}

    registers: list[Net | None] = [None, None]

    for net, node in net_node_list:
-        if isinstance(node, Write):
+        if isinstance(node, Store):
            assert len(registers) == 2
            type_list = [transl_type(r.dtype) if r else 'int' for r in registers]
-            yield node.args[0], Op(f"write_{type_list[0]}_reg0_" + '_'.join(type_list), node.args)
-        elif node.name.startswith('read_'):
+            yield node.args[0], Op(f"store_{type_list[0]}_reg0_" + '_'.join(type_list), node.args)
+        elif node.name.startswith('load_'):
            yield net, node
        else:
            yield None, node
@ -207,7 +207,7 @@ def add_write_ops(net_node_list: list[tuple[Net | None, Node]], const_nets: list

            if net in read_back_nets and net not in stored_nets:
                type_list = [transl_type(r.dtype) if r else 'int' for r in registers]
-                yield net, Op(f"write_{type_list[0]}_reg0_" + '_'.join(type_list), [])
+                yield net, Op(f"store_{type_list[0]}_reg0_" + '_'.join(type_list), [])
                stored_nets.add(net)


@ -344,8 +344,8 @@ def compile_to_dag(node_list: Iterable[Node], sdb: stencil_database) -> tuple[bi

    ordered_ops = list(stable_toposort(get_all_dag_edges(node_list)))
    const_net_list = get_const_nets(ordered_ops)
-    output_ops = list(add_read_ops(ordered_ops))
-    extended_output_ops = list(add_write_ops(output_ops, const_net_list))
+    output_ops = list(add_load_ops(ordered_ops))
+    extended_output_ops = list(add_store_ops(output_ops, const_net_list))

    dw = binw.data_writer(sdb.byteorder)

--- a/src/copapy/_math.py
+++ b/src/copapy/_math.py
@ -1,5 +1,7 @@
 from . import vector
+from . import tensor
 from ._vectors import VecNumLike
+from ._tensors import TensorNumLike
 from . import value, NumLike
 from typing import TypeVar, Any, overload, Callable
 from ._basic_types import add_op, unifloat
@ -15,6 +17,8 @@ def exp(x: float | int) -> float: ...
 def exp(x: value[Any]) -> value[float]: ...
@overload
 def exp(x: vector[Any]) -> vector[float]: ...
+@overload
+def exp(x: tensor[Any]) -> tensor[float]: ...
 def exp(x: Any) -> Any:
    """Exponential function to basis e

@ -26,7 +30,7 @@ def exp(x: Any) -> Any:
    """
    if isinstance(x, value):
        return add_op('exp', [x])
-    if isinstance(x, vector):
+    if isinstance(x, vector | tensor):
        return x.map(exp)
    return float(math.exp(x))

@ -37,6 +41,8 @@ def log(x: float | int) -> float: ...
 def log(x: value[Any]) -> value[float]: ...
@overload
 def log(x: vector[Any]) -> vector[float]: ...
+@overload
+def log(x: tensor[Any]) -> tensor[float]: ...
 def log(x: Any) -> Any:
    """Logarithm to basis e

@ -48,7 +54,7 @@ def log(x: Any) -> Any:
    """
    if isinstance(x, value):
        return add_op('log', [x])
-    if isinstance(x, vector):
+    if isinstance(x, vector | tensor):
        return x.map(log)
    return float(math.log(x))

@ -61,7 +67,13 @@ def pow(x: value[Any], y: NumLike) -> value[float]: ...
 def pow(x: NumLike, y: value[Any]) -> value[float]: ...
@overload
 def pow(x: vector[Any], y: Any) -> vector[float]: ...
-def pow(x: VecNumLike, y: VecNumLike) -> Any:
+@overload
+def pow(x: Any, y: vector[Any]) -> vector[float]: ...
+@overload
+def pow(x: tensor[Any], y: Any) -> tensor[float]: ...
+@overload
+def pow(x: Any, y: tensor[Any]) -> tensor[float]: ...
+def pow(x: TensorNumLike, y: TensorNumLike) -> Any:
    """x to the power of y

    Arguments:
@ -70,8 +82,10 @@ def pow(x: VecNumLike, y: VecNumLike) -> Any:
    Returns:
        result of x**y
    """
+    if isinstance(x, tensor) or isinstance(y, tensor):
+        return _map2_tensor(x, y, pow)
    if isinstance(x, vector) or isinstance(y, vector):
-        return _map2(x, y, pow)
+        return _map2_vector(x, y, pow)
    if isinstance(y, int) and 0 <= y < 8:
        if y == 0:
            return 1
@ -93,6 +107,8 @@ def sqrt(x: float | int) -> float: ...
 def sqrt(x: value[Any]) -> value[float]: ...
@overload
 def sqrt(x: vector[Any]) -> vector[float]: ...
+@overload
+def sqrt(x: tensor[Any]) -> tensor[float]: ...
 def sqrt(x: Any) -> Any:
    """Square root function

@ -104,7 +120,7 @@ def sqrt(x: Any) -> Any:
    """
    if isinstance(x, value):
        return add_op('sqrt', [x])
-    if isinstance(x, vector):
+    if isinstance(x, vector | tensor):
        return x.map(sqrt)
    return float(math.sqrt(x))

@ -115,6 +131,8 @@ def sin(x: float | int) -> float: ...
 def sin(x: value[Any]) -> value[float]: ...
@overload
 def sin(x: vector[Any]) -> vector[float]: ...
+@overload
+def sin(x: tensor[Any]) -> tensor[float]: ...
 def sin(x: Any) -> Any:
    """Sine function

@ -126,7 +144,7 @@ def sin(x: Any) -> Any:
    """
    if isinstance(x, value):
        return add_op('sin', [x])
-    if isinstance(x, vector):
+    if isinstance(x, vector | tensor):
        return x.map(sin)
    return math.sin(x)

@ -137,6 +155,8 @@ def cos(x: float | int) -> float: ...
 def cos(x: value[Any]) -> value[float]: ...
@overload
 def cos(x: vector[Any]) -> vector[float]: ...
+@overload
+def cos(x: tensor[Any]) -> tensor[float]: ...
 def cos(x: Any) -> Any:
    """Cosine function

@ -148,7 +168,7 @@ def cos(x: Any) -> Any:
    """
    if isinstance(x, value):
        return add_op('cos', [x])
-    if isinstance(x, vector):
+    if isinstance(x, vector | tensor):
        return x.map(cos)
    return math.cos(x)

@ -159,6 +179,8 @@ def tan(x: float | int) -> float: ...
 def tan(x: value[Any]) -> value[float]: ...
@overload
 def tan(x: vector[Any]) -> vector[float]: ...
+@overload
+def tan(x: tensor[Any]) -> tensor[float]: ...
 def tan(x: Any) -> Any:
    """Tangent function

@ -170,8 +192,7 @@ def tan(x: Any) -> Any:
    """
    if isinstance(x, value):
        return add_op('tan', [x])
-    if isinstance(x, vector):
-        #return x.map(tan)
+    if isinstance(x, vector | tensor):
        return x.map(tan)
    return math.tan(x)

@ -182,6 +203,8 @@ def atan(x: float | int) -> float: ...
 def atan(x: value[Any]) -> value[float]: ...
@overload
 def atan(x: vector[Any]) -> vector[float]: ...
+@overload
+def atan(x: tensor[Any]) -> tensor[float]: ...
 def atan(x: Any) -> Any:
    """Inverse tangent function

@ -193,7 +216,7 @@ def atan(x: Any) -> Any:
    """
    if isinstance(x, value):
        return add_op('atan', [x])
-    if isinstance(x, vector):
+    if isinstance(x, vector | tensor):
        return x.map(atan)
    return math.atan(x)

@ -208,7 +231,11 @@ def atan2(x: NumLike, y: value[Any]) -> value[float]: ...
 def atan2(x: vector[float], y: VecNumLike) -> vector[float]: ...
@overload
 def atan2(x: VecNumLike, y: vector[float]) -> vector[float]: ...
-def atan2(x: VecNumLike, y: VecNumLike) -> Any:
+@overload
+def atan2(x: tensor[float], y: TensorNumLike) -> tensor[float]: ...
+@overload
+def atan2(x: TensorNumLike, y: tensor[float]) -> tensor[float]: ...
+def atan2(x: TensorNumLike, y: TensorNumLike) -> Any:
    """2-argument arctangent

    Arguments:
@ -218,8 +245,10 @@ def atan2(x: VecNumLike, y: VecNumLike) -> Any:
    Returns:
        Result in radian
    """
+    if isinstance(x, tensor) or isinstance(y, tensor):
+        return _map2_tensor(x, y, atan2)
    if isinstance(x, vector) or isinstance(y, vector):
-        return _map2(x, y, atan2)
+        return _map2_vector(x, y, atan2)
    if isinstance(x, value) or isinstance(y, value):
        return add_op('atan2', [x, y])
    return math.atan2(x, y)
@ -231,6 +260,8 @@ def asin(x: float | int) -> float: ...
 def asin(x: value[Any]) -> value[float]: ...
@overload
 def asin(x: vector[Any]) -> vector[float]: ...
+@overload
+def asin(x: tensor[Any]) -> tensor[float]: ...
 def asin(x: Any) -> Any:
    """Inverse sine function

@ -242,7 +273,7 @@ def asin(x: Any) -> Any:
    """
    if isinstance(x, value):
        return add_op('asin', [x])
-    if isinstance(x, vector):
+    if isinstance(x, vector | tensor):
        return x.map(asin)
    return math.asin(x)

@ -253,6 +284,8 @@ def acos(x: float | int) -> float: ...
 def acos(x: value[Any]) -> value[float]: ...
@overload
 def acos(x: vector[Any]) -> vector[float]: ...
+@overload
+def acos(x: tensor[Any]) -> tensor[float]: ...
 def acos(x: Any) -> Any:
    """Inverse cosine function

@ -264,11 +297,12 @@ def acos(x: Any) -> Any:
    """
    if isinstance(x, value):
        return add_op('acos', [x])
-    if isinstance(x, vector):
+    if isinstance(x, vector | tensor):
        return x.map(acos)
    return math.asin(x)


+# Debug test function
@overload
 def get_42(x: float | int) -> float: ...
@overload
@ -286,7 +320,9 @@ def abs(x: U) -> U: ...
 def abs(x: value[U]) -> value[U]: ...
@overload
 def abs(x: vector[U]) -> vector[U]: ...
-def abs(x: U | value[U] | vector[U]) -> Any:
+@overload
+def abs(x: tensor[U]) -> tensor[U]: ...
+def abs(x: U | value[U] | vector[U] | tensor[U]) -> Any:
    """Absolute value function

    Arguments:
@ -297,18 +333,20 @@ def abs(x: U | value[U] | vector[U]) -> Any:
    """
    if isinstance(x, value):
        return add_op('abs', [x])
-    if isinstance(x, vector):
+    if isinstance(x, vector | tensor):
        return x.map(abs)
    return (x < 0) * -x + (x >= 0) * x


@overload
-def sign(x: U) -> U: ...
+def sign(x: U) -> int: ...
@overload
-def sign(x: value[U]) -> value[U]: ...
+def sign(x: value[U]) -> value[int]: ...
@overload
-def sign(x: vector[U]) -> vector[U]: ...
-def sign(x: U | value[U] | vector[U]) -> Any:
+def sign(x: vector[U]) -> vector[int]: ...
+@overload
+def sign(x: tensor[U]) -> tensor[int]: ...
+def sign(x: U | value[U] | vector[U] | tensor[U]) -> Any:
    """Return 1 for positive numbers and -1 for negative numbers.
    For an input of 0 the return value is 0.

@ -318,8 +356,11 @@ def sign(x: U | value[U] | vector[U]) -> Any:
    Returns:
        -1, 0 or 1
    """
-    ret = (x > 0) - (x < 0)
-    return ret
+    if isinstance(x, value):
+        return add_op('sign', [x])
+    if isinstance(x, vector | tensor):
+        return x.map(sign)
+    return (x > 0) - (x < 0)


@overload
@ -367,7 +408,13 @@ def min(x: U | value[U], y: U | value[U]) -> Any:
    Returns:
        Minimum of x and y
    """
-    return (x < y) * x + (x >= y) * y
+    if isinstance(x, value):
+        return add_op('min', [x, y])
+    if isinstance(x, tensor):
+        return _map2_tensor(x, y, min)
+    if isinstance(x, vector):
+        return _map2_vector(x, y, min)
+    return x if x < y else y


@overload
@ -386,7 +433,13 @@ def max(x: U | value[U], y: U | value[U]) -> Any:
    Returns:
        Maximum of x and y
    """
-    return (x > y) * x + (x <= y) * y
+    if isinstance(x, value):
+        return add_op('max', [x, y])
+    if isinstance(x, tensor):
+        return _map2_tensor(x, y, max)
+    if isinstance(x, vector):
+        return _map2_vector(x, y, max)
+    return x if x > y else y


@overload
@ -400,7 +453,16 @@ def lerp(v1: U, v2: U, t: float) -> U: ...
@overload
 def lerp(v1: vector[U], v2: vector[U], t: unifloat) -> vector[U]: ...
 def lerp(v1: U | value[U] | vector[U], v2: U | value[U] | vector[U], t:  unifloat) -> Any:
-    """Linearly interpolate between two values or vectors v1 and v2 by a factor t."""
+    """Linearly interpolate between two values or vectors v1 and v2 by a factor t.
+
+    Arguments:
+        v1: First value or vector
+        v2: Second value or vector
+        t: Interpolation factor (0.0 to 1.0)
+
+    Returns:
+        Interpolated value or vector
+    """
    if isinstance(v1, vector) or isinstance(v2, vector):
        assert isinstance(v1, vector) and isinstance(v2, vector), "None or both v1 and v2 must be vectors."
        assert len(v1.values) == len(v2.values), "Vectors must be of the same length."
@ -414,13 +476,15 @@ def relu(x: U) -> U: ...
 def relu(x: value[U]) -> value[U]: ...
@overload
 def relu(x: vector[U]) -> vector[U]: ...
-def relu(x: U | value[U] | vector[U]) -> Any:
+@overload
+def relu(x: tensor[U]) -> tensor[U]: ...
+def relu(x: U | value[U] | vector[U] | tensor[U]) -> Any:
    """Returns x for x > 0 and otherwise 0."""
-    ret = (x > 0) * x
+    ret = x * (x > 0)
    return ret


-def _map2(self: VecNumLike, other: VecNumLike, func: Callable[[Any, Any], value[U] | U]) -> vector[U]:
+def _map2_vector(self: VecNumLike, other: VecNumLike, func: Callable[[Any, Any], value[U] | U]) -> vector[U]:
    """Applies a function to each element of the vector and a second vector or scalar."""
    if isinstance(self, vector) and isinstance(other, vector):
        return vector(func(x, y) for x, y in zip(self.values, other.values))
@ -430,3 +494,20 @@ def _map2(self: VecNumLike, other: VecNumLike, func: Callable[[Any, Any], value[
        return vector(func(self, x) for x in other.values)
    else:
        return vector([func(self, other)])
+
+
+def _map2_tensor(self: TensorNumLike, other: TensorNumLike, func: Callable[[Any, Any], value[U] | U]) -> tensor[U]:
+    """Applies a function to each element of the vector and a second vector or scalar."""
+    if isinstance(self, vector):
+        self = tensor(self.values, (len(self.values),))
+    if isinstance(other, vector):
+        other = tensor(other.values, (len(other.values),))
+    if isinstance(self, tensor) and isinstance(other, tensor):
+        assert self.shape == other.shape, "Tensors must have the same shape"
+        return tensor([func(x, y) for x, y in zip(self.values, other.values)], self.shape)
+    elif isinstance(self, tensor):
+        return tensor([func(x, other) for x in self.values], self.shape)
+    elif isinstance(other, tensor):
+        return tensor([func(self, x) for x in other.values], other.shape)
+    else:
+        return tensor(func(self, other))
--- a/src/copapy/_target.py
+++ b/src/copapy/_target.py
@ -2,7 +2,7 @@ from typing import Iterable, overload, TypeVar, Any, Callable, TypeAlias
 from . import _binwrite as binw
 from coparun_module import coparun, read_data_mem, create_target, clear_target
 import struct
-from ._basic_types import value, Net, Node, Write, NumLike, ArrayType, stencil_db_from_package
+from ._basic_types import value, Net, Node, Store, NumLike, ArrayType, stencil_db_from_package
 from ._compiler import compile_to_dag

 T = TypeVar("T", int, float)
@ -76,13 +76,13 @@ class Target():
            if isinstance(input, ArrayType):
                for v in input.values:
                    if isinstance(v, value):
-                        nodes.append(Write(v))
+                        nodes.append(Store(v))
            elif isinstance(input, Iterable):
                for v in input:
                    if isinstance(v, value):
-                        nodes.append(Write(v))
+                        nodes.append(Store(v))
            elif isinstance(input, value):
-                nodes.append(Write(input))
+                nodes.append(Store(input))

        dw, self._values = compile_to_dag(nodes, self.sdb)
        dw.write_com(binw.Command.END_COM)
--- a/src/copapy/_tensors.py
+++ b/src/copapy/_tensors.py
@ -1,13 +1,13 @@
 from copapy._basic_types import NumLike, ArrayType
 from . import value
-from ._vectors import vector
+from ._vectors import vector, VecFloatLike, VecIntLike, VecNumLike
 from ._mixed import mixed_sum
 from typing import TypeVar, Any, overload, TypeAlias, Callable, Iterator, Sequence
 from ._helper_types import TNum

 TensorNumLike: TypeAlias = 'tensor[Any] | vector[Any] | value[Any] | int | float | bool'
-TensorIntLike: TypeAlias = 'tensor[int] | value[int] | int'
-TensorFloatLike: TypeAlias = 'tensor[float] | value[float] | float'
+TensorIntLike: TypeAlias = 'tensor[int] | vector[int] | value[int] | int | bool'
+TensorFloatLike: TypeAlias = 'tensor[float] | vector[float] | value[float] | float'
 TensorSequence: TypeAlias = 'Sequence[TNum | value[TNum]] | Sequence[Sequence[TNum | value[TNum]]] | Sequence[Sequence[Sequence[TNum | value[TNum]]]]'
 U = TypeVar("U", int, float)

@ -26,6 +26,7 @@ class tensor(ArrayType[TNum]):
            values: Nested iterables of constant values or copapy values.
                    Can be a scalar, 1D iterable (vector),
                    or n-dimensional nested structure.
+            shape: Optional shape of the tensor. If not provided, inferred from values.
        """
        if shape:
            self.shape: tuple[int, ...] = tuple(shape)
@ -264,15 +265,19 @@ class tensor(ArrayType[TNum]):
    @overload
    def __add__(self: 'tensor[float]', other: TensorNumLike) -> 'tensor[float]': ...
    @overload
-    def __add__(self, other: TensorNumLike) -> 'tensor[int] | tensor[float]': ...
+    def __add__(self, other: TensorNumLike) -> 'tensor[Any]': ...
    def __add__(self, other: TensorNumLike) -> Any:
        """Element-wise addition."""
        return self._binary_op(other, lambda a, b: a + b)

    @overload
-    def __radd__(self: 'tensor[float]', other: TensorNumLike) -> 'tensor[float]': ...
+    def __radd__(self: 'tensor[int]', other: VecFloatLike) -> 'tensor[float]': ...
    @overload
-    def __radd__(self: 'tensor[int]', other: value[int] | int) -> 'tensor[int]': ...
+    def __radd__(self: 'tensor[int]', other: VecIntLike) -> 'tensor[int]': ...
+    @overload
+    def __radd__(self: 'tensor[float]', other: VecNumLike) -> 'tensor[float]': ...
+    @overload
+    def __radd__(self, other: VecNumLike) -> 'tensor[Any]': ...
    def __radd__(self, other: Any) -> Any:
        return self + other

@ -283,15 +288,19 @@ class tensor(ArrayType[TNum]):
    @overload
    def __sub__(self: 'tensor[float]', other: TensorNumLike) -> 'tensor[float]': ...
    @overload
-    def __sub__(self, other: TensorNumLike) -> 'tensor[int] | tensor[float]': ...
+    def __sub__(self, other: TensorNumLike) -> 'tensor[Any]': ...
    def __sub__(self, other: TensorNumLike) -> Any:
        """Element-wise subtraction."""
        return self._binary_op(other, lambda a, b: a - b, commutative=False)

    @overload
-    def __rsub__(self: 'tensor[float]', other: TensorNumLike) -> 'tensor[float]': ...
+    def __rsub__(self: 'tensor[int]', other: VecFloatLike) -> 'tensor[float]': ...
    @overload
-    def __rsub__(self: 'tensor[int]', other: value[int] | int) -> 'tensor[int]': ...
+    def __rsub__(self: 'tensor[int]', other: VecIntLike) -> 'tensor[int]': ...
+    @overload
+    def __rsub__(self: 'tensor[float]', other: VecNumLike) -> 'tensor[float]': ...
+    @overload
+    def __rsub__(self, other: VecNumLike) -> 'tensor[Any]': ...
    def __rsub__(self, other: TensorNumLike) -> Any:
        return self._binary_op(other, lambda a, b: b - a, commutative=False, reversed=True)

@ -302,15 +311,19 @@ class tensor(ArrayType[TNum]):
    @overload
    def __mul__(self: 'tensor[float]', other: TensorNumLike) -> 'tensor[float]': ...
    @overload
-    def __mul__(self, other: TensorNumLike) -> 'tensor[int] | tensor[float]': ...
+    def __mul__(self, other: TensorNumLike) -> 'tensor[Any]': ...
    def __mul__(self, other: TensorNumLike) -> Any:
        """Element-wise multiplication."""
        return self._binary_op(other, lambda a, b: a * b)

    @overload
-    def __rmul__(self: 'tensor[float]', other: TensorNumLike) -> 'tensor[float]': ...
+    def __rmul__(self: 'tensor[int]', other: VecFloatLike) -> 'tensor[float]': ...
    @overload
-    def __rmul__(self: 'tensor[int]', other: value[int] | int) -> 'tensor[int]': ...
+    def __rmul__(self: 'tensor[int]', other: VecIntLike) -> 'tensor[int]': ...
+    @overload
+    def __rmul__(self: 'tensor[float]', other: VecNumLike) -> 'tensor[float]': ...
+    @overload
+    def __rmul__(self, other: VecNumLike) -> 'tensor[Any]': ...
    def __rmul__(self, other: TensorNumLike) -> Any:
        return self * other

@ -329,15 +342,19 @@ class tensor(ArrayType[TNum]):
    @overload
    def __pow__(self: 'tensor[float]', other: TensorNumLike) -> 'tensor[float]': ...
    @overload
-    def __pow__(self, other: TensorNumLike) -> 'tensor[int] | tensor[float]': ...
+    def __pow__(self, other: TensorNumLike) -> 'tensor[Any]': ...
    def __pow__(self, other: TensorNumLike) -> Any:
        """Element-wise power."""
        return self._binary_op(other, lambda a, b: a ** b, commutative=False)

    @overload
-    def __rpow__(self: 'tensor[float]', other: TensorNumLike) -> 'tensor[float]': ...
+    def __rpow__(self: 'tensor[int]', other: VecFloatLike) -> 'tensor[float]': ...
    @overload
-    def __rpow__(self: 'tensor[int]', other: value[int] | int) -> 'tensor[int]': ...
+    def __rpow__(self: 'tensor[int]', other: VecIntLike) -> 'tensor[int]': ...
+    @overload
+    def __rpow__(self: 'tensor[float]', other: VecNumLike) -> 'tensor[float]': ...
+    @overload
+    def __rpow__(self, other: VecNumLike) -> 'tensor[Any]': ...
    def __rpow__(self, other: TensorNumLike) -> Any:
        return self._binary_op(other, lambda a, b: b ** a, commutative=False, reversed=True)

--- a/src/copapy/_version.py
+++ b/src/copapy/_version.py
@ -1,2 +0,0 @@
-# generated by setuptools_scm - do not edit
-__version__ = "0.0.0"
--- a/src/copapy/backend.py
+++ b/src/copapy/backend.py
@ -4,10 +4,10 @@ and give access to compiler internals and debugging tools.
 """

 from ._target import add_read_command
-from ._basic_types import Net, Op, Node, CPConstant, Write, stencil_db_from_package
+from ._basic_types import Net, Op, Node, CPConstant, Store, stencil_db_from_package
 from ._compiler import compile_to_dag, \
-    stable_toposort, get_const_nets, get_all_dag_edges, add_read_ops, get_all_dag_edges_between, \
-    add_write_ops, get_dag_stats
+    stable_toposort, get_const_nets, get_all_dag_edges, add_load_ops, get_all_dag_edges_between, \
+    add_store_ops, get_dag_stats

 __all__ = [
    "add_read_command",
@ -15,14 +15,14 @@ __all__ = [
    "Op",
    "Node",
    "CPConstant",
-    "Write",
+    "Store",
    "compile_to_dag",
    "stable_toposort",
    "get_const_nets",
    "get_all_dag_edges",
    "get_all_dag_edges_between",
-    "add_read_ops",
-    "add_write_ops",
+    "add_load_ops",
+    "add_store_ops",
    "stencil_db_from_package",
    "get_dag_stats"
 ]
--- a/stencils/generate_stencils.py
+++ b/stencils/generate_stencils.py
@ -166,8 +166,42 @@ def get_floordiv(op: str, type1: str, type2: str) -> str:
        """
    else:
        return f"""
-        STENCIL void {op}_{type1}_{type2}({type1} arg1, {type2} arg2) {{
-            result_float_{type2}(floorf((float)arg1 / (float)arg2), arg2);
+        STENCIL void {op}_{type1}_{type2}({type1} a, {type2} b) {{
+            result_float_{type2}(floorf((float)a / (float)b), b);
+        }}
+        """
+
+
+@norm_indent
+def get_min(type1: str, type2: str) -> str:
+    if type1 == 'int' and type2 == 'int':
+        return f"""
+        STENCIL void min_{type1}_{type2}({type1} a, {type2} b) {{
+            result_int_{type2}(a < b ? a : b, b);
+        }}
+        """
+    else:
+        return f"""
+        STENCIL void min_{type1}_{type2}({type1} a, {type2} b) {{
+            float _a = (float)a; float _b = (float)b;
+            result_float_{type2}(_a < _b ? _a : _b, b);
+        }}
+        """
+
+
+@norm_indent
+def get_max(type1: str, type2: str) -> str:
+    if type1 == 'int' and type2 == 'int':
+        return f"""
+        STENCIL void max_{type1}_{type2}({type1} a, {type2} b) {{
+            result_int_{type2}(a > b ? a : b, b);
+        }}
+        """
+    else:
+        return f"""
+        STENCIL void max_{type1}_{type2}({type1} a, {type2} b) {{
+            float _a = (float)a; float _b = (float)b;
+            result_float_{type2}(_a > _b ? _a : _b, b);
        }}
        """

@ -187,27 +221,27 @@ def get_result_stubs2(type1: str, type2: str) -> str:


@norm_indent
-def get_read_reg0_code(type1: str, type2: str, type_out: str) -> str:
+def get_load_reg0_code(type1: str, type2: str, type_out: str) -> str:
    return f"""
-    STENCIL void read_{type_out}_reg0_{type1}_{type2}({type1} arg1, {type2} arg2) {{
+    STENCIL void load_{type_out}_reg0_{type1}_{type2}({type1} arg1, {type2} arg2) {{
        result_{type_out}_{type2}(dummy_{type_out}, arg2);
    }}
    """


@norm_indent
-def get_read_reg1_code(type1: str, type2: str, type_out: str) -> str:
+def get_load_reg1_code(type1: str, type2: str, type_out: str) -> str:
    return f"""
-    STENCIL void read_{type_out}_reg1_{type1}_{type2}({type1} arg1, {type2} arg2) {{
+    STENCIL void load_{type_out}_reg1_{type1}_{type2}({type1} arg1, {type2} arg2) {{
        result_{type1}_{type_out}(arg1, dummy_{type_out});
    }}
    """


@norm_indent
-def get_write_code(type1: str, type2: str) -> str:
+def get_store_code(type1: str, type2: str) -> str:
    return f"""
-    STENCIL void write_{type1}_reg0_{type1}_{type2}({type1} arg1, {type2} arg2) {{
+    STENCIL void store_{type1}_reg0_{type1}_{type2}({type1} arg1, {type2} arg2) {{
        dummy_{type1} = arg1;
        result_{type1}_{type2}(arg1, arg2);
    }}
@ -268,10 +302,17 @@ if __name__ == "__main__":
    code += get_math_func1('fabsf', 'float', 'abs')
    code += get_custom_stencil('abs_int(int arg1)', 'result_int(__builtin_abs(arg1));')

+    for t in types:
+        code += get_custom_stencil(f"sign_{t}({t} arg1)", f"result_int((arg1 > 0) - (arg1 < 0));")
+
    fnames = ['atan2', 'pow']
    for fn, t1, t2 in permutate(fnames, types, types):
        code += get_math_func2(fn, t1, t2)

+    for t1, t2 in permutate(types, types):
+        code += get_min(t1, t2)
+        code += get_max(t1, t2)
+
    for op, t1, t2 in permutate(ops, types, types):
        t_out = t1 if t1 == t2 else 'float'
        if op == 'floordiv':
@ -289,11 +330,11 @@ if __name__ == "__main__":
    code += get_op_code('mod', 'int', 'int', 'int')

    for t1, t2, t_out in permutate(types, types, types):
-        code += get_read_reg0_code(t1, t2, t_out)
-        code += get_read_reg1_code(t1, t2, t_out)
+        code += get_load_reg0_code(t1, t2, t_out)
+        code += get_load_reg1_code(t1, t2, t_out)

    for t1, t2 in permutate(types, types):
-        code += get_write_code(t1, t2)
+        code += get_store_code(t1, t2)

    print(f"Write file {args.path}...")
    with open(args.path, 'w') as f:
--- a/tests/test_ast_gen.py
+++ b/tests/test_ast_gen.py
@ -1,5 +1,5 @@
 from copapy import value
-from copapy.backend import Write
+from copapy.backend import Store
 import copapy.backend as cpb


@ -19,16 +19,16 @@ def test_ast_generation():
    #i1 = c1 * 2
    #r1 = i1 + 7
    #r2 = i1 + 9
-    #out = [Write(r1), Write(r2)]
+    #out = [Store(r1), Store(r2)]

    c1 = value(4)
    c2 = value(2)
    #i1 = c1 * 2
    #r1 = i1 + 7 + (c2 + 7 * 9)
    #r2 = i1 + 9
-    #out = [Write(r1), Write(r2)]
+    #out = [Store(r1), Store(r2)]
    r1 = c1 * 5 + 8 + c2 * 3
-    out = [Write(r1)]
+    out = [Store(r1)]

    print(out)
    print('-- get_edges:')
@ -48,12 +48,12 @@ def test_ast_generation():
        print('#', p)

    print('-- add_read_ops:')
-    output_ops = list(cpb.add_read_ops(ordered_ops))
+    output_ops = list(cpb.add_load_ops(ordered_ops))
    for p in output_ops:
        print('#', p)

    print('-- add_write_ops:')
-    extended_output_ops = list(cpb.add_write_ops(output_ops, const_list))
+    extended_output_ops = list(cpb.add_store_ops(output_ops, const_list))
    for p in extended_output_ops:
        print('#', p)
    print('--')
--- a/tests/test_autograd.py
+++ b/tests/test_autograd.py
@ -13,7 +13,7 @@ def test_autograd():
    c += c + 1
    c += 1 + c + (-a)
    d += d * 2 + cp.relu(b + a)
-    d += 3 * d + cp.relu(b - a)
+    d += 3 * d + cp.relu(-a + b)
    e = c - d
    f = e**2
    g = f / 2.0
@ -34,5 +34,26 @@ def test_autograd():
    assert pytest.approx(dg[1], abs=1e-4) == 645.57725  # pyright: ignore[reportUnknownMemberType]


+def test_autograd_extended():
+    a = value(-4.0)
+    b = value(2.0)
+    c = a + b
+    d = a * b + b**3
+    c += c + 1
+    c += 1 + c + (-a)
+    d += d * 2 + cp.relu(b + a)
+    d += 3 * d + cp.relu(b - a)
+    e = c - cp.sin(-d)
+    f = cp.abs(e**2)
+    g = f / 2.0
+    g += 10.0 / f
+
+    dg = grad(g, (a, b))
+
+    tg = cp.Target()
+    tg.compile(g, dg)
+    tg.run()
+
+
 if __name__ == "__main__":
    test_autograd()
--- a/tests/test_branching_stencils.py
+++ b/tests/test_branching_stencils.py
@ -1,5 +1,5 @@
 from copapy import value
-from copapy.backend import Write, compile_to_dag, add_read_command
+from copapy.backend import Store, compile_to_dag, add_read_command
 import copapy as cp
 import subprocess
 from copapy import _binwrite
@ -22,7 +22,7 @@ def test_compile():
    # Function with no passing-on-jump as last instruction:
    ret_test = [r for v in test_vals for r in (cp.tan(value(v)),)]

-    out = [Write(r) for r in ret_test]
+    out = [Store(r) for r in ret_test]

    il, variables = compile_to_dag(out, copapy.generic_sdb)

--- a/tests/test_comp_timing.py
+++ b/tests/test_comp_timing.py
@ -1,6 +1,6 @@
 import time
 from copapy import backend
-from copapy.backend import Write, stencil_db_from_package
+from copapy.backend import Store, stencil_db_from_package
 import copapy.backend as cpb
 import copapy as cp
 import copapy._binwrite as binw
@ -13,7 +13,7 @@ def test_timing_compiler():
    #t2 = t1.sum()
    t3 = cp.vector(cp.value(1 / (v + 1)) for v in range(256))
    t5 = ((t3 * t1) * 2).magnitude()
-    out = [Write(t5)]
+    out = [Store(t5)]

    print(out)

@ -45,7 +45,7 @@ def test_timing_compiler():

    print('-- add_read_ops:')
    t0 = time.time()
-    output_ops = list(cpb.add_read_ops(ordered_ops))
+    output_ops = list(cpb.add_load_ops(ordered_ops))
    t1 = time.time()
    #for p in output_ops:
    #    print('#', p)
@ -53,7 +53,7 @@ def test_timing_compiler():

    print('-- add_write_ops:')
    t0 = time.time()
-    extended_output_ops = list(cpb.add_write_ops(output_ops, const_net_list))
+    extended_output_ops = list(cpb.add_store_ops(output_ops, const_net_list))
    t1 = time.time()
    #for p in extended_output_ops:
    #    print('#', p)
--- a/tests/test_compile.py
+++ b/tests/test_compile.py
@ -1,5 +1,5 @@
 from copapy import NumLike
-from copapy.backend import Write, compile_to_dag, add_read_command
+from copapy.backend import Store, compile_to_dag, add_read_command
 import copapy as cp
 import subprocess
 import struct
@ -58,7 +58,7 @@ def test_compile():

    ret = (t2, t4, t5)

-    out = [Write(r) for r in ret]
+    out = [Store(r) for r in ret]

    il, variables = compile_to_dag(out, copapy.generic_sdb)

--- a/tests/test_compile_aarch64.py
+++ b/tests/test_compile_aarch64.py
@ -1,5 +1,5 @@
 from copapy import NumLike
-from copapy.backend import Write, compile_to_dag, add_read_command
+from copapy.backend import Store, compile_to_dag, add_read_command
 import subprocess
 from copapy import _binwrite
 import copapy.backend as backend
@ -52,7 +52,7 @@ def test_compile():

    ret = (t2, t4, t5)

-    out = [Write(r) for r in ret]
+    out = [Store(r) for r in ret]

    sdb = backend.stencil_db_from_package('arm64')
    il, variables = compile_to_dag(out, sdb)
--- a/tests/test_compile_armv7.py
+++ b/tests/test_compile_armv7.py
@ -1,5 +1,5 @@
 from copapy import NumLike
-from copapy.backend import Write, compile_to_dag, add_read_command
+from copapy.backend import Store, compile_to_dag, add_read_command
 import subprocess
 from copapy import _binwrite
 import copapy.backend as backend
@ -52,7 +52,7 @@ def test_compile():

    ret = (t2, t4, t5)

-    out = [Write(r) for r in ret]
+    out = [Store(r) for r in ret]

    sdb = backend.stencil_db_from_package('armv7')
    il, variables = compile_to_dag(out, sdb)
--- a/tests/test_compile_div.py
+++ b/tests/test_compile_div.py
@ -1,5 +1,5 @@
 from copapy import value, NumLike
-from copapy.backend import Write, compile_to_dag, add_read_command, Net
+from copapy.backend import Store, compile_to_dag, add_read_command
 import copapy
 import subprocess
 from copapy import _binwrite
@ -26,7 +26,7 @@ def test_compile():

    ret = function(c1)

-    out = [Write(r) for r in ret]
+    out = [Store(r) for r in ret]

    il, vars = compile_to_dag(out, copapy.generic_sdb)

--- a/tests/test_compile_math.py
+++ b/tests/test_compile_math.py
@ -1,5 +1,5 @@
 from copapy import value
-from copapy.backend import Write, compile_to_dag, add_read_command
+from copapy.backend import Store, compile_to_dag, add_read_command
 import copapy as cp
 import subprocess
 from copapy import _binwrite
@ -21,7 +21,7 @@ def test_compile_sqrt():
    ret = [r for v in test_vals for r in (cp.sqrt(value(v)),)]


-    out = [Write(r) for r in ret]
+    out = [Store(r) for r in ret]

    il, variables = compile_to_dag(out, copapy.generic_sdb)

@ -55,7 +55,7 @@ def test_compile_log():
    ret = [r for v in test_vals for r in (cp.log(value(v)),)]


-    out = [Write(r) for r in ret]
+    out = [Store(r) for r in ret]

    il, variables = compile_to_dag(out, copapy.generic_sdb)

@ -89,7 +89,7 @@ def test_compile_sin():
    ret = [r for v in test_vals for r in (cp.sin(value(v)),)]


-    out = [Write(r) for r in ret]
+    out = [Store(r) for r in ret]

    il, variables = compile_to_dag(out, copapy.generic_sdb)

--- a/tests/test_dag_optimization.py
+++ b/tests/test_dag_optimization.py
@ -1,12 +1,12 @@
 import copapy as cp
 from copapy import value
-from copapy.backend import get_dag_stats, Write
+from copapy.backend import get_dag_stats, Store
 import copapy.backend as cpb
 from typing import Any


 def show_dag(val: value[Any]):
-    out = [Write(val.net)]
+    out = [Store(val.net)]

    print(out)
    print('-- get_edges:')
@ -26,12 +26,12 @@ def show_dag(val: value[Any]):
        print('#', p)

    print('-- add_read_ops:')
-    output_ops = list(cpb.add_read_ops(ordered_ops))
+    output_ops = list(cpb.add_load_ops(ordered_ops))
    for p in output_ops:
        print('#', p)

    print('-- add_write_ops:')
-    extended_output_ops = list(cpb.add_write_ops(output_ops, const_list))
+    extended_output_ops = list(cpb.add_store_ops(output_ops, const_list))
    for p in extended_output_ops:
        print('#', p)
    print('--')
--- a/tests/test_math.py
+++ b/tests/test_math.py
@ -20,7 +20,11 @@ def test_fine():
                cp.cos(c_f),
                cp.tan(c_f),
                cp.abs(-c_i),
-                cp.abs(-c_f))
+                cp.abs(-c_f),
+                cp.sign(c_i),
+                cp.sign(-c_f),
+                cp.min(c_i, 5),
+                cp.max(c_f, 5))

    re2_test = (a_f ** 2,
                a_i ** -1,
@ -32,7 +36,11 @@ def test_fine():
                cp.cos(a_f),
                cp.tan(a_f),
                cp.abs(-a_i),
-                cp.abs(-a_f))
+                cp.abs(-a_f),
+                cp.sign(a_i),
+                cp.sign(-a_f),
+                cp.min(a_i, 5),
+                cp.max(a_f, 5))

    ret_refe = (a_f ** 2,
                a_i ** -1,
@ -43,8 +51,12 @@ def test_fine():
                ma.sin(a_f),
                ma.cos(a_f),
                ma.tan(a_f),
-                cp.abs(-a_i),
-                cp.abs(-a_f))
+                abs(-a_i),
+                abs(-a_f),
+                (a_i > 0) - (a_i < 0),
+                (-a_f > 0) - (-a_f < 0),
+                min(a_i, 5),
+                max(a_f, 5))

    tg = Target()
    print('* compile and copy ...')
@ -53,10 +65,10 @@ def test_fine():
    tg.run()
    print('* finished')

-    for test, val2, ref, name in zip(ret_test, re2_test, ret_refe, ('^2', '**-1', 'sqrt_int', 'sqrt_float', 'sin', 'cos', 'tan')):
+    for test, val2, ref, name in zip(ret_test, re2_test, ret_refe, ['^2', '**-1', 'sqrt_int', 'sqrt_float', 'sin', 'cos', 'tan'] + ['other']*10):
        assert isinstance(test, cp.value)
        val = tg.read_value(test)
-        print('+', val, ref, type(val), test.dtype)
+        print('+', name, val, ref, type(val), test.dtype)
        #for t in (int, float, bool):
        #    assert isinstance(val, t) == isinstance(ref, t), f"Result type does not match for {val} and {ref}"
        assert val == pytest.approx(ref, abs=1e-3), f"Result for {name} does not match: {val} and reference: {ref}"  # pyright: ignore[reportUnknownMemberType]
--- a/tests/test_ops_aarch64.py
+++ b/tests/test_ops_aarch64.py
@ -1,5 +1,5 @@
 from copapy import NumLike, iif, value
-from copapy.backend import Write, compile_to_dag, add_read_command
+from copapy.backend import Store, compile_to_dag, add_read_command
 import subprocess
 from copapy import _binwrite
 import copapy.backend as backend
@ -91,7 +91,7 @@ def test_compile():
    ret_test = function1(c_i) + function1(c_f) + function2(c_i) + function2(c_f) + function3(c_i) + function4(c_i) + function5(c_b) + [value(9) % 2] + iiftests(c_i) + iiftests(c_f) + [cp.asin(c_i/10)]
    ret_ref = function1(9) + function1(1.111) + function2(9) + function2(1.111) + function3(9) + function4(9) + function5(True) + [9 % 2] + iiftests(9) + iiftests(1.111) + [cp.asin(9/10)]

-    out = [Write(r) for r in ret_test]
+    out = [Store(r) for r in ret_test]

    #ret_test += [c_i, v2]
    #ret_ref += [9, 4.44, -4.44]
--- a/tests/test_ops_armv6.py
+++ b/tests/test_ops_armv6.py
@ -1,5 +1,5 @@
 from copapy import NumLike, iif, value
-from copapy.backend import Write, compile_to_dag, add_read_command
+from copapy.backend import Store, compile_to_dag, add_read_command
 import subprocess
 from copapy import _binwrite
 import copapy.backend as backend
@ -96,7 +96,7 @@ def test_compile():
    #ret_test = (c_i * 100 // 5, c_f * 10 // 5)
    #ret_ref = (9 * 100 // 5, 1.111 * 10 // 5)

-    out = [Write(r) for r in ret_test]
+    out = [Store(r) for r in ret_test]

    sdb = backend.stencil_db_from_package('armv6')
    dw, variables = compile_to_dag(out, sdb)
--- a/tests/test_ops_armv7.py
+++ b/tests/test_ops_armv7.py
@ -1,5 +1,5 @@
 from copapy import NumLike, iif, value
-from copapy.backend import Write, compile_to_dag, add_read_command
+from copapy.backend import Store, compile_to_dag, add_read_command
 import subprocess
 from copapy import _binwrite
 import copapy.backend as backend
@ -96,7 +96,7 @@ def test_compile():
    #ret_test = (c_i * 100 // 5, c_f * 10 // 5)
    #ret_ref = (9 * 100 // 5, 1.111 * 10 // 5)

-    out = [Write(r) for r in ret_test]
+    out = [Store(r) for r in ret_test]

    sdb = backend.stencil_db_from_package('armv7')
    dw, variables = compile_to_dag(out, sdb)
--- a/tests/test_ops_x86.py
+++ b/tests/test_ops_x86.py
@ -1,5 +1,5 @@
 from copapy import NumLike, iif, value
-from copapy.backend import Write, compile_to_dag, add_read_command
+from copapy.backend import Store, compile_to_dag, add_read_command
 import subprocess
 from copapy import _binwrite
 import copapy.backend as backend
@ -104,7 +104,7 @@ def test_compile():
    #ret_test = [cp.get_42(c_i)]
    #ret_ref = [cp.get_42(9)]

-    out = [Write(r) for r in ret_test]
+    out = [Store(r) for r in ret_test]

    #ret_test += [c_i, v2]
    #ret_ref += [9, 4.44, -4.44]
@ -185,7 +185,7 @@ def test_vector_compile():

    ret = (t2, t4, t5)

-    out = [Write(r) for r in ret]
+    out = [Store(r) for r in ret]

    sdb = backend.stencil_db_from_package('x86')
    il, variables = compile_to_dag(out, sdb)
@ -243,7 +243,7 @@ def test_sinus():
    ret_test = [si, e]
    ret_ref = [cp.sin(a_val), (a_val + 0.87 * 2.0) ** 2 + cp.sin(a_val) + cp.sqrt(0.87)]

-    out = [Write(r) for r in ret_test]
+    out = [Store(r) for r in ret_test]

    sdb = backend.stencil_db_from_package('x86')
    dw, variables = compile_to_dag(out, sdb)
--- a/tools/build.bat
+++ b/tools/build.bat
@ -1,68 +1,156 @@
+@echo off
+setlocal ENABLEDELAYEDEXPANSION
+
+set ARCH=%1
+if "%ARCH%"=="" set ARCH=x86_64
+
+if not "%ARCH%"=="x86_64" ^
+if not "%ARCH%"=="x86" ^
+if not "%ARCH%"=="arm64" ^
+if not "%ARCH%"=="arm-v6" ^
+if not "%ARCH%"=="arm-v7" ^
+if not "%ARCH%"=="all" (
+    echo Usage: %0 [x86_64^|x86^|arm64^|arm-v6^|arm-v7^|all]
+    exit /b 1
+)
+
 mkdir build\stencils
 mkdir build\runner
-python stencils/generate_stencils.py build/stencils/stencils.c

+python stencils/generate_stencils.py build\stencils\stencils.c
+
+REM ============================================================
+REM x86_64
+REM ============================================================
+if "%ARCH%"=="x86_64" goto BUILD_X86_64
+if "%ARCH%"=="all"    goto BUILD_X86_64
+goto SKIP_X86_64
+
+:BUILD_X86_64
 echo -------------x86_64 - 64 bit-----------------
+
 call "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvarsall.bat" x64
+
 echo - Compile stencil test...
 cl /Zi /Od stencils\test.c /Fe:build\stencils\test.exe

 echo - Build runner for Windows 64 bit...
-cl /Zi /Od /DENABLE_BASIC_LOGGING src\coparun\runmem.c src\coparun\coparun.c src\coparun\mem_man.c /Fe:build\runner\coparun.exe
+cl /Zi /Od /DENABLE_BASIC_LOGGING ^
+    src\coparun\runmem.c ^
+    src\coparun\coparun.c ^
+    src\coparun\mem_man.c ^
+    /Fe:build\runner\coparun.exe

-REM Optimized:
-REM cl /O2 src\coparun\runmem.c src\coparun\coparun.c src\coparun\mem_man.c /Fe:build\runner\coparun.exe
-
-echo - Build stencils for 64 bit...
-REM ../copapy/tools/cross_compiler_unix/packobjs.sh gcc ld ../copapy/build/musl/musl_objects_x86_64.o
+echo - Build stencils for x86_64...
 wsl gcc -fno-pic -ffunction-sections -c build/stencils/stencils.c -O3 -o build/stencils/stencils.o
 wsl ld -r build/stencils/stencils.o build/musl/musl_objects_x86_64.o -o src/copapy/obj/stencils_x86_64_O3.o
 wsl objdump -d -x src/copapy/obj/stencils_x86_64_O3.o > build/stencils/stencils_x86_64_O3.asm

-echo ---------------x86 - 32 bit---------------
+:SKIP_X86_64
+
+REM ============================================================
+REM x86 32-bit
+REM ============================================================
+if "%ARCH%"=="x86" goto BUILD_X86
+if "%ARCH%"=="all" goto BUILD_X86
+goto SKIP_X86
+
+:BUILD_X86
+echo ---------------x86 - 32 bit----------------
+
 call "C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\VC\Auxiliary\Build\vcvarsall.bat" x86

 echo - Build runner for Windows 32 bit...
-cl /Zi /Od /DENABLE_LOGGING src\coparun\runmem.c src\coparun\coparun.c src\coparun\mem_man.c /Fe:build\runner\coparun-x86.exe
+cl /Zi /Od /DENABLE_LOGGING ^
+    src\coparun\runmem.c ^
+    src\coparun\coparun.c ^
+    src\coparun\mem_man.c ^
+    /Fe:build\runner\coparun-x86.exe

-echo - Build runner for linux x86 32 bit...
-wsl i686-linux-gnu-gcc-12 -static -Wall -Wextra -Wconversion -Wsign-conversion -Wshadow -Wstrict-overflow -O3 -DENABLE_LOGGING src/coparun/runmem.c src/coparun/coparun.c src/coparun/mem_man.c -o build/runner/coparun-x86
+echo - Build runner for Linux x86 32 bit...
+wsl i686-linux-gnu-gcc-12 -static -O3 -DENABLE_LOGGING ^
+    src/coparun/runmem.c ^
+    src/coparun/coparun.c ^
+    src/coparun/mem_man.c ^
+    -o build/runner/coparun-x86

 echo - Build stencils x86 32 bit...
-REM  sh ../copapy/tools/cross_compiler_unix/packobjs.sh i686-linux-gnu-gcc-12 i686-linux-gnu-ld ../copapy/build/musl/musl_objects_x86.o  -fno-pic
 wsl i686-linux-gnu-gcc-12 -fno-pic -ffunction-sections -c build/stencils/stencils.c -O3 -o build/stencils/stencils.o
 wsl i686-linux-gnu-ld -r build/stencils/stencils.o build/musl/musl_objects_x86.o -o src/copapy/obj/stencils_x86_O3.o
 wsl i686-linux-gnu-objdump -d -x src/copapy/obj/stencils_x86_O3.o > build/stencils/stencils_x86_O3.asm

+:SKIP_X86
+
+REM ============================================================
+REM ARM64
+REM ============================================================
+if "%ARCH%"=="arm64" goto BUILD_ARM64
+if "%ARCH%"=="all"   goto BUILD_ARM64
+goto SKIP_ARM64
+
+:BUILD_ARM64
+echo --------------arm64 64 bit----------------

-echo --------------arm64  64 bit----------------
 wsl aarch64-linux-gnu-gcc-12 -fno-pic -ffunction-sections -c build/stencils/stencils.c -O3 -o build/stencils/stencils.o
 wsl aarch64-linux-gnu-ld -r build/stencils/stencils.o build/musl/musl_objects_arm64.o -o src/copapy/obj/stencils_arm64_O3.o
 wsl aarch64-linux-gnu-objdump -d -x src/copapy/obj/stencils_arm64_O3.o > build/stencils/stencils_arm64_O3.asm
-echo ------------------------------
-echo - Build runner for Aarch64...
-wsl aarch64-linux-gnu-gcc-12 -static -Wall -Wextra -Wconversion -Wsign-conversion -Wshadow -Wstrict-overflow -O3 -DENABLE_LOGGING src/coparun/runmem.c src/coparun/coparun.c src/coparun/mem_man.c -o build/runner/coparun-aarch64

+echo - Build runner for AArch64...
+wsl aarch64-linux-gnu-gcc-12 -static -O3 -DENABLE_LOGGING ^
+    src/coparun/runmem.c ^
+    src/coparun/coparun.c ^
+    src/coparun/mem_man.c ^
+    -o build/runner/coparun-aarch64
+
+:SKIP_ARM64
+
+REM ============================================================
+REM ARM v6
+REM ============================================================
+if "%ARCH%"=="arm-v6" goto BUILD_ARMV6
+if "%ARCH%"=="all"    goto BUILD_ARMV6
+goto SKIP_ARMV6
+
+:BUILD_ARMV6
+echo --------------arm-v6 32 bit----------------
+
+wsl arm-none-eabi-gcc -fno-pic -ffunction-sections -march=armv6 -mfpu=vfp -mfloat-abi=hard -marm ^
+    -c build/stencils/stencils.c -O3 -o build/stencils/stencils.o
+
+wsl arm-none-eabi-ld -r build/stencils/stencils.o build/musl/musl_objects_armv6.o ^
+    $(arm-none-eabi-gcc -print-libgcc-file-name) ^
+    -o src/copapy/obj/stencils_armv6_O3.o

-echo --------------arm-v6  32 bit----------------
-REM  sh ../copapy/tools/cross_compiler_unix/packobjs.sh arm-none-eabi-gcc arm-none-eabi-ld ../copapy/build/musl/musl_objects_armv6.o "-march=armv6 -mfpu=vfp -marm"
-wsl arm-none-eabi-gcc -fno-pic -ffunction-sections -march=armv6 -mfpu=vfp -mfloat-abi=hard -marm -c build/stencils/stencils.c -O3 -o build/stencils/stencils.o
-wsl arm-none-eabi-ld -r build/stencils/stencils.o build/musl/musl_objects_armv6.o $(arm-none-eabi-gcc -print-libgcc-file-name) -o src/copapy/obj/stencils_armv6_O3.o
 wsl arm-none-eabi-objdump -d -x src/copapy/obj/stencils_armv6_O3.o > build/stencils/stencils_armv6_O3.asm
-echo ------------------------------
-REM echo - Build runner
-REM wsl arm-linux-gnueabihf-gcc -march=armv6 -mfpu=vfp -marm -static -Wall -Wextra -Wconversion -Wsign-conversion -Wshadow -Wstrict-overflow -O3 -DENABLE_LOGGING src/coparun/runmem.c src/coparun/coparun.c src/coparun/mem_man.c -o build/runner/coparun-armv6

+:SKIP_ARMV6

+REM ============================================================
+REM ARM v7
+REM ============================================================
+if "%ARCH%"=="arm-v7" goto BUILD_ARMV7
+if "%ARCH%"=="all"    goto BUILD_ARMV7
+goto END
+
+:BUILD_ARMV7
+echo --------------arm-v7 32 bit----------------
+
+wsl arm-none-eabi-gcc -fno-pic -ffunction-sections -march=armv7-a -mfpu=neon-vfpv3 -mfloat-abi=hard -marm ^
+    -c build/stencils/stencils.c -O3 -o build/stencils/stencils.o
+
+wsl arm-none-eabi-ld -r build/stencils/stencils.o build/musl/musl_objects_armv7.o ^
+    $(arm-none-eabi-gcc -print-libgcc-file-name) ^
+    -o src/copapy/obj/stencils_armv7_O3.o

-echo --------------arm-v7  32 bit----------------
-REM  sh ../copapy/tools/cross_compiler_unix/packobjs.sh arm-none-eabi-gcc arm-none-eabi-ld ../copapy/build/musl/musl_objects_armv7.o "-march=armv7-a -mfpu=neon-vfpv3 -marm"
-wsl arm-none-eabi-gcc -fno-pic -ffunction-sections -march=armv7-a -mfpu=neon-vfpv3 -mfloat-abi=hard -marm -c build/stencils/stencils.c -O3 -o build/stencils/stencils.o
-wsl arm-none-eabi-ld -r build/stencils/stencils.o build/musl/musl_objects_armv7.o $(arm-none-eabi-gcc -print-libgcc-file-name) -o src/copapy/obj/stencils_armv7_O3.o
 wsl arm-none-eabi-objdump -d -x src/copapy/obj/stencils_armv7_O3.o > build/stencils/stencils_armv7_O3.asm

+echo - Build runner for ARM v7...
+wsl arm-linux-gnueabihf-gcc -static -O3 -DENABLE_LOGGING ^
+    src/coparun/runmem.c ^
+    src/coparun/coparun.c ^
+    src/coparun/mem_man.c ^
+    -o build/runner/coparun-armv7

-echo ------------------------------
-echo - Build runner
-wsl arm-linux-gnueabihf-gcc -march=armv7-a -mfpu=neon-vfpv3 -marm -static -Wall -Wextra -Wconversion -Wsign-conversion -Wshadow -Wstrict-overflow -O3 -DENABLE_LOGGING src/coparun/runmem.c src/coparun/coparun.c src/coparun/mem_man.c -o build/runner/coparun-armv7
-
+:END
+echo Build completed for %ARCH%
+endlocal
--- a/tools/build.sh
+++ b/tools/build.sh
@ -1,6 +1,16 @@
 #!/bin/bash
-set -e
-set -v
+set -eux
+
+ARCH=${1:-x86_64}
+
+case "$ARCH" in
+    (x86_64|arm-v6|arm-v7|all)
+        ;;
+    (*)
+        echo "Usage: $0 [x86_64|arm-v6|arm-v7|all]"
+        exit 1
+        ;;
+esac

 mkdir -p build/stencils
 mkdir -p build/runner
@ -10,34 +20,90 @@ DEST=src/copapy/obj
 python3 stencils/generate_stencils.py $SRC
 mkdir -p $DEST

-gcc -fno-pic -ffunction-sections -c $SRC -O3 -o build/stencils/stencils.o
-ld -r build/stencils/stencils.o build/musl/musl_objects_x86_64.o -o $DEST/stencils_x86_64_O3.o
-objdump -d -x $DEST/stencils_x86_64_O3.o > build/stencils/stencils_x86_64_O3.asm
+#######################################
+# x86_64
+#######################################
+if [[ "$ARCH" == "x86_64" || "$ARCH" == "all" ]]; then
+    echo "--------------x86_64----------------"

-mkdir bin -p
-gcc -Wall -Wextra -Wconversion -Wsign-conversion \
-    -Wshadow -Wstrict-overflow -Werror -g -O3 \
-    -DENABLE_LOGGING \
-    src/coparun/runmem.c src/coparun/coparun.c src/coparun/mem_man.c -o build/runner/coparun
+    gcc -fno-pic -ffunction-sections -c $SRC -O3 -o build/stencils/stencils.o
+    ld -r build/stencils/stencils.o build/musl/musl_objects_x86_64.o \
+        -o $DEST/stencils_x86_64_O3.o
+    objdump -d -x $DEST/stencils_x86_64_O3.o \
+        > build/stencils/stencils_x86_64_O3.asm

+    mkdir -p bin
+    gcc -Wall -Wextra -Wconversion -Wsign-conversion \
+        -Wshadow -Wstrict-overflow -Werror -g -O3 \
+        -DENABLE_LOGGING \
+        src/coparun/runmem.c \
+        src/coparun/coparun.c \
+        src/coparun/mem_man.c \
+        -o build/runner/coparun
+fi

-echo "--------------arm-v6  32 bit----------------"
-LIBGCC=$(arm-none-eabi-gcc -print-libgcc-file-name)
-#LIBM=$(arm-none-eabi-gcc -print-file-name=libm.a)
-#LIBC=$(arm-none-eabi-gcc -print-file-name=libc.a)
+#######################################
+# ARM v6
+#######################################
+if [[ "$ARCH" == "arm-v6" || "$ARCH" == "all" ]]; then
+    echo "--------------arm-v6 32 bit----------------"

-arm-none-eabi-gcc -fno-pic -ffunction-sections -march=armv6 -mfpu=vfp -mfloat-abi=hard -marm -c $SRC -O3 -o build/stencils/stencils.o
-arm-none-eabi-ld -r build/stencils/stencils.o build/musl/musl_objects_armv6.o $LIBGCC -o $DEST/stencils_armv6_O3.o
-arm-none-eabi-objdump -d -x $DEST/stencils_armv6_O3.o > build/stencils/stencils_armv6_O3.asm
-arm-linux-gnueabihf-gcc -march=armv6 -mfpu=vfp -mfloat-abi=hard -marm -static -Wall -Wextra -Wconversion -Wsign-conversion -Wshadow -Wstrict-overflow -O3 -DENABLE_LOGGING src/coparun/runmem.c src/coparun/coparun.c src/coparun/mem_man.c -o build/runner/coparun-armv6
+    LIBGCC=$(arm-none-eabi-gcc -print-libgcc-file-name)

+    arm-none-eabi-gcc -fno-pic -ffunction-sections \
+        -march=armv6 -mfpu=vfp -mfloat-abi=hard -marm \
+        -c $SRC -O3 -o build/stencils/stencils.o

-echo "--------------arm-v7  32 bit----------------"
-LIBGCC=$(arm-none-eabi-gcc -print-libgcc-file-name)
-#LIBM=$(arm-none-eabi-gcc -print-file-name=libm.a)
-#LIBC=$(arm-none-eabi-gcc -print-file-name=libc.a)
+    arm-none-eabi-ld -r \
+        build/stencils/stencils.o \
+        build/musl/musl_objects_armv6.o \
+        $LIBGCC \
+        -o $DEST/stencils_armv6_O3.o

-arm-none-eabi-gcc -fno-pic -ffunction-sections -march=armv7-a -mfpu=neon-vfpv3 -mfloat-abi=hard -marm -c $SRC -O3 -o build/stencils/stencils.o
-arm-none-eabi-ld -r build/stencils/stencils.o build/musl/musl_objects_armv7.o $LIBGCC -o $DEST/stencils_armv7_O3.o
-arm-none-eabi-objdump -d -x $DEST/stencils_armv7_O3.o > build/stencils/stencils_armv7_O3.asm
-arm-linux-gnueabihf-gcc -march=armv7-a -mfpu=neon-vfpv3 -mfloat-abi=hard -marm -static -Wall -Wextra -Wconversion -Wsign-conversion -Wshadow -Wstrict-overflow -O3 -DENABLE_LOGGING src/coparun/runmem.c src/coparun/coparun.c src/coparun/mem_man.c -o build/runner/coparun-armv7
+    arm-none-eabi-objdump -d -x \
+        $DEST/stencils_armv6_O3.o \
+        > build/stencils/stencils_armv6_O3.asm
+
+    arm-linux-gnueabihf-gcc \
+        -march=armv6 -mfpu=vfp -mfloat-abi=hard -marm -static \
+        -Wall -Wextra -Wconversion -Wsign-conversion \
+        -Wshadow -Wstrict-overflow -O3 \
+        -DENABLE_LOGGING \
+        src/coparun/runmem.c \
+        src/coparun/coparun.c \
+        src/coparun/mem_man.c \
+        -o build/runner/coparun-armv6
+fi
+
+#######################################
+# ARM v7
+#######################################
+if [[ "$ARCH" == "arm-v7" || "$ARCH" == "all" ]]; then
+    echo "--------------arm-v7 32 bit----------------"
+
+    LIBGCC=$(arm-none-eabi-gcc -print-libgcc-file-name)
+
+    arm-none-eabi-gcc -fno-pic -ffunction-sections \
+        -march=armv7-a -mfpu=neon-vfpv3 -mfloat-abi=hard -marm \
+        -c $SRC -O3 -o build/stencils/stencils.o
+
+    arm-none-eabi-ld -r \
+        build/stencils/stencils.o \
+        build/musl/musl_objects_armv7.o \
+        $LIBGCC \
+        -o $DEST/stencils_armv7_O3.o
+
+    arm-none-eabi-objdump -d -x \
+        $DEST/stencils_armv7_O3.o \
+        > build/stencils/stencils_armv7_O3.asm
+
+    arm-linux-gnueabihf-gcc \
+        -march=armv7-a -mfpu=neon-vfpv3 -mfloat-abi=hard -marm -static \
+        -Wall -Wextra -Wconversion -Wsign-conversion \
+        -Wshadow -Wstrict-overflow -O3 \
+        -DENABLE_LOGGING \
+        src/coparun/runmem.c \
+        src/coparun/coparun.c \
+        src/coparun/mem_man.c \
+        -o build/runner/coparun-armv7
+fi
--- a/tools/make_example.py
+++ b/tools/make_example.py
@ -1,5 +1,5 @@
 from copapy import value
-from copapy.backend import Write, compile_to_dag, stencil_db_from_package
+from copapy.backend import Store, compile_to_dag, stencil_db_from_package
 from copapy._binwrite import Command

 input = value(9.0)
@ -8,7 +8,7 @@ result = input ** 2 / 3.3 + 5

 arch = 'native'
 sdb = stencil_db_from_package(arch)
-dw, _ = compile_to_dag([Write(result)], sdb)
+dw, _ = compile_to_dag([Store(result)], sdb)

 # Instruct runner to dump patched code to a file:
 dw.write_com(Command.DUMP_CODE)
Author	SHA1	Message	Date
Nicolas Kruse	8d77ee3a25	Merge pull request #27 from Nonannet/dev Dev	2026-01-05 14:13:04 +01:00
Nicolas	0f5bb86bd4	Added tensor support and type hints for math functions	2026-01-05 13:39:53 +01:00
Nicolas	32aad5cafd	min, max and sign stencil added	2026-01-05 13:39:01 +01:00
Nicolas Kruse	c88a409b6a	Merge pull request #26 from Nonannet/dev Docs: Tensor/Matrix functions added again	2026-01-05 10:58:57 +01:00
Nicolas	d71922769f	Docs: Tensor/Matrix functions added again	2026-01-05 10:49:47 +01:00
Nicolas Kruse	3e728f8d7c	Merge pull request #25 from Nonannet/dev Dev	2026-01-02 15:24:21 +01:00
Nicolas Kruse	81e93892d6	updated setuptools_scm version_scheme	2026-01-02 15:17:19 +01:00
Nicolas Kruse	515eba4a4a	Readme updated	2026-01-02 14:54:55 +01:00
Nicolas Kruse	72adcfc874	Merge pull request #24 from Nonannet/dev CI/CD pipeline fixed and updated	2026-01-02 12:07:39 +01:00
Nicolas Kruse	6ba18358d1	CI pipeline updated	2026-01-02 11:58:14 +01:00
Nicolas Kruse	af56c84a05	Merge pull request #23 from Nonannet/dev ci.yml fixed for release-stencils	2026-01-02 11:16:41 +01:00
Nicolas Kruse	92cd0425de	ci.yml fixed for release-stencils	2026-01-02 11:10:09 +01:00
Nicolas Kruse	f9280b5cd8	Merge pull request #22 from Nonannet/dev Dev	2026-01-01 16:03:51 +01:00
Nicolas	2f5b5156c5	docs updated	2026-01-01 15:35:57 +01:00
Nicolas	884fc3affd	Renamed classes and ops from Write to Store and Read to Load	2026-01-01 15:34:56 +01:00
Nicolas	df5b4c19f1	manual stencil build scripts updated with cl arguments and x86_64 default	2026-01-01 15:20:09 +01:00
Nicolas	43465a690c	CI/CD: fixed build_wheels.yaml	2026-01-01 14:58:54 +01:00
Nicolas	2287a181da	neg() und abs() stencil added in copapy	2026-01-01 14:57:47 +01:00