Compare commits

...

4 Commits

5 changed files with 94 additions and 15 deletions

View File

@ -299,7 +299,7 @@ jobs:
build-windows: build-windows:
needs: [build_stencils] needs: [build_stencils]
runs-on: windows-latest runs-on: windows-2022
strategy: strategy:
matrix: matrix:

View File

@ -39,7 +39,7 @@ Despite missing SIMD-optimization, benchmark performance shows promising numbers
For the benchmark (`tests/benchmark.py`) the timing of 30000 iterations for calculating the therm `sum((v1 + i) @ v2 for i in range(10))` where measured on an Ryzen 5 3400G. Where the vectors `v1` and `v2` both have a lengths of `v_size` which was varied according to the chart from 10 to 500. For the NumPy case the "i in range(10)" loop was vectorized like this: `np.sum((v1 + i) @ v2)` with i being here a `NDArray` with a dimension of `[10, 1]`. The number of calculated scalar operations is the same for both contenders. Obviously Copapy profits from less overheat by calling a single function from python per iteration, where the NumPy variant requires 3. Interestingly there is no indication visible in the chart that for increasing `v_size` the calling overhead for NumPy will be compensated by using faster SIMD instructions. It is to note that in this benchmark the Copapy case does not move any data between python and the compiled code. For the benchmark (`tests/benchmark.py`) the timing of 30000 iterations for calculating the therm `sum((v1 + i) @ v2 for i in range(10))` where measured on an Ryzen 5 3400G. Where the vectors `v1` and `v2` both have a lengths of `v_size` which was varied according to the chart from 10 to 500. For the NumPy case the "i in range(10)" loop was vectorized like this: `np.sum((v1 + i) @ v2)` with i being here a `NDArray` with a dimension of `[10, 1]`. The number of calculated scalar operations is the same for both contenders. Obviously Copapy profits from less overheat by calling a single function from python per iteration, where the NumPy variant requires 3. Interestingly there is no indication visible in the chart that for increasing `v_size` the calling overhead for NumPy will be compensated by using faster SIMD instructions. It is to note that in this benchmark the Copapy case does not move any data between python and the compiled code.
Furthermore for many applications copypy will benefit by reducing the actual number of operations significantly compared to a NumPy implementation, by precompute constant values know at compile time and benefiting from sparcity. Multiplying by zero (e.g. in a diagonal matrix) eliminate a hole branch in the computation graph. Operations without effect, like multiplications by 1 oder additions with zero gets eliminated at compile time. Furthermore for many applications Copapy performance will benefit by reducing the actual number of operations significantly compared to a NumPy implementation, by precompute constant values know at compile time and benefiting from sparcity. Multiplying by zero (e.g. in a diagonal matrix) eliminate a hole branch in the computation graph. Operations without effect, like multiplications by 1 oder additions with zero gets eliminated at compile time.
For Testing and using Copapy to speed up computations in conventional Python programs there is also the `@cp.jit` decorator available, to compile functions on first use and cache the compiled version for later calls: For Testing and using Copapy to speed up computations in conventional Python programs there is also the `@cp.jit` decorator available, to compile functions on first use and cache the compiled version for later calls:
@ -188,7 +188,7 @@ For more complex operations - where inlining is less useful - stencils call a no
Unlike stencils, non-stencil functions like `sinf` are not stripped and do not need to be tail-call-optimizable. These functions can be provided as C code and compiled together with the stencils or can be object files like in the case of `sinf` compiled from C and assembly code and merged into the stencil object files. Math functions like `sinf` are currently provided by the MUSL C library, with architecture-specific optimizations. Unlike stencils, non-stencil functions like `sinf` are not stripped and do not need to be tail-call-optimizable. These functions can be provided as C code and compiled together with the stencils or can be object files like in the case of `sinf` compiled from C and assembly code and merged into the stencil object files. Math functions like `sinf` are currently provided by the MUSL C library, with architecture-specific optimizations.
Non-stencil functions and constants are stored together with the stencils in an ELF object file for each supported CPU architecture. The required non-stencil functions and constants are bundled during compilation. The compiler includes only the data and code required for a specific Copapy program. Non-stencil functions and constants used by them are stored together with the stencils in ELF object files for each supported CPU architecture. The required non-stencil functions and constants are bundled during compilation. The compiler includes only the data and code required for a specific Copapy program.
The Copapy compilation process is independent of the actual instruction set. It relies purely on relocation entries and symbol metadata from the ELF file generated by the C compiler. The Copapy compilation process is independent of the actual instruction set. It relies purely on relocation entries and symbol metadata from the ELF file generated by the C compiler.

View File

@ -99,6 +99,8 @@ if __name__ == "__main__":
write_functions(f, ['*'], 'copapy', title='Tensor/Matrix functions', path_patterns=['*_tensors*'], api_dir=api_dir) write_functions(f, ['*'], 'copapy', title='Tensor/Matrix functions', path_patterns=['*_tensors*'], api_dir=api_dir)
write_functions(f, ['*'], 'copapy', title='ANN functions', path_patterns=['*_nn*'], api_dir=api_dir)
#write_manual(f, ['NumLike'], title='Types') #write_manual(f, ['NumLike'], title='Types')
with open(f'{api_dir}/backend.md', 'w') as f: with open(f'{api_dir}/backend.md', 'w') as f:

View File

@ -128,7 +128,14 @@ class tensor(ArrayType[TNum]):
return self.shape[0] return self.shape[0]
def get_scalar(self: 'tensor[TNum]', *key: int) -> TNum | value[TNum]: def get_scalar(self: 'tensor[TNum]', *key: int) -> TNum | value[TNum]:
"""Get a single scalar value from the tensor given multi-dimensional indices.""" """Get a single scalar value from the tensor given multi-dimensional indices.
Arguments:
*key: Variable number of indices for each dimension.
Returns:
The scalar value at the specified indices.
"""
assert len(key) == self.ndim, f"Expected {self.ndim} indices, got {len(key)}" assert len(key) == self.ndim, f"Expected {self.ndim} indices, got {len(key)}"
flat_idx = self._get_flat_index(key) flat_idx = self._get_flat_index(key)
return self.values[flat_idx] return self.values[flat_idx]
@ -567,7 +574,11 @@ class tensor(ArrayType[TNum]):
@overload @overload
def trace(self: 'tensor[float]') -> float | value[float]: ... def trace(self: 'tensor[float]') -> float | value[float]: ...
def trace(self) -> Any: def trace(self) -> Any:
"""Calculate the trace (sum of diagonal elements).""" """Calculate the trace (sum of diagonal elements).
Returns:
The sum of diagonal elements.
"""
assert self.ndim == 2 and self.shape[0] == self.shape[1], "Trace is only defined for square matrices" assert self.ndim == 2 and self.shape[0] == self.shape[1], "Trace is only defined for square matrices"
return mixed_sum(self.get_scalar(i, i) for i in range(self.shape[0])) return mixed_sum(self.get_scalar(i, i) for i in range(self.shape[0]))
@ -615,7 +626,11 @@ class tensor(ArrayType[TNum]):
return self.reshape(-1) return self.reshape(-1)
def size(self) -> int: def size(self) -> int:
"""Return total number of elements.""" """Count number of elements over all dimensions.
Returns:
Total number of elements.
"""
size = 1 size = 1
for dim in self.shape: for dim in self.shape:
size *= dim size *= dim
@ -820,7 +835,11 @@ class tensor(ArrayType[TNum]):
return tensor(result_vals, self.shape) return tensor(result_vals, self.shape)
def homogenize(self) -> 'tensor[TNum]': def homogenize(self) -> 'tensor[TNum]':
"""Convert all elements to copapy values if any element is a copapy value.""" """Convert all elements to Copapy values if any element is a Copapy value.
Returns:
Tensor with all elements converted to Copapy values, or the input tensor if no value is a Copapy value.
"""
if any(isinstance(val, value) for val in self.values): if any(isinstance(val, value) for val in self.values):
homogenized: tuple[value[Any], ...] = tuple(value(val) if not isinstance(val, value) else val for val in self.values) homogenized: tuple[value[Any], ...] = tuple(value(val) if not isinstance(val, value) else val for val in self.values)
return tensor(homogenized, self.shape) return tensor(homogenized, self.shape)
@ -833,7 +852,13 @@ class tensor(ArrayType[TNum]):
def zeros(shape: Sequence[int] | int) -> tensor[int]: def zeros(shape: Sequence[int] | int) -> tensor[int]:
"""Create a zero tensor of given shape.""" """Create a zero tensor of given shape.
Arguments:
shape: shape of the tensor to create.
Returns:
New tensor of given shape, initialized with zeros."""
if isinstance(shape, int): if isinstance(shape, int):
shape = (shape,) shape = (shape,)
@ -845,7 +870,13 @@ def zeros(shape: Sequence[int] | int) -> tensor[int]:
def ones(shape: Sequence[int] | int) -> tensor[int]: def ones(shape: Sequence[int] | int) -> tensor[int]:
"""Create a tensor of ones with given shape.""" """Create a tensor of ones with given shape.
Arguments:
shape: shape of the tensor to create.
Returns:
New tensor of given shape, initialized with ones."""
if isinstance(shape, int): if isinstance(shape, int):
shape = (shape,) shape = (shape,)
@ -930,7 +961,14 @@ def concat(tensors: Sequence[tensor[U] | vector[U]], axis: int = 0) -> tensor[U]
def flatten(t: tensor[U]) -> tensor[U]: def flatten(t: tensor[U]) -> tensor[U]:
"""Flatten a tensor to a 1D tensor.""" """Flatten a tensor to a 1D tensor.
Arguments:
t: n-dimensional tensor.
Returns:
A 1D tensor containing all elements from the input tensor.
"""
return t.flatten() return t.flatten()

View File

@ -54,7 +54,14 @@ class vector(ArrayType[TNum]):
return iter(self.values) return iter(self.values)
def get_scalar(self, index: int) -> TNum | value[TNum]: def get_scalar(self, index: int) -> TNum | value[TNum]:
"""Get a single scalar value from the vector.""" """Get a single scalar value from the vector.
Arguments:
index: The index of the element to retrieve.
Returns:
The scalar value at the specified index.
"""
return self.values[index] return self.values[index]
@overload @overload
@ -200,6 +207,14 @@ class vector(ArrayType[TNum]):
@overload @overload
def dot(self, other: 'vector[int] | vector[float]') -> float | int | value[float] | value[int]: ... def dot(self, other: 'vector[int] | vector[float]') -> float | int | value[float] | value[int]: ...
def dot(self, other: 'vector[int] | vector[float]') -> Any: def dot(self, other: 'vector[int] | vector[float]') -> Any:
"""Calculate the dot product of this vector with another.
Arguments:
other: Another vector of the same length.
Returns:
The dot product as a scalar value.
"""
assert len(self.values) == len(other.values), "Vectors must be of same length." assert len(self.values) == len(other.values), "Vectors must be of same length."
return mixed_sum(a * b for a, b in zip(self.values, other.values)) return mixed_sum(a * b for a, b in zip(self.values, other.values))
@ -216,7 +231,14 @@ class vector(ArrayType[TNum]):
return self.dot(other) return self.dot(other)
def cross(self: 'vector[float]', other: 'vector[float]') -> 'vector[float]': def cross(self: 'vector[float]', other: 'vector[float]') -> 'vector[float]':
"""3D cross product""" """Calculate the cross product of this vector with another 3D vector.
Arguments:
other: Another 3D vector.
Returns:
A new vector perpendicular to both vectors.
"""
assert len(self.values) == 3 and len(other.values) == 3, "Both vectors must be 3-dimensional." assert len(self.values) == 3 and len(other.values) == 3, "Both vectors must be 3-dimensional."
a1, a2, a3 = self.values a1, a2, a3 = self.values
b1, b2, b3 = other.values b1, b2, b3 = other.values
@ -286,20 +308,37 @@ class vector(ArrayType[TNum]):
@overload @overload
def sum(self: 'vector[float]') -> float | value[float]: ... def sum(self: 'vector[float]') -> float | value[float]: ...
def sum(self) -> Any: def sum(self) -> Any:
"""Sum of all vector elements.""" """Sum of all vector elements.
Returns:
The sum of all vector elements as a scalar value.
"""
return mixed_sum(self.values) return mixed_sum(self.values)
def magnitude(self) -> 'float | value[float]': def magnitude(self) -> 'float | value[float]':
"""Magnitude (length) of the vector.""" """Magnitude (length) of the vector.
Returns:
The magnitude of the vector as a scalar value.
"""
s = mixed_sum(a * a for a in self.values) s = mixed_sum(a * a for a in self.values)
return cp.sqrt(s) return cp.sqrt(s)
def normalize(self) -> 'vector[float]': def normalize(self) -> 'vector[float]':
"""Returns a normalized (unit length) version of the vector.""" """Calculate a normalized (unit length) version of the vector.
Returns:
A normalized (unit length) vector.
"""
mag = self.magnitude() + epsilon mag = self.magnitude() + epsilon
return self / mag return self / mag
def homogenize(self) -> 'vector[TNum]': def homogenize(self) -> 'vector[TNum]':
"""Convert all elements to Copapy values if any element is a Copapy value.
Returns:
Vector with homogeneous element type.
"""
if any(isinstance(val, value) for val in self.values): if any(isinstance(val, value) for val in self.values):
return vector(mixed_homogenize(self)) return vector(mixed_homogenize(self))
else: else: