Merge pull request #33 from Nonannet/dev

Dev
2026-03-07 12:12:25 +01:00 · 2026-03-07 12:12:25 +01:00 · 4309983213
parent ee771e3ac1 15ea733d5b
commit 4309983213
2 changed files with 10 additions and 5 deletions
--- a/README.md
+++ b/README.md
@ -27,17 +27,17 @@ The Copapy framework also includes a runner as Python module build from the same

 ## Current state

-While hardware I/O is obviously a core aspect of the project, it is not yet available. Therefore, this package is currently a proof of concept with limited direct use. However, the computation engine is fully functional and available for testing and experimentation simply by installing the package. The project is now close to being ready for integration into its first demonstration hardware platform.
+While hardware I/O is obviously a core aspect of the project it is not yet available. However the computation engine is already fully functional - for all above mentioned target architectures - and available for testing and experimentation simply by installing the package. The project focuses now on integration into the first demonstration hardware platform.

-Currently in development:
- Array stencils for handling very large arrays and generating SIMD-optimized code - e.g., for machine vision and neural network applications
+Furthermore in development are currently:
+- Array stencils for handling large arrays and generating SIMD-optimized code - e.g., for machine vision and neural network applications
 - Constant regrouping for further symbolic optimization of the computation graph

 Despite missing SIMD-optimization, benchmark performance shows promising numbers. The following chart plots the results in comparison to NumPy 2.3.5:

 ![Copapy architecture](docs/source/media/benchmark_results_001.svg)

-For the benchmark (`tests/benchmark.py`) the timing of 30000 iterations for calculating the therm `sum((v1 + i) @ v2 for i in range(10))` where measured on an Ryzen 5 3400G. Where the vectors `v1` and `v2` both have a lengths of `v_size` which was varied according to the chart from 10 to 600. For the NumPy case the "i in range(10)" loop was vectorized like this: `np.sum((v1 + i) @ v2)` with i being here a `NDArray` with a dimension of `[10, 1]`. The number of calculated scalar operations is the same for both contenders. Obviously copapy profits from less overheat by calling a single function from python per iteration, where the NumPy variant requires 3. Interestingly there is no indication visible in the chart that for increasing `v_size` the calling overhead for NumPy will be compensated by using faster SIMD instructions. It is to note that in this benchmark the copapy case does not move any data between python and the compiled code.
+For the benchmark (`tests/benchmark.py`) the timing of 30000 iterations for calculating the therm `sum((v1 + i) @ v2 for i in range(10))` where measured on an Ryzen 5 3400G. Where the vectors `v1` and `v2` both have a lengths of `v_size` which was varied according to the chart from 10 to 500. For the NumPy case the "i in range(10)" loop was vectorized like this: `np.sum((v1 + i) @ v2)` with i being here a `NDArray` with a dimension of `[10, 1]`. The number of calculated scalar operations is the same for both contenders. Obviously Copapy profits from less overheat by calling a single function from python per iteration, where the NumPy variant requires 3. Interestingly there is no indication visible in the chart that for increasing `v_size` the calling overhead for NumPy will be compensated by using faster SIMD instructions. It is to note that in this benchmark the Copapy case does not move any data between python and the compiled code.

 Furthermore for many applications copypy will benefit by reducing the actual number of operations significantly compared to a NumPy implementation, by precompute constant values know at compile time and benefiting from sparcity. Multiplying by zero (e.g. in a diagonal matrix) eliminate a hole branch in the computation graph. Operations without effect, like multiplications by 1 oder additions with zero gets eliminated at compile time.

--- a/pyproject.toml
+++ b/pyproject.toml
@ -77,7 +77,12 @@ markers = [
    "runner: requires to have an compiled runner executable"]

 [tool.ruff]
-lint.ignore = ["E501", "E226", "E265"]
+ignore = [
+    "E501",
+    "E226",
+    "E265",
+    "C901"
+]

 # Equivalent to Flake8's "exclude"
 exclude = [