some fixes in the readme

This commit is contained in:
Nicolas 2026-06-17 09:48:29 +02:00
parent 4a71db0e38
commit bbb8eca678
1 changed files with 2 additions and 2 deletions

View File

@ -39,7 +39,7 @@ Despite missing SIMD-optimization, benchmark performance shows promising numbers
For the benchmark (`tests/benchmark.py`) the timing of 30000 iterations for calculating the therm `sum((v1 + i) @ v2 for i in range(10))` where measured on an Ryzen 5 3400G. Where the vectors `v1` and `v2` both have a lengths of `v_size` which was varied according to the chart from 10 to 500. For the NumPy case the "i in range(10)" loop was vectorized like this: `np.sum((v1 + i) @ v2)` with i being here a `NDArray` with a dimension of `[10, 1]`. The number of calculated scalar operations is the same for both contenders. Obviously Copapy profits from less overheat by calling a single function from python per iteration, where the NumPy variant requires 3. Interestingly there is no indication visible in the chart that for increasing `v_size` the calling overhead for NumPy will be compensated by using faster SIMD instructions. It is to note that in this benchmark the Copapy case does not move any data between python and the compiled code.
Furthermore for many applications copypy will benefit by reducing the actual number of operations significantly compared to a NumPy implementation, by precompute constant values know at compile time and benefiting from sparcity. Multiplying by zero (e.g. in a diagonal matrix) eliminate a hole branch in the computation graph. Operations without effect, like multiplications by 1 oder additions with zero gets eliminated at compile time.
Furthermore for many applications Copapy performance will benefit by reducing the actual number of operations significantly compared to a NumPy implementation, by precompute constant values know at compile time and benefiting from sparcity. Multiplying by zero (e.g. in a diagonal matrix) eliminate a hole branch in the computation graph. Operations without effect, like multiplications by 1 oder additions with zero gets eliminated at compile time.
For Testing and using Copapy to speed up computations in conventional Python programs there is also the `@cp.jit` decorator available, to compile functions on first use and cache the compiled version for later calls:
@ -188,7 +188,7 @@ For more complex operations - where inlining is less useful - stencils call a no
Unlike stencils, non-stencil functions like `sinf` are not stripped and do not need to be tail-call-optimizable. These functions can be provided as C code and compiled together with the stencils or can be object files like in the case of `sinf` compiled from C and assembly code and merged into the stencil object files. Math functions like `sinf` are currently provided by the MUSL C library, with architecture-specific optimizations.
Non-stencil functions and constants are stored together with the stencils in an ELF object file for each supported CPU architecture. The required non-stencil functions and constants are bundled during compilation. The compiler includes only the data and code required for a specific Copapy program.
Non-stencil functions and constants used by them are stored together with the stencils in ELF object files for each supported CPU architecture. The required non-stencil functions and constants are bundled during compilation. The compiler includes only the data and code required for a specific Copapy program.
The Copapy compilation process is independent of the actual instruction set. It relies purely on relocation entries and symbol metadata from the ELF file generated by the C compiler.