From d9f361a6d60df9b7807931aa6efbcdd862df5320 Mon Sep 17 00:00:00 2001
From: Nicolas <Nicolas@nonan.net>
Date: Tue, 16 Dec 2025 12:34:40 +0100
Subject: [PATCH] Readme updated with benchmark result

---
 README.md | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/README.md b/README.md
index 7286182..62b4d51 100644
--- a/README.md
+++ b/README.md
@@ -34,9 +34,14 @@ Currently in development:
 - Support for Thumb instructions required by ARM*-M targets (for MCUs)
 - Constant regrouping for further symbolic optimization of the computation graph
 
+Despite missing SIMD-optimization, benchmark performance shows promising numbers. The following chart plots the results in comparison to NumPy 2.3.5:
 
 ![Copapy architecture](docs/source/media/benchmark_results_001.svg)
 
+For the benchmark (`tests/benchmark.py`) the timing of 30000 iterations for calculating the therm `sum((v1 + i) @ v2 for i in range(10))` where measured on an Ryzen 5 3400G. Where the vectors `v1` and `v2` both have a lengths of `v_size` which was varied according to the chart from 10 to 600. For the NumPy case the "i in range(10)" loop was vectorized like this: `np.sum((v1 + i) @ v2)` with i being here a `NDArray` with a dimension of `[10, 1]`. The number of calculated scalar operations is the same for both contenders. Obviously copapy profits from less overheat by calling a single function from python per iteration, where the NumPy variant requires 3. Interestingly there is no indication visible in the chart that for increasing `v_size` the calling overhead for NumPy will be compensated by using faster SIMD instructions.
+
+Furthermore for many applications copypy will benefit by reducing the actual number of operations significantly compared to a NumPy implementation, by precompute constant values know at compile time and benefiting from sparcity. Multiplying by zero (e.g. in a diagonal matrix) eliminate a hole branch in the computation graph. Operations without effect, like multiplications by 1 oder additions with zero gets eliminated at compile time.
+
 ## Install
 
 To install Copapy, you can use pip. Precompiled wheels are available for Linux (x86_64, AArch64, ARMv7), Windows (x86_64) and macOS (x86_64, AArch64):