- Get link
- X
- Other Apps
- Get link
- X
- Other Apps
SPO600 Project Selection
Introduction
This blog post is the first stage of the
SPO Project. The main objective is to optimize an open-source library for ARMv9
hardware using
Scalable Vector Extension 2.
The first part consists of steps to find and select a suitable package
that ideally involves significant amounts of data processing and has
existing SIMD implementations:
- Identify some candidate open-source packages for optimization.
- Find the SIMD implementations in these packages.
- Select a part you want to enhance.
- Create a strategy for changes.
- Note how the community accepts contributions and engages with the community to discuss your proposed work.
Open-source Package
As I am taking a course in Data Science, I have used a couple of
packages to work on data and design learning models. Some of which are
Pandas,
NumPy,
Scikit-Learn,
TensorFlow. These open-source libraries perfectly fit our criteria - they
process big data and probably use some SIMD implementation. However, a
library should ideally use C/C++ - allows straightforward assembly
embedding with GCC compiler intrinsic.
Upon examing the repositories of each, I found that Pandas packages use
the Cython extension, which allows to write C code with Python syntax and
integrate with existing low-level libraries. But I did not find any
mentions of SIMD or architecture optimizations. Maybe they are located
in the C dependencies. The same was true Scikit-Learn.
On the other hand, I found mentions of <arm_neon.h> SIMD library
in both Numpy and TensorFlow packages. Both packages use the NEON
intrinsics and extensively utilize other SIMD CPU capabilities for
optimization.
Function using NEON from TensorFlow:
inline static float GetSum(const float32x4_t& values) {
static float32_t summed_values[4];
vst1q_f32(summed_values, values);
return summed_values[0]
+ summed_values[1]
+ summed_values[2]
+ summed_values[3];
}
Additionally, the repository contains some inline assembly but nothing
related to SVE2.
Inline assembly from NumPy:
#else // gcc
npyv_s32 lo_odd, hi_odd;
__asm__ ("xvcvdpsxws %x0,%x1" : "=wa" (lo_odd) : "wa" (a));
__asm__ ("xvcvdpsxws %x0,%x1" : "=wa" (hi_odd) : "wa" (b));
As for NumPy, I found the files that include the NEON library, but there
are just testing programs located in checks folder. Also, there is
some inline assembly and x86 intrinsics but nothing related to SVE2.
Selection
Both packages are great candidates, and I will probably create an issue for
both projects. But I am leaning towards TensorFlow as they already
have robust NEON implementation. It will help me determine the SVE2 analogs
of NEON instructions used in calculations and data processing. Not to
mention that TensorFlow is a Machine Learning library, porting of which for
ARMv9 embedded devices is an obvious course for the project.
Strategy
I plan to fork the code and explore the SIMD NEON parts to see what I can update. I will also explore the building process by trying to compile the test code into an executable. After that, I will research the SVE2 and find similarities with the given code, so I can start implementing every function one by one using compiler intrinsics - the approach is compatible with existing SIMD code.
Project Contribution Rules
I have read the TensorFlow Documentation and
Contribution Rules. I will thoroughly research the contribution process and create
an issue on GitHub for both packages. Conveniently, both projects
are hosted on GitHub, meaning I can simply add Pull Requests for
the code to be tested.
Email: deezzir@gmail.com
GitHub: github.com
arm
assembly
compiler intrinsics
contribution
github
opensource
optimization
portability
project
spo
sve2
- Get link
- X
- Other Apps
Comments
Post a Comment