101010.pl is one of the many independent Mastodon servers you can use to participate in the fediverse.
101010.pl czyli najstarszy polski serwer Mastodon. Posiadamy wpisy do 2048 znaków.

Server stats:

478
active users

#simd

0 posts0 participants0 posts today

SIMD blog series: @folkertdev shows examples of using SIMD in the zlib-rs project.

Part 2 explains what to do when the compiler is not capable of using the SIMD capabilities of modern CPUs effectively. We end up with a basic, but very effective, example of a custom SIMD implementation beating the compiler.

tweedegolf.nl/en/blog/155/simd

@trifectatech

tweedegolf.nlSIMD in zlib-rs (part 2): compare256 - Blog - Tweede golfIn part 1 of the "SIMD in zlib-rs" series, we've seen that, with a bit of nudging, autovectorization can produce optimal code for some problems. But that does not always work: with SIMD clever pr ...

New blog series: @folkertdev shows how we use SIMD in the zlib-rs project.

SIMD is crucial to good performance, but learning how to use it can be daunting. In this series we'll show concrete examples of using SIMD in a real world project.

Part 1 explains how the compiler already uses SIMD for us, how to evaluate whether it's doing a good job, and how to use a more optimal version when the current CPU supports it.

tweedegolf.nl/en/blog/153/simd

@trifectatech

tweedegolf.nlSIMD in zlib-rs (part 1): Autovectorization and target features - Blog - Tweede golfI'm fascinated by the creative use of SIMD instructions. When you first learn about SIMD, it is clear that doing more multiplications in a single instruction is useful for speeding up matrix multi ...

While implementing complex numbers for #simd I tripped over failures wrt. negative zero. After multiple re-readings of C23 Annex G and considering the meaning of infinite infinities on a 2D plane (with zeros simply being their inverse) I believe #C and #CPlusPlus should ignore the sign of zeros and infinities in their x+iy representations of complex numbers. compiler-explorer.com/z/YavE4M provides some motivation.
Am I missing something?

compiler-explorer.comCompiler Explorer - C++ int main() { using C = std::complex<double>; std::cout << C() * -C() << '\n'; std::cout << 0. * -C() << '\n'; }

Forget the AI hype - FFT is the real unsung hero of computing...

The Fast Fourier Transform (FFT) is everywhere: multiplying large numbers, audio and video compression, high-frequency trading, weather prediction - you name it. It’s also the foundation of other key transforms: DCT for image compression, MDCT for audio compression, MFCC for machine learning, and more.

FFT is the most underrated algorithm of the 20th and 21st century — change my mind.

The first time I saw the Fourier Matrix and finally understood the Cooley-Tukey FFT, I was hooked. There’s something beautiful and elegant about its tree-like structure. Someday, I will probably write about what happens when you unravel FFT's recursion, and how it is related to the `rbit` instruction on ARM CPU. And sometimes, I just sit at my computer, and code away to make FFT run faster. It's relaxing...

Here’s one of my little achievement: A 4-point complex-to-complex FFT in just **11** AVX2 instructions. By itself, a 4-point FFT isn’t much, but as a kernel, it helps build higher-order FFTs with blazing efficiency.

Full demo implementation is on GitHub, which computes 256 point FFT under 1 micro-second on 12th gen Intel Processors.

gist.github.com/ashafq/eef8ef3

Открытие Эндрю Крапивина о хеш-таблицах и микро-указателях?
Чисто гипотетически, может и актуально, но лишь в чистой и голой computer science теории.
На практике же полно нюансов реализации, сводящихся к оптимизациям конкретных аппаратных платформ.

Например, есть #SwissTable известные с 2018 года, недавно #Golang перешёл на них (с версии 1.24). И до него на SwissTable перейти успел #Rust.

Хеш-таблицы Google SwissTable и Facebook F14 примерно одинаковые, одно лишь вариант другого.

Идея оптимизации работы вокруг использования #SIMD инструкций для поиска занятых ячеек и проверки ключа. И в тотально подавляющем большинстве случаев хватает одной проверки блока из восьми элементов.

Надо ещё много раз поиграться с вариантами реализации какой-либо идеи из чистого computer science. Посмотрев как оно ложится на аппаратную платформу сродни x86-64.

  1. Есть prefetching памяти и работа с ОЗУ идёт через загрузку целиком всей cache line в ЦПУ, даже при обращении на чтение лишь к одному значению в пару байт.

  2. Предыдущий пункт не только про cache misses, но и «локальность данных». Как повышающую производительность, так и приводящих к false sharing при многопоточном использовании структуры данных.  

  3. Необходимо учитывать и размер страницы виртуальной памяти, чтобы снизить «давление» на TLB и уйти от TLB miss.

Для пример, в нагруженных системах используется донастройка системы на huge pages, например, все кто используют модный #DPDK сам по себе или с каким-нибудь #Seastar:

  • Выбравшие не оригинальную #Kafka, а её более производительный аналог #RedPanda.
  • Использующие вместо Apache #Cassandra более производительную #ScyllaDB

Голая теория computer science это хорошо и замечательно, но практика омерзительна свой приземлённостью. Прямой проход перебором по небольшому массиву оказывается быстрее, чем использование binary search tree. И совершенно не важно какого именно красно-чёрного или же АВЛ.

Это не вопрос ретроградства и вызова 40-летней теории :)

#software #SoftwareDevelop #программирование #разработка #programming @russian_mastodon @ru @Russia

idealists.suAkkoma

I landed some improvements and small optimizations to #pixman's AltiVec code. See gitlab.freedesktop.org/pixman/

It was fun working with a new (to me) instruction set and trying to figure out how to puzzle together the pieces into something that improved the `pix_multiply()` function (which is kind of the core primitive of most fast paths).

I couldn't figure out a way to use the `vec_mradds`/`vmhraddshs` instruction. Maybe you can? (see gitlab.freedesktop.org/pixman/)

GitLabvmx: Many improvements (!136) · Merge requests · Pixman / pixman · GitLabMatt Turner (19): vmx: Remove unnecessary variable vmx: Remove unpack_565_to_8888() and associated constants vmx: Remove unpack_128_2x128_16() vmx: Remove...

I fixed an issue in pixman's Altivec code the other day -- cgit.freedesktop.org/pixman/co

And in the process, I read through the Altivec docs and discovered that there are vector instructions that pack and unpack between a8r8g8b8 and a1r5g5b5 formats (but nothing fo r5g6b5).

Any clues why? Was a1r5g5b5 really common on Mac OS or something? I don't think I've seen a1r5g5b5 used anywhere.

cgit.freedesktop.orgvmx: Fix is_opaque, is_zero, is_transparent functions - pixman - Pixman: The pixel-manipulation library for X and cairo. (mirrored from https://gitlab.freedesktop.org/pixman/pixman)

Hey friends!
For folks interested in #RISCV, and especially #RVV, here's some information on the #tenstorrent in house designed CPU!

High level, vector is 2x256, full RVV1.0 as well as a fair few of the optional extras to RVV1.0!

Phoronix article here: phoronix.com/news/LLVM-20-Tens

LLVM patches here: github.com/llvm/llvm-project/p

One Pager: cdn.sanity.io/files/jpb4ed5r/p

www.phoronix.comLLVM Merges Support The For Tenstorrent TT-Ascalon-D8 RISC-V CPU

Yesterday, one year ago... (Still wondering how many people actually have read or tried out any of these)

mastodon.thi.ng/@toxi/11134859

Mastodon Glitch EditionKarsten Schmidt (@toxi@mastodon.thi.ng)#HowToThing #Epilogue #LongRead: After 66 days of addressing 30 wildly varied use cases and building ~20 new example projects of varying complexity to illustrate how #ThingUmbrella libraries can be used & combined, I'm taking a break to concentrate on other important thi.ngs... With this overall selection I tried shining a light on common architectural patterns, but also some underexposed, yet interesting niche topics. Since there were many different techniques involved, it's natural not everything resonated with everyone. That's fine! Though, my hope always is that readers take an interest in a wide range of topics, and so many of these new examples were purposefully multi-faceted and hopefully provided insights for at least some parts, plus (in)directly communicated a core essence of the larger project: Only individual packages (or small clusters) are designed & optimized for a set of particular use cases. At large, though, thi.ng explicitly does NOT offer any such guidance or even opinion. All I can offer are possibilities, nudges and cross-references, how these constructs & techniques can be (and have been) useful and/or the theory underpinning them. For some topics, thi.ng libs provide multiple approaches to achieve certain goals. This again is by design (not lack of it!) and stems from hard-learned experience, showing that many (esp. larger) projects highly benefit from more nuanced (sometimes conflicting approaches) compared to popular defacto "catch-all" framework solutions. To avid users (incl. myself) this approach has become a somewhat unique offering and advantage, yet in itself seems to be the hardest and most confusing aspect of the entire project to communicate to newcomers. So seeing this list of new projects together, to me really is a celebration (and confirmation/testament) of the overall #BottomUpDesign #ThingUmbrella approach (which I've been building on since ~2006): From the wide spectrum/flexibility of use cases, the expressiveness, concision, the data-first approach, the undogmatic mix of complementary paradigms, the separation of concerns, no hidden magic state, only minimal build tooling requirements (a bundler is optional, but recommended for tree shaking, no more) — these are all aspects I think are key to building better (incl. more maintainable & reason-able) software. IMO they are worth embracing & exposing more people to and this is what I've partially attempted to do with this series of posts... ICYMI here's a summary of the 10 most recent posts (full list in the https://thi.ng/umbrella readme). Many of those examples have more comments than code... 021: Iterative animated polygon subdivision & heat map viz https://mastodon.thi.ng/@toxi/111221943333023306 022: Quasi-random voronoi lattice generator https://mastodon.thi.ng/@toxi/111244412425832657 023: Tag-based Jaccard similarity ranking using bitfields https://mastodon.thi.ng/@toxi/111256960928934577 024: 2.5D hidden line visualization of DEM files https://mastodon.thi.ng/@toxi/111269505611983570 025: Transforming & plotting 10k data points using SIMD https://mastodon.thi.ng/@toxi/111283262419126958 026: Shader meta-programming to generate 16 animated function plots https://mastodon.thi.ng/@toxi/111295842650216136 027: Flocking sim w/ neighborhood queries to visualize proximity https://mastodon.thi.ng/@toxi/111308439597090930 028: Randomized, space-filling, nested 2D grid layout generator https://mastodon.thi.ng/@toxi/111324566926701431 029: Forth-like DSL & livecoding playground for 2D geometry https://mastodon.thi.ng/@toxi/111335025037332972 030: Procedural text generation via custom DSL & parse grammar https://mastodon.thi.ng/@toxi/111347074558293056 #ThingUmbrella #OpenSource #TypeScript #JavaScript #Tutorial

European #GNURadio Days this week. (It's just a few steps from my regular office at #GSI_Helmholtzzentrum_für_Schwerionenforschung.) This week has a focus on GNURadio 4, which was developed by colleagues at #FAIR/#GSI. I'm happy that I was able to contribute a small part in design and implementation of the new core. And this new core makes use of `stdx::simd` and github.com/mattkretz/vir-simd. I will talk about the #SIMD parts later today (1:30 pm CEST) and you can tune in at
youtube.com/watch?v=8xnPsPdy5A

wide, github.com/Lokathor/wide.

> [it] has portable "wide" data types that do their best to be SIMD when possible.

> On x86, x86_64, wasm32 and aarch64 neon this is done with explicit intrinsic usage (via safe_arch), and on other architectures this is done by carefully writing functions so that LLVM hopefully does the right thing. When Rust stabilizes more explicit intrinsics then they can go into safe_arch and then they can get used here.

GitHubGitHub - Lokathor/wide: A crate to help you go wide. By which I mean use SIMD stuff.A crate to help you go wide. By which I mean use SIMD stuff. - Lokathor/wide