101010.pl is one of the many independent Mastodon servers you can use to participate in the fediverse.
101010.pl czyli najstarszy polski serwer Mastodon. Posiadamy wpisy do 2048 znaków.

Server stats:

504
active users

#simd

0 posts0 participants0 posts today
Tweede golf<p>SIMD blog series: <span class="h-card" translate="no"><a href="https://hachyderm.io/@folkertdev" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>folkertdev</span></a></span> shows examples of using SIMD in the zlib-rs project. </p><p>Part 2 explains what to do when the compiler is not capable of using the SIMD capabilities of modern CPUs effectively. We end up with a basic, but very effective, example of a custom SIMD implementation beating the compiler. </p><p><a href="https://tweedegolf.nl/en/blog/155/simd-in-zlib-rs-part-2-compare256" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">tweedegolf.nl/en/blog/155/simd</span><span class="invisible">-in-zlib-rs-part-2-compare256</span></a> </p><p><span class="h-card" translate="no"><a href="https://fosstodon.org/@trifectatech" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>trifectatech</span></a></span></p><p><a href="https://fosstodon.org/tags/rustlang" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>rustlang</span></a> <a href="https://fosstodon.org/tags/datacompression" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>datacompression</span></a> <a href="https://fosstodon.org/tags/simd" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>simd</span></a></p>
nietras 👾<p>New blog post "Sep 0.10.0 - 21 GB/s CSV Parsing Using SIMD on AMD 9950X 🚀"</p><p>📈 Sep <a href="https://mastodon.social/tags/performance" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>performance</span></a> from 7 GB/s to 21 GB/s over last two years<br>🧑‍💻 <a href="https://mastodon.social/tags/csharp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>csharp</span></a> <a href="https://mastodon.social/tags/SIMD" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SIMD</span></a> and <a href="https://mastodon.social/tags/x64" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>x64</span></a> assembly on <a href="https://mastodon.social/tags/dotnet" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>dotnet</span></a> 9.0<br>🛠️ Tweaks and new <a href="https://mastodon.social/tags/AVX512" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>AVX512</span></a>-to-256 parser<br>🔢 Lots of benchmarks</p><p>👇<br><a href="https://nietras.com/2025/05/09/sep-0-10-0/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">nietras.com/2025/05/09/sep-0-1</span><span class="invisible">0-0/</span></a></p>
Tweede golf<p>New blog series: <span class="h-card" translate="no"><a href="https://hachyderm.io/@folkertdev" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>folkertdev</span></a></span> shows how we use SIMD in the zlib-rs project.</p><p>SIMD is crucial to good performance, but learning how to use it can be daunting. In this series we'll show concrete examples of using SIMD in a real world project.</p><p>Part 1 explains how the compiler already uses SIMD for us, how to evaluate whether it's doing a good job, and how to use a more optimal version when the current CPU supports it. </p><p><a href="https://tweedegolf.nl/en/blog/153/simd-in-zlib-rs-part-1-autovectorization-and-target-features" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">tweedegolf.nl/en/blog/153/simd</span><span class="invisible">-in-zlib-rs-part-1-autovectorization-and-target-features</span></a></p><p><span class="h-card" translate="no"><a href="https://fosstodon.org/@trifectatech" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>trifectatech</span></a></span></p><p> <a href="https://fosstodon.org/tags/rustlang" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>rustlang</span></a> <a href="https://fosstodon.org/tags/datacompression" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>datacompression</span></a> <a href="https://fosstodon.org/tags/simd" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>simd</span></a></p>
mkretz<p>While implementing complex numbers for <a href="https://floss.social/tags/simd" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>simd</span></a> I tripped over failures wrt. negative zero. After multiple re-readings of C23 Annex G and considering the meaning of infinite infinities on a 2D plane (with zeros simply being their inverse) I believe <a href="https://floss.social/tags/C" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>C</span></a> and <a href="https://floss.social/tags/CPlusPlus" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>CPlusPlus</span></a> should ignore the sign of zeros and infinities in their x+iy representations of complex numbers. <a href="https://compiler-explorer.com/z/YavE4MnMj" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">compiler-explorer.com/z/YavE4M</span><span class="invisible">nMj</span></a> provides some motivation.<br>Am I missing something?</p>
Ayan Shafqat<p>Forget the AI hype - FFT is the real unsung hero of computing...</p><p>The Fast Fourier Transform (FFT) is everywhere: multiplying large numbers, audio and video compression, high-frequency trading, weather prediction - you name it. It’s also the foundation of other key transforms: DCT for image compression, MDCT for audio compression, MFCC for machine learning, and more.</p><p>FFT is the most underrated algorithm of the 20th and 21st century — change my mind.</p><p>The first time I saw the Fourier Matrix and finally understood the Cooley-Tukey FFT, I was hooked. There’s something beautiful and elegant about its tree-like structure. Someday, I will probably write about what happens when you unravel FFT's recursion, and how it is related to the `rbit` instruction on ARM CPU. And sometimes, I just sit at my computer, and code away to make FFT run faster. It's relaxing...</p><p>Here’s one of my little achievement: A 4-point complex-to-complex FFT in just **11** AVX2 instructions. By itself, a 4-point FFT isn’t much, but as a kernel, it helps build higher-order FFTs with blazing efficiency.</p><p>Full demo implementation is on GitHub, which computes 256 point FFT under 1 micro-second on 12th gen Intel Processors.</p><p><a href="https://gist.github.com/ashafq/eef8ef391fb58be85b325c259ce591e3" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">gist.github.com/ashafq/eef8ef3</span><span class="invisible">91fb58be85b325c259ce591e3</span></a></p><p><a href="https://hachyderm.io/tags/signalprocessing" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>signalprocessing</span></a> <a href="https://hachyderm.io/tags/programming" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>programming</span></a> <a href="https://hachyderm.io/tags/simd" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>simd</span></a> <a href="https://hachyderm.io/tags/optimization" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>optimization</span></a></p>
Ayan Shafqat<p>SIMD and IIR filters are like oil and water, hard to mix! But with some clever math tricks, we can make IIR filters parallel utilizing SIMD instructions. Check out my new (or not so new) post!</p><p><a href="https://shafq.at/vectorizing-iir-filters.html" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">shafq.at/vectorizing-iir-filte</span><span class="invisible">rs.html</span></a></p><p><a href="https://hachyderm.io/tags/signalprocessing" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>signalprocessing</span></a> <a href="https://hachyderm.io/tags/C" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>C</span></a> <a href="https://hachyderm.io/tags/programming" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>programming</span></a> <a href="https://hachyderm.io/tags/vector" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>vector</span></a> <a href="https://hachyderm.io/tags/simd" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>simd</span></a></p>
Несерьёзный Выдумщик<p>Открытие Эндрю Крапивина о хеш-таблицах и микро-указателях?<br>Чисто гипотетически, может и актуально, но лишь в чистой и голой computer science теории.<br>На практике же полно нюансов реализации, сводящихся к оптимизациям конкретных аппаратных платформ.</p><p>Например, есть <a class="hashtag" href="https://idealists.su/tag/swisstable" rel="nofollow noopener" target="_blank">#SwissTable</a> известные с 2018 года, недавно <a class="hashtag" href="https://idealists.su/tag/golang" rel="nofollow noopener" target="_blank">#Golang</a> перешёл на них (с версии 1.24). И до него на SwissTable перейти успел <a class="hashtag" href="https://idealists.su/tag/rust" rel="nofollow noopener" target="_blank">#Rust</a>.</p><p>Хеш-таблицы Google <a href="https://abseil.io/about/design/swisstables" rel="nofollow noopener" target="_blank">SwissTable</a> и Facebook <a href="https://github.com/facebook/folly/blob/main/folly/container/F14.md" rel="nofollow noopener" target="_blank">F14</a> примерно одинаковые, одно лишь вариант другого.</p><p>Идея оптимизации работы вокруг использования <a class="hashtag" href="https://idealists.su/tag/simd" rel="nofollow noopener" target="_blank">#SIMD</a> инструкций для поиска занятых ячеек и проверки ключа. И в тотально подавляющем большинстве случаев хватает одной проверки блока из восьми элементов.</p><p>Надо ещё много раз поиграться с вариантами реализации какой-либо идеи из чистого computer science. Посмотрев как оно ложится на аппаратную платформу сродни x86-64.</p><ol><li><p>Есть <a href="https://en.wikipedia.org/wiki/Cache_prefetching" rel="nofollow noopener" target="_blank">prefetching</a> памяти и работа с ОЗУ идёт через загрузку целиком всей <a href="https://en.wikipedia.org/wiki/CPU_cache#Cache_entries" rel="nofollow noopener" target="_blank">cache line</a> в ЦПУ, даже при обращении <strong>на чтение</strong> лишь к одному значению в пару байт.</p></li><li><p>Предыдущий пункт не только про cache misses, но и «локальность данных». Как повышающую производительность, так и приводящих к false sharing при многопоточном использовании структуры данных. &nbsp;</p></li><li><p>Необходимо учитывать и размер страницы виртуальной памяти, чтобы снизить «давление» на TLB и уйти от <a href="https://en.wikipedia.org/wiki/Translation_lookaside_buffer#TLB-miss_handling" rel="nofollow noopener" target="_blank">TLB miss</a>.</p></li></ol><p>Для пример, в нагруженных системах используется донастройка системы на huge pages, например, все кто используют модный <a class="hashtag" href="https://idealists.su/tag/dpdk" rel="nofollow noopener" target="_blank">#DPDK</a> сам по себе или с каким-нибудь <a class="hashtag" href="https://idealists.su/tag/seastar" rel="nofollow noopener" target="_blank">#Seastar</a>:</p><ul><li>Выбравшие не оригинальную <a class="hashtag" href="https://idealists.su/tag/kafka" rel="nofollow noopener" target="_blank">#Kafka</a>, а её более производительный аналог <a class="hashtag" href="https://idealists.su/tag/redpanda" rel="nofollow noopener" target="_blank">#RedPanda</a>.</li><li>Использующие вместо Apache <a class="hashtag" href="https://idealists.su/tag/cassandra" rel="nofollow noopener" target="_blank">#Cassandra</a> более производительную <a class="hashtag" href="https://idealists.su/tag/scylladb" rel="nofollow noopener" target="_blank">#ScyllaDB</a></li></ul><p>Голая теория computer science это хорошо и замечательно, но практика омерзительна свой приземлённостью. Прямой проход перебором по небольшому массиву оказывается быстрее, чем использование binary search tree. И совершенно не важно какого именно красно-чёрного или же АВЛ.</p><p>Это не вопрос ретроградства и вызова 40-летней теории :)</p><p><a class="hashtag" href="https://idealists.su/tag/software" rel="nofollow noopener" target="_blank">#software</a> <a class="hashtag" href="https://idealists.su/tag/softwaredevelop" rel="nofollow noopener" target="_blank">#SoftwareDevelop</a> <a class="hashtag" href="https://idealists.su/tag/программирование" rel="nofollow noopener" target="_blank">#программирование</a> <a class="hashtag" href="https://idealists.su/tag/разработка" rel="nofollow noopener" target="_blank">#разработка</a> <a class="hashtag" href="https://idealists.su/tag/programming" rel="nofollow noopener" target="_blank">#programming</a> <span class="h-card"><a class="u-url mention" href="https://mastodon.social/@russian_mastodon" rel="nofollow noopener" target="_blank">@<span>russian_mastodon</span></a></span> <span class="h-card"><a class="u-url mention" href="https://lor.sh/@ru" rel="nofollow noopener" target="_blank">@<span>ru</span></a></span> <span class="h-card"><a class="u-url mention" href="https://3zi.ru/@Russia" rel="nofollow noopener" target="_blank">@<span>Russia</span></a></span></p>
IT News<p>Faster Integer Division with Floating Point - Multiplication on a common microcontroller is easy. But division is much more diff... - <a href="https://hackaday.com/2024/12/22/faster-integer-division-with-floating-point/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">hackaday.com/2024/12/22/faster</span><span class="invisible">-integer-division-with-floating-point/</span></a> <a href="https://schleuss.online/tags/softwaredevelopment" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>softwaredevelopment</span></a> <a href="https://schleuss.online/tags/softwarehacks" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>softwarehacks</span></a> <a href="https://schleuss.online/tags/optimization" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>optimization</span></a> <a href="https://schleuss.online/tags/assembly" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>assembly</span></a> <a href="https://schleuss.online/tags/avx" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>avx</span></a>-512 <a href="https://schleuss.online/tags/x86_64" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>x86_64</span></a> <a href="https://schleuss.online/tags/simd" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>simd</span></a> <a href="https://schleuss.online/tags/x86" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>x86</span></a></p>
mattst88 :gentoo:<p>I landed some improvements and small optimizations to <a href="https://fosstodon.org/tags/pixman" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>pixman</span></a>'s AltiVec code. See <a href="https://gitlab.freedesktop.org/pixman/pixman/-/merge_requests/136" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">gitlab.freedesktop.org/pixman/</span><span class="invisible">pixman/-/merge_requests/136</span></a></p><p>It was fun working with a new (to me) instruction set and trying to figure out how to puzzle together the pieces into something that improved the `pix_multiply()` function (which is kind of the core primitive of most fast paths).</p><p>I couldn't figure out a way to use the `vec_mradds`/`vmhraddshs` instruction. Maybe you can? (see <a href="https://gitlab.freedesktop.org/pixman/pixman/-/merge_requests/136#note_2699795" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">gitlab.freedesktop.org/pixman/</span><span class="invisible">pixman/-/merge_requests/136#note_2699795</span></a>)</p><p><a href="https://fosstodon.org/tags/altivec" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>altivec</span></a> <a href="https://fosstodon.org/tags/powerpc" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>powerpc</span></a> <a href="https://fosstodon.org/tags/simd" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>simd</span></a></p>
FCLC<p>Channeling my inner <span class="h-card" translate="no"><a href="https://hachyderm.io/@shafik" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>shafik</span></a></span>, assuming a standard, compliant <a href="https://mast.hpc.social/tags/riscv" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>riscv</span></a> processor, what kind of float instructions can be executed on the vector unit of a processor that advertises </p><p>"RV32IMFDZve64f"</p><p><a href="https://mast.hpc.social/tags/HPC" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HPC</span></a> <a href="https://mast.hpc.social/tags/IEEE754" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>IEEE754</span></a> <a href="https://mast.hpc.social/tags/SIMD" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SIMD</span></a> <a href="https://mast.hpc.social/tags/RISCV" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RISCV</span></a> <a href="https://mast.hpc.social/tags/RVV" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RVV</span></a> </p><p><a href="https://github.com/riscvarchive/riscv-v-spec/releases/download/v1.0/riscv-v-spec-1.0.pdf" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/riscvarchive/riscv-</span><span class="invisible">v-spec/releases/download/v1.0/riscv-v-spec-1.0.pdf</span></a></p>
mattst88 :gentoo:<p>I fixed an issue in pixman's Altivec code the other day -- <a href="https://cgit.freedesktop.org/pixman/commit/?id=207626180d0282bb14a50f2e494174f54ac8a6ce" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">cgit.freedesktop.org/pixman/co</span><span class="invisible">mmit/?id=207626180d0282bb14a50f2e494174f54ac8a6ce</span></a></p><p>And in the process, I read through the Altivec docs and discovered that there are vector instructions that pack and unpack between a8r8g8b8 and a1r5g5b5 formats (but nothing fo r5g6b5).</p><p>Any clues why? Was a1r5g5b5 really common on Mac OS or something? I don't think I've seen a1r5g5b5 used anywhere.</p><p><a href="https://fosstodon.org/tags/powerpc" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>powerpc</span></a> <a href="https://fosstodon.org/tags/altivec" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>altivec</span></a> <a href="https://fosstodon.org/tags/simd" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>simd</span></a> <a href="https://fosstodon.org/tags/macos9" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>macos9</span></a> <a href="https://fosstodon.org/tags/pixman" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>pixman</span></a></p>
FCLC<p>Hey friends! <br>For folks interested in <a href="https://mast.hpc.social/tags/RISCV" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RISCV</span></a>, and especially <a href="https://mast.hpc.social/tags/RVV" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RVV</span></a>, here's some information on the <a href="https://mast.hpc.social/tags/tenstorrent" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>tenstorrent</span></a> in house designed CPU!</p><p>High level, vector is 2x256, full RVV1.0 as well as a fair few of the optional extras to RVV1.0! </p><p>Phoronix article here: <a href="https://www.phoronix.com/news/LLVM-20-Tenstorrent-Ascalon" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">phoronix.com/news/LLVM-20-Tens</span><span class="invisible">torrent-Ascalon</span></a></p><p>LLVM patches here: <a href="https://github.com/llvm/llvm-project/pull/115100" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/llvm/llvm-project/p</span><span class="invisible">ull/115100</span></a></p><p>One Pager: <a href="https://cdn.sanity.io/files/jpb4ed5r/production/6a28f7d59b6d1300fccdbdd394e192a4fd5f54c6.pdf" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">cdn.sanity.io/files/jpb4ed5r/p</span><span class="invisible">roduction/6a28f7d59b6d1300fccdbdd394e192a4fd5f54c6.pdf</span></a></p><p><a href="https://mast.hpc.social/tags/HPC" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HPC</span></a> <a href="https://mast.hpc.social/tags/SIMD" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SIMD</span></a></p>
mkretz<p>C++26 will have data-parallel types (or std::simd as it came to be known; unless we rename it next meeting — don't settle in for the name just yet) 🎉 :cpp_language: <a href="https://floss.social/tags/cpp" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>cpp</span></a> <a href="https://floss.social/tags/cplusplus" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>cplusplus</span></a> <a href="https://floss.social/tags/cpp26" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>cpp26</span></a> <a href="https://floss.social/tags/simd" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>simd</span></a></p>
Karsten Schmidt<p>Yesterday, one year ago... (Still wondering how many people actually have read or tried out any of these)</p><p><a href="https://mastodon.thi.ng/@toxi/111348591236791838" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">mastodon.thi.ng/@toxi/11134859</span><span class="invisible">1236791838</span></a></p><p><a href="https://mastodon.thi.ng/tags/ThingUmbrella" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ThingUmbrella</span></a> <a href="https://mastodon.thi.ng/tags/HowToThing" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>HowToThing</span></a> <a href="https://mastodon.thi.ng/tags/TypeScript" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>TypeScript</span></a> <a href="https://mastodon.thi.ng/tags/Tutorial" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Tutorial</span></a> <a href="https://mastodon.thi.ng/tags/Shader" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Shader</span></a> <a href="https://mastodon.thi.ng/tags/GIS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GIS</span></a> <a href="https://mastodon.thi.ng/tags/SIMD" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SIMD</span></a> <a href="https://mastodon.thi.ng/tags/Forth" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>Forth</span></a> <a href="https://mastodon.thi.ng/tags/ProcGen" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>ProcGen</span></a></p>
Marcin Juszkiewicz 🙃<p>If you work with SIMD and wonder how it looks on the other architectures then VectorCamp has launched website which helps.</p><p>On <a href="https://simd.info/" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">simd.info/</span><span class="invisible"></span></a> you can look which intrinsics are available on Arm, Power and x86-64 (RISC-V RVV will be there too). Compare them etc.</p><p>There is a search function, tree of operations and links to the official documentation.</p><p><a href="https://society.oftrolls.com/tags/simd" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>simd</span></a> <a href="https://society.oftrolls.com/tags/neon" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>neon</span></a> <a href="https://society.oftrolls.com/tags/avx" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>avx</span></a> <a href="https://society.oftrolls.com/tags/sve" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>sve</span></a> <a href="https://society.oftrolls.com/tags/vsx" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>vsx</span></a> <a href="https://society.oftrolls.com/tags/sse" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>sse</span></a></p>
NLnet Labs<p><span class="h-card" translate="no"><a href="https://infosec.exchange/@resingm" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>resingm</span></a></span> <span class="h-card" translate="no"><a href="https://fosstodon.org/@ximon18" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>ximon18</span></a></span> Meanwhile, it's day 4 and <span class="h-card" translate="no"><a href="https://tech.lgbt/@bal4e" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>bal4e</span></a></span> is seriously on a mission with making the `domain` zone file parser lightning fast. ⚡️ <a href="https://fosstodon.org/tags/DNS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DNS</span></a> <a href="https://fosstodon.org/tags/SIMD" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SIMD</span></a> <a href="https://fosstodon.org/tags/rustlang" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>rustlang</span></a>⚡️ <a href="https://github.com/NLnetLabs/domain/pull/388" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">github.com/NLnetLabs/domain/pu</span><span class="invisible">ll/388</span></a></p>
NLnet Labs<p>Jeroen Koekkoek, one of our lead developers, has collaborated with <span class="h-card" translate="no"><a href="https://mastodon.social/@lemire" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>lemire</span></a></span> to create a blazingly fast <a href="https://fosstodon.org/tags/DNS" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DNS</span></a> zone file parser that is now part of our authoritative nameserver NSD. </p><p>They have now published a paper outlining how they enhanced parsing throughput using data parallelism, specifically Single Instruction Multiple Data (<a href="https://fosstodon.org/tags/SIMD" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SIMD</span></a>) instructions available on commodity processors. <a href="https://fosstodon.org/tags/programming" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>programming</span></a> <a href="https://www.authorea.com/1222979" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="">authorea.com/1222979</span><span class="invisible"></span></a></p>
mkretz<p>European <a href="https://floss.social/tags/GNURadio" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GNURadio</span></a> Days this week. (It's just a few steps from my regular office at <a href="https://floss.social/tags/GSI_Helmholtzzentrum_f%C3%BCr_Schwerionenforschung" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>GSI_Helmholtzzentrum_für_Schwerionenforschung</span></a>.) This week has a focus on GNURadio 4, which was developed by colleagues at <a href="https://floss.social/tags/FAIR" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>FAIR</span></a>/#GSI. I'm happy that I was able to contribute a small part in design and implementation of the new core. And this new core makes use of `stdx::simd` and <a href="https://github.com/mattkretz/vir-simd" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">github.com/mattkretz/vir-simd</span><span class="invisible"></span></a>. I will talk about the <a href="https://floss.social/tags/SIMD" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SIMD</span></a> parts later today (1:30 pm CEST) and you can tune in at<br><a href="https://www.youtube.com/watch?v=8xnPsPdy5AQ" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://www.</span><span class="ellipsis">youtube.com/watch?v=8xnPsPdy5A</span><span class="invisible">Q</span></a></p>
Librecast<p>It's a new release of lcrq!</p><p>lcrq now makes use of a CPU dispatcher to detect the available SIMD instruction sets at runtime, ensuring that the code runs as fast as possible on the target machine.</p><p>Thanks to <span class="h-card" translate="no"><a href="https://social.nlnet.nl/@nlnet" class="u-url mention" rel="nofollow noopener" target="_blank">@<span>nlnet</span></a></span> and <a href="https://chaos.social/tags/NGIAssure" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>NGIAssure</span></a> for funding this work.</p><p><a href="https://codeberg.org/librecast/lcrq/releases/tag/v0.2.0" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="ellipsis">codeberg.org/librecast/lcrq/re</span><span class="invisible">leases/tag/v0.2.0</span></a></p><p><a href="https://chaos.social/tags/simd" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>simd</span></a> <a href="https://chaos.social/tags/RaptorQ" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RaptorQ</span></a> <a href="https://chaos.social/tags/lcrq" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>lcrq</span></a></p>
Ivan Enderlin 🦀<p>wide, <a href="https://github.com/Lokathor/wide" rel="nofollow noopener" translate="no" target="_blank"><span class="invisible">https://</span><span class="">github.com/Lokathor/wide</span><span class="invisible"></span></a>.</p><p>&gt; [it] has portable "wide" data types that do their best to be SIMD when possible.</p><p>&gt; On x86, x86_64, wasm32 and aarch64 neon this is done with explicit intrinsic usage (via safe_arch), and on other architectures this is done by carefully writing functions so that LLVM hopefully does the right thing. When Rust stabilizes more explicit intrinsics then they can go into safe_arch and then they can get used here.</p><p><a href="https://fosstodon.org/tags/RustLang" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>RustLang</span></a> <a href="https://fosstodon.org/tags/SIMD" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>SIMD</span></a> <a href="https://fosstodon.org/tags/DataType" class="mention hashtag" rel="nofollow noopener" target="_blank">#<span>DataType</span></a></p>