Thanks for using Compiler Explorer
Sponsors
C with Coccinelle
C++ with Coccinelle
Jakt
C++
Ada
Algol68
Analysis
Android Java
Android Kotlin
Assembly
C
C3
Carbon
C++ (Circle)
CIRCT
Clean
CMake
CMakeScript
COBOL
C++ for OpenCL
MLIR
Cppx
Cppx-Blue
Cppx-Gold
Cpp2-cppfront
Crystal
C#
CUDA C++
D
Dart
Elixir
Erlang
Fortran
F#
GLSL
Go
Haskell
HLSL
Hook
Hylo
IL
ispc
Java
Julia
Kotlin
LLVM IR
LLVM MIR
Modula-2
Nim
Numba
Objective-C
Objective-C++
OCaml
Odin
OpenCL C
Pascal
Pony
Python
Racket
Ruby
Rust
Snowball
Scala
Slang
Solidity
Spice
SPIR-V
Swift
LLVM TableGen
Toit
TypeScript Native
V
Vala
Visual Basic
Vyper
WASM
Zig
Javascript
GIMPLE
Ygen
sway
c++ source #2
Output
Compile to binary object
Link to binary
Execute the code
Intel asm syntax
Demangle identifiers
Verbose demangling
Filters
Unused labels
Library functions
Directives
Comments
Horizontal whitespace
Debug intrinsics
Compiler
6502-c++ 11.1.0
ARM GCC 10.2.0
ARM GCC 10.3.0
ARM GCC 10.4.0
ARM GCC 10.5.0
ARM GCC 11.1.0
ARM GCC 11.2.0
ARM GCC 11.3.0
ARM GCC 11.4.0
ARM GCC 12.1.0
ARM GCC 12.2.0
ARM GCC 12.3.0
ARM GCC 12.4.0
ARM GCC 13.1.0
ARM GCC 13.2.0
ARM GCC 13.2.0 (unknown-eabi)
ARM GCC 13.3.0
ARM GCC 13.3.0 (unknown-eabi)
ARM GCC 14.1.0
ARM GCC 14.1.0 (unknown-eabi)
ARM GCC 14.2.0
ARM GCC 14.2.0 (unknown-eabi)
ARM GCC 4.5.4
ARM GCC 4.6.4
ARM GCC 5.4
ARM GCC 6.3.0
ARM GCC 6.4.0
ARM GCC 7.3.0
ARM GCC 7.5.0
ARM GCC 8.2.0
ARM GCC 8.5.0
ARM GCC 9.3.0
ARM GCC 9.4.0
ARM GCC 9.5.0
ARM GCC trunk
ARM gcc 10.2.1 (none)
ARM gcc 10.3.1 (2021.07 none)
ARM gcc 10.3.1 (2021.10 none)
ARM gcc 11.2.1 (none)
ARM gcc 5.4.1 (none)
ARM gcc 7.2.1 (none)
ARM gcc 8.2 (WinCE)
ARM gcc 8.3.1 (none)
ARM gcc 9.2.1 (none)
ARM msvc v19.0 (WINE)
ARM msvc v19.10 (WINE)
ARM msvc v19.14 (WINE)
ARM64 Morello gcc 10.1 Alpha 2
ARM64 gcc 10.2
ARM64 gcc 10.3
ARM64 gcc 10.4
ARM64 gcc 10.5.0
ARM64 gcc 11.1
ARM64 gcc 11.2
ARM64 gcc 11.3
ARM64 gcc 11.4.0
ARM64 gcc 12.1
ARM64 gcc 12.2.0
ARM64 gcc 12.3.0
ARM64 gcc 12.4.0
ARM64 gcc 13.1.0
ARM64 gcc 13.2.0
ARM64 gcc 13.3.0
ARM64 gcc 14.1.0
ARM64 gcc 14.2.0
ARM64 gcc 4.9.4
ARM64 gcc 5.4
ARM64 gcc 5.5.0
ARM64 gcc 6.3
ARM64 gcc 6.4
ARM64 gcc 7.3
ARM64 gcc 7.5
ARM64 gcc 8.2
ARM64 gcc 8.5
ARM64 gcc 9.3
ARM64 gcc 9.4
ARM64 gcc 9.5
ARM64 gcc trunk
ARM64 msvc v19.14 (WINE)
AVR gcc 10.3.0
AVR gcc 11.1.0
AVR gcc 12.1.0
AVR gcc 12.2.0
AVR gcc 12.3.0
AVR gcc 12.4.0
AVR gcc 13.1.0
AVR gcc 13.2.0
AVR gcc 13.3.0
AVR gcc 14.1.0
AVR gcc 14.2.0
AVR gcc 4.5.4
AVR gcc 4.6.4
AVR gcc 5.4.0
AVR gcc 9.2.0
AVR gcc 9.3.0
Arduino Mega (1.8.9)
Arduino Uno (1.8.9)
BPF clang (trunk)
BPF clang 13.0.0
BPF clang 14.0.0
BPF clang 15.0.0
BPF clang 16.0.0
BPF clang 17.0.1
BPF clang 18.1.0
BPF clang 19.1.0
BPF clang 20.1.0
EDG (experimental reflection)
EDG 6.5
EDG 6.5 (GNU mode gcc 13)
EDG 6.6
EDG 6.6 (GNU mode gcc 13)
EDG 6.7
EDG 6.7 (GNU mode gcc 14)
FRC 2019
FRC 2020
FRC 2023
HPPA gcc 14.2.0
KVX ACB 4.1.0 (GCC 7.5.0)
KVX ACB 4.1.0-cd1 (GCC 7.5.0)
KVX ACB 4.10.0 (GCC 10.3.1)
KVX ACB 4.11.1 (GCC 10.3.1)
KVX ACB 4.12.0 (GCC 11.3.0)
KVX ACB 4.2.0 (GCC 7.5.0)
KVX ACB 4.3.0 (GCC 7.5.0)
KVX ACB 4.4.0 (GCC 7.5.0)
KVX ACB 4.6.0 (GCC 9.4.1)
KVX ACB 4.8.0 (GCC 9.4.1)
KVX ACB 4.9.0 (GCC 9.4.1)
KVX ACB 5.0.0 (GCC 12.2.1)
KVX ACB 5.2.0 (GCC 13.2.1)
LoongArch64 clang (trunk)
LoongArch64 clang 17.0.1
LoongArch64 clang 18.1.0
LoongArch64 clang 19.1.0
LoongArch64 clang 20.1.0
M68K gcc 13.1.0
M68K gcc 13.2.0
M68K gcc 13.3.0
M68K gcc 14.1.0
M68K gcc 14.2.0
M68k clang (trunk)
MRISC32 gcc (trunk)
MSP430 gcc 4.5.3
MSP430 gcc 5.3.0
MSP430 gcc 6.2.1
MinGW clang 14.0.3
MinGW clang 14.0.6
MinGW clang 15.0.7
MinGW clang 16.0.0
MinGW clang 16.0.2
MinGW gcc 11.3.0
MinGW gcc 12.1.0
MinGW gcc 12.2.0
MinGW gcc 13.1.0
RISC-V (32-bits) gcc (trunk)
RISC-V (32-bits) gcc 10.2.0
RISC-V (32-bits) gcc 10.3.0
RISC-V (32-bits) gcc 11.2.0
RISC-V (32-bits) gcc 11.3.0
RISC-V (32-bits) gcc 11.4.0
RISC-V (32-bits) gcc 12.1.0
RISC-V (32-bits) gcc 12.2.0
RISC-V (32-bits) gcc 12.3.0
RISC-V (32-bits) gcc 12.4.0
RISC-V (32-bits) gcc 13.1.0
RISC-V (32-bits) gcc 13.2.0
RISC-V (32-bits) gcc 13.3.0
RISC-V (32-bits) gcc 14.1.0
RISC-V (32-bits) gcc 14.2.0
RISC-V (32-bits) gcc 8.2.0
RISC-V (32-bits) gcc 8.5.0
RISC-V (32-bits) gcc 9.4.0
RISC-V (64-bits) gcc (trunk)
RISC-V (64-bits) gcc 10.2.0
RISC-V (64-bits) gcc 10.3.0
RISC-V (64-bits) gcc 11.2.0
RISC-V (64-bits) gcc 11.3.0
RISC-V (64-bits) gcc 11.4.0
RISC-V (64-bits) gcc 12.1.0
RISC-V (64-bits) gcc 12.2.0
RISC-V (64-bits) gcc 12.3.0
RISC-V (64-bits) gcc 12.4.0
RISC-V (64-bits) gcc 13.1.0
RISC-V (64-bits) gcc 13.2.0
RISC-V (64-bits) gcc 13.3.0
RISC-V (64-bits) gcc 14.1.0
RISC-V (64-bits) gcc 14.2.0
RISC-V (64-bits) gcc 8.2.0
RISC-V (64-bits) gcc 8.5.0
RISC-V (64-bits) gcc 9.4.0
RISC-V rv32gc clang (trunk)
RISC-V rv32gc clang 10.0.0
RISC-V rv32gc clang 10.0.1
RISC-V rv32gc clang 11.0.0
RISC-V rv32gc clang 11.0.1
RISC-V rv32gc clang 12.0.0
RISC-V rv32gc clang 12.0.1
RISC-V rv32gc clang 13.0.0
RISC-V rv32gc clang 13.0.1
RISC-V rv32gc clang 14.0.0
RISC-V rv32gc clang 15.0.0
RISC-V rv32gc clang 16.0.0
RISC-V rv32gc clang 17.0.1
RISC-V rv32gc clang 18.1.0
RISC-V rv32gc clang 19.1.0
RISC-V rv32gc clang 20.1.0
RISC-V rv32gc clang 9.0.0
RISC-V rv32gc clang 9.0.1
RISC-V rv64gc clang (trunk)
RISC-V rv64gc clang 10.0.0
RISC-V rv64gc clang 10.0.1
RISC-V rv64gc clang 11.0.0
RISC-V rv64gc clang 11.0.1
RISC-V rv64gc clang 12.0.0
RISC-V rv64gc clang 12.0.1
RISC-V rv64gc clang 13.0.0
RISC-V rv64gc clang 13.0.1
RISC-V rv64gc clang 14.0.0
RISC-V rv64gc clang 15.0.0
RISC-V rv64gc clang 16.0.0
RISC-V rv64gc clang 17.0.1
RISC-V rv64gc clang 18.1.0
RISC-V rv64gc clang 19.1.0
RISC-V rv64gc clang 20.1.0
RISC-V rv64gc clang 9.0.0
RISC-V rv64gc clang 9.0.1
Raspbian Buster
Raspbian Stretch
SPARC LEON gcc 12.2.0
SPARC LEON gcc 12.3.0
SPARC LEON gcc 12.4.0
SPARC LEON gcc 13.1.0
SPARC LEON gcc 13.2.0
SPARC LEON gcc 13.3.0
SPARC LEON gcc 14.1.0
SPARC LEON gcc 14.2.0
SPARC gcc 12.2.0
SPARC gcc 12.3.0
SPARC gcc 12.4.0
SPARC gcc 13.1.0
SPARC gcc 13.2.0
SPARC gcc 13.3.0
SPARC gcc 14.1.0
SPARC gcc 14.2.0
SPARC64 gcc 12.2.0
SPARC64 gcc 12.3.0
SPARC64 gcc 12.4.0
SPARC64 gcc 13.1.0
SPARC64 gcc 13.2.0
SPARC64 gcc 13.3.0
SPARC64 gcc 14.1.0
SPARC64 gcc 14.2.0
TI C6x gcc 12.2.0
TI C6x gcc 12.3.0
TI C6x gcc 12.4.0
TI C6x gcc 13.1.0
TI C6x gcc 13.2.0
TI C6x gcc 13.3.0
TI C6x gcc 14.1.0
TI C6x gcc 14.2.0
TI CL430 21.6.1
Tricore gcc 11.3.0 (EEESlab)
VAX gcc NetBSDELF 10.4.0
VAX gcc NetBSDELF 10.5.0 (Nov 15 03:50:22 2023)
WebAssembly clang (trunk)
Xtensa ESP32 gcc 11.2.0 (2022r1)
Xtensa ESP32 gcc 12.2.0 (20230208)
Xtensa ESP32 gcc 14.2.0 (20241119)
Xtensa ESP32 gcc 8.2.0 (2019r2)
Xtensa ESP32 gcc 8.2.0 (2020r1)
Xtensa ESP32 gcc 8.2.0 (2020r2)
Xtensa ESP32 gcc 8.4.0 (2020r3)
Xtensa ESP32 gcc 8.4.0 (2021r1)
Xtensa ESP32 gcc 8.4.0 (2021r2)
Xtensa ESP32-S2 gcc 11.2.0 (2022r1)
Xtensa ESP32-S2 gcc 12.2.0 (20230208)
Xtensa ESP32-S2 gcc 14.2.0 (20241119)
Xtensa ESP32-S2 gcc 8.2.0 (2019r2)
Xtensa ESP32-S2 gcc 8.2.0 (2020r1)
Xtensa ESP32-S2 gcc 8.2.0 (2020r2)
Xtensa ESP32-S2 gcc 8.4.0 (2020r3)
Xtensa ESP32-S2 gcc 8.4.0 (2021r1)
Xtensa ESP32-S2 gcc 8.4.0 (2021r2)
Xtensa ESP32-S3 gcc 11.2.0 (2022r1)
Xtensa ESP32-S3 gcc 12.2.0 (20230208)
Xtensa ESP32-S3 gcc 14.2.0 (20241119)
Xtensa ESP32-S3 gcc 8.4.0 (2020r3)
Xtensa ESP32-S3 gcc 8.4.0 (2021r1)
Xtensa ESP32-S3 gcc 8.4.0 (2021r2)
arm64 msvc v19.20 VS16.0
arm64 msvc v19.21 VS16.1
arm64 msvc v19.22 VS16.2
arm64 msvc v19.23 VS16.3
arm64 msvc v19.24 VS16.4
arm64 msvc v19.25 VS16.5
arm64 msvc v19.27 VS16.7
arm64 msvc v19.28 VS16.8
arm64 msvc v19.28 VS16.9
arm64 msvc v19.29 VS16.10
arm64 msvc v19.29 VS16.11
arm64 msvc v19.30 VS17.0
arm64 msvc v19.31 VS17.1
arm64 msvc v19.32 VS17.2
arm64 msvc v19.33 VS17.3
arm64 msvc v19.34 VS17.4
arm64 msvc v19.35 VS17.5
arm64 msvc v19.36 VS17.6
arm64 msvc v19.37 VS17.7
arm64 msvc v19.38 VS17.8
arm64 msvc v19.39 VS17.9
arm64 msvc v19.40 VS17.10
arm64 msvc v19.latest
armv7-a clang (trunk)
armv7-a clang 10.0.0
armv7-a clang 10.0.1
armv7-a clang 11.0.0
armv7-a clang 11.0.1
armv7-a clang 12.0.0
armv7-a clang 12.0.1
armv7-a clang 13.0.0
armv7-a clang 13.0.1
armv7-a clang 14.0.0
armv7-a clang 15.0.0
armv7-a clang 16.0.0
armv7-a clang 17.0.1
armv7-a clang 18.1.0
armv7-a clang 19.1.0
armv7-a clang 9.0.0
armv7-a clang 9.0.1
armv8-a clang (all architectural features, trunk)
armv8-a clang (trunk)
armv8-a clang 10.0.0
armv8-a clang 10.0.1
armv8-a clang 11.0.0
armv8-a clang 11.0.1
armv8-a clang 12.0.0
armv8-a clang 13.0.0
armv8-a clang 14.0.0
armv8-a clang 15.0.0
armv8-a clang 16.0.0
armv8-a clang 17.0.1
armv8-a clang 18.1.0
armv8-a clang 19.1.0
armv8-a clang 20.1.0
armv8-a clang 9.0.0
armv8-a clang 9.0.1
clad trunk (clang 19.1.0)
clad v1.8 (clang 18.1.0)
clad v1.9 (clang 19.1.0)
clang-cl 18.1.0
ellcc 0.1.33
ellcc 0.1.34
ellcc 2017-07-16
hexagon-clang 16.0.5
llvm-mos atari2600-3e
llvm-mos atari2600-4k
llvm-mos atari2600-common
llvm-mos atari5200-supercart
llvm-mos atari8-cart-megacart
llvm-mos atari8-cart-std
llvm-mos atari8-cart-xegs
llvm-mos atari8-common
llvm-mos atari8-dos
llvm-mos c128
llvm-mos c64
llvm-mos commodore
llvm-mos cpm65
llvm-mos cx16
llvm-mos dodo
llvm-mos eater
llvm-mos mega65
llvm-mos nes
llvm-mos nes-action53
llvm-mos nes-cnrom
llvm-mos nes-gtrom
llvm-mos nes-mmc1
llvm-mos nes-mmc3
llvm-mos nes-nrom
llvm-mos nes-unrom
llvm-mos nes-unrom-512
llvm-mos osi-c1p
llvm-mos pce
llvm-mos pce-cd
llvm-mos pce-common
llvm-mos pet
llvm-mos rp6502
llvm-mos rpc8e
llvm-mos supervision
llvm-mos vic20
loongarch64 gcc 12.2.0
loongarch64 gcc 12.3.0
loongarch64 gcc 12.4.0
loongarch64 gcc 13.1.0
loongarch64 gcc 13.2.0
loongarch64 gcc 13.3.0
loongarch64 gcc 14.1.0
loongarch64 gcc 14.2.0
mips clang 13.0.0
mips clang 14.0.0
mips clang 15.0.0
mips clang 16.0.0
mips clang 17.0.1
mips clang 18.1.0
mips clang 19.1.0
mips clang 20.1.0
mips gcc 11.2.0
mips gcc 12.1.0
mips gcc 12.2.0
mips gcc 12.3.0
mips gcc 12.4.0
mips gcc 13.1.0
mips gcc 13.2.0
mips gcc 13.3.0
mips gcc 14.1.0
mips gcc 14.2.0
mips gcc 4.9.4
mips gcc 5.4
mips gcc 5.5.0
mips gcc 9.3.0 (codescape)
mips gcc 9.5.0
mips64 (el) gcc 12.1.0
mips64 (el) gcc 12.2.0
mips64 (el) gcc 12.3.0
mips64 (el) gcc 12.4.0
mips64 (el) gcc 13.1.0
mips64 (el) gcc 13.2.0
mips64 (el) gcc 13.3.0
mips64 (el) gcc 14.1.0
mips64 (el) gcc 14.2.0
mips64 (el) gcc 4.9.4
mips64 (el) gcc 5.4.0
mips64 (el) gcc 5.5.0
mips64 (el) gcc 9.5.0
mips64 clang 13.0.0
mips64 clang 14.0.0
mips64 clang 15.0.0
mips64 clang 16.0.0
mips64 clang 17.0.1
mips64 clang 18.1.0
mips64 clang 19.1.0
mips64 clang 20.1.0
mips64 gcc 11.2.0
mips64 gcc 12.1.0
mips64 gcc 12.2.0
mips64 gcc 12.3.0
mips64 gcc 12.4.0
mips64 gcc 13.1.0
mips64 gcc 13.2.0
mips64 gcc 13.3.0
mips64 gcc 14.1.0
mips64 gcc 14.2.0
mips64 gcc 4.9.4
mips64 gcc 5.4.0
mips64 gcc 5.5.0
mips64 gcc 9.5.0
mips64el clang 13.0.0
mips64el clang 14.0.0
mips64el clang 15.0.0
mips64el clang 16.0.0
mips64el clang 17.0.1
mips64el clang 18.1.0
mips64el clang 19.1.0
mips64el clang 20.1.0
mipsel clang 13.0.0
mipsel clang 14.0.0
mipsel clang 15.0.0
mipsel clang 16.0.0
mipsel clang 17.0.1
mipsel clang 18.1.0
mipsel clang 19.1.0
mipsel clang 20.1.0
mipsel gcc 12.1.0
mipsel gcc 12.2.0
mipsel gcc 12.3.0
mipsel gcc 12.4.0
mipsel gcc 13.1.0
mipsel gcc 13.2.0
mipsel gcc 13.3.0
mipsel gcc 14.1.0
mipsel gcc 14.2.0
mipsel gcc 4.9.4
mipsel gcc 5.4.0
mipsel gcc 5.5.0
mipsel gcc 9.5.0
nanoMIPS gcc 6.3.0 (mtk)
power gcc 11.2.0
power gcc 12.1.0
power gcc 12.2.0
power gcc 12.3.0
power gcc 12.4.0
power gcc 13.1.0
power gcc 13.2.0
power gcc 13.3.0
power gcc 14.1.0
power gcc 14.2.0
power gcc 4.8.5
power64 AT12.0 (gcc8)
power64 AT13.0 (gcc9)
power64 gcc 11.2.0
power64 gcc 12.1.0
power64 gcc 12.2.0
power64 gcc 12.3.0
power64 gcc 12.4.0
power64 gcc 13.1.0
power64 gcc 13.2.0
power64 gcc 13.3.0
power64 gcc 14.1.0
power64 gcc 14.2.0
power64 gcc trunk
power64le AT12.0 (gcc8)
power64le AT13.0 (gcc9)
power64le clang (trunk)
power64le gcc 11.2.0
power64le gcc 12.1.0
power64le gcc 12.2.0
power64le gcc 12.3.0
power64le gcc 12.4.0
power64le gcc 13.1.0
power64le gcc 13.2.0
power64le gcc 13.3.0
power64le gcc 14.1.0
power64le gcc 14.2.0
power64le gcc 6.3.0
power64le gcc trunk
powerpc64 clang (trunk)
qnx 8.0.0
s390x gcc 11.2.0
s390x gcc 12.1.0
s390x gcc 12.2.0
s390x gcc 12.3.0
s390x gcc 12.4.0
s390x gcc 13.1.0
s390x gcc 13.2.0
s390x gcc 13.3.0
s390x gcc 14.1.0
s390x gcc 14.2.0
sh gcc 12.2.0
sh gcc 12.3.0
sh gcc 12.4.0
sh gcc 13.1.0
sh gcc 13.2.0
sh gcc 13.3.0
sh gcc 14.1.0
sh gcc 14.2.0
sh gcc 4.9.4
sh gcc 9.5.0
vast (trunk)
x64 msvc v19.0 (WINE)
x64 msvc v19.10 (WINE)
x64 msvc v19.14 (WINE)
x64 msvc v19.20 VS16.0
x64 msvc v19.21 VS16.1
x64 msvc v19.22 VS16.2
x64 msvc v19.23 VS16.3
x64 msvc v19.24 VS16.4
x64 msvc v19.25 VS16.5
x64 msvc v19.27 VS16.7
x64 msvc v19.28 VS16.8
x64 msvc v19.28 VS16.9
x64 msvc v19.29 VS16.10
x64 msvc v19.29 VS16.11
x64 msvc v19.30 VS17.0
x64 msvc v19.31 VS17.1
x64 msvc v19.32 VS17.2
x64 msvc v19.33 VS17.3
x64 msvc v19.34 VS17.4
x64 msvc v19.35 VS17.5
x64 msvc v19.36 VS17.6
x64 msvc v19.37 VS17.7
x64 msvc v19.38 VS17.8
x64 msvc v19.39 VS17.9
x64 msvc v19.40 VS17.10
x64 msvc v19.latest
x86 djgpp 4.9.4
x86 djgpp 5.5.0
x86 djgpp 6.4.0
x86 djgpp 7.2.0
x86 msvc v19.0 (WINE)
x86 msvc v19.10 (WINE)
x86 msvc v19.14 (WINE)
x86 msvc v19.20 VS16.0
x86 msvc v19.21 VS16.1
x86 msvc v19.22 VS16.2
x86 msvc v19.23 VS16.3
x86 msvc v19.24 VS16.4
x86 msvc v19.25 VS16.5
x86 msvc v19.27 VS16.7
x86 msvc v19.28 VS16.8
x86 msvc v19.28 VS16.9
x86 msvc v19.29 VS16.10
x86 msvc v19.29 VS16.11
x86 msvc v19.30 VS17.0
x86 msvc v19.31 VS17.1
x86 msvc v19.32 VS17.2
x86 msvc v19.33 VS17.3
x86 msvc v19.34 VS17.4
x86 msvc v19.35 VS17.5
x86 msvc v19.36 VS17.6
x86 msvc v19.37 VS17.7
x86 msvc v19.38 VS17.8
x86 msvc v19.39 VS17.9
x86 msvc v19.40 VS17.10
x86 msvc v19.latest
x86 nvc++ 22.11
x86 nvc++ 22.7
x86 nvc++ 22.9
x86 nvc++ 23.1
x86 nvc++ 23.11
x86 nvc++ 23.3
x86 nvc++ 23.5
x86 nvc++ 23.7
x86 nvc++ 23.9
x86 nvc++ 24.1
x86 nvc++ 24.11
x86 nvc++ 24.3
x86 nvc++ 24.5
x86 nvc++ 24.7
x86 nvc++ 24.9
x86 nvc++ 25.1
x86-64 Zapcc 190308
x86-64 clang (Chris Bazley N3089)
x86-64 clang (EricWF contracts)
x86-64 clang (amd-staging)
x86-64 clang (assertions trunk)
x86-64 clang (clangir)
x86-64 clang (experimental -Wlifetime)
x86-64 clang (experimental P1061)
x86-64 clang (experimental P1144)
x86-64 clang (experimental P1221)
x86-64 clang (experimental P2996)
x86-64 clang (experimental P2998)
x86-64 clang (experimental P3068)
x86-64 clang (experimental P3309)
x86-64 clang (experimental P3367)
x86-64 clang (experimental P3372)
x86-64 clang (experimental metaprogramming - P2632)
x86-64 clang (old concepts branch)
x86-64 clang (p1974)
x86-64 clang (pattern matching - P2688)
x86-64 clang (reflection)
x86-64 clang (resugar)
x86-64 clang (string interpolation - P3412)
x86-64 clang (thephd.dev)
x86-64 clang (trunk)
x86-64 clang (variadic friends - P2893)
x86-64 clang (widberg)
x86-64 clang 10.0.0
x86-64 clang 10.0.0 (assertions)
x86-64 clang 10.0.1
x86-64 clang 11.0.0
x86-64 clang 11.0.0 (assertions)
x86-64 clang 11.0.1
x86-64 clang 12.0.0
x86-64 clang 12.0.0 (assertions)
x86-64 clang 12.0.1
x86-64 clang 13.0.0
x86-64 clang 13.0.0 (assertions)
x86-64 clang 13.0.1
x86-64 clang 14.0.0
x86-64 clang 14.0.0 (assertions)
x86-64 clang 15.0.0
x86-64 clang 15.0.0 (assertions)
x86-64 clang 16.0.0
x86-64 clang 16.0.0 (assertions)
x86-64 clang 17.0.1
x86-64 clang 17.0.1 (assertions)
x86-64 clang 18.1.0
x86-64 clang 18.1.0 (assertions)
x86-64 clang 19.1.0
x86-64 clang 19.1.0 (assertions)
x86-64 clang 2.6.0 (assertions)
x86-64 clang 2.7.0 (assertions)
x86-64 clang 2.8.0 (assertions)
x86-64 clang 2.9.0 (assertions)
x86-64 clang 20.1.0
x86-64 clang 20.1.0 (assertions)
x86-64 clang 3.0.0
x86-64 clang 3.0.0 (assertions)
x86-64 clang 3.1
x86-64 clang 3.1 (assertions)
x86-64 clang 3.2
x86-64 clang 3.2 (assertions)
x86-64 clang 3.3
x86-64 clang 3.3 (assertions)
x86-64 clang 3.4 (assertions)
x86-64 clang 3.4.1
x86-64 clang 3.5
x86-64 clang 3.5 (assertions)
x86-64 clang 3.5.1
x86-64 clang 3.5.2
x86-64 clang 3.6
x86-64 clang 3.6 (assertions)
x86-64 clang 3.7
x86-64 clang 3.7 (assertions)
x86-64 clang 3.7.1
x86-64 clang 3.8
x86-64 clang 3.8 (assertions)
x86-64 clang 3.8.1
x86-64 clang 3.9.0
x86-64 clang 3.9.0 (assertions)
x86-64 clang 3.9.1
x86-64 clang 4.0.0
x86-64 clang 4.0.0 (assertions)
x86-64 clang 4.0.1
x86-64 clang 5.0.0
x86-64 clang 5.0.0 (assertions)
x86-64 clang 5.0.1
x86-64 clang 5.0.2
x86-64 clang 6.0.0
x86-64 clang 6.0.0 (assertions)
x86-64 clang 6.0.1
x86-64 clang 7.0.0
x86-64 clang 7.0.0 (assertions)
x86-64 clang 7.0.1
x86-64 clang 7.1.0
x86-64 clang 8.0.0
x86-64 clang 8.0.0 (assertions)
x86-64 clang 8.0.1
x86-64 clang 9.0.0
x86-64 clang 9.0.0 (assertions)
x86-64 clang 9.0.1
x86-64 clang rocm-4.5.2
x86-64 clang rocm-5.0.2
x86-64 clang rocm-5.1.3
x86-64 clang rocm-5.2.3
x86-64 clang rocm-5.3.3
x86-64 clang rocm-5.7.0
x86-64 clang rocm-6.0.2
x86-64 clang rocm-6.1.2
x86-64 gcc (contract labels)
x86-64 gcc (contracts natural syntax)
x86-64 gcc (contracts)
x86-64 gcc (coroutines)
x86-64 gcc (modules)
x86-64 gcc (trunk)
x86-64 gcc 10.1
x86-64 gcc 10.2
x86-64 gcc 10.3
x86-64 gcc 10.3 (assertions)
x86-64 gcc 10.4
x86-64 gcc 10.4 (assertions)
x86-64 gcc 10.5
x86-64 gcc 10.5 (assertions)
x86-64 gcc 11.1
x86-64 gcc 11.1 (assertions)
x86-64 gcc 11.2
x86-64 gcc 11.2 (assertions)
x86-64 gcc 11.3
x86-64 gcc 11.3 (assertions)
x86-64 gcc 11.4
x86-64 gcc 11.4 (assertions)
x86-64 gcc 12.1
x86-64 gcc 12.1 (assertions)
x86-64 gcc 12.2
x86-64 gcc 12.2 (assertions)
x86-64 gcc 12.3
x86-64 gcc 12.3 (assertions)
x86-64 gcc 12.4
x86-64 gcc 12.4 (assertions)
x86-64 gcc 13.1
x86-64 gcc 13.1 (assertions)
x86-64 gcc 13.2
x86-64 gcc 13.2 (assertions)
x86-64 gcc 13.3
x86-64 gcc 13.3 (assertions)
x86-64 gcc 14.1
x86-64 gcc 14.1 (assertions)
x86-64 gcc 14.2
x86-64 gcc 14.2 (assertions)
x86-64 gcc 3.4.6
x86-64 gcc 4.0.4
x86-64 gcc 4.1.2
x86-64 gcc 4.4.7
x86-64 gcc 4.5.3
x86-64 gcc 4.6.4
x86-64 gcc 4.7.1
x86-64 gcc 4.7.2
x86-64 gcc 4.7.3
x86-64 gcc 4.7.4
x86-64 gcc 4.8.1
x86-64 gcc 4.8.2
x86-64 gcc 4.8.3
x86-64 gcc 4.8.4
x86-64 gcc 4.8.5
x86-64 gcc 4.9.0
x86-64 gcc 4.9.1
x86-64 gcc 4.9.2
x86-64 gcc 4.9.3
x86-64 gcc 4.9.4
x86-64 gcc 5.1
x86-64 gcc 5.2
x86-64 gcc 5.3
x86-64 gcc 5.4
x86-64 gcc 5.5
x86-64 gcc 6.1
x86-64 gcc 6.2
x86-64 gcc 6.3
x86-64 gcc 6.4
x86-64 gcc 6.5
x86-64 gcc 7.1
x86-64 gcc 7.2
x86-64 gcc 7.3
x86-64 gcc 7.4
x86-64 gcc 7.5
x86-64 gcc 8.1
x86-64 gcc 8.2
x86-64 gcc 8.3
x86-64 gcc 8.4
x86-64 gcc 8.5
x86-64 gcc 9.1
x86-64 gcc 9.2
x86-64 gcc 9.3
x86-64 gcc 9.4
x86-64 gcc 9.5
x86-64 icc 13.0.1
x86-64 icc 16.0.3
x86-64 icc 17.0.0
x86-64 icc 18.0.0
x86-64 icc 19.0.0
x86-64 icc 19.0.1
x86-64 icc 2021.1.2
x86-64 icc 2021.10.0
x86-64 icc 2021.2.0
x86-64 icc 2021.3.0
x86-64 icc 2021.4.0
x86-64 icc 2021.5.0
x86-64 icc 2021.6.0
x86-64 icc 2021.7.0
x86-64 icc 2021.7.1
x86-64 icc 2021.8.0
x86-64 icc 2021.9.0
x86-64 icx 2021.1.2
x86-64 icx 2021.2.0
x86-64 icx 2021.3.0
x86-64 icx 2021.4.0
x86-64 icx 2022.0.0
x86-64 icx 2022.1.0
x86-64 icx 2022.2.0
x86-64 icx 2022.2.1
x86-64 icx 2023.0.0
x86-64 icx 2023.1.0
x86-64 icx 2023.2.1
x86-64 icx 2024.0.0
x86-64 icx 2024.1.0
x86-64 icx 2024.2.0
x86-64 icx 2024.2.1
x86-64 icx 2025.0.0
x86-64 icx 2025.0.1
x86-64 icx 2025.0.3
x86-64 icx 2025.0.4
x86-64 icx 2025.0.4
zig c++ 0.10.0
zig c++ 0.11.0
zig c++ 0.12.0
zig c++ 0.12.1
zig c++ 0.13.0
zig c++ 0.14.0
zig c++ 0.6.0
zig c++ 0.7.0
zig c++ 0.7.1
zig c++ 0.8.0
zig c++ 0.9.0
zig c++ trunk
Options
Source code
// Type your code here, or load an example. #include <cinttypes> #if !defined( __aarch64__) && !defined(__arm__) #include <xmmintrin.h> #include <tmmintrin.h> #include <pmmintrin.h> #include <smmintrin.h> #else #include <arm_neon.h> #endif #include <array> #include <limits> typedef uint8_t v16ui __attribute__((vector_size(16))) __attribute__((aligned(16))); typedef uint16_t v8ui __attribute__((vector_size(16))) __attribute__((aligned(16))); typedef uint32_t v4ui __attribute__((vector_size(16))) __attribute__((aligned(16))); typedef uint64_t v2ui __attribute__((vector_size(16))) __attribute__((aligned(16))); typedef int8_t v16si __attribute__((vector_size(16))) __attribute__((aligned(16))); typedef int16_t v8si __attribute__((vector_size(16))) __attribute__((aligned(16))); typedef int32_t v4si __attribute__((vector_size(16))) __attribute__((aligned(16))); typedef int64_t v2si __attribute__((vector_size(16))) __attribute__((aligned(16))); v2ui expand64_u(v16ui a, uint8_t imm) { switch(imm) { case 0: return {a[0], a[1]}; case 1: return {a[2], a[3]}; case 2: return {a[4], a[5]}; case 3: return {a[6], a[7]}; case 4: return {a[8], a[9]}; case 5: return {a[10], a[11]}; case 6: return {a[12],a[13]}; case 7: return {a[14],a[15]}; } return {0}; } v2ui expand64_u(v8ui a, uint8_t imm) { switch(imm) { case 0: return {a[0], a[1]}; case 1: return {a[2], a[3]}; case 2: return {a[4], a[5]}; case 3: return {a[6], a[7]}; } return {0}; } v2si expand64_s(v16si a, uint8_t imm) { switch(imm) { case 0: return {a[0], a[1]}; case 1: return {a[2], a[3]}; case 2: return {a[4], a[5]}; case 3: return {a[6], a[7]}; case 4: return {a[8], a[9]}; case 5: return {a[10], a[11]}; case 6: return {a[12],a[13]}; case 7: return {a[14],a[15]}; } return {0}; } v2si expand64_s(v8si a, uint8_t imm) { switch(imm) { case 0: return {a[0], a[1]}; case 1: return {a[2], a[3]}; case 2: return {a[4], a[5]}; case 3: return {a[6], a[7]}; } return {0}; } v4ui expand32_u(v16ui a, uint8_t imm) { switch(imm) { case 0: return {a[0], a[1], a[2], a[3]}; case 1: return {a[4], a[5], a[6], a[7]}; case 2: return {a[8], a[9], a[10], a[11]}; case 3: return {a[12], a[13], a[14], a[15]}; } return {0}; } v4si expand32_s(v16si a, uint8_t imm) { switch(imm) { case 0: return {a[0], a[1], a[2], a[3]}; case 1: return {a[4], a[5], a[6], a[7]}; case 2: return {a[8], a[9], a[10], a[11]}; case 3: return {a[12], a[13], a[14], a[15]}; } return {0}; } #if defined(__aarch64__) v4ui expand64_u_insr(v16ui a, uint8_t imm) { const v2ui adder{2,2}; const v2ui tblMask1 = {0xffffffffffffff00,0xffffffffffffff01}; auto tblMask2 = tblMask1 + adder; auto tblMask3 = tblMask2 + adder; auto tblMask4 = tblMask3 + adder; auto tblMask5 = tblMask4 + adder; auto tblMask6 = tblMask5 + adder; auto tblMask7 = tblMask6 + adder; auto tblMask8 = tblMask7 + adder; switch(imm) { case 0: return vqtbl1q_u8(a,tblMask1); case 1: return vqtbl1q_u8(a,tblMask2); case 2: return vqtbl1q_u8(a,tblMask3); case 3: return vqtbl1q_u8(a,tblMask4); case 4: return vqtbl1q_u8(a,tblMask5); case 5: return vqtbl1q_u8(a,tblMask6); case 6: return vqtbl1q_u8(a,tblMask7); case 7: return vqtbl1q_u8(a,tblMask8); } return {0}; } v4si expand64_s_insr(v16si a, uint8_t imm) { const v2ui adder{0x02ffffffffffffff,0x02ffffffffffffff}; const v2ui tblMask1 = {0x00ffffffffffffff,0x01ffffffffffffff}; auto tblMask2 = tblMask1 + adder; auto tblMask3 = tblMask2 + adder; auto tblMask4 = tblMask3 + adder; auto tblMask5 = tblMask4 + adder; auto tblMask6 = tblMask5 + adder; auto tblMask7 = tblMask6 + adder; auto tblMask8 = tblMask7 + adder; switch(imm) { case 0: return ((v2si)vqtbl1q_u8(a,tblMask1)) >> 56; case 1: return ((v2si)vqtbl1q_u8(a,tblMask2)) >> 56; case 2: return ((v2si)vqtbl1q_u8(a,tblMask3)) >> 56; case 3: return ((v2si)vqtbl1q_u8(a,tblMask4)) >> 56; case 4: return ((v2si)vqtbl1q_u8(a,tblMask5)) >> 56; case 5: return ((v2si)vqtbl1q_u8(a,tblMask6)) >> 56; case 6: return ((v2si)vqtbl1q_u8(a,tblMask7)) >> 56; case 7: return ((v2si)vqtbl1q_u8(a,tblMask8)) >> 56; } return {0}; } v4ui expand64_u_insr(v8ui a, uint8_t imm) { const v2ui adder{0x0202,0x0202}; const v2ui tblMask1 = {0xffffffffffff0100,0xffffffffffff0302}; auto tblMask2 = tblMask1 + adder; auto tblMask3 = tblMask2 + adder; auto tblMask4 = tblMask3 + adder; switch(imm) { case 0: return vqtbl1q_u8(a,tblMask1); case 1: return vqtbl1q_u8(a,tblMask2); case 2: return vqtbl1q_u8(a,tblMask3); case 3: return vqtbl1q_u8(a,tblMask4); } return {0}; } v4si expand64_s_insr(v8si a, uint8_t imm) { uint64_t adderval1 = 0x0202l << 48; const v2ui adder{adderval1,adderval1}; const v2ui tblMask1 = {0x0100ffffffffffff,0x0302ffffffffffff}; auto tblMask2 = tblMask1 + adder; auto tblMask3 = tblMask2 + adder; auto tblMask4 = tblMask3 + adder; switch(imm) { case 0: return ((v2si)vqtbl1q_u8(a,tblMask1)) >> 48; case 1: return ((v2si)vqtbl1q_u8(a,tblMask2)) >> 48; case 2: return ((v2si)vqtbl1q_u8(a,tblMask3)) >> 48; case 3: return ((v2si)vqtbl1q_u8(a,tblMask4)) >> 48; } return {0}; } v4ui expand32_u_insr(v16ui a, uint8_t imm) { const v4ui tblMask1 = {0xffffff00,0xffffff01,0xffffff02, 0xffffff03}, tblMask2 = {0xffffff04,0xffffff05,0xffffff06, 0xffffff07}, tblMask3 = {0xffffff08,0xffffff09,0xffffff0a, 0xffffff0b}, tblMask4 = {0xffffff0c,0xffffff0d,0xffffff0e, 0xffffff0f}; switch(imm) { case 0: return vqtbl1q_u8(a,tblMask1); case 1: return vqtbl1q_u8(a,tblMask2); case 2: return vqtbl1q_u8(a,tblMask3); case 3: return vqtbl1q_u8(a,tblMask4); } return {0}; } v4si expand32_s_insr(v16si a, uint8_t imm) { const v4ui tblMask1 = {0x00ffffff,0x01ffffff,0x02ffffff, 0x03ffffff}, tblMask2 = {0x04ffffff,0x05ffffff,0x06ffffff, 0x07ffffff}, tblMask3 = {0x08ffffff,0x09ffffff,0x0affffff, 0x0bffffff}, tblMask4 = {0x0cffffff,0x0dffffff,0x0effffff, 0x0fffffff}; switch(imm) { case 0: return ((v4si)vqtbl1q_u8(a,tblMask1))>>24; case 1: return ((v4si)vqtbl1q_u8(a,tblMask2))>>24; case 2: return ((v4si)vqtbl1q_u8(a,tblMask3))>>24; case 3: return ((v4si)vqtbl1q_u8(a,tblMask4))>>24; } return {0}; } #elif defined(__x86_64__) __attribute__((target("ssse3"))) v4ui expand32_u_insr(v16ui a, uint8_t imm) { const v4ui tblMask1 = {0xffffff00,0xffffff01,0xffffff02, 0xffffff03}, tblMask2 = {0xffffff04,0xffffff05,0xffffff06, 0xffffff07}, tblMask3 = {0xffffff08,0xffffff09,0xffffff0a, 0xffffff0b}, tblMask4 = {0xffffff0c,0xffffff0d,0xffffff0e, 0xffffff0f}; switch(imm) { case 0: return _mm_shuffle_epi8(a,tblMask1); case 1: return _mm_shuffle_epi8(a,tblMask2); case 2: return _mm_shuffle_epi8(a,tblMask3); case 3: return _mm_shuffle_epi8(a,tblMask4); } return {0}; } __attribute__((target("ssse3"))) v4si expand32_s_insr(v16ui a, uint8_t imm) { const v4ui tblMask1 = {0x00ffffff,0x01ffffff,0x02ffffff, 0x03ffffff}, tblMask2 = {0x04ffffff,0x05ffffff,0x06ffffff, 0x07ffffff}, tblMask3 = {0x08ffffff,0x09ffffff,0x0affffff, 0x0bffffff}, tblMask4 = {0x0cffffff,0x0dffffff,0x0effffff, 0x0fffffff}; switch(imm) { case 0: return ((v4si)_mm_shuffle_epi8((__m128i)a,(__m128i)tblMask1))>>24; case 1: return ((v4si)_mm_shuffle_epi8((__m128i)a,(__m128i)tblMask2))>>24; case 2: return ((v4si)_mm_shuffle_epi8((__m128i)a,(__m128i)tblMask3))>>24; case 3: return ((v4si)_mm_shuffle_epi8((__m128i)a,(__m128i)tblMask4))>>24; } return {0}; } __attribute__((target("ssse3"))) v4ui expand64_u_insr(v16ui a, uint8_t imm) { const v2ui adder{2,2}; const v2ui tblMask1 = {0xffffffffffffff00,0xffffffffffffff01}; auto tblMask2 = tblMask1 + adder; auto tblMask3 = tblMask2 + adder; auto tblMask4 = tblMask3 + adder; auto tblMask5 = tblMask4 + adder; auto tblMask6 = tblMask5 + adder; auto tblMask7 = tblMask6 + adder; auto tblMask8 = tblMask7 + adder; switch(imm) { case 0: return _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask1); case 1: return _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask2); case 2: return _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask3); case 3: return _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask4); case 4: return _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask5); case 5: return _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask6); case 6: return _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask7); case 7: return _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask8); } return {0}; } __attribute__((target("ssse3"))) inline v2si expand64_s_insr_old(v16si a, uint8_t imm) { const v2ui adder{0x02ffffff02ffffff,0x02ffffff02ffffff}; const v2ui tblMask1 = {0x01ffffff00ffffff,0x03ffffff02ffffff}; auto tblMask2 = tblMask1 + adder; auto tblMask3 = tblMask2 + adder; auto tblMask4 = tblMask3 + adder; auto tblMask5 = tblMask4 + adder; auto tblMask6 = tblMask5 + adder; auto tblMask7 = tblMask6 + adder; auto tblMask8 = tblMask7 + adder; __m128i shuffleOut; #if 0 switch(imm) { case 0: shuffleOut = _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask1); break; case 1: shuffleOut = (_mm_shuffle_epi8((__m128i)a,(__m128i)tblMask2)); break; case 2: shuffleOut = _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask3); break; case 3: shuffleOut = (_mm_shuffle_epi8((__m128i)a,(__m128i)tblMask4)); break; case 4: shuffleOut = _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask5); break; case 5: shuffleOut = (_mm_shuffle_epi8((__m128i)a,(__m128i)tblMask6)); break; case 6: shuffleOut = _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask7); break; case 7: shuffleOut = _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask8); break; default: return {0}; } #else switch(imm) { case 0: case 1: shuffleOut = _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask1); break; case 2: case 3: shuffleOut = _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask3); break; case 4: case 5: shuffleOut = _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask5); break; case 6: case 7: shuffleOut = _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask7); break; default: return {0}; } #endif auto lowerPart = shuffleOut; auto upperPart = lowerPart; lowerPart = _mm_srai_epi32(lowerPart,24); upperPart = _mm_srai_epi32(upperPart,31); return (imm % 2 == 1) ? (v2si) _mm_unpackhi_epi32(upperPart,lowerPart) : (v2si) _mm_unpacklo_epi32(upperPart,lowerPart); } __attribute__((target("ssse3"))) inline std::array<v2si,2> expand64_s_insr_old_return2(v16si a, uint8_t imm) { const v2ui adder{0x0400000004000000,0x0400000004000000}; const v2ui tblMask1 = {0x01ffffff00ffffff,0x03ffffff02ffffff}; auto tblMask2 = tblMask1 + adder; auto tblMask3 = tblMask2 + adder; auto tblMask4 = tblMask3 + adder; auto tblMask5 = tblMask4 + adder; auto tblMask6 = tblMask5 + adder; auto tblMask7 = tblMask6 + adder; auto tblMask8 = tblMask7 + adder; __m128i shuffleOut; #if 0 switch(imm) { case 0: shuffleOut = _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask1); break; case 1: shuffleOut = (_mm_shuffle_epi8((__m128i)a,(__m128i)tblMask2)); break; case 2: shuffleOut = _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask3); break; case 3: shuffleOut = (_mm_shuffle_epi8((__m128i)a,(__m128i)tblMask4)); break; case 4: shuffleOut = _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask5); break; case 5: shuffleOut = (_mm_shuffle_epi8((__m128i)a,(__m128i)tblMask6)); break; case 6: shuffleOut = _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask7); break; case 7: shuffleOut = _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask8); break; default: return {0}; } #else switch(imm) { case 0: case 1: shuffleOut = _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask1); break; case 2: case 3: shuffleOut = _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask2); break; case 4: case 5: shuffleOut = _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask3); break; case 6: case 7: shuffleOut = _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask4); break; default: return {0}; } #endif auto lowerPart = shuffleOut; auto upperPart = lowerPart; lowerPart = _mm_srai_epi32(lowerPart,24); upperPart = _mm_srai_epi32(upperPart,31); return std::array<v2si,2>{_mm_unpacklo_epi32(upperPart,lowerPart),_mm_unpackhi_epi32(upperPart,lowerPart)}; } __attribute__((target("default"))) v2ui expand64_u_insr(v8ui a, uint8_t imm) { return {a[imm*2],a[imm*2+1]}; } __attribute__((target("ssse3"))) v2ui expand64_u_insr(v8ui a, uint8_t imm) { const v2ui adder{0x0202,0x0202}; const v2ui tblMask1 = {0xffffffffffff0100,0xffffffffffff0302}; auto tblMask2 = tblMask1 + adder; auto tblMask3 = tblMask2 + adder; auto tblMask4 = tblMask3 + adder; switch(imm) { case 0: return _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask1); case 1: return _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask2); case 2: return _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask3); case 3: return _mm_shuffle_epi8((__m128i)a,(__m128i)tblMask4); } return {0}; } inline v8ui genMask64(v8si a, uint8_t imm) { return {uint16_t(imm*2),uint16_t(imm*2+1),0xffff,0xffff,uint16_t(imm*2+2),uint16_t(imm*2+3),0xffff,0xffff}; } inline v16ui genMask64(v16si a, uint8_t imm) { return {uint8_t(imm*2),0xff,0xff,0xff,0xff,0xff,0xff,0xff, uint8_t(imm*2+1),0xff,0xff,0xff,0xff,0xff,0xff,0xff}; } __attribute__((target("sse4"))) v4si expand64_s_insr(v16si const & a, uint8_t const & imm) { __m128i *pPtr = (__m128i*)(((int16_t*)&a)+imm); return _mm_cvtepi8_epi64(*pPtr); } __attribute__((target("sse4"))) v4si expand64_s_insr(v8si const & a, uint8_t const & imm) { __m128i *pPtr = (__m128i*)(((int32_t*)&a)+imm); return _mm_cvtepi16_epi64(*pPtr); } #endif #if defined(__arm__) v2ui expand64_u_insr(v16ui a, uint8_t imm) { const v2ui adder{2,2}; const v2ui tblMask1 = {0xffffffffffffff00,0xffffffffffffff01}; auto tblMask2 = tblMask1 + adder; auto tblMask3 = tblMask2 + adder; auto tblMask4 = tblMask3 + adder; auto tblMask5 = tblMask4 + adder; auto tblMask6 = tblMask5 + adder; auto tblMask7 = tblMask6 + adder; auto tblMask8 = tblMask7 + adder; switch(imm) { case 0: return vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask1)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask1))); case 1: return vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask2)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask2))); case 2: return vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask3)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask3))); case 3: return vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask4)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask4))); case 4: return vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask5)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask5))); case 5: return vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask6)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask6))); case 6: return vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask7)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask7))); case 7: return vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask8)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask8))); } return {0}; } v2si expand64_s_insr(v16si a, uint8_t imm) { const v2ui adder{0x02ffffffffffffff,0x02ffffffffffffff}; const v2ui tblMask1 = {0x00ffffffffffffff,0x01ffffffffffffff}; auto tblMask2 = tblMask1 + adder; auto tblMask3 = tblMask2 + adder; auto tblMask4 = tblMask3 + adder; auto tblMask5 = tblMask4 + adder; auto tblMask6 = tblMask5 + adder; auto tblMask7 = tblMask6 + adder; auto tblMask8 = tblMask7 + adder; switch(imm) { case 0: return ((v2si)vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask1)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask1))))>>56; case 1: return ((v2si)vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask2)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask2))))>>56; case 2: return ((v2si)vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask3)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask3))))>>56; case 3: return ((v2si)vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask4)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask4))))>>56; case 4: return ((v2si)vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask5)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask5))))>>56; case 5: return ((v2si)vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask6)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask6))))>>56; case 6: return ((v2si)vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask7)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask7))))>>56; case 7: return((v2si)vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask8)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask8))))>>56; } return {0}; } v2ui expand64_u_insr(v8ui a, uint8_t imm) { const v2ui adder{0x0202,0x0202}; const v2ui tblMask1 = {0xffffffffffff0100,0xffffffffffff0302}; auto tblMask2 = tblMask1 + adder; auto tblMask3 = tblMask2 + adder; auto tblMask4 = tblMask3 + adder; switch(imm) { case 0: return vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask1)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask1))); case 1: return vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask2)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask2))); case 2: return vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask3)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask3))); case 3: return vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask4)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask4))); } return {0}; } v2si expand64_s_insr(v8si a, uint8_t imm) { uint64_t adderval1 = 0x0202l << 48; const v2ui adder{adderval1,adderval1}; const v2ui tblMask1 = {0x0100ffffffffffff,0x0302ffffffffffff}; auto tblMask2 = tblMask1 + adder; auto tblMask3 = tblMask2 + adder; auto tblMask4 = tblMask3 + adder; switch(imm) { case 0: return ((v2si)vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask1)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask1))))>>48; case 1: return ((v2si)vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask2)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask2))))>>48; case 2: return ((v2si)vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask3)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask3))))>>48; case 3: return ((v2si)vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask4)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask4))))>>48; } return {0}; } v4ui expand32_u_insr(v16ui a, uint8_t imm) { const v4ui tblMask1 = {0xffffff00,0xffffff01,0xffffff02, 0xffffff03}, tblMask2 = {0xffffff04,0xffffff05,0xffffff06, 0xffffff07}, tblMask3 = {0xffffff08,0xffffff09,0xffffff0a, 0xffffff0b}, tblMask4 = {0xffffff0c,0xffffff0d,0xffffff0e, 0xffffff0f}; switch(imm) { case 0: return vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask1)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask1))); case 1: return vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask2)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask2))); case 2: return vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask3)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask3))); case 3: return vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask4)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask4))); } return {0}; } v4si expand32_s_insr(v16si a, uint8_t imm) { const v4ui tblMask1 = {0x00ffffff,0x01ffffff,0x02ffffff, 0x03ffffff}, tblMask2 = {0x04ffffff,0x05ffffff,0x06ffffff, 0x07ffffff}, tblMask3 = {0x08ffffff,0x09ffffff,0x0affffff, 0x0bffffff}, tblMask4 = {0x0cffffff,0x0dffffff,0x0effffff, 0x0fffffff}; switch(imm) { case 0: return ((v4si)vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask1)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask1))))>>24; case 1: return ((v4si)vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask2)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask2))))>>24; case 2: return ((v4si)vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask3)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask3))))>>24; case 3: return ((v4si)vcombine_u8(vtbl1_u8(vget_low_u8(a),vget_low_u8(tblMask4)), vtbl1_u8(vget_high_u8(a),vget_high_u8(tblMask4))))>>24; } return {0}; } #endif v4ui expand32_u(v8ui a, uint8_t imm) { switch(imm) { case 0: return {a[0], a[1], a[2], a[3]}; case 1: return {a[4], a[5], a[6], a[7]}; } return {0}; } void test100_expand32_u(v16ui* pIn, size_t pLength, v4ui *pOut) { for (size_t i = 0; i < pLength; ++i) { pOut[i*4] = expand32_u(pIn[i],0); pOut[i*4+1] = expand32_u(pIn[i],1); pOut[i*4+2] = expand32_u(pIn[i],2); pOut[i*4+3] = expand32_u(pIn[i],3); } } void test100_expand32_s(v16ui* pIn, size_t pLength, v4ui *pOut) { for (size_t i = 0; i < pLength; ++i) { pOut[i*4] = expand32_s(pIn[i],0); pOut[i*4+1] = expand32_s(pIn[i],1); pOut[i*4+2] = expand32_s(pIn[i],2); pOut[i*4+3] = expand32_s(pIn[i],3); } } void test100_expand32_u_insr(v16ui* pIn, size_t pLength, v4ui *pOut) { for (size_t i = 0; i < pLength; ++i) { pOut[i*4] = expand32_u_insr(pIn[i],0); pOut[i*4+1] = expand32_u_insr(pIn[i],1); pOut[i*4+2] = expand32_u_insr(pIn[i],2); pOut[i*4+3] = expand32_u_insr(pIn[i],3); } } void test100_expand32_s_insr(v16ui* pIn, size_t pLength, v4ui *pOut) { for (size_t i = 0; i < pLength; ++i) { pOut[i*4] = expand32_s_insr(pIn[i],0); pOut[i*4+1] = expand32_s_insr(pIn[i],1); pOut[i*4+2] = expand32_s_insr(pIn[i],2); pOut[i*4+3] = expand32_s_insr(pIn[i],3); } } void test100_expand64_u(v8ui* pIn, size_t pLength, v4ui *pOut) { for (size_t i = 0; i < pLength; ++i) { pOut[i*4] = expand64_u(pIn[i],0); pOut[i*4+1] = expand64_u(pIn[i],1); pOut[i*4+2] = expand64_u(pIn[i],2); pOut[i*4+3] = expand64_u(pIn[i],3); } } void test100_expand64_u(v16ui* pIn, size_t pLength, v4ui *pOut) { for (size_t i = 0; i < pLength; ++i) { pOut[i*4] = expand64_u(pIn[i],0); pOut[i*4+1] = expand64_u(pIn[i],1); pOut[i*4+2] = expand64_u(pIn[i],2); pOut[i*4+3] = expand64_u(pIn[i],3); pOut[i*4+4] = expand64_u(pIn[i],4); pOut[i*4+5] = expand64_u(pIn[i],5); pOut[i*4+6] = expand64_u(pIn[i],6); pOut[i*4+7] = expand64_u(pIn[i],7); } } void test100_expand64_s(v8si* pIn, size_t pLength, v4si *pOut) { for (size_t i = 0; i < pLength; ++i) { pOut[i*4] = expand64_s(pIn[i],0); pOut[i*4+1] = expand64_s(pIn[i],1); pOut[i*4+2] = expand64_s(pIn[i],2); pOut[i*4+3] = expand64_s(pIn[i],3); } } void test100_expand64_s(v16si* pIn, size_t pLength, v4si *pOut) { for (size_t i = 0; i < pLength; ++i) { pOut[i*4] = expand64_s(pIn[i],0); pOut[i*4+1] = expand64_s(pIn[i],1); pOut[i*4+2] = expand64_s(pIn[i],2); pOut[i*4+3] = expand64_s(pIn[i],3); pOut[i*4+4] = expand64_s(pIn[i],4); pOut[i*4+5] = expand64_s(pIn[i],5); pOut[i*4+6] = expand64_s(pIn[i],6); pOut[i*4+7] = expand64_s(pIn[i],7); } } void test100_expand64_u_insr(v8ui* pIn, size_t pLength, v4ui *pOut) { for (size_t i = 0; i < pLength; ++i) { pOut[i*4] = expand64_u_insr(pIn[i],0); pOut[i*4+1] = expand64_u_insr(pIn[i],1); pOut[i*4+2] = expand64_u_insr(pIn[i],2); pOut[i*4+3] = expand64_u_insr(pIn[i],3); } } void test100_expand64_u_insr(v16ui* pIn, size_t pLength, v4ui *pOut) { for (size_t i = 0; i < pLength; ++i) { pOut[i*4] = expand64_u_insr(pIn[i],0); pOut[i*4+1] = expand64_u_insr(pIn[i],1); pOut[i*4+2] = expand64_u_insr(pIn[i],2); pOut[i*4+3] = expand64_u_insr(pIn[i],3); pOut[i*4+4] = expand64_u_insr(pIn[i],4); pOut[i*4+5] = expand64_u_insr(pIn[i],5); pOut[i*4+6] = expand64_u_insr(pIn[i],6); pOut[i*4+7] = expand64_u_insr(pIn[i],7); } } void test100_expand64_s_insr(v8si* pIn, size_t pLength, v4si *pOut) { for (size_t i = 0; i < pLength; ++i) { pOut[i*4] = expand64_s_insr(pIn[i],0); pOut[i*4+1] = expand64_s_insr(pIn[i],1); pOut[i*4+2] = expand64_s_insr(pIn[i],2); pOut[i*4+3] = expand64_s_insr(pIn[i],3); } } void test100_expand64_s_insr(v16si* pIn, size_t pLength, v4si *pOut) { for (size_t i = 0; i < pLength; ++i) { pOut[i*4] = expand64_s_insr(pIn[i],0); pOut[i*4+1] = expand64_s_insr(pIn[i],1); pOut[i*4+2] = expand64_s_insr(pIn[i],2); pOut[i*4+3] = expand64_s_insr(pIn[i],3); pOut[i*4+4] = expand64_s_insr(pIn[i],4); pOut[i*4+5] = expand64_s_insr(pIn[i],5); pOut[i*4+6] = expand64_s_insr(pIn[i],6); pOut[i*4+7] = expand64_s_insr(pIn[i],7); } } #if defined(__x86_64__) void test100_expand64_s_insr_old(v16si* pIn, size_t pLength, v4si *pOut) { for (size_t i = 0; i < pLength; ++i) { pOut[i*4] = expand64_s_insr_old(pIn[i],0); pOut[i*4+1] = expand64_s_insr_old(pIn[i],1); pOut[i*4+2] = expand64_s_insr_old(pIn[i],2); pOut[i*4+3] = expand64_s_insr_old(pIn[i],3); pOut[i*4+4] = expand64_s_insr_old(pIn[i],4); pOut[i*4+5] = expand64_s_insr_old(pIn[i],5); pOut[i*4+6] = expand64_s_insr_old(pIn[i],6); pOut[i*4+7] = expand64_s_insr_old(pIn[i],7); } } //expand64_s_insr_old_return2 void test100_expand64_s_insr_old_return2(v16si* pIn, size_t pLength, v4si *pOut) { for (size_t i = 0; i < pLength; ++i) { auto a = expand64_s_insr_old_return2(pIn[i],0); pOut[i*4] = a[0]; pOut[i*4+1] = a[1]; auto b = expand64_s_insr_old_return2(pIn[i],2); pOut[i*4+2] = b[0]; pOut[i*4+3] = b[1]; auto c = expand64_s_insr_old_return2(pIn[i],4); pOut[i*4+4] = c[0]; pOut[i*4+5] = c[1]; auto d = expand64_s_insr_old_return2(pIn[i],6); pOut[i*4+6] = d[0]; pOut[i*4+7] = d[1]; } } #endif
analysis source #7
Output
Compile to binary object
Link to binary
Execute the code
Intel asm syntax
Demangle identifiers
Verbose demangling
Filters
Unused labels
Library functions
Directives
Comments
Horizontal whitespace
Debug intrinsics
Compiler
OSACA (0.6.1)
llvm-mca (assertions trunk)
llvm-mca (trunk)
Options
Source code
push {r4, lr} cmp r1, #0 popeq {r4, pc} vldr d16, .LCPI24_0 vldr d18, .LCPI24_1 vldr d19, .LCPI24_2 vldr d20, .LCPI24_3 vldr d21, .LCPI24_4 vldr d22, .LCPI24_5 vldr d23, .LCPI24_6 vldr d24, .LCPI24_7 vldr d25, .LCPI24_8 vldr d26, .LCPI24_9 vldr d27, .LCPI24_10 vldr d28, .LCPI24_11 vldr d29, .LCPI24_12 vldr d30, .LCPI24_13 vldr d31, .LCPI24_14 vmov.i64 d17, #0xffffffffffffff mov r12, #112 .LBB24_1: @ =>This Inner Loop Header: Depth=1 vld1.64 {d0, d1}, [r0:128] mov lr, r2 add r3, r2, #16 add r4, r2, #80 subs r1, r1, #1 vtbl.8 d3, {d1}, d16 vtbl.8 d2, {d0}, d17 vshr.s64 q0, q1, #56 vst1.64 {d0, d1}, [lr:128], r12 vld1.64 {d0, d1}, [r0:128] vtbl.8 d3, {d1}, d18 vtbl.8 d2, {d0}, d19 vshr.s64 q0, q1, #56 vst1.64 {d0, d1}, [r3:128] add r3, r2, #32 vld1.64 {d0, d1}, [r0:128] vtbl.8 d3, {d1}, d20 vtbl.8 d2, {d0}, d21 vshr.s64 q0, q1, #56 vst1.64 {d0, d1}, [r3:128] add r3, r2, #48 vld1.64 {d0, d1}, [r0:128] vtbl.8 d3, {d1}, d22 vtbl.8 d2, {d0}, d23 vshr.s64 q0, q1, #56 vst1.64 {d0, d1}, [r3:128] add r3, r2, #64 add r2, r2, #96 vld1.64 {d0, d1}, [r0:128] vtbl.8 d3, {d1}, d24 vtbl.8 d2, {d0}, d25 vshr.s64 q0, q1, #56 vst1.64 {d0, d1}, [r3:128] vld1.64 {d0, d1}, [r0:128] vtbl.8 d3, {d1}, d26 vtbl.8 d2, {d0}, d27 vshr.s64 q0, q1, #56 vst1.64 {d0, d1}, [r4:128] vld1.64 {d0, d1}, [r0:128] vtbl.8 d3, {d1}, d28 vtbl.8 d2, {d0}, d29 vshr.s64 q0, q1, #56 vst1.64 {d0, d1}, [r2:128] mov r2, r3 vld1.8 {d0, d1}, [r0:128]! vtbl.8 d3, {d1}, d30 vtbl.8 d2, {d0}, d31 vshr.s64 q0, q1, #56 vst1.64 {d0, d1}, [lr:128] bne .LBB24_1
analysis source #1
Output
Compile to binary object
Link to binary
Execute the code
Intel asm syntax
Demangle identifiers
Verbose demangling
Filters
Unused labels
Library functions
Directives
Comments
Horizontal whitespace
Debug intrinsics
Compiler
OSACA (0.6.1)
llvm-mca (assertions trunk)
llvm-mca (trunk)
Options
Source code
# Transpose variant /* stp d9, d8, [sp, #-16]! // 16-byte Folded Spill adrp x9, .LCPI8_0 ldr q16, [x9, :lo12:.LCPI8_0] adrp x9, .LCPI8_1 ldr q17, [x9, :lo12:.LCPI8_1] adrp x9, .LCPI8_2 ldr q18, [x9, :lo12:.LCPI8_2] adrp x9, .LCPI8_3 ldr q19, [x9, :lo12:.LCPI8_3] adrp x9, .LCPI8_4 ldr q20, [x9, :lo12:.LCPI8_4] adrp x9, .LCPI8_5 ldr q21, [x9, :lo12:.LCPI8_5] ldr q22, [sp, #16] mov x8, xzr movi v23.8h, #128 movi v24.16b, #128 */ tbl v31.16b, { v25.16b, v26.16b, v27.16b, v28.16b }, v18.16b sshr v31.2D, v31.2D, #48 sshll v24.4S, v24.4H, #0 sshll v24.2D, v24.2S, #0 /* .LBB8_1: // =>This Inner Loop Header: Depth=1 ldr q25, [x0, x8] ldr q26, [x1, x8] ldr q27, [x2, x8] ldr q28, [x3, x8] tbl v31.16b, { v25.16b, v26.16b, v27.16b, v28.16b }, v18.16b tbl v29.16b, { v25.16b, v26.16b, v27.16b, v28.16b }, v16.16b tbl v30.16b, { v25.16b, v26.16b, v27.16b, v28.16b }, v17.16b tbl v8.16b, { v25.16b, v26.16b, v27.16b, v28.16b }, v19.16b tbl v9.16b, { v25.16b, v26.16b, v27.16b, v28.16b }, v20.16b tbl v25.16b, { v25.16b, v26.16b, v27.16b, v28.16b }, v21.16b mul v26.8h, v31.8h, v1.8h mul v28.8h, v31.8h, v4.8h mul v31.8h, v31.8h, v7.8h mul v27.8h, v8.8h, v1.8h mla v26.8h, v29.8h, v0.8h mla v28.8h, v29.8h, v3.8h mla v31.8h, v29.8h, v6.8h mul v29.8h, v8.8h, v4.8h mul v8.8h, v8.8h, v7.8h mla v27.8h, v30.8h, v0.8h mla v29.8h, v30.8h, v3.8h mla v8.8h, v30.8h, v6.8h mla v26.8h, v9.8h, v2.8h mla v28.8h, v9.8h, v5.8h mla v31.8h, v9.8h, v22.8h mla v27.8h, v25.8h, v2.8h mla v29.8h, v25.8h, v5.8h mla v8.8h, v25.8h, v22.8h addhn v25.8b, v26.8h, v23.8h addhn v26.8b, v28.8h, v23.8h addhn v28.8b, v31.8h, v23.8h addhn2 v25.16b, v27.8h, v23.8h addhn2 v26.16b, v29.8h, v23.8h addhn2 v28.16b, v8.8h, v23.8h str q25, [x4, x8] eor v25.16b, v26.16b, v24.16b eor v26.16b, v28.16b, v24.16b str q25, [x5, x8] str q26, [x6, x8] add x8, x8, #16 // =16 cmp x8, #1600 // =1600 b.ne .LBB8_1 ldp d9, d8, [sp], #16 // 16-byte Folded Reload */
analysis source #3
Output
Compile to binary object
Link to binary
Execute the code
Intel asm syntax
Demangle identifiers
Verbose demangling
Filters
Unused labels
Library functions
Directives
Comments
Horizontal whitespace
Debug intrinsics
Compiler
OSACA (0.6.1)
llvm-mca (assertions trunk)
llvm-mca (trunk)
Options
Source code
# Expand 8 to 64 # test rsi, rsi # je .LBB27_3 # mov eax, 14 .LBB27_2: # =>This Inner Loop Header: Depth=1 vmovdqa xmmword ptr [rdi + rax - 14], xmm0 vpmovsxbq xmm0, word ptr [rdi + rax - 14] vmovdqa xmmword ptr [rdx + 4*rax - 56], xmm0 vpmovsxbq xmm0, word ptr [rdi + rax - 12] vmovdqa xmmword ptr [rdx + 4*rax - 40], xmm0 vpmovsxbq xmm0, word ptr [rdi + rax - 10] vmovdqa xmmword ptr [rdx + 4*rax - 24], xmm0 vpmovsxbq xmm0, word ptr [rdi + rax - 8] vmovdqa xmmword ptr [rdx + 4*rax - 8], xmm0 vpmovsxbq xmm0, word ptr [rdi + rax - 6] vmovdqa xmmword ptr [rdx + 4*rax + 8], xmm0 vpmovsxbq xmm0, word ptr [rdi + rax - 4] vmovdqa xmmword ptr [rdx + 4*rax + 24], xmm0 vpmovsxbq xmm0, word ptr [rdi + rax - 2] vmovdqa xmmword ptr [rdx + 4*rax + 40], xmm0 vpmovsxbq xmm0, word ptr [rdi + rax] vmovdqa xmmword ptr [rdx + 4*rax + 56], xmm0 add rax, 16 dec rsi jne .LBB27_2
analysis source #4
Output
Compile to binary object
Link to binary
Execute the code
Intel asm syntax
Demangle identifiers
Verbose demangling
Filters
Unused labels
Library functions
Directives
Comments
Horizontal whitespace
Debug intrinsics
Compiler
OSACA (0.6.1)
llvm-mca (assertions trunk)
llvm-mca (trunk)
Options
Source code
/* test rsi, rsi je .LBB28_3 xor eax, eax vmovdqa xmm0, xmmword ptr [rip + .LCPI28_0] # xmm0 = [128,128,128,0,128,128,128,1,128,128,128,2,128,128,128,3] vmovdqa xmm1, xmmword ptr [rip + .LCPI28_1] # xmm1 = [128,128,128,6,128,128,128,7,128,128,128,8,128,128,128,9] vmovdqa xmm2, xmmword ptr [rip + .LCPI28_2] # xmm2 = [128,128,128,12,128,128,128,13,128,128,128,14,128,128,128,15] vmovdqa xmm3, xmmword ptr [rip + .LCPI28_3] # xmm3 = [128,128,128,2,128,128,128,3,128,128,128,4,128,128,128,5] */ .LBB28_2: # =>This Inner Loop Header: Depth=1 vmovdqa xmm4, xmmword ptr [rdi + rax] vpshufb xmm4, xmm4, xmm0 vpsrad xmm5, xmm4, 24 vpsrad xmm4, xmm4, 31 vpunpckldq xmm6, xmm4, xmm5 # xmm6 = xmm4[0],xmm5[0],xmm4[1],xmm5[1] vpunpckhdq xmm4, xmm4, xmm5 # xmm4 = xmm4[2],xmm5[2],xmm4[3],xmm5[3] vmovdqa xmmword ptr [rdx + 4*rax], xmm6 vmovdqa xmmword ptr [rdx + 4*rax + 16], xmm4 vmovdqa xmm4, xmmword ptr [rdi + rax] vpshufb xmm4, xmm4, xmm1 vpsrad xmm5, xmm4, 24 vpsrad xmm4, xmm4, 31 vpunpckldq xmm6, xmm4, xmm5 # xmm6 = xmm4[0],xmm5[0],xmm4[1],xmm5[1] vpunpckhdq xmm4, xmm4, xmm5 # xmm4 = xmm4[2],xmm5[2],xmm4[3],xmm5[3] vmovdqa xmmword ptr [rdx + 4*rax + 32], xmm6 vmovdqa xmmword ptr [rdx + 4*rax + 48], xmm4 vmovdqa xmm4, xmmword ptr [rdi + rax] vpshufb xmm4, xmm4, xmm2 vpsrad xmm5, xmm4, 24 vpsrad xmm4, xmm4, 31 vpunpckldq xmm6, xmm4, xmm5 # xmm6 = xmm4[0],xmm5[0],xmm4[1],xmm5[1] vpunpckhdq xmm4, xmm4, xmm5 # xmm4 = xmm4[2],xmm5[2],xmm4[3],xmm5[3] vmovdqa xmmword ptr [rdx + 4*rax + 64], xmm6 vmovdqa xmmword ptr [rdx + 4*rax + 80], xmm4 vmovdqa xmm4, xmmword ptr [rdi + rax] vpshufb xmm4, xmm4, xmm3 vpsrad xmm5, xmm4, 24 vpsrad xmm4, xmm4, 31 vpunpckldq xmm6, xmm4, xmm5 # xmm6 = xmm4[0],xmm5[0],xmm4[1],xmm5[1] vpunpckhdq xmm4, xmm4, xmm5 # xmm4 = xmm4[2],xmm5[2],xmm4[3],xmm5[3] vmovdqa xmmword ptr [rdx + 4*rax + 96], xmm6 vmovdqa xmmword ptr [rdx + 4*rax + 112], xmm4 add rax, 16 add rsi, -1 jne .LBB28_2
Become a Patron
Sponsor on GitHub
Donate via PayPal
Source on GitHub
Mailing list
Installed libraries
Wiki
Report an issue
How it works
Contact the author
CE on Mastodon
CE on Bluesky
About the author
Statistics
Changelog
Version tree