Skip to content

Commit dcd1712

Browse files
jackmottBurntSushi
authored andcommitted
Start of AVX2 functions (#2)
start adding avx2
1 parent 16d848d commit dcd1712

File tree

3 files changed

+1092
-3
lines changed

3 files changed

+1092
-3
lines changed

TODO.md

+389
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ Intel intrinsics. Replace `SSE4.2` with the intended type.
55
rg '^<intrinsic' intel-intrinsics-3.3.15.xml | rg "'SSE4.2'" | rg '^.*name=\x27([^\x27]+)\x27.*$' -r '* [ ] `$1`' >> TODO.md
66
```
77

8+
rg calls the ripgrep tool, which can be installed with `cargo install ripgrep`
89

910
sse
1011
---
@@ -535,3 +536,391 @@ sse4.2
535536
* [ ] `_mm_crc32_u16`
536537
* [ ] `_mm_crc32_u32`
537538
* [ ] `_mm_crc32_u64`
539+
540+
541+
avx
542+
---
543+
* [ ] `_mm256_add_pd`
544+
* [ ] `_mm256_add_ps`
545+
* [ ] `_mm256_addsub_pd`
546+
* [ ] `_mm256_addsub_ps`
547+
* [ ] `_mm256_and_pd`
548+
* [ ] `_mm256_and_ps`
549+
* [ ] `_mm256_andnot_pd`
550+
* [ ] `_mm256_andnot_ps`
551+
* [ ] `_mm256_blend_pd`
552+
* [ ] `_mm256_blend_ps`
553+
* [ ] `_mm256_blendv_pd`
554+
* [ ] `_mm256_blendv_ps`
555+
* [ ] `_mm256_div_pd`
556+
* [ ] `_mm256_div_ps`
557+
* [ ] `_mm256_dp_ps`
558+
* [ ] `_mm256_hadd_pd`
559+
* [ ] `_mm256_hadd_ps`
560+
* [ ] `_mm256_hsub_pd`
561+
* [ ] `_mm256_hsub_ps`
562+
* [ ] `_mm256_max_pd`
563+
* [ ] `_mm256_max_ps`
564+
* [ ] `_mm256_min_pd`
565+
* [ ] `_mm256_min_ps`
566+
* [ ] `_mm256_mul_pd`
567+
* [ ] `_mm256_mul_ps`
568+
* [ ] `_mm256_or_pd`
569+
* [ ] `_mm256_or_ps`
570+
* [ ] `_mm256_shuffle_pd`
571+
* [ ] `_mm256_shuffle_ps`
572+
* [ ] `_mm256_sub_pd`
573+
* [ ] `_mm256_sub_ps`
574+
* [ ] `_mm256_xor_pd`
575+
* [ ] `_mm256_xor_ps`
576+
* [ ] `_mm_cmp_pd`
577+
* [ ] `_mm256_cmp_pd`
578+
* [ ] `_mm_cmp_ps`
579+
* [ ] `_mm256_cmp_ps`
580+
* [ ] `_mm_cmp_sd`
581+
* [ ] `_mm_cmp_ss`
582+
* [ ] `_mm256_cvtepi32_pd`
583+
* [ ] `_mm256_cvtepi32_ps`
584+
* [ ] `_mm256_cvtpd_ps`
585+
* [ ] `_mm256_cvtps_epi32`
586+
* [ ] `_mm256_cvtps_pd`
587+
* [ ] `_mm256_cvttpd_epi32`
588+
* [ ] `_mm256_cvtpd_epi32`
589+
* [ ] `_mm256_cvttps_epi32`
590+
* [ ] `_mm256_extractf128_ps`
591+
* [ ] `_mm256_extractf128_pd`
592+
* [ ] `_mm256_extractf128_si256`
593+
* [ ] `_mm256_extract_epi8`
594+
* [ ] `_mm256_extract_epi16`
595+
* [ ] `_mm256_extract_epi32`
596+
* [ ] `_mm256_extract_epi64`
597+
* [ ] `_mm256_zeroall`
598+
* [ ] `_mm256_zeroupper`
599+
* [ ] `_mm256_permutevar_ps`
600+
* [ ] `_mm_permutevar_ps`
601+
* [ ] `_mm256_permute_ps`
602+
* [ ] `_mm_permute_ps`
603+
* [ ] `_mm256_permutevar_pd`
604+
* [ ] `_mm_permutevar_pd`
605+
* [ ] `_mm256_permute_pd`
606+
* [ ] `_mm_permute_pd`
607+
* [ ] `_mm256_permute2f128_ps`
608+
* [ ] `_mm256_permute2f128_pd`
609+
* [ ] `_mm256_permute2f128_si256`
610+
* [ ] `_mm256_broadcast_ss`
611+
* [ ] `_mm_broadcast_ss`
612+
* [ ] `_mm256_broadcast_sd`
613+
* [ ] `_mm256_broadcast_ps`
614+
* [ ] `_mm256_broadcast_pd`
615+
* [ ] `_mm256_insertf128_ps`
616+
* [ ] `_mm256_insertf128_pd`
617+
* [ ] `_mm256_insertf128_si256`
618+
* [ ] `_mm256_insert_epi8`
619+
* [ ] `_mm256_insert_epi16`
620+
* [ ] `_mm256_insert_epi32`
621+
* [ ] `_mm256_insert_epi64`
622+
* [ ] `_mm256_load_pd`
623+
* [ ] `_mm256_store_pd`
624+
* [ ] `_mm256_load_ps`
625+
* [ ] `_mm256_store_ps`
626+
* [ ] `_mm256_loadu_pd`
627+
* [ ] `_mm256_storeu_pd`
628+
* [ ] `_mm256_loadu_ps`
629+
* [ ] `_mm256_storeu_ps`
630+
* [ ] `_mm256_load_si256`
631+
* [ ] `_mm256_store_si256`
632+
* [ ] `_mm256_loadu_si256`
633+
* [ ] `_mm256_storeu_si256`
634+
* [ ] `_mm256_maskload_pd`
635+
* [ ] `_mm256_maskstore_pd`
636+
* [ ] `_mm_maskload_pd`
637+
* [ ] `_mm_maskstore_pd`
638+
* [ ] `_mm256_maskload_ps`
639+
* [ ] `_mm256_maskstore_ps`
640+
* [ ] `_mm_maskload_ps`
641+
* [ ] `_mm_maskstore_ps`
642+
* [ ] `_mm256_movehdup_ps`
643+
* [ ] `_mm256_moveldup_ps`
644+
* [ ] `_mm256_movedup_pd`
645+
* [ ] `_mm256_lddqu_si256`
646+
* [ ] `_mm256_stream_si256`
647+
* [ ] `_mm256_stream_pd`
648+
* [ ] `_mm256_stream_ps`
649+
* [ ] `_mm256_rcp_ps`
650+
* [ ] `_mm256_rsqrt_ps`
651+
* [ ] `_mm256_sqrt_pd`
652+
* [ ] `_mm256_sqrt_ps`
653+
* [ ] `_mm256_round_pd`
654+
* [ ] `_mm256_round_ps`
655+
* [ ] `_mm256_unpackhi_pd`
656+
* [ ] `_mm256_unpackhi_ps`
657+
* [ ] `_mm256_unpacklo_pd`
658+
* [ ] `_mm256_unpacklo_ps`
659+
* [ ] `_mm256_testz_si256`
660+
* [ ] `_mm256_testc_si256`
661+
* [ ] `_mm256_testnzc_si256`
662+
* [ ] `_mm256_testz_pd`
663+
* [ ] `_mm256_testc_pd`
664+
* [ ] `_mm256_testnzc_pd`
665+
* [ ] `_mm_testz_pd`
666+
* [ ] `_mm_testc_pd`
667+
* [ ] `_mm_testnzc_pd`
668+
* [ ] `_mm256_testz_ps`
669+
* [ ] `_mm256_testc_ps`
670+
* [ ] `_mm256_testnzc_ps`
671+
* [ ] `_mm_testz_ps`
672+
* [ ] `_mm_testc_ps`
673+
* [ ] `_mm_testnzc_ps`
674+
* [ ] `_mm256_movemask_pd`
675+
* [ ] `_mm256_movemask_ps`
676+
* [ ] `_mm256_setzero_pd`
677+
* [ ] `_mm256_setzero_ps`
678+
* [ ] `_mm256_setzero_si256`
679+
* [ ] `_mm256_set_pd`
680+
* [ ] `_mm256_set_ps`
681+
* [ ] `_mm256_set_epi8`
682+
* [ ] `_mm256_set_epi16`
683+
* [ ] `_mm256_set_epi32`
684+
* [ ] `_mm256_set_epi64x`
685+
* [ ] `_mm256_setr_pd`
686+
* [ ] `_mm256_setr_ps`
687+
* [ ] `_mm256_setr_epi8`
688+
* [ ] `_mm256_setr_epi16`
689+
* [ ] `_mm256_setr_epi32`
690+
* [ ] `_mm256_setr_epi64x`
691+
* [ ] `_mm256_set1_pd`
692+
* [ ] `_mm256_set1_ps`
693+
* [ ] `_mm256_set1_epi8`
694+
* [ ] `_mm256_set1_epi16`
695+
* [ ] `_mm256_set1_epi32`
696+
* [ ] `_mm256_set1_epi64x`
697+
* [ ] `_mm256_castpd_ps`
698+
* [ ] `_mm256_castps_pd`
699+
* [ ] `_mm256_castps_si256`
700+
* [ ] `_mm256_castpd_si256`
701+
* [ ] `_mm256_castsi256_ps`
702+
* [ ] `_mm256_castsi256_pd`
703+
* [ ] `_mm256_castps256_ps128`
704+
* [ ] `_mm256_castpd256_pd128`
705+
* [ ] `_mm256_castsi256_si128`
706+
* [ ] `_mm256_castps128_ps256`
707+
* [ ] `_mm256_castpd128_pd256`
708+
* [ ] `_mm256_castsi128_si256`
709+
* [ ] `_mm256_zextps128_ps256`
710+
* [ ] `_mm256_zextpd128_pd256`
711+
* [ ] `_mm256_zextsi128_si256`
712+
* [ ] `_mm256_floor_ps`
713+
* [ ] `_mm256_ceil_ps`
714+
* [ ] `_mm256_floor_pd`
715+
* [ ] `_mm256_ceil_pd`
716+
* [ ] `_mm256_undefined_ps`
717+
* [ ] `_mm256_undefined_pd`
718+
* [ ] `_mm256_undefined_si256`
719+
* [ ] `_mm256_set_m128`
720+
* [ ] `_mm256_set_m128d`
721+
* [ ] `_mm256_set_m128i`
722+
* [ ] `_mm256_setr_m128`
723+
* [ ] `_mm256_setr_m128d`
724+
* [ ] `_mm256_setr_m128i`
725+
* [ ] `_mm256_loadu2_m128`
726+
* [ ] `_mm256_loadu2_m128d`
727+
* [ ] `_mm256_loadu2_m128i`
728+
* [ ] `_mm256_storeu2_m128`
729+
* [ ] `_mm256_storeu2_m128d`
730+
* [ ] `_mm256_storeu2_m128i`
731+
732+
733+
734+
avx2
735+
----
736+
* [x] `_mm256_abs_epi8`
737+
* [x] `_mm256_abs_epi16`
738+
* [x] `_mm256_abs_epi32`
739+
* [x] `_mm256_add_epi8`
740+
* [x] `_mm256_add_epi16`
741+
* [x] `_mm256_add_epi32`
742+
* [x] `_mm256_add_epi64`
743+
* [x] `_mm256_adds_epi8`
744+
* [x] `_mm256_adds_epi16`
745+
* [x] `_mm256_adds_epu8`
746+
* [x] `_mm256_adds_epu16`
747+
* [ ] `_mm256_alignr_epi8`
748+
* [x] `_mm256_and_si256`
749+
* [x] `_mm256_andnot_si256`
750+
* [x] `_mm256_avg_epu8`
751+
* [x] `_mm256_avg_epu16`
752+
* [ ] `_mm256_blend_epi16`
753+
* [ ] `_mm_blend_epi32`
754+
* [ ] `_mm256_blend_epi32`
755+
* [x] `_mm256_blendv_epi8`
756+
* [ ] `_mm_broadcastb_epi8`
757+
* [ ] `_mm256_broadcastb_epi8`
758+
* [ ] `_mm_broadcastd_epi32`
759+
* [ ] `_mm256_broadcastd_epi32`
760+
* [ ] `_mm_broadcastq_epi64`
761+
* [ ] `_mm256_broadcastq_epi64`
762+
* [ ] `_mm_broadcastsd_pd`
763+
* [ ] `_mm256_broadcastsd_pd`
764+
* [ ] `_mm_broadcastsi128_si256`
765+
* [ ] `_mm256_broadcastsi128_si256`
766+
* [ ] `_mm_broadcastss_ps`
767+
* [ ] `_mm256_broadcastss_ps`
768+
* [ ] `_mm_broadcastw_epi16`
769+
* [ ] `_mm256_broadcastw_epi16`
770+
* [x] `_mm256_cmpeq_epi8`
771+
* [x] `_mm256_cmpeq_epi16`
772+
* [x] `_mm256_cmpeq_epi32`
773+
* [x] `_mm256_cmpeq_epi64`
774+
* [x] `_mm256_cmpgt_epi8`
775+
* [x] `_mm256_cmpgt_epi16`
776+
* [x] `_mm256_cmpgt_epi32`
777+
* [x] `_mm256_cmpgt_epi64`
778+
* [ ] `_mm256_cvtepi16_epi32`
779+
* [ ] `_mm256_cvtepi16_epi64`
780+
* [ ] `_mm256_cvtepi32_epi64`
781+
* [ ] `_mm256_cvtepi8_epi16`
782+
* [ ] `_mm256_cvtepi8_epi32`
783+
* [ ] `_mm256_cvtepi8_epi64`
784+
* [ ] `_mm256_cvtepu16_epi32`
785+
* [ ] `_mm256_cvtepu16_epi64`
786+
* [ ] `_mm256_cvtepu32_epi64`
787+
* [ ] `_mm256_cvtepu8_epi16`
788+
* [ ] `_mm256_cvtepu8_epi32`
789+
* [ ] `_mm256_cvtepu8_epi64`
790+
* [ ] `_mm256_extracti128_si256`
791+
* [x] `_mm256_hadd_epi16`
792+
* [x] `_mm256_hadd_epi32`
793+
* [x] `_mm256_hadds_epi16`
794+
* [x] `_mm256_hsub_epi16`
795+
* [x] `_mm256_hsub_epi32`
796+
* [x] `_mm256_hsubs_epi16`
797+
* [ ] `_mm_i32gather_pd`
798+
* [ ] `_mm256_i32gather_pd`
799+
* [ ] `_mm_i32gather_ps`
800+
* [ ] `_mm256_i32gather_ps`
801+
* [ ] `_mm_i32gather_epi32`
802+
* [ ] `_mm256_i32gather_epi32`
803+
* [ ] `_mm_i32gather_epi64`
804+
* [ ] `_mm256_i32gather_epi64`
805+
* [ ] `_mm_i64gather_pd`
806+
* [ ] `_mm256_i64gather_pd`
807+
* [ ] `_mm_i64gather_ps`
808+
* [ ] `_mm256_i64gather_ps`
809+
* [ ] `_mm_i64gather_epi32`
810+
* [ ] `_mm256_i64gather_epi32`
811+
* [ ] `_mm_i64gather_epi64`
812+
* [ ] `_mm256_i64gather_epi64`
813+
* [ ] `_mm256_inserti128_si256`
814+
* [ ] `_mm256_madd_epi16`
815+
* [ ] `_mm256_maddubs_epi16`
816+
* [ ] `_mm_mask_i32gather_pd`
817+
* [ ] `_mm256_mask_i32gather_pd`
818+
* [ ] `_mm_mask_i32gather_ps`
819+
* [ ] `_mm256_mask_i32gather_ps`
820+
* [ ] `_mm_mask_i32gather_epi32`
821+
* [ ] `_mm256_mask_i32gather_epi32`
822+
* [ ] `_mm_mask_i32gather_epi64`
823+
* [ ] `_mm256_mask_i32gather_epi64`
824+
* [ ] `_mm_mask_i64gather_pd`
825+
* [ ] `_mm256_mask_i64gather_pd`
826+
* [ ] `_mm_mask_i64gather_ps`
827+
* [ ] `_mm256_mask_i64gather_ps`
828+
* [ ] `_mm_mask_i64gather_epi32`
829+
* [ ] `_mm256_mask_i64gather_epi32`
830+
* [ ] `_mm_mask_i64gather_epi64`
831+
* [ ] `_mm256_mask_i64gather_epi64`
832+
* [ ] `_mm_maskload_epi32`
833+
* [ ] `_mm256_maskload_epi32`
834+
* [ ] `_mm_maskload_epi64`
835+
* [ ] `_mm256_maskload_epi64`
836+
* [ ] `_mm_maskstore_epi32`
837+
* [ ] `_mm256_maskstore_epi32`
838+
* [ ] `_mm_maskstore_epi64`
839+
* [ ] `_mm256_maskstore_epi64`
840+
* [ ] `_mm256_max_epi8`
841+
* [ ] `_mm256_max_epi16`
842+
* [ ] `_mm256_max_epi32`
843+
* [ ] `_mm256_max_epu8`
844+
* [ ] `_mm256_max_epu16`
845+
* [ ] `_mm256_max_epu32`
846+
* [ ] `_mm256_min_epi8`
847+
* [ ] `_mm256_min_epi16`
848+
* [ ] `_mm256_min_epi32`
849+
* [ ] `_mm256_min_epu8`
850+
* [ ] `_mm256_min_epu16`
851+
* [ ] `_mm256_min_epu32`
852+
* [ ] `_mm256_movemask_epi8`
853+
* [ ] `_mm256_mpsadbw_epu8`
854+
* [ ] `_mm256_mul_epi32`
855+
* [ ] `_mm256_mul_epu32`
856+
* [ ] `_mm256_mulhi_epi16`
857+
* [ ] `_mm256_mulhi_epu16`
858+
* [ ] `_mm256_mulhrs_epi16`
859+
* [ ] `_mm256_mullo_epi16`
860+
* [ ] `_mm256_mullo_epi32`
861+
* [ ] `_mm256_or_si256`
862+
* [ ] `_mm256_packs_epi16`
863+
* [ ] `_mm256_packs_epi32`
864+
* [ ] `_mm256_packus_epi16`
865+
* [ ] `_mm256_packus_epi32`
866+
* [ ] `_mm256_permute2x128_si256`
867+
* [ ] `_mm256_permute4x64_epi64`
868+
* [ ] `_mm256_permute4x64_pd`
869+
* [ ] `_mm256_permutevar8x32_epi32`
870+
* [ ] `_mm256_permutevar8x32_ps`
871+
* [ ] `_mm256_sad_epu8`
872+
* [ ] `_mm256_shuffle_epi32`
873+
* [ ] `_mm256_shuffle_epi8`
874+
* [ ] `_mm256_shufflehi_epi16`
875+
* [ ] `_mm256_shufflelo_epi16`
876+
* [ ] `_mm256_sign_epi8`
877+
* [ ] `_mm256_sign_epi16`
878+
* [ ] `_mm256_sign_epi32`
879+
* [ ] `_mm256_slli_si256`
880+
* [ ] `_mm256_bslli_epi128`
881+
* [ ] `_mm256_sll_epi16`
882+
* [ ] `_mm256_slli_epi16`
883+
* [ ] `_mm256_sll_epi32`
884+
* [ ] `_mm256_slli_epi32`
885+
* [ ] `_mm256_sll_epi64`
886+
* [ ] `_mm256_slli_epi64`
887+
* [ ] `_mm_sllv_epi32`
888+
* [ ] `_mm256_sllv_epi32`
889+
* [ ] `_mm_sllv_epi64`
890+
* [ ] `_mm256_sllv_epi64`
891+
* [ ] `_mm256_sra_epi16`
892+
* [ ] `_mm256_srai_epi16`
893+
* [ ] `_mm256_sra_epi32`
894+
* [ ] `_mm256_srai_epi32`
895+
* [ ] `_mm_srav_epi32`
896+
* [ ] `_mm256_srav_epi32`
897+
* [ ] `_mm256_srli_si256`
898+
* [ ] `_mm256_bsrli_epi128`
899+
* [ ] `_mm256_srl_epi16`
900+
* [ ] `_mm256_srli_epi16`
901+
* [ ] `_mm256_srl_epi32`
902+
* [ ] `_mm256_srli_epi32`
903+
* [ ] `_mm256_srl_epi64`
904+
* [ ] `_mm256_srli_epi64`
905+
* [ ] `_mm_srlv_epi32`
906+
* [ ] `_mm256_srlv_epi32`
907+
* [ ] `_mm_srlv_epi64`
908+
* [ ] `_mm256_srlv_epi64`
909+
* [ ] `_mm256_stream_load_si256`
910+
* [ ] `_mm256_sub_epi8`
911+
* [ ] `_mm256_sub_epi16`
912+
* [ ] `_mm256_sub_epi32`
913+
* [ ] `_mm256_sub_epi64`
914+
* [ ] `_mm256_subs_epi8`
915+
* [ ] `_mm256_subs_epi16`
916+
* [ ] `_mm256_subs_epu8`
917+
* [ ] `_mm256_subs_epu16`
918+
* [ ] `_mm256_xor_si256`
919+
* [ ] `_mm256_unpackhi_epi8`
920+
* [ ] `_mm256_unpackhi_epi16`
921+
* [ ] `_mm256_unpackhi_epi32`
922+
* [ ] `_mm256_unpackhi_epi64`
923+
* [ ] `_mm256_unpacklo_epi8`
924+
* [ ] `_mm256_unpacklo_epi16`
925+
* [ ] `_mm256_unpacklo_epi32`
926+
* [ ] `_mm256_unpacklo_epi64`

0 commit comments

Comments
 (0)