-
Notifications
You must be signed in to change notification settings - Fork 13
Expand file tree
/
Copy pathispc.html
More file actions
6985 lines (6908 loc) · 414 KB
/
ispc.html
File metadata and controls
6985 lines (6908 loc) · 414 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="description" content="Intel® ISPC User's Guide - Complete documentation for the high-performance SIMD compiler">
<link rel="icon" type="image/png" href="favicon.png">
<link rel="stylesheet" href="css/style.css">
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
<title>Intel® ISPC User's Guide</title>
</head>
<body>
<div class="container">
<header class="site-header">
<nav class="main-nav">
<div class="nav-brand">
<h1>Intel® ISPC</h1>
</div>
<ul class="nav-menu">
<li class="nav-item"><a href="index.html">Overview</a></li>
<li class="nav-item"><a href="features.html">Features</a></li>
<li class="nav-item"><a href="downloads.html">Downloads</a></li>
<li class="nav-item active"><a href="documentation.html">Documentation</a></li>
<li class="nav-item"><a href="perf.html">Performance</a></li>
<li class="nav-item"><a href="contrib.html">Contributors</a></li>
</ul>
</nav>
</header>
<main class="main-content">
<div class="content-grid">
<article class="content-main">
<h1 class="title">Intel® ISPC User's Guide</h1>
<p>The Intel® Implicit SPMD Program Compiler (Intel® ISPC) is a compiler for
writing SPMD (single program multiple data) programs to run on the CPU and GPU.
The SPMD
programming approach is widely known to graphics and GPGPU programmers; it
is used for GPU shaders and CUDA* and OpenCL* kernels, for example. The
main idea behind SPMD is that one writes programs as if they were operating
on a single data element (a pixel for a pixel shader, for example), but
then the underlying hardware and runtime system executes multiple
invocations of the program in parallel with different inputs (the values
for different pixels, for example).</p>
<p>The main goals behind <tt class="docutils literal">ispc</tt> are to:</p>
<ul class="simple">
<li>Build a variant of the C programming language that delivers good
performance to performance-oriented programmers who want to run SPMD
programs on CPUs and GPUs.</li>
<li>Provide a thin abstraction layer between the programmer and the
hardware--in particular, to follow the lesson from C for serial programs
of having an execution and data model where the programmer can cleanly
reason about the mapping of their source program to compiled assembly
language and the underlying hardware.</li>
<li>Harness the computational power of Single Program, Multiple Data (SIMD) vector
units without the extremely low-productivity task of directly writing
intrinsics.</li>
<li>Explore opportunities enabled by tight coupling between C/C++ application code
and SPMD <tt class="docutils literal">ispc</tt> code running on the same processor—lightweight function
calls between the two languages, direct data sharing via pointers without
copying or reformatting, etc.</li>
</ul>
<p><strong>We are very interested in your feedback and comments about ispc and
in hearing your experiences using the system. We are especially interested
in hearing if you try using ispc but see results that are not as you
were expecting or hoping for.</strong> We encourage you to send a note with your
experiences or comments to the <a class="reference external" href="https://github.com/ispc/ispc/discussions">GitHub Discussions</a> forum or to file bug or
feature requests with the <tt class="docutils literal">ispc</tt> <a class="reference external" href="https://github.com/ispc/ispc/issues?state=open">bug tracker</a>. (Thanks!)</p>
<p>Contents:</p>
<ul class="simple">
<li><a class="reference internal" href="#recent-changes-to-ispc">Recent Changes to ISPC</a><ul>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-30-0">Updating ISPC Programs For Changes In ISPC 1.30.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-29-0">Updating ISPC Programs For Changes In ISPC 1.29.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-28-0">Updating ISPC Programs For Changes In ISPC 1.28.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-27-0">Updating ISPC Programs For Changes In ISPC 1.27.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-26-0">Updating ISPC Programs For Changes In ISPC 1.26.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-25-0">Updating ISPC Programs For Changes In ISPC 1.25.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-24-0">Updating ISPC Programs For Changes In ISPC 1.24.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-23-0">Updating ISPC Programs For Changes In ISPC 1.23.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-22-0">Updating ISPC Programs For Changes In ISPC 1.22.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-21-0">Updating ISPC Programs For Changes In ISPC 1.21.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-20-0">Updating ISPC Programs For Changes In ISPC 1.20.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-19-0">Updating ISPC Programs For Changes In ISPC 1.19.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-18-0">Updating ISPC Programs For Changes In ISPC 1.18.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-17-0">Updating ISPC Programs For Changes In ISPC 1.17.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-16-0">Updating ISPC Programs For Changes In ISPC 1.16.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-15-0">Updating ISPC Programs For Changes In ISPC 1.15.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-14-1">Updating ISPC Programs For Changes In ISPC 1.14.1</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-14-0">Updating ISPC Programs For Changes In ISPC 1.14.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-13-0">Updating ISPC Programs For Changes In ISPC 1.13.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-12-0">Updating ISPC Programs For Changes In ISPC 1.12.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-11-0">Updating ISPC Programs For Changes In ISPC 1.11.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-10-0">Updating ISPC Programs For Changes In ISPC 1.10.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-9-2">Updating ISPC Programs For Changes In ISPC 1.9.2</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-9-1">Updating ISPC Programs For Changes In ISPC 1.9.1</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-9-0">Updating ISPC Programs For Changes In ISPC 1.9.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-8-2">Updating ISPC Programs For Changes In ISPC 1.8.2</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-7-0">Updating ISPC Programs For Changes In ISPC 1.7.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-6-0">Updating ISPC Programs For Changes In ISPC 1.6.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-5-0">Updating ISPC Programs For Changes In ISPC 1.5.0</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-3">Updating ISPC Programs For Changes In ISPC 1.3</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-2">Updating ISPC Programs For Changes In ISPC 1.2</a></li>
<li><a class="reference internal" href="#updating-ispc-programs-for-changes-in-ispc-1-1">Updating ISPC Programs For Changes In ISPC 1.1</a></li>
</ul>
</li>
<li><a class="reference internal" href="#getting-started-with-ispc">Getting Started with ISPC</a><ul>
<li><a class="reference internal" href="#installing-ispc">Installing ISPC</a></li>
<li><a class="reference internal" href="#compiling-and-running-a-simple-ispc-program">Compiling and Running a Simple ISPC Program</a></li>
</ul>
</li>
<li><a class="reference internal" href="#using-the-ispc-compiler">Using The ISPC Compiler</a><ul>
<li><a class="reference internal" href="#basic-command-line-options">Basic Command-line Options</a></li>
<li><a class="reference internal" href="#selecting-the-compilation-target">Selecting The Compilation Target</a></li>
<li><a class="reference internal" href="#selecting-32-or-64-bit-addressing">Selecting 32 or 64 Bit Addressing</a></li>
<li><a class="reference internal" href="#the-preprocessor">The Preprocessor</a></li>
<li><a class="reference internal" href="#pragma-directives">Pragma Directives</a></li>
<li><a class="reference internal" href="#debugging">Debugging</a></li>
<li><a class="reference internal" href="#optimization-settings">Optimization Settings</a></li>
<li><a class="reference internal" href="#other-ways-of-passing-arguments-to-ispc">Other ways of passing arguments to ISPC</a></li>
<li><a class="reference internal" href="#sample-based-profile-guided-optimization">Sample-Based Profile-Guided Optimization</a></li>
</ul>
</li>
<li><a class="reference internal" href="#using-ispc-as-a-library">Using ISPC as a Library</a><ul>
<li><a class="reference internal" href="#library-initialization">Library Initialization</a></li>
<li><a class="reference internal" href="#simple-compilation-interface">Simple Compilation Interface</a></li>
<li><a class="reference internal" href="#advanced-interface-with-ispcengine">Advanced Interface with ISPCEngine</a></li>
<li><a class="reference internal" href="#just-in-time-jit-compilation-interface">Just-In-Time (JIT) Compilation Interface</a></li>
<li><a class="reference internal" href="#compatibility">Compatibility</a></li>
<li><a class="reference internal" href="#cmake-integration">CMake Integration</a><ul>
<li><a class="reference internal" href="#basic-usage">Basic Usage</a></li>
<li><a class="reference internal" href="#cmake-variables">CMake Variables</a></li>
<li><a class="reference internal" href="#cmake-example">CMake Example</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#the-ispc-parallel-execution-model">The ISPC Parallel Execution Model</a><ul>
<li><a class="reference internal" href="#basic-concepts-program-instances-and-gangs-of-program-instances">Basic Concepts: Program Instances and Gangs of Program Instances</a></li>
<li><a class="reference internal" href="#control-flow-within-a-gang">Control Flow Within A Gang</a><ul>
<li><a class="reference internal" href="#control-flow-example-if-statements">Control Flow Example: If Statements</a></li>
<li><a class="reference internal" href="#control-flow-example-loops">Control Flow Example: Loops</a></li>
<li><a class="reference internal" href="#gang-convergence-guarantees">Gang Convergence Guarantees</a></li>
</ul>
</li>
<li><a class="reference internal" href="#uniform-data">Uniform Data</a><ul>
<li><a class="reference internal" href="#uniform-control-flow">Uniform Control Flow</a></li>
<li><a class="reference internal" href="#uniform-variables-and-varying-control-flow">Uniform Variables and Varying Control Flow</a></li>
</ul>
</li>
<li><a class="reference internal" href="#data-races-within-a-gang">Data Races Within a Gang</a></li>
<li><a class="reference internal" href="#tasking-model">Tasking Model</a></li>
</ul>
</li>
<li><a class="reference internal" href="#the-ispc-language">The ISPC Language</a><ul>
<li><a class="reference internal" href="#relationship-to-the-c-programming-language">Relationship To The C Programming Language</a></li>
<li><a class="reference internal" href="#lexical-structure">Lexical Structure</a><ul>
<li><a class="reference internal" href="#integer-literals">Integer Literals</a></li>
<li><a class="reference internal" href="#floating-point-literals">Floating Point Literals</a></li>
<li><a class="reference internal" href="#string-literals">String Literals</a></li>
</ul>
</li>
<li><a class="reference internal" href="#types">Types</a><ul>
<li><a class="reference internal" href="#basic-types-and-type-qualifiers">Basic Types and Type Qualifiers</a></li>
<li><a class="reference internal" href="#signed-and-unsigned-integer-types">Signed and Unsigned Integer Types</a></li>
<li><a class="reference internal" href="#uniform-and-varying-qualifiers">"uniform" and "varying" Qualifiers</a></li>
<li><a class="reference internal" href="#defining-new-names-for-types">Defining New Names For Types</a></li>
<li><a class="reference internal" href="#pointer-types">Pointer Types</a></li>
<li><a class="reference internal" href="#function-pointer-types">Function Pointer Types</a></li>
<li><a class="reference internal" href="#reference-types">Reference Types</a></li>
<li><a class="reference internal" href="#enumeration-types">Enumeration Types</a></li>
<li><a class="reference internal" href="#short-vector-types">Short Vector Types</a></li>
<li><a class="reference internal" href="#array-types">Array Types</a></li>
<li><a class="reference internal" href="#struct-types">Struct Types</a><ul>
<li><a class="reference internal" href="#operators-overloading">Operators Overloading</a></li>
</ul>
</li>
<li><a class="reference internal" href="#structure-of-array-types">Structure of Array Types</a></li>
</ul>
</li>
<li><a class="reference internal" href="#declarations-and-initializers">Declarations and Initializers</a></li>
<li><a class="reference internal" href="#attributes">Attributes</a><ul>
<li><a class="reference internal" href="#noescape">noescape</a></li>
<li><a class="reference internal" href="#address-space">address_space</a></li>
<li><a class="reference internal" href="#unmangled">unmangled</a></li>
<li><a class="reference internal" href="#external-only">external_only</a></li>
<li><a class="reference internal" href="#deprecated">deprecated</a></li>
<li><a class="reference internal" href="#aligned">aligned</a></li>
</ul>
</li>
<li><a class="reference internal" href="#expressions">Expressions</a><ul>
<li><a class="reference internal" href="#dynamic-memory-allocation">Dynamic Memory Allocation</a></li>
<li><a class="reference internal" href="#type-casting">Type Casting</a></li>
</ul>
</li>
<li><a class="reference internal" href="#control-flow">Control Flow</a><ul>
<li><a class="reference internal" href="#conditional-statements-if">Conditional Statements: "if"</a></li>
<li><a class="reference internal" href="#conditional-statements-switch">Conditional Statements: "switch"</a></li>
<li><a class="reference internal" href="#iteration-statements">Iteration Statements</a><ul>
<li><a class="reference internal" href="#basic-iteration-statements-for-while-and-do">Basic Iteration Statements: "for", "while", and "do"</a></li>
<li><a class="reference internal" href="#iteration-over-active-program-instances-foreach-active">Iteration over active program instances: "foreach_active"</a></li>
<li><a class="reference internal" href="#iteration-over-unique-elements-foreach-unique">Iteration over unique elements: "foreach_unique"</a></li>
<li><a class="reference internal" href="#parallel-iteration-statements-foreach-and-foreach-tiled">Parallel Iteration Statements: "foreach" and "foreach_tiled"</a></li>
<li><a class="reference internal" href="#parallel-iteration-with-programindex-and-programcount">Parallel Iteration with "programIndex" and "programCount"</a></li>
</ul>
</li>
<li><a class="reference internal" href="#unstructured-control-flow-goto">Unstructured Control Flow: "goto"</a></li>
<li><a class="reference internal" href="#coherent-control-flow-statements-cif-and-friends">"Coherent" Control Flow Statements: "cif" and Friends</a></li>
<li><a class="reference internal" href="#functions-and-function-calls">Functions and Function Calls</a><ul>
<li><a class="reference internal" href="#function-overloading">Function Overloading</a></li>
</ul>
</li>
<li><a class="reference internal" href="#re-establishing-the-execution-mask">Re-establishing The Execution Mask</a></li>
<li><a class="reference internal" href="#task-parallel-execution">Task Parallel Execution</a><ul>
<li><a class="reference internal" href="#task-parallelism-launch-and-sync-statements">Task Parallelism: "launch" and "sync" Statements</a></li>
<li><a class="reference internal" href="#task-parallelism-runtime-requirements">Task Parallelism: Runtime Requirements</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#llvm-intrinsic-functions">LLVM Intrinsic Functions</a></li>
<li><a class="reference internal" href="#function-templates">Function Templates</a></li>
</ul>
</li>
<li><a class="reference internal" href="#the-ispc-standard-library">The ISPC Standard Library</a><ul>
<li><a class="reference internal" href="#basic-operations-on-data">Basic Operations On Data</a><ul>
<li><a class="reference internal" href="#logical-and-selection-operations">Logical and Selection Operations</a></li>
<li><a class="reference internal" href="#bit-operations">Bit Operations</a></li>
</ul>
</li>
<li><a class="reference internal" href="#math-functions">Math Functions</a><ul>
<li><a class="reference internal" href="#basic-math-functions">Basic Math Functions</a></li>
<li><a class="reference internal" href="#transcendental-functions">Transcendental Functions</a></li>
<li><a class="reference internal" href="#saturating-arithmetic">Saturating Arithmetic</a></li>
<li><a class="reference internal" href="#dot-product">Dot product</a></li>
<li><a class="reference internal" href="#intel-amx-advanced-matrix-extensions">Intel AMX (Advanced Matrix Extensions)</a></li>
<li><a class="reference internal" href="#pseudo-random-numbers">Pseudo-Random Numbers</a></li>
<li><a class="reference internal" href="#random-numbers">Random Numbers</a></li>
</ul>
</li>
<li><a class="reference internal" href="#output-functions">Output Functions</a></li>
<li><a class="reference internal" href="#assertions">Assertions</a></li>
<li><a class="reference internal" href="#compiler-optimization-hints">Compiler Optimization Hints</a></li>
<li><a class="reference internal" href="#cross-program-instance-operations">Cross-Program Instance Operations</a><ul>
<li><a class="reference internal" href="#reductions">Reductions</a></li>
</ul>
</li>
<li><a class="reference internal" href="#stack-memory-allocation">Stack Memory Allocation</a></li>
<li><a class="reference internal" href="#data-movement">Data Movement</a><ul>
<li><a class="reference internal" href="#setting-and-copying-values-in-memory">Setting and Copying Values In Memory</a></li>
<li><a class="reference internal" href="#packed-load-and-store-operations">Packed Load and Store Operations</a></li>
<li><a class="reference internal" href="#streaming-load-and-store-operations">Streaming Load and Store Operations</a></li>
</ul>
</li>
<li><a class="reference internal" href="#data-conversions">Data Conversions</a><ul>
<li><a class="reference internal" href="#converting-between-array-of-structures-and-structure-of-arrays-layout">Converting Between Array-of-Structures and Structure-of-Arrays Layout</a></li>
<li><a class="reference internal" href="#conversions-to-and-from-half-precision-floats">Conversions To and From Half-Precision Floats</a></li>
<li><a class="reference internal" href="#converting-from-to-srgb8">Converting from/to sRGB8</a></li>
</ul>
</li>
<li><a class="reference internal" href="#systems-programming-support">Systems Programming Support</a><ul>
<li><a class="reference internal" href="#atomic-operations-and-memory-fences">Atomic Operations and Memory Fences</a></li>
<li><a class="reference internal" href="#prefetches">Prefetches</a></li>
<li><a class="reference internal" href="#system-information">System Information</a></li>
</ul>
</li>
</ul>
</li>
<li><a class="reference internal" href="#interoperability-with-the-application">Interoperability with the Application</a><ul>
<li><a class="reference internal" href="#interoperability-overview">Interoperability Overview</a></li>
<li><a class="reference internal" href="#data-layout">Data Layout</a></li>
<li><a class="reference internal" href="#data-alignment-and-aliasing">Data Alignment and Aliasing</a></li>
<li><a class="reference internal" href="#restructuring-existing-programs-to-use-ispc">Restructuring Existing Programs to Use ISPC</a></li>
</ul>
</li>
<li><a class="reference internal" href="#notices-disclaimers">Notices & Disclaimers</a></li>
</ul>
<div class="section" id="recent-changes-to-ispc">
<h1>Recent Changes to ISPC</h1>
<p>See the file <a class="reference external" href="https://raw.github.com/ispc/ispc/main/docs/ReleaseNotes.txt">ReleaseNotes.txt</a> in the <tt class="docutils literal">ispc</tt> distribution for a list
of recent changes to the compiler.</p>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-30-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.30.0</h2>
<p>New Features:</p>
<ul class="simple">
<li>Intel AMX (Advanced Matrix Extensions) support has been added to the standard
library. AMX provides hardware acceleration for matrix operations, particularly
useful for machine learning workloads. The new <tt class="docutils literal"><amx.isph></tt> header provides
functions for tile configuration, data loading/storing, and matrix dot products
for INT8, BF16, and FP16 data types. AMX is supported on <tt class="docutils literal">avx512spr</tt>,
<tt class="docutils literal">avx512gnr</tt>, and <tt class="docutils literal">avx10.2dmr</tt> targets. Please refer to
<a class="reference internal" href="#intel-amx-advanced-matrix-extensions">Intel AMX (Advanced Matrix Extensions)</a> for more details.</li>
</ul>
<p>Language Changes:</p>
<ul class="simple">
<li>Integral type aliases (<tt class="docutils literal">size_t</tt>, <tt class="docutils literal">ptrdiff_t</tt>, <tt class="docutils literal">intptr_t</tt>, <tt class="docutils literal">uintptr_t</tt>)
can now be used as non-type template parameters.</li>
</ul>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-29-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.29.0</h2>
<p>Language Changes:</p>
<p>The compiler now assumes that all loops with non-constant conditions will make
forward progress and eventually terminate. This enables optimizations based
on the assumption that loops will not execute indefinitely. Infinite loops
with constant conditions like <tt class="docutils literal">for <span class="pre">(;;)</span></tt> or <tt class="docutils literal">while (1)</tt> are treated
specially and do not have this assumption applied.</p>
<p>Compiler Switches:</p>
<ul class="simple">
<li>Added <tt class="docutils literal"><span class="pre">--profile-sample-use=<file></span></tt> to enable profile-guided optimization
using sample profile data. When provided, the ISPC compiler loads the
specified sample profile data file and uses it to guide optimization decisions
during compilation. Use with <tt class="docutils literal"><span class="pre">--sample-profiling-debug-info</span></tt> to generate
debug information optimized for sample-based profiling.</li>
<li>Added <tt class="docutils literal"><span class="pre">--[no-]internal-export-functions</span></tt> to control generation of internal
(ISPC-callable) versions of exported functions. The flag is enabled by default.
When disabled (<tt class="docutils literal"><span class="pre">--no-internal-export-functions</span></tt>), only external versions are
generated and calling exported functions from ISPC code will result in a
compilation error.</li>
<li>Added <tt class="docutils literal"><span class="pre">--stack-protector[=<level>]</span></tt> flag to enable Stack Smash Protection (SSP)
for ISPC functions, providing runtime detection of stack buffer overflows.
<tt class="docutils literal"><span class="pre">--stack-protector</span></tt> (equivalent to <tt class="docutils literal"><span class="pre">--stack-protector=on</span></tt>) enables stack
protectors for functions vulnerable to stack smashing. <tt class="docutils literal"><span class="pre">--stack-protector=strong</span></tt>
enables stack protectors for functions that contain arrays of any size or take
addresses of local variables. <tt class="docutils literal"><span class="pre">--stack-protector=all</span></tt> enables stack protectors
for all functions. <tt class="docutils literal"><span class="pre">--stack-protector=none</span></tt> disables stack protectors (default).</li>
<li>The default DWARF version has been updated to DWARF 5, matching the LLVM
default. If your debugging tools do not support DWARF 5, you can use the
<tt class="docutils literal"><span class="pre">--dwarf-version=<N></span></tt> flag to specify an earlier version.</li>
</ul>
<p>Warning for Exported Function Calls:</p>
<p>A new warning has been introduced to prepare for an upcoming behavior change in
exported functions. Currently, <tt class="docutils literal">export</tt> functions generate both internal
(ISPC-callable) and external (C/C++-callable) versions. Starting in ISPC 1.30,
the default behavior will change to generate only external versions, matching
the behavior of the <tt class="docutils literal">external_only</tt> attribute.</p>
<p>In ISPC 1.29, the compiler issues a <strong>warning</strong> when an exported function
without the <tt class="docutils literal">external_only</tt> attribute is called from ISPC code. This warning
helps identify code that may be affected by the upcoming change. To address
this warning, you can:</p>
<ul class="simple">
<li>Use a non-exported function for ISPC-to-ISPC calls</li>
<li>Add the <tt class="docutils literal">external_only</tt> attribute (see <a class="reference internal" href="#external-only">external_only</a>) to the function
(note: this will generate an <strong>error</strong> instead of a warning)</li>
<li>Use the <tt class="docutils literal"><span class="pre">--no-internal-export-functions</span></tt> command-line flag to suppress
internal version generation (note: this will generate <strong>errors</strong> instead of
warnings for all exported function calls)</li>
</ul>
<p>Note: The compiler can only detect calls to exported functions within the same
compilation unit. Cross-module calls to exported functions cannot be detected.</p>
<p>ISPC Targets:</p>
<ul class="simple">
<li>New <tt class="docutils literal"><span class="pre">avx512gnr-x4</span></tt>, <tt class="docutils literal"><span class="pre">avx512gnr-x8</span></tt>, <tt class="docutils literal"><span class="pre">avx512gnr-x16</span></tt>, <tt class="docutils literal"><span class="pre">avx512gnr-x32</span></tt>,
and <tt class="docutils literal"><span class="pre">avx512gnr-x64</span></tt> targets have been added for Intel Granite Rapids processors.
These targets support AVX-512 with AMX-FP16 capabilities.</li>
<li><tt class="docutils literal">avx10.2</tt> family of targets has been renamed to <tt class="docutils literal">avx10.2dmr</tt>. The macro
<tt class="docutils literal">ISPC_TARGET_AVX10_2</tt> has been replaced with <tt class="docutils literal">ISPC_TARGET_AVX10_2DMR</tt>.</li>
<li><tt class="docutils literal"><span class="pre">sse2-i32x4</span></tt> and <tt class="docutils literal"><span class="pre">sse2-i32x8</span></tt> targets are deprecated and will be removed in
future releases.</li>
<li><tt class="docutils literal"><span class="pre">gen9-x8</span></tt>, and <tt class="docutils literal"><span class="pre">gen9-x16</span></tt> targets have been removed.</li>
</ul>
<p>Predefined Macros:</p>
<ul class="simple">
<li>New predefined macros <tt class="docutils literal">ISPC_TARGET_HAS_FP16_SUPPORT</tt> and
<tt class="docutils literal">ISPC_TARGET_HAS_FP64_SUPPORT</tt> have been added following the consistent
naming convention used by other target capability macros. The old macro names
<tt class="docutils literal">ISPC_FP16_SUPPORTED</tt> and <tt class="docutils literal">ISPC_FP64_SUPPORTED</tt> remain available for
backward compatibility but are now deprecated.</li>
<li>The <tt class="docutils literal">ISPC_TARGET_AVX512GNR</tt> macro has been added.</li>
<li>The <tt class="docutils literal">ISPC_TARGET_AVX10_2</tt> macro has been replaced with
<tt class="docutils literal">ISPC_TARGET_AVX10_2DMR</tt> to match the target renaming.</li>
</ul>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-28-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.28.0</h2>
<p>New Features:</p>
<ul class="simple">
<li>Added a new command-line option <tt class="docutils literal"><span class="pre">--include-float16-conversions</span></tt>. This
option makes the compiler include float16 conversion functions in the
compiled module. This is useful for targets that do not have native
instructions for float16 conversions, such as x86 targets prior to AVX2.
This option is disabled by default.</li>
<li>ISPC can now generate nanobind wrappers for ISPC modules, allowing easy
and lightweight integration of ISPC code with Python. The generated wrappers
can be built into native Python modules and imported into Python code. The
<tt class="docutils literal"><span class="pre">--nanobind-wrapper=<filename></span></tt> command-line option enables this feature.</li>
<li>Struct operator overloading has been extended. Extended support for
overloading unary (<tt class="docutils literal">++</tt>, <tt class="docutils literal"><span class="pre">--</span></tt>, <tt class="docutils literal">-</tt>, <tt class="docutils literal">!</tt>, <tt class="docutils literal">~</tt>), binary (<tt class="docutils literal">*</tt>,
<tt class="docutils literal">/</tt>, <tt class="docutils literal">%</tt>, <tt class="docutils literal">+</tt>, <tt class="docutils literal">-</tt>, <tt class="docutils literal">>></tt>, <tt class="docutils literal"><<</tt>, <tt class="docutils literal">==</tt>, <tt class="docutils literal">!=</tt>, <tt class="docutils literal"><</tt>, <tt class="docutils literal">></tt>,
<tt class="docutils literal"><=</tt>, <tt class="docutils literal">>=</tt>, <tt class="docutils literal">&</tt>, <tt class="docutils literal">|</tt>, <tt class="docutils literal">^</tt>, <tt class="docutils literal">&&</tt>, <tt class="docutils literal">||</tt>), and assignment
(<tt class="docutils literal">=</tt>, <tt class="docutils literal">+=</tt>, <tt class="docutils literal"><span class="pre">-=</span></tt> , <tt class="docutils literal">*=</tt>, <tt class="docutils literal">/=</tt>, <tt class="docutils literal">%=</tt>, <tt class="docutils literal"><<=</tt>, <tt class="docutils literal">>>=</tt>, <tt class="docutils literal">&=</tt>,
<tt class="docutils literal">|=</tt>, <tt class="docutils literal">^=</tt>) operators for struct types.</li>
<li>ISPC can now be used as a C++ library (<tt class="docutils literal">libispc</tt>) for embedding ISPC
compilation directly into applications. It now also provides CMake
configuration files for easy integration into other CMake projects. The
library includes experimental Just-In-Time (JIT) compilation capabilities
for runtime code generation and execution. See the section
<a class="reference internal" href="#using-ispc-as-a-library">Using ISPC as a Library</a> for more details.</li>
<li>Added a new <tt class="docutils literal">include/intrinsics</tt> directory containing header files that
implement selected SSE intrinsics in ISPC. If you're porting existing code
from intrinsics to ISPC, you can use these headers as a reference.</li>
</ul>
<p>Language and Syntax Changes:</p>
<ul class="simple">
<li>Integer literals are now stricter:<ul>
<li>Limits the number of occurrences of <tt class="docutils literal">[uUlL]</tt> symbols (e.g., <tt class="docutils literal">ulll</tt>,
<tt class="docutils literal">uul</tt>, and <tt class="docutils literal">lulu</tt> are not valid anymore).</li>
<li>The value modification suffix (i.e., <tt class="docutils literal">[kMG]</tt>) must precede the type
modification suffix (i.e., <tt class="docutils literal">[uUlL]</tt> symbols).</li>
<li>Like C/C++, <tt class="docutils literal">lL</tt> and <tt class="docutils literal">Ll</tt> suffixes are no longer allowed (i.e., mixing
lower- and upper-case <tt class="docutils literal">L</tt> to form a <tt class="docutils literal">LL</tt> suffix).</li>
</ul>
</li>
</ul>
<p>Standard Library Changes:</p>
<ul>
<li><p class="first"><tt class="docutils literal">select</tt> functions now support unsigned integer types <tt class="docutils literal">uint8</tt>,
<tt class="docutils literal">uint16</tt>, <tt class="docutils literal">uint32</tt>, and <tt class="docutils literal">uint64</tt> as well as uniform short vectors.</p>
</li>
<li><p class="first">Added new functions: <tt class="docutils literal">isinf</tt>, <tt class="docutils literal">isfinite</tt>, and <tt class="docutils literal">srgb8_to_float</tt>.</p>
</li>
<li><p class="first">Standard library functions for short vectors have been moved to a separate
header file <tt class="docutils literal">short_vec.isph</tt>. They are no longer defined implicitly for
every file compiled with ISPC. Code using such functions should now include
this file with:</p>
<pre class="literal-block">
#include "short_vec.isph"
</pre>
</li>
<li><p class="first">Support for short vector types has been added to the following element-wise
functions: <tt class="docutils literal">fmod</tt>, <tt class="docutils literal">isnan</tt>, <tt class="docutils literal">rsqrt_fast</tt>, and <tt class="docutils literal">clamp</tt>.</p>
</li>
</ul>
<p>Debugging:</p>
<p>The default DWARF version was updated to DWARF 5. You can still
specify any supported DWARF version with the <tt class="docutils literal"><span class="pre">--dwarf-version=</span></tt> switch.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-27-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.27.0</h2>
<p>New targets:</p>
<p>New targets have been added for platforms supporting Intel® Advanced Vector
Extensions 10.2: <tt class="docutils literal"><span class="pre">avx10.2-x4</span></tt>, <tt class="docutils literal"><span class="pre">avx10.2-x8</span></tt>, <tt class="docutils literal"><span class="pre">avx10.2-x16</span></tt>,
<tt class="docutils literal"><span class="pre">avx10.2-x32</span></tt>, and <tt class="docutils literal"><span class="pre">avx10.2-x64</span></tt>. Additionally, a new macro
<tt class="docutils literal">ISPC_TARGET_AVX10_2</tt> has been introduced.</p>
<p>Standard library:</p>
<ul class="simple">
<li>Cross-lane operations - <tt class="docutils literal">broadcast</tt>, <tt class="docutils literal">rotate</tt>, <tt class="docutils literal">shift</tt>, and
<tt class="docutils literal">shuffle</tt> - are now supported for unsigned types.</li>
<li>ISPC's data handling capabilities have been extended to include signed and
unsigned <tt class="docutils literal">int8</tt> and <tt class="docutils literal">int16</tt> types in the reduction functions.</li>
<li>Support for <tt class="docutils literal">packed_load</tt> and <tt class="docutils literal">packed_store</tt> operations has also been
expanded to include: <tt class="docutils literal">int8</tt>, <tt class="docutils literal">int16</tt> (signed/unsigned), <tt class="docutils literal">float16</tt>,
<tt class="docutils literal">float</tt>, and <tt class="docutils literal">double</tt>.</li>
<li>The cube root function <tt class="docutils literal">cbrt</tt> has been added to the standard library for
<tt class="docutils literal">float</tt> and <tt class="docutils literal">double</tt> types.</li>
<li>Dot product functionality has been enhanced with mixed signedness support for
16-bit integers. The following input combinations are now supported: u16 x u16
(unsigned x unsigned), i16 x i16 (signed x signed), u16 x i16 (mixed
signedness). For consistency with other naming conventions, the function
<tt class="docutils literal">dot2add_i16_packed</tt> has been renamed to <tt class="docutils literal">dot2add_i16i16_packed</tt>.</li>
</ul>
<p>New standard library functions for short vectors:</p>
<p>The <tt class="docutils literal">max</tt>, <tt class="docutils literal">min</tt> and <tt class="docutils literal">abs</tt> functions for short vectors of basic types
have been added to the standard library. They support both uniform and varying
short vector types for all basic types supported by the corresponding standard
functions, i.e., signed and unsigned integer types and floating-point types.</p>
<p>It makes it possible, for example, to find the maximum value between two short
vectors:</p>
<pre class="literal-block">
uniform int<3> a = {1, 2, 3};
uniform int<3> b = {3, -2, 1};
uniform int<3> c = max(a, b); // c = {3, 2, 3}
varying float<4> x, y;
varying float<4> z = max(x, y);
</pre>
<p>Support for short vector types has also been added for the following
floating-point element-wise functions: <tt class="docutils literal">round</tt>, <tt class="docutils literal">floor</tt>, <tt class="docutils literal">ceil</tt>,
<tt class="docutils literal">trunc</tt>, <tt class="docutils literal">rcp</tt>, <tt class="docutils literal">rcp_fast</tt>, <tt class="docutils literal">sqrt</tt>, <tt class="docutils literal">rsqrt</tt>, <tt class="docutils literal">sin</tt>, <tt class="docutils literal">asin</tt>,
<tt class="docutils literal">cos</tt>, <tt class="docutils literal">acos</tt>, <tt class="docutils literal">tan</tt>, <tt class="docutils literal">atan</tt>, <tt class="docutils literal">exp</tt>, <tt class="docutils literal">log</tt>, <tt class="docutils literal">atan2</tt>, <tt class="docutils literal">pow</tt> and
<tt class="docutils literal">cbrt</tt>.</p>
<p>Language changes:</p>
<ul class="simple">
<li>The <tt class="docutils literal">aligned(N)</tt> attribute is now available to specify the alignment of
variables and struct types.</li>
<li>A bug was fixed where unsigned array indices or pointer arithmetic with
unsigned offsets could result in overflow due to sign extension when promoting
to pointer size. This issue is now resolved, and the compiler correctly
handles unsigned integer indexing and pointer arithmetic.</li>
</ul>
<p>Compiler flags changes:</p>
<ul class="simple">
<li>The <tt class="docutils literal"><span class="pre">-dD</span></tt> and <tt class="docutils literal"><span class="pre">-dM</span></tt> flags are now supported. They are useful for debugging the
preprocessor and checking the macros defined by the compiler.</li>
</ul>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-26-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.26.0</h2>
<p>There are breaking changes to ARM support:</p>
<ul class="simple">
<li>The <tt class="docutils literal"><span class="pre">--arch=arm</span></tt> flag, which previously mapped to ARMv7 (32-bit), now maps
to ARMv8 (32-bit). There are no changes to <tt class="docutils literal"><span class="pre">--arch=aarch64</span></tt>, which continues
to map to ARMv8 (64-bit).</li>
<li>The CPU definitions for the ARMv7 architecture have been removed:
<tt class="docutils literal"><span class="pre">cortex-a9</span></tt> and <tt class="docutils literal"><span class="pre">cortex-a15</span></tt>.</li>
<li>New CPU definitions have been introduced, including <tt class="docutils literal"><span class="pre">cortex-a55</span></tt>,
<tt class="docutils literal"><span class="pre">cortex-a78</span></tt>, <tt class="docutils literal"><span class="pre">cortex-a510</span></tt>, and <tt class="docutils literal"><span class="pre">cortex-a520</span></tt>, along with support for
new Apple devices.</li>
<li>New double-pumped targets have been introduced: <tt class="docutils literal"><span class="pre">neon-i16x16</span></tt> and
<tt class="docutils literal"><span class="pre">neon-i8x32</span></tt>.</li>
</ul>
<p>Language Updates:</p>
<ul class="simple">
<li>Macro definitions for the LLVM version that ISPC is based on have been added.
Please refer to <a class="reference internal" href="#the-preprocessor">The Preprocessor</a> for more details.</li>
<li>The <tt class="docutils literal"><span class="pre">__attribute__((deprecated))</span></tt> attribute can now be applied to a function
to mark it as deprecated, generating a warning when the function is called.</li>
</ul>
<p>Compiler flags changes:</p>
<ul class="simple">
<li>The <tt class="docutils literal"><span class="pre">--nocpp</span></tt> command-line flag is deprecated and will be removed in a
future release.</li>
<li>The target <tt class="docutils literal"><span class="pre">avx512knl-x16</span></tt> has been removed.</li>
<li>The <tt class="docutils literal"><span class="pre">--darwin-version-min</span></tt> option has been added to specify the minimum
deployment target version for macOS and iOS applications. This addresses a new
linker behavior introduced in Xcode 15.0, which issues a warning when no
version is provided.</li>
</ul>
<p>The behavior of user programs when no supported ISA is detected in the
auto-dispatch code has changed. Instead of raising the <tt class="docutils literal">SIGABRT</tt> signal, the
system will now raise <tt class="docutils literal">SIGILL</tt>. This affects users who rely on <tt class="docutils literal">SIGABRT</tt> in
their signal handlers for error handling or recovery. Such users must update
their code to handle <tt class="docutils literal">SIGILL</tt> instead. This change improves predictability and
removes the dispatcher's reliance on the C standard library.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-25-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.25.0</h2>
<p>The ISPC language has been extended to support the <tt class="docutils literal"><span class="pre">__attribute__(())</span></tt> syntax
for variable and function declarations. The following attributes are now
supported: <tt class="docutils literal">noescape</tt>, <tt class="docutils literal">address_space(N)</tt>, <tt class="docutils literal">external_only</tt>, and
<tt class="docutils literal">unmangled</tt>. The macro <tt class="docutils literal">ISPC_ATTRIBUTE_SUPPORTED</tt> is defined if the ISPC
compiler supports attribute syntax. Please refer to the <a class="reference internal" href="#attributes">Attributes</a> section
for more details and the full list of supported attributes.</p>
<p>This release introduces support for the <tt class="docutils literal"><span class="pre">-ffunction-sections</span></tt> command-line
flag, which generates each function in a separate section. This flag is useful
for reducing the size of the final executable by removing unused functions.
Please refer to the <a class="reference internal" href="#basic-command-line-options">Basic Command-line Options</a> section for more details.</p>
<p>In some cases, such as shared libraries, the <tt class="docutils literal"><span class="pre">-ffunction-sections</span></tt> flag alone
may not be sufficient to remove unused ISPC copies of exported functions. To
address this, you can use the <tt class="docutils literal">external_only</tt> function attribute. This
attribute can only be applied to exported functions and instructs the compiler
to remove the ISPC version of the function. For more information, please refer
to the <a class="reference internal" href="#attributes">Attributes</a> and <a class="reference internal" href="#functions-and-function-calls">Functions and Function Calls</a> sections.</p>
<p>Template support for short vectors and array declarations has been extended.
You can now use both type and non-type parameters to specify the type and
dimensions of these types.</p>
<p>For ARM targets, IEEE 754-compliant instructions (<tt class="docutils literal">fminnm</tt> and <tt class="docutils literal">vminnm</tt>) are
now generated for min/max operations, replacing the previous use of <tt class="docutils literal">fmin</tt> and
<tt class="docutils literal">vmin</tt>.</p>
<p>The <tt class="docutils literal"><span class="pre">avx512knl-x16</span></tt>, <tt class="docutils literal"><span class="pre">gen9-x8</span></tt>, and <tt class="docutils literal"><span class="pre">gen9-x16</span></tt> targets are deprecated and
will be removed in future releases.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-24-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.24.0</h2>
<p>This release extends the standard library with new functions performing dot
product operations. These functions utilize specific hardware instructions from
AVX-VNNI and AVX512-VNNI. The ISPC targets that support native VNNI
instructions are <tt class="docutils literal"><span class="pre">avx2vnni-i32x*</span></tt>, <tt class="docutils literal"><span class="pre">avx512icl-*</span></tt> and <tt class="docutils literal"><span class="pre">avx512spr-*</span></tt>. The
first two targets (<tt class="docutils literal"><span class="pre">avx2vnni-*</span></tt> and <tt class="docutils literal"><span class="pre">avx512icl-*</span></tt>) were introduced in this
release. Please refer to <a class="reference internal" href="#dot-product">Dot product</a> for more details.</p>
<p>Now, uniform integers and enums can be used as non-type template parameters.
Please refer to <a class="reference internal" href="#function-templates">Function Templates</a> for more details.</p>
<p>The release contains the following changes that may affect compatibility with
older versions:</p>
<ul class="simple">
<li><tt class="docutils literal"><span class="pre">--pic</span></tt> command line flag now corresponds to the <tt class="docutils literal"><span class="pre">-fpic</span></tt> flag of Clang
and GCC, whereas the newly introduced <tt class="docutils literal"><span class="pre">--PIC</span></tt> corresponds to <tt class="docutils literal"><span class="pre">-fPIC</span></tt>.
The previous behavior of <tt class="docutils literal"><span class="pre">--pic</span></tt> flag corresponded to <tt class="docutils literal"><span class="pre">-fPIC</span></tt> flag. In
some cases, to preserve previous behavior, users may need to switch to
<tt class="docutils literal"><span class="pre">--PIC</span></tt>.</li>
<li>Newly introduced macro definitions for numeric limits can cause conflicts
with user-defined macros with same names. When this happens, ISPC emits
warnings about macro redefinition. Please, refer to <a class="reference internal" href="#the-preprocessor">The Preprocessor</a> for
the full list of macro definitions.</li>
<li>The implementation of <tt class="docutils literal">round</tt> standard library function was aligned across
all targets. It may potentially affect the results of the code that uses this
function for the following targets: <tt class="docutils literal"><span class="pre">avx2-i16x16</span></tt>, <tt class="docutils literal"><span class="pre">avx2-i8x32</span></tt> and all
AVX-512 targets. Please, refer to <a class="reference internal" href="#basic-math-functions">Basic Math Functions</a> for more details.</li>
</ul>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-23-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.23.0</h2>
<p>This release contains the following changes that may affect compatibility with
older versions:</p>
<ul class="simple">
<li><cite>true</cite> <cite>bool</cite> values in storage were changed from <cite>-1</cite> to <cite>1</cite> to match C/C++ ABI.
Previously, ISPC treated <cite>bool</cite> values similarly to C/C++ in terms of size, but
incorrectly interpreted their actual values. This meant that <cite>true</cite> in ISPC
might not have translated correctly to true in C/C++. This issue was introduced
in version 1.13.0. Starting now, ISPC correctly stores and interprets <cite>true</cite>
values in a way that aligns with C/C++ expectations.</li>
</ul>
<p>A couple of improvements have been made to variables initialization:</p>
<ul class="simple">
<li>Variables with const qualifiers can be initialized using the values of
previously initialized const variables including arithmetic operations above
them. It now works also with varying types.</li>
<li>Enumeration type values can be used as constants.</li>
</ul>
<p>The result of selection operator can now be used as lvalue if it has suitable
type.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-22-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.22.0</h2>
<p>Template operators with explicit specializations and instantiations were introduced to
the language. The usage of different function specifiers with templates were fixed and
aligned, please refer to <a class="reference internal" href="#function-templates">Function Templates</a> section for more details.</p>
<p>Now, command-line switch <cite>--dwarf-version=<n></cite> forces DWARF format debug info generation
on Windows. It allows to debug ISPC code linked with MinGW generated code.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-21-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.21.0</h2>
<p>Now, in case of signed integer overflow, <cite>ispc</cite> will assume undefined behavior similar to
C and C++. This change may cause compatibility issues. You can manage this behavior by
using the <cite>--[no-]wrap-signed-int</cite> compiler switch. The default behavior (before version
1.21.0) can be preserved by using <cite>--wrap-signed-int</cite>, which maintains defined wraparound
behavior for signed integers, though it may limit some compiler optimizations.</p>
<p>Template function specializations with explicit template arguments were introduced to the
language, please refer to <a class="reference internal" href="#function-templates">Function Templates</a> section for more details.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-20-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.20.0</h2>
<p>New version of <cite>sse4</cite> targets were added, now you can specify either <cite>sse4.1</cite>
or <cite>sse4.2</cite>, for example <cite>sse4.2-i32x4</cite>. The changes are fully backward
compatible, meaning that <cite>sse4</cite> versions are still accepted and aliased to
<cite>sse4.2</cite>. Multi-target compilation accepts only one of <cite>sse4</cite>/<cite>sse4.1</cite>/<cite>sse4.2</cite>
targets. All of these targets will produce an object file with <cite>sse4</cite> suffix in
multi-target compilation.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-19-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.19.0</h2>
<p>New targets were added:</p>
<ul class="simple">
<li>avx512spr-x4, avx512spr-x8, avx512spr-x16, avx512spr-x32, avx512spr-x64 for
4th generation Intel® Xeon® Scalable (codename Sapphire Rapids) CPUs. A macro
<tt class="docutils literal">ISPC_TARGET_AVX512SPR</tt> was added.</li>
<li>xehpc-x16 and xehpc-x32 for Intel® Data Center GPU Max (codename Ponte Vecchio).</li>
</ul>
<p>Function templates were introduced to the language, please refer to the <a class="reference internal" href="#function-templates">Function
Templates</a> section for more details. Two new keywords were introduced: <tt class="docutils literal">template</tt>
and <tt class="docutils literal">typename</tt>.</p>
<p><tt class="docutils literal">ISPC_FP16_SUPPORTED</tt> macro was introduced for the targets supporting FP16.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-18-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.18.0</h2>
<p>AVX-512 targets were renamed to drop "base type" (or "mask size"), old naming is accepted for
compatibility. New names are avx512skx-x4, avx512skx-x8, avx512skx-x16,
avx512skx-x32, avx512skx-x64, and avx512knl-x16.</p>
<p>Standard library gained full support for <tt class="docutils literal">float16</tt> type. Note that it is
fully supported only on the targets with native hardware support.
On the other targets emulation is still not guaranteed, but may work in some cases.</p>
<p>The compiler gained support for <tt class="docutils literal"><span class="pre">-E</span></tt> switch for running preprocessor only,
which is similar to the switch of C/C++ compilers. Also, as a result of bug fix,
in case of preprocessor error, the compiler will crash now. It used not to crash and
produced some output (sometimes correct!). As it was a convenient feature for some
users running experiments in isolated environment (like ignoring missing includes
when compiling on <a class="reference external" href="https://godbolt.org/">Compiler Explorer</a>), the <tt class="docutils literal"><span class="pre">--ignore-preprocessor-errors</span></tt> switch
was added to preserve this behavior.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-17-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.17.0</h2>
<p>The release introduces new data type <tt class="docutils literal">float16</tt> and floating point literals
with <tt class="docutils literal">f16</tt> suffix.</p>
<p>For the sake of unification with C/C++, capital letter X may be used in
hexadecimal prefix (<tt class="docutils literal">0X</tt>) and capital letter P as a separator for exponent in
hexadecimal floating point. For example: <tt class="docutils literal">0X1P16</tt>.</p>
<p>The naming of Xe targets, architectures, device names has changed.</p>
<p>The standard library got new <tt class="docutils literal"><span class="pre">prefetchw_{l1,l2,l3}()</span></tt> intrinsics for
prefetching in anticipation of a write.</p>
<p>The algorithms used for implementation of <tt class="docutils literal">rsqrt(double)</tt> and <tt class="docutils literal">rcp(double)</tt>
standard library functions have changed on AVX-512 and may affect the existing
code.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-16-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.16.0</h2>
<p>The release has several new functions in the standard library that may
affect compatibility:</p>
<ul class="simple">
<li><tt class="docutils literal">alloca()</tt> - refer to <a class="reference internal" href="#stack-memory-allocation">Stack Memory Allocation</a> for more details.</li>
<li><tt class="docutils literal">assume()</tt> - refer to <a class="reference internal" href="#compiler-optimization-hints">Compiler Optimization Hints</a> for more details.</li>
<li><tt class="docutils literal">trunc()</tt> - refer to <a class="reference internal" href="#basic-math-functions">Basic Math Functions</a> for more details.</li>
</ul>
<p>The language got experimental feature for calling LLVM intrinsics. This
should not affect compatibility with existing programs.
See <a class="reference internal" href="#llvm-intrinsic-functions">LLVM Intrinsic Functions</a> for more details.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-15-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.15.0</h2>
<p>The release has several new language features, which do not affect compatibility.
Namely, packed_[load|store]_active() stdlib functions for 64 bit types, and loop
unroll pragmas: "#pragma unroll" and "#pragma nounroll".</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-14-1">
<h2>Updating ISPC Programs For Changes In ISPC 1.14.1</h2>
<p>The release doesn't contain language changes, which may affect compatibility with
older versions.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-14-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.14.0</h2>
<p>This release contains the following changes that may affect compatibility with
older versions:</p>
<ul class="simple">
<li>"generic" targets were removed. Please use native targets instead.</li>
</ul>
<p>New i8 and i16 targets were introduced: avx2-i8x32, avx2-i16x16, avx512skx-i8x64,
and avx512skx-i16x32.</p>
<p>Windows x86_64 target now supports <tt class="docutils literal">__vectorcall</tt> calling convention.
It's off by default, can be enabled by <tt class="docutils literal"><span class="pre">--vectorcall</span></tt> command line switch.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-13-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.13.0</h2>
<p>This release contains the following changes that may affect compatibility with
older versions:</p>
<ul class="simple">
<li>Representation of <tt class="docutils literal">bool</tt> type in storage was changed from target-specific to
one byte per boolean value. So size of <tt class="docutils literal">varying bool</tt> is target width (in
bytes), and size of <tt class="docutils literal">uniform bool</tt> is one. This definition is compatible
with C/C++, hence improves interoperability.</li>
<li>type aliases for unsigned types were added: <tt class="docutils literal">uint8</tt>, <tt class="docutils literal">uint16</tt>, <tt class="docutils literal">uint32</tt>,
<tt class="docutils literal">uint64</tt>, and <tt class="docutils literal">uint</tt>. To detect if these types are supported you can
check if ISPC_UINT_IS_DEFINED macro is defined, this is handy for writing code
which works with older versions of <tt class="docutils literal">ispc</tt>.</li>
<li><tt class="docutils literal">extract()</tt>/<tt class="docutils literal">insert()</tt> for boolean arguments, and <tt class="docutils literal">abs()</tt> for all integer and
FP types were added to standard library.</li>
</ul>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-12-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.12.0</h2>
<p>This release contains the following changes that may affect compatibility with
older versions:</p>
<ul class="simple">
<li><tt class="docutils literal">noinline</tt> keyword was added.</li>
<li>Standard library functions <tt class="docutils literal">rsqrt_fast()</tt> and <tt class="docutils literal">rcp_fast()</tt> were added.</li>
<li>AVX1.1 (IvyBridge) targets and generic KNC and KNL targets were removed.
Note that KNL is still supported through avx512knl-i32x16.</li>
</ul>
<p>The release also introduces static initialization for varying variables, which
should not affect compatibility.</p>
<p>This release introduces experimental cross OS compilation support and ARM/AARCH64
support. It also contains a new 128-bit AVX2 target (avx2-i32x4) and a CPU
definition for Ice Lake client (--device=icl).</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-11-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.11.0</h2>
<p>This release redefined the -O1 compiler option to optimize for size, so it may require
adjusting your build system accordingly.</p>
<p>Starting with version 1.11.0, auto-generated headers use <tt class="docutils literal">#pragma once</tt>. In the unlikely
case that your C/C++ compiler does not support that, please use the <tt class="docutils literal"><span class="pre">--no-pragma-once</span></tt>
<tt class="docutils literal">ispc</tt> switch.</p>
<p>This release also introduces a new AVX-512 target avx512skx-i32x8. It produces code,
which doesn't use ZMM registers.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-10-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.10.0</h2>
<p>The release has several new language features, which do not affect compatibility.
Namely, new streaming stores, aos_to_soa/soa_to_aos intrinsics for 64 bit types,
and a "#pragma ignore".</p>
<p>One change that may potentially affect compatibility is the changed size of short vector
types. If you use short vector types for data passed between C/C++ and ISPC, you
may want to pay attention to it.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-9-2">
<h2>Updating ISPC Programs For Changes In ISPC 1.9.2</h2>
<p>The release doesn't contain language changes, which may affect compatibility with
older versions.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-9-1">
<h2>Updating ISPC Programs For Changes In ISPC 1.9.1</h2>
<p>The release doesn't contain language changes, which may affect compatibility with
older versions. It introduces new AVX-512 target: avx512skx-i32x16.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-9-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.9.0</h2>
<p>The release doesn't contain language changes, which may affect compatibility with
older versions. It introduces new AVX-512 target: avx512knl-i32x16.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-8-2">
<h2>Updating ISPC Programs For Changes In ISPC 1.8.2</h2>
<p>The release doesn't contain language changes, which may affect compatibility with
older versions. Though you may want to be aware of the following:</p>
<ul class="simple">
<li>Mangling of uniform types was changed to not include varying width, so now you
may use uniform structures and pointers to uniform types as return types in
export functions in multi-target compilation.</li>
</ul>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-7-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.7.0</h2>
<p>This release contains several changes that may affect compatibility with
older versions:</p>
<ul class="simple">
<li>The algorithm for selecting overloaded functions was extended to cover more
types of overloading, and handling of reference types was fixed. At the same
time the old scheme, which blindly used the function with "the best score"
summed for all arguments, was switched to the C++ approach, which requires
"the best score" for each argument. If the best function doesn't exist, a
warning is issued in this version. It will be turned into an error in the
next version. A simple example: Suppose we have two functions: max(int, int)
and max(unsigned int, unsigned int). The new rules lead to an error when
calling max(int, unsigned int), as the best choice is ambiguous.</li>
<li>Implicit cast of pointer to const type to void* was disallowed. Use explicit
cast if needed.</li>
<li>A bug which prevented "const" qualifiers from appearing in emitted .h files
was fixed. Consequently, "const" qualifiers now properly appearing in emitted
.h files may cause compile errors in pre-existing codes.</li>
<li>get_ProgramCount() was moved from stdlib to examples/util/util.isph file. You
need to include this file to be able to use this function.</li>
</ul>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-6-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.6.0</h2>
<p>This release adds support for <a class="reference internal" href="#operators-overloading">Operators Overloading</a>, so a word <tt class="docutils literal">operator</tt>
becomes a keyword and it potentially creates a conflict with existing user
function. Also a new library function packed_store_active2() was introduced,
which also may create a conflict with existing user functions.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-5-0">
<h2>Updating ISPC Programs For Changes In ISPC 1.5.0</h2>
<p>This release adds support for double precision floating point constants.
Double precision floating point constants are floating point number with
<tt class="docutils literal">d</tt> suffix and optional exponent part. Here are some examples: 3.14d,
31.4d-1, 1.d, 1.0d, 1d-2. Note that floating point number without suffix is
treated as single precision constant.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-3">
<h2>Updating ISPC Programs For Changes In ISPC 1.3</h2>
<p>This release adds a number of new iteration constructs, which in turn use
new reserved words: <tt class="docutils literal">unmasked</tt>, <tt class="docutils literal">foreach_unique</tt>, <tt class="docutils literal">foreach_active</tt>,
and <tt class="docutils literal">in</tt>. Any program that happens to have a variable or function with
one of these names must be modified to rename that symbol.</p>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-2">
<h2>Updating ISPC Programs For Changes In ISPC 1.2</h2>
<p>The following changes were made to the language syntax and semantics for
the <tt class="docutils literal">ispc</tt> 1.2 release:</p>
<ul class="simple">
<li>Syntax for the "launch" keyword has been cleaned up; it's now no longer
necessary to bracket the launched function call with angle brackets. (In
other words, now use <tt class="docutils literal">launch <span class="pre">foo();</span></tt>, rather than <tt class="docutils literal">launch < foo() >;</tt>.)</li>
<li>When using pointers, the pointed-to data type is now "uniform" by
default. Use the varying keyword to specify varying pointed-to types
when needed. (i.e. <tt class="docutils literal">float *ptr</tt> is a varying pointer to uniform float
data, whereas previously it was a varying pointer to varying float
values.) Use <tt class="docutils literal">varying float *</tt> to specify a varying pointer to varying
float data, and so forth.</li>
<li>The details of "uniform" and "varying" and how they interact with struct
types have been cleaned up. Now, when a struct type is declared, if the
struct elements don't have explicit "uniform" or "varying" qualifiers,
they are said to have "unbound" variability. When a struct type is
instantiated, any unbound variability elements inherit the variability of
the parent struct type. See <a class="reference internal" href="#struct-types">Struct Types</a> for more details.</li>
<li><tt class="docutils literal">ispc</tt> has a new language feature that makes it much easier to use the
efficient "(array of) structure of arrays" (AoSoA, or SoA) memory layout
of data. A new <tt class="docutils literal">soa<n></tt> qualifier can be applied to structure types to
specify an n-wide SoA version of the corresponding type. Array indexing
and pointer operations with arrays SoA types automatically handles the
two-stage indexing calculation to access the data. See <a class="reference internal" href="#structure-of-array-types">Structure of
Array Types</a> for more details.</li>
</ul>
</div>
<div class="section" id="updating-ispc-programs-for-changes-in-ispc-1-1">
<h2>Updating ISPC Programs For Changes In ISPC 1.1</h2>
<p>The major changes introduced in the 1.1 release of <tt class="docutils literal">ispc</tt> are first-class
support for pointers in the language and new parallel loop constructs.
Adding this functionality required a number of syntactic changes to the
language. These changes should generally lead to straightforward minor
modifications of existing <tt class="docutils literal">ispc</tt> programs.</p>
<p>These are the relevant changes to the language:</p>
<ul class="simple">
<li>The syntax for reference types has been changed to match C++'s syntax for
references and the <tt class="docutils literal">reference</tt> keyword has been removed. (A diagnostic
message is issued if <tt class="docutils literal">reference</tt> is used.)<ul>
<li>Declarations like <tt class="docutils literal">reference float foo</tt> should be changed to <tt class="docutils literal">float &foo</tt>.</li>
<li>Any array parameters in function declaration with a <tt class="docutils literal">reference</tt>
qualifier should just have <tt class="docutils literal">reference</tt> removed: <tt class="docutils literal">void foo(reference
float <span class="pre">bar[])</span></tt> can just be <tt class="docutils literal">void foo(float <span class="pre">bar[])</span></tt>.</li>
</ul>
</li>
<li>It is now a compile-time error to assign an entire array to another
array.</li>
<li>A number of standard library routines have been updated to take
pointer-typed parameters, rather than references or arrays an index
offsets, as appropriate. For example, the <tt class="docutils literal">atomic_add_global()</tt>
function previously took a reference to the variable to be updated
atomically but now takes a pointer. In a similar fashion,
<tt class="docutils literal">packed_store_active()</tt> takes a pointer to a <tt class="docutils literal">uniform unsigned int</tt>
as its first parameter rather than taking a <tt class="docutils literal">uniform unsigned int[]</tt> as
its first parameter and a <tt class="docutils literal">uniform int</tt> offset as its second parameter.</li>
<li>It is no longer legal to pass a varying lvalue to a function that takes a
reference parameter; references can only be to uniform lvalue types. In
this case, the function should be rewritten to take a varying pointer
parameter.</li>
<li>There are new iteration constructs for looping over computation domains,
<tt class="docutils literal">foreach</tt> and <tt class="docutils literal">foreach_tiled</tt>. In addition to being syntactically
cleaner than regular <tt class="docutils literal">for</tt> loops, these can provide performance
benefits in many cases when iterating over data and mapping it to program
instances. See the Section <a class="reference internal" href="#parallel-iteration-statements-foreach-and-foreach-tiled">Parallel Iteration Statements: "foreach" and
"foreach_tiled"</a> for more information about these.</li>
</ul>
</div>
</div>
<div class="section" id="getting-started-with-ispc">
<h1>Getting Started with ISPC</h1>
<div class="section" id="installing-ispc">
<h2>Installing ISPC</h2>
<p>The <a class="reference external" href="http://ispc.github.io/downloads.html">ispc downloads web page</a> has prebuilt executables for Windows*,
Linux* and macOS* available for download. Alternatively, you can
download the source code from that page and build it yourself; see the
<a class="reference external" href="http://github.com/ispc/ispc/wiki">ispc wiki</a> for instructions about building <tt class="docutils literal">ispc</tt> from source.</p>
<p>Once you have an executable for your system, copy it into a directory
that's in your <tt class="docutils literal">PATH</tt>. Congratulations--you've now installed <tt class="docutils literal">ispc</tt>.</p>
</div>
<div class="section" id="compiling-and-running-a-simple-ispc-program">
<h2>Compiling and Running a Simple ISPC Program</h2>
<p>The directory <tt class="docutils literal">examples/cpu/simple</tt> in the <tt class="docutils literal">ispc</tt> distribution includes a
simple example of how to use <tt class="docutils literal">ispc</tt> with a short C++ program. See the
file <tt class="docutils literal">simple.ispc</tt> in that directory (also reproduced here.)</p>
<pre class="literal-block">
export void simple(uniform float vin[], uniform float vout[],
uniform int count) {
foreach (index = 0 ... count) {
float v = vin[index];
if (v < 3.)
v = v * v;
else
v = sqrt(v);
vout[index] = v;
}
}
</pre>
<p>This program loops over an array of values in <tt class="docutils literal">vin</tt> and computes an
output value for each one. For each value in <tt class="docutils literal">vin</tt>, if its value is less
than three, the output is the value squared, otherwise it's the square root
of the value.</p>
<p>The first thing to notice in this program is the presence of the <tt class="docutils literal">export</tt>
keyword in the function definition; this indicates that the function should
be made available to be called from application code. The <tt class="docutils literal">uniform</tt>
qualifiers on the parameters to <tt class="docutils literal">simple</tt> indicate that the corresponding
variables are non-vector quantities--this concept is discussed in detail in the
<a class="reference internal" href="#uniform-and-varying-qualifiers">"uniform" and "varying" Qualifiers</a> section.</p>
<p>Each iteration of the <tt class="docutils literal">foreach</tt> loop works on a number of input values in
parallel--depending on the compilation target chosen, it may be 4, 8, 16, 32, or
even 64 elements of the <tt class="docutils literal">vin</tt> array, processed efficiently with the CPU's or
GPU's SIMD hardware. Here, the variable <tt class="docutils literal">index</tt> takes all values from 0 to
<tt class="docutils literal"><span class="pre">count-1</span></tt>. After the load from the array to the variable <tt class="docutils literal">v</tt>, the
program can then proceed, doing computation and control flow based on the
values loaded. The result from the running program instances is written to
the <tt class="docutils literal">vout</tt> array before the next iteration of the <tt class="docutils literal">foreach</tt> loop runs.</p>
<p>To build and run examples, go to the <tt class="docutils literal">examples</tt> directory and create a <tt class="docutils literal">build</tt> folder.
Run <tt class="docutils literal">cmake <span class="pre">-DISPC_EXECUTABLE=<path_to_ispc_binary></span> ../</tt>. On Linux* and
macOS*, the makefile will be generated in that directory. On Windows*,
Microsoft Visual Studio solution <tt class="docutils literal">ispc_examples.sln</tt> will be created. In
either case, build it now! We'll walk through the details of the compilation
steps in the following section, <a class="reference internal" href="#using-the-ispc-compiler">Using The ISPC Compiler</a>.) In addition to
compiling the <tt class="docutils literal">ispc</tt> program, in this case the <tt class="docutils literal">ispc</tt> compiler also
generates a small header file, <tt class="docutils literal">simple.h</tt>. This header file includes the
declaration for the C-callable function that the above <tt class="docutils literal">ispc</tt> program is
compiled to. The relevant parts of this file are:</p>
<pre class="literal-block">
#ifdef __cplusplus
extern "C" {
#endif // __cplusplus
extern void simple(float vin[], float vout[], int32_t count);
#ifdef __cplusplus
}
#endif // __cplusplus
</pre>
<p>It's not mandatory to <tt class="docutils literal">#include</tt> the generated header file in your C/C++
code (you can alternatively use a manually-written <tt class="docutils literal">extern</tt> declaration
of the <tt class="docutils literal">ispc</tt> functions you use), but it's a helpful check to ensure that
the function signatures are as expected on both sides.</p>
<p>Here is the main program, <tt class="docutils literal">simple.cpp</tt>, which calls the <tt class="docutils literal">ispc</tt> function
above.</p>
<pre class="literal-block">
#include <stdio.h>
#include "simple.h"
int main() {
float vin[16], vout[16];
for (int i = 0; i < 16; ++i)
vin[i] = i;
simple(vin, vout, 16);
for (int i = 0; i < 16; ++i)
printf("%d: simple(%f) = %f\n", i, vin[i], vout[i]);
}
</pre>
<p>Note that the call to the <tt class="docutils literal">ispc</tt> function in the middle of <tt class="docutils literal">main()</tt> is
a regular function call. (And it has the same overhead as a C/C++ function