For the Xeon Phi, there exists extra pragmas telling the Phi to offload. If you’re familiar with using parallel constructs from MATLAB/Julia like parallel loops, OpenMP’s pragmas are pretty much the C version of that. Pragmas are a type of syntax from OpenMP where one specifies segments of the code to be parallelized. Thus while this is really good for C code, it’s not as easy to use when you wish to control the Phi from the computer itself as a “side job”. However, you need to copy the files (and libraries) over the Phi via ssh and run the job from there. You can also compile code to natively execute on the Xeon Phi. You compile Julia with MKL and and setup a few environment variables and it will do it automatically. Sure you could hack it to be matrix multiplication by a sparse diagonal matrix, but these types of hacks really only tend to give you speedups when your vectors are large since you still incur the costs of transferring the arrays every time. Also, one major downside is that it does not apply to vectorized arithmetic such as. parabolic PDE), this adds a large overhead since you’ll be sending that data back and forth every multiplication. If you are repeatedly using the same matrices, like in solving an evolution equation (i.e. First of all, there is no data persistence. However, automatic offloading is a mixed blessing. Details for setting up automatic offload are given by MATLAB. This means that if you are doing lots of linear algebra on large matrices, standard operations from BLAS and Linpack like matrix multiplication * will automatically be done on the acceleration card. First, let’s talk about automatic offloadingĪutomatic offloading allows you to offload all of your MKL-calls to the Xeon Phi automatically. I am going to detail some of my advances in interfacing with the Xeon Phi via Julia.
#Lab master supermic full#
These details tell us that for high-performance computing using Xeon Phi’s to their full potential is the way forward. For this reason many major HPCs such as Stampede and SuperMIC have been incorporating a Xeon Phi into every compute node. Lastly, since the Xeon Phi uses X86 cores which one interfaces with via standard tools such as OpenMP and MPI, high performance parallel codes naturally transfer over to the Xeon Phi with little work (if you’ve already parallelized your code). Intel has also been saying that this next platform will be much more user friendly and have improved bandwidth to allow for quicker offloading of data. For one, Intel will be releasing its next version Knights Landing in Q3 which promises up to 8 teraflops and 16 GB of RAM. However, there are a few big reasons why I think our interest in the Xeon Phi should be renewed. Also, making out at over a taraflop is good, but not quite as high as NVIDIA’s GPU acceleration cards. For one, the installation process itself is quite tricky, and the device has stringent requirements for motherboard choices. It’s an instant cluster in your computer, right? It turns out it’s not quite that easy.
#Lab master supermic update#
There may be a future update where some of these functions are specified in Julia, and Intel’s compilertools.jl looks like a viable solution, but for now it’s not possible.)
#Lab master supermic how to#
(Disclaimer: This is not a full-Julia solution for using the Phi, and instead is a tutorial on how to link OpenMP/C code for the Xeon Phi to Julia.