PS3 Programming: libspe vs. libspe2 with Multi-Threaded Hello World in C

July 6th, 2007 by Ozzy

This little guide covers a multi-threaded Hello World Tutorial for the Cell BE found in the Playstation 3. First we’ll step over the
code using the deprecated libspe 1.2 and the new libspe 2.1 and finally look at the output we get from both examples.


Ok, let’s take a look a the SPU Part.

This is the SPU Code for the Hello World Program:

#include <stdio.h> int main(unsigned long long speid, unsigned long long argp, unsigned long long envp) { printf("Hello World!\n"); return 0; }

Not much going on here. The main function takes three arguments of type unsigned long long.
The first one is the SPE ID, which we’ll pass to the SPU within the PPU Code. The second is an argument pointer.
Usually, you would use this as an address to DMA data from main-memory. Since we don’t need that, we’ll pass NULL for
this as well as the last argument which is an environment pointer. The only thing the main function does is print “Hello World”
to the console and exit.

Now for the hard(not) part.

The PPU Code with libspe:

#include <stdio.h> #include <libspe.h> #include <sys/wait.h> #include <time.h> #define SPU_THREADS 100 extern spe_program_handle_t hello_spu; int main (int argc, char **argv) { clock_t start_time, end_time; start_time = clock(); speid_t speid[SPU_THREADS]; int status[SPU_THREADS]; int i; for (i=0;i<SPU_THREADS;i++) speid[i] = spe_create_thread (0, &hello_spu, NULL, NULL, -1, 0); for (i=0;i<SPU_THREADS;i++) { spe_wait(speid[i], &status[i], 0); printf ("status = %d\n", WEXITSTATUS(status[i])); } end_time = clock(); printf("Total seconds elapsed %.2f\n", (float)(end_time - start_time) / (float)CLOCKS_PER_SEC); return 0; }

Again, nothing here to be afraid of. We include the libspe header (libspe.h) and the standard time headerfile (time.h) for the time measurment. Next we define how much threads we want to create, which is 100 in this case.
The extern declaration says that we have already included our SPU Program and we want to reference it as hello_spu. We include this by using the embedspu tool. Now to the main.
We get our start time with the first two lines. Next we create placeholders for the SPE IDs and for the status which each of the spu threads will return. The first for-loop creates the actual threads. The first argmuent represents the thread group identifier, 0 is the default group. Second comes the pointer to the actual spu program. The third and fourth argument represent the argument and environment pointers of the spu code main function. Next comes the processor affinity mask, which tells the program what particular SPE we want to use, or not. -1 is the code for any available. The last argument represents the flags we want to use. Since we don’t need any, we say 0 to this.
The next for-loop waits for the threads to finish and prints their corresponding exit status to the console. spe_wait blocks, until the thread with the thread id we passed to this function exits. And finally we print the time elapsed in seconds.

Next up, we have the libspe2 version of our Hello World masterpiece. Don’t be afraid of the code length. The part were we actually do something is pretty much the same as in the libspe example.

The PPU Code with libspe2 and using posix threads:

#include <stdio.h> #include <stdlib.h> #include <libspe2.h> #include <pthread.h> #include <time.h> #define SPU_THREADS 100 extern spe_program_handle_t hello_spu; typedef struct { spe_context_ptr_t spe; unsigned long long *args; } thread_arg_t; void *run_hello_spu(void *thread_arg) { int ret; thread_arg_t *arg = (thread_arg_t *) thread_arg; unsigned int entry; entry = SPE_DEFAULT_ENTRY; spe_stop_info_t stop_info; ret = spe_context_run(arg->spe, &entry, 0, arg->args, NULL, &stop_info); if (ret < 0) { perror("spe_context_run"); return NULL; } printf("status = %d\n", spe_stop_info_read(arg->spe, &stop_info)); return NULL; } int main(int argc, char **argv) { clock_t start_time, end_time; start_time = clock(); int i; int ret; spe_context_ptr_t spe[SPU_THREADS]; pthread_t thread[SPU_THREADS]; thread_arg_t arg[SPU_THREADS]; for (i = 0; i < SPU_THREADS; i++) { spe[i] = spe_context_create(SPE_EVENTS_ENABLE, NULL); if (!spe[i]) { perror("spe_context_create"); exit(1); } ret = spe_program_load(spe[i], &hello_spu); if (ret) { perror("spe_program_load"); exit(1); } arg[i].spe = spe[i]; arg[i].args = NULL; ret = pthread_create(&thread[i], NULL, run_hello_spu, &arg[i]); if (ret) { perror("pthread_create"); exit(1); } } for (i = 0; i < SPU_THREADS; i++) { pthread_join(thread[i], NULL); ret = spe_context_destroy(spe[i]); if (ret) { perror("spe_context_destroy"); exit(1); } } end_time = clock(); printf("Total seconds elapsed %.2f\n", (float)(end_time - start_time) / (float)CLOCKS_PER_SEC); return 0; }

Instead of libspe.h we use libspe2.h this time and we include pthread.h since we are going to use posix threads for this.

If you are new to posix threads, I recommend reading Daniel Robbins’ POSIX threads explained articles on IBM’s developerWorks, they cover pretty much everything you’ll propably ever need to know about pthreads (Part One, Part Two, Part Three).

Let’s jump right into the main function. After the usual time and variable setup, we have our placeholders.
Remember the thread id construct we used in the old libspe example? Well, it’s gone. The replacement is called SPE context.
Next we define the pthreads to hold the spe threads we will create, and finally the arguments, which we want to pass along to the SPU(the SPE context and the argument pointer), which have defined in a struct called thread_arg_t.
The first for-loop creates the contexts for each thread. The spe_context_create function takes two arguments: flags and a spe gang context pointer, which is NULL. We use SPE_EVENTS_ENABLE as a flag in this case because we want to be able read the status of each thread after it has executed. Then we load our embedded SPU program into each context, set up the thread arguments and create a posix thread for each thread we set up. Now let’s jump up to the posix thread main (run_hello_spu).

We define a spe_stop_info_t to hold the status of each thread after it has executed. The spe_context_run is similiar to the spe_create_thread we already used.
The first argument is the context pointer, followed by the entry point of the SPU instruction pointer (default in this case). The third parameter represents some runflags, 0 this. Next up the argument and environment pointer for the spu main, NULL in this example and finally a pointer to a spe stop info structure.
spe_context_run blocks until the thread has exited. This is when we’ll read the status of the spu using spe_stop_info_read and print it out.

Finally, back in main, we wait for the threads to exit, destroy the contexts and print the time elapsed in seconds.

Now let’s take a look at some example output. I used IBM’s System Simulator which is included in the CellSDK, but of course you can run this example on your PS3 too, if you like.

If everything turns out to be OK, we should see 100 Hello World’s mixed with 100 status = 0’s.

Ouput for libspe code:
SystemSim Output for libspe example

Output for libspe2 code:
SystemSim Output for libspe2 example

Something seems to be wrong with the first example, doesn’t it? Well, no, actually that’s what I wanted you to see.
The deprecated libspe version is asynchronous. That means, if you try to create more threads than you have SPEs available(6 on your PS3), the program tries to create 100 threads, but fails sometimes, because the SPEs are busy doing something else. That doesn’t happen with libspe2 and posix threads. They’ll block and queue the incoming thread, until the requested resources are free again without the overhead you would get from trying this with the old libspe.

Questions?

12 Responses

  1. Santhosh Kumar . R Wrote:

    very useful guide to know the differences between libspe1 and libspe2.

    I have one doubt with respect to libspe2:
    How can I pass multiple arguments to the spu program from ppu?

    How can I retrieve them in the spu program?

    please explain with example
    *** Using libspe2 only*****

  2. Ozzy Wrote:

    Thanks, will do.

    The main idea is that you can pass the adresses of the arguments you would like the SPEs to have to the DMA controller.

    I’ll write write a couple of lines on this matter and post a new article and maybe do a series of articles on DMA Transfers, if someone would like to see that.

  3. Santhosh Kumar . R Wrote:

    Thanks for your reply .

    I am very much interested in this. so please send me the article ASAP.

    And I tried with structure, where I could pass multiple arguments in a bundle. I am passing the address of structure. But while retrieving them i am facing some problem.
    saying spe_context_run: Bad address. can u help me out?

  4. EnneKappa Wrote:

    Great! I read this http://ww2.cs.fsu.edu/~west/cell/index.php?tutorial=start but there are some problems with libspe.h
    Thanks very much!

  5. Jimc Wrote:

    Ozzy,

    Thanks for the turorial. I am trying to figure out if it possible to (and to we typically want to) specify which of the 6 SPEs performs a particular task? If most of the code is run on the PPU and the SPEs get used for occassional big jobs that are easily parellized,I assume to strategy is to break the big job up into 6 equal chunks and send it to the SPU as 6 threads. Is it any more complicated than this?

  6. Ozzy Wrote:

    Hi Jimc.

    it depends on what you’re trying to solve. Not every problem or algorithm for that matter is parallelizable, but you can always assign each SPU it’s own unique thread(e.g. speaking in game development terms: one does collision detection, one does audio processing, one does A.I., … )

    The PPU is NOT where most of your code should run. The ideal use for the PPU is as a control unit that dispatches and gathers the SPU threads.
    You can always run another thread on the PPU, if you feel like it isn’t stretched enough, but the PPU is not made for intense computing sessions and will become a bottleneck for the whole application.

    Don’t take the SPUs as optional.
    The SPUs are the power horses of the Cell BE, use them wherever possible.

  7. Santhosh Kumar . R Wrote:

    Hi Ozzy,

    Thanks for the information.
    I have some problem with mounting spufs on cell-linux.

    Actually I am doing porting of Game OS library test cases to cell-linux. In cell-linux I need to mount the spufs on /spu. The problem is that, When I reboot the PlayStation3, the mounting of spufs is not happening at the bootup, I need to do it manually after login by using mount -a (which takes all the entries from /etc/fstab). How do I make spufs to mount at the boot-up time?
    Please suggest me the solution

  8. Ozzy Wrote:

    Hi Santhosh,

    Check your /etc/fstab, it should have an entry like this:

    none /spu spufs gid=spu 0 0

    Although, I guess it already does, if you can do a mount -a.

    My next suggestion would be to check your log files for errors or warnings. If you can’t find anything there either, you can always add the full mount command (not mount -a) to your rc.local file.

  9. Santhosh Kumar . R Wrote:

    Hi Ozzy,

    Thanks for your guidelines,
    Right now I have an entry as shown below in /etc/fstab

    spufs /spu spufs defaults 0 0

    and I am getting an error message at the reboot time that block device /spu cannot be unmounted.

    when we install fedora 7 freshly into other PS3 the content of /etc/fstab has an entry as like you mentioned.

    When I locate for spufs the output is as follows,
    /lib/modules/2.6.21-1.3194.fc7/kernel/arch/powerpc/platforms/cell/spufs
    /lib/modules/2.6.21-1.3194.fc7/kernel/arch/powerpc/platforms/cell/spufs/spufs.ko
    /usr/share/man/man2/spufs.2.gz

    1.Can You suggest me the full mount command to be placed at the rc.local file.

    2. May I know the reason why I am getting that umount error.

  10. Ozzy Wrote:

    1. “mount /spu” should do the trick

    2. Sorry, I have no idea, but you should check the log-files in your /var/log/ folder

  11. Dusmant Kumar Sahoo Wrote:

    Hi,

    I am new to CBE.
    I want to access ppu-variable from multiple spu threads and change value of ppu-variable .
    Please explain me.

    Thanks
    Dusmant

  12. Kylie Batt1 Wrote:

    а вот тут реально классные есть…

    http://rel” rel=”nofollow”> Главный (заместитель главного) бухгалтер This little guide covers a multi-threaded Hello World Tutorial for the Cell BE found in the Playstation 3…..

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.


Anti-Spam Image

I thought what I'd do was, I'd pretend I was one of those deaf-mutes. That way I wouldn't have to have any goddam stupid useless conversations with anybody. If anybody wanted to tell me something, they'd have to write it on a piece of paper and shove it over to me. They'd get bored as hell doing that after a while, and then I'd be through with having conversations for the rest of my life.