Author Topic: Linux Clusters/Parallel Computing  (Read 899 times)

voidmain

  • VIP
  • Member
  • ***
  • Posts: 5,605
  • Kudos: 184
    • http://voidmain.is-a-geek.net/
Linux Clusters/Parallel Computing
« on: 31 January 2002, 11:32 »
Anyone into programming and interested in Linux clusters?  I just set up PVM (Parallel Virtual Machine) on four of my Linux machines and compiled and ran some of the example programs.  It's pretty slick stuff.  Just curious if anyone had any experience with Linux clusters and might have some advice. It appears that PVM may be on the way out the door for MPI and other technology...
Someone please remove this account. Thanks...

runkpock

  • Guest
Linux Clusters/Parallel Computing
« Reply #1 on: 31 January 2002, 17:30 »
I am going to do the same thing with a bunch of
SGI boxes I have. Can you point me to some good
docs?

voidmain

  • VIP
  • Member
  • ***
  • Posts: 5,605
  • Kudos: 184
    • http://voidmain.is-a-geek.net/
Linux Clusters/Parallel Computing
« Reply #2 on: 31 January 2002, 21:23 »
That's a good question. I just picked up the interest two days ago although I've been reading about some of the worlds fastest computers and how they are using Linux clusters because they are so inexpensive to build.  Then just did a couple of google searches on "Linux Cluster HOWTO" and noticed two technologies that popped out, being "PVM" and "MPI" both of which are freely available.  

Then I noticed that the "pvm" and "xpvm" RPMs are on the RedHat 7.2 CD (I think they've been included for several releases).  Installed the RPMs two nights ago but didn't have much luck getting them running as the documents I did find on google seemed to lack a basic understanding of how the master spawned and communicated with PVM on the nodes.  Last night I figured that really easy part out. If you start PVM on any one of your machines it will automatically start up "pvmd" on the local machine and then try and start "pvmd" on all the other nodes in your pvm hosts file using "rsh" (I set an environment variable to make it use ssh instead). First set up ssh to be able to log into all of the other nodes without requiring a password.  

Another key thing is to have all of your nodes host names you include in your PVM hosts file in your local /etc/hosts file (appears having their names in DNS is not enough).  After getting this far it should spawn "pvmd" on all nodes automatically if they are not already running.

Of course you have to have the PVM RPM installed on all nodes, you need your compiler installed etc.  Then there are some example programs in /usr/share/pvm3/examples. Read the "Readme" file in that directory on how to compile them.  The binaries will go to "/usr/share/pvm3/bin/LINUX" and they have to be in that directory on every node.  

I didn't set it up this way but I think most large scale Linux clusters would have the "code" directors NFS mounted so you wouldn't have to distribute everything.  I just did a little shell script with a "for machine in mach1 mach2 mach3 mach4" loop to compile the programs on every machine using "ssh" and takes a parameter of the program to be compiled. Again, I'm sure this is not the optimal method.

Apparently O'Reilly has a couple of good books on clustering that I should probably get.

Here is another piece of software I would like to try that I saw some Universities using and I think you can download and use for no cost:

http://www.openpbs.org

and the commercial version:

http://www.pbspro.com

Also, the "xpvm" will graphically show your nodes and how they are communicating with each other and graph the jobs/load over time, etc...

Let me know what you come up with...

[ January 31, 2002: Message edited by: VoidMain ]

Someone please remove this account. Thanks...

runkpock

  • Guest
Linux Clusters/Parallel Computing
« Reply #3 on: 2 February 2002, 06:24 »
Do you think NFS is really the best NetFilesystem
clusters? I would really like to try out coda or
something else, but coda is evil and refuses to
work correctly. Have you any luck with anything
geared for distributed systems? It seems that this
topic is somewhat of a mystery. I will try to set
up my SGI boxes as a cluster, perhaps I'll give
shells to interested people. Bit of a project me
thinks though. Perhaps a couple weeks away.

voidmain

  • VIP
  • Member
  • ***
  • Posts: 5,605
  • Kudos: 184
    • http://voidmain.is-a-geek.net/
Linux Clusters/Parallel Computing
« Reply #4 on: 2 February 2002, 07:45 »
I don't have any real world experience or training with Super Computer clusters. However, I believe that NFS is probably more than sufficient for most applications to be run on Super Computers.  The reason for building a cluster is to harness the power of many CPUs to solve complex problems, not for disk I/0.  In fact many applications might only read/write a very small amount of data.  The NFS would be for nothing more than distributing the code. And all of the results are collected on the master node anyhow so as long as that node has local disk it shouldn't matter anyway.  

The nodes do not communicate communicate to each other through NFS for processing but through their own daemon and network sockets. But I have no idea what I am talking about, just guessing, so....  I wish I knew someone with some experience.  I'll just keep learning from info on the net I guess.
Someone please remove this account. Thanks...