February 23, 2007
This page consists of my work towards fullfilling my degree of Masters of Science in Computer Science.
With the assistance and direction of Dr. Scott Spetka, I will be creating a computational grid system in
the test bay. The system will be using PCs with Pentium 4 processors and a Linux operating system.
Eventually the Globus Toolkit will be loaded on eight machines whereas the Condor batch processing system will be
loaded on three.
In this page one will find my paper and a list of tasks that I need to complete. These are provided for individuals
who would like to follow the progress and who are also interested in grid computing. The links below will direct the
the person to my paper and task list. The paper and task listing are both in Microsoft Word format. In addition, I have
provided links to the Globus Alliance and the Condor research group.
Once I begin working on the grid in the test bed, I would like to photograph the area so that readers can actually experience
the building of it. Currently the machines are in disarray and scattered. Eventually I would like to change that
and have it look and act like a fully functioning grid.
February 15, 2007
I have been trying to install Globus and Condor on the lab machines in CS102 and CS107. It is necessary for me to modify the
host tables on those machines. I will check with Dr. Spetka and Nick Meranty about changing them. The loop back on the host tables
are not correct formating so that Globus will recognize the itself or other machines.
March 23, 2007
I am being allowed to use 6 Dell PC to set up the SUNYIT Grid and to write my thesis. These machines are in the test bay and behind the SUNYIT gateway. Each of the
machines will have Fedora Core 6 installed on them and are running Intel processors. The Fedora Core 6 will give a reliable free operating system to use with the Globus Toolkit and
Condor batch processing applications which are UNIX based. I should be careful to say that because they do have versions that will run
on Window machines. Because we are using Linux versus UNIX, there might be some porting issues to contend with.
March 30, 2007
The Operating systems are now all installed and I tried to configure their hosts tables. With the assistance of Nick Merante I was
able to "ping" each machine from another machine. In addition, I was able to access the Internet from each machine. By doing this
I was able to validate that each machine could talk to another machine since communication will play a major role in this thesis.
What has to be installed and configured:
It has taken some time but I have learned the art of tunneling. I have set up my home computer to transfer downloads to the grid
machines. This allows me the opportunity to work from home. All of the software that was installed on the grid machines, except the
operating system, was down loaded at home and then transferred to a grid machine.
The grid machines IP adresses are as follows:
They all reside behind the SUNYIT gateway which is gw.cs.sunyit.edu (150.156.195.10). A person trying to access the grid from the outside needs an account to log into the gateway.
The setting up of this network was extremley rewarding. It gave me first hand experience in setting up my own network cluster. I was able to experiment and play with host tables and machines. There were trouble areas such as loopback addressing, naming and configuring. Once those hurdles were overcome, then I was able to watch this network talk to one another.
April 16, 2007
I have been installing MPICH2 on the machines and testing them to ensure proper communication exists. Currently ruth, mantle, dimagio, gehrig, stengal and jeter
all have MPICH2. I am able to boot all the mpd processes from ruth and shut them down using mpdallexit.
May 11, 2007
Most of the machines are now installed. Globus is on ruth, mantle, gehrig and stengal. Condor is installed on stengal, dimagio
and jeter. The interface between Globus and Condor needs to be worked out. I do not see a job being submitted from Condor to Globus or Globus
to Condor.
May 16, 2007
I have been working overtime at my current job. This will continue until final acceptance testing is over. Unfortunately
progress will slip. But I must continue to do well at my job so as to pay my mortgage and afford the opportunity to eat.
June 22, 2007
I will be leaving for Hawaii in a week. Work has slowed down do to work work (over time) and planning for the trip. Some things
I will need to do once I get back:
July 21, 2007
After catching up on my sleep from my trip, I was able to fix the Condor conpiler problem I was witnessing. I removed the old compiler and
replaced it with a newer one. At first I was told on the Condor mail thread that an older gcc compiler was needed in order to use Condor_Compile.
I later found out that was not the case. I removed the install and went back to the version of gcc that came with Fredore Core 6.
MPICH was installed on stengal again. MPICH2 was tested and verified that all the machines were working. Several samples that came with the
distribution were used. I am confident that MPI is installed on all of the machines and Condor and Globuse should be able to find them. The
key to Condor and Globus finding them is in the environment. Once must make sure the environment is set correctly or Condor and Globis will not
find the MPI distribution or each other.
In addition, I re-installed Globus and Condor with the newer Globus 4.0.5 and Condor 6.8.5. During installation I noticed their configuration scripts
did find the correct environmental variables. Not just each others but MPICH2 too. I might have to re-vist ruth, mantle and gehrig to ensure they
too can see MPICH2. If not I believe the MPI job would fail.
August 2, 2007
SUNYIT has shut the power off for upgrades to the labs. Unfortunately they did not do a graceful shutdown. Only two of the six machines would
boot. I went to the lab to restart all the machines and routers that were needed. Work can now proceed.
August 10, 2007
I was able to complete the connection between the Globus and Condor interfaces to allow jobs to be sent from Globus to Condor or
Condor to Globus. The way the job is formulated in Globus is very important. Globus converts the XML job into language that
Condor can understand. This is performed through a Perl script. I found on the Internet that another student had a similar
problem to me. He modified the Condor.pl script so that "should_transfer_files" and "when_to_transfer_output" was generated when Globus
generated the condor job. Remember, these machines are not NST so all the executables have to transfered to the execution
machine by gridftp.
August 14, 2007
Progress continues to be made. I maybe slow progress, however it is progress nonetheless!
I am happy to report that the system is in place and stress testing will start shortly.
August 19, 2007
Dr. Ralph Butler of Middle State Tenessee University has been guiding me in Phython script changes that are necessary for comunication between globusrun-ws and
mpiexec. Currently I am seeing where the script is looking for a 'PATH' string which the Toolkit is not providing.
August 20, 2007
Restarted the grid after a strong thunderstorm on August 16, 2007. Actually, it was a good thing to go to the
lab because I met with Scott and we discussed the grid and we're how to continue with its design.
I added another machine, rizzuto 192.168.0.80 to the grid. Currently there is nothing on the machine but perhaps I will
add Condor and make it an execute submit machine.
It was discussed whether the gateway should be aliased or another NIC card installed. Matt Haas is a proponent for
the installation of the NIC card and Scott believes an aliasing would be sufficient. In addition, Matt advises it will be
necessary to modify our host table once the VPN, NIC alias problem is resolved. The IP address 192.168."0" causes havick with
Cornings system.
August 21, 2007
I was able to submit a mpi job through globus. There are two different typs of MPI jobs. There is the MPICH_G2 and MPI. I don't know the
difference of the two completely, yet, however I do know that MPICH-G2 is compiled with a makefile header which links globus libraries within
the executable. Whereas the plain MPI executable only takes the libraries from MPI. Since I used the globus toolkit 4.0.5 on stengal, jeter and
dimagio it is necessary to update mantle ruth and gehrig in order to have the same linking globus libraries. Will this have to be the same throughout
a heterogenous grid? I don't know but it would seem like a problem if the globus linking libraries differ in such a manner. This means that I will
have to re-install the machines.
Received an e-mail from Scott. He is trying to get a server to put BITS stuff on. Most importantly and I had forgot about is configuration files
and backups. I will write up several papers on how to configure Globus, Postgres and Condor. A lot of the configuration comes from the manuals on
line, but I might be able to provide some quick references.
I will also create some links to pages that have my tests on to demonstrate Globus and Condor. These will be valuable to individuals who want to
play with these tools or demonstrate Globus and Condor to students.
August 26, 2007
I tried to access the grid today but the storm last night must have knocked them out. Stengal was the only machine that was up and running.
I will stop by tomorrow to re-start the machines and test my changes. IF the test is successful, then the last remaining test is "Submit a RSL job from Globus to Condor (STANDARD, VANILLA, JAVA, PARALLEL)". I know the standard, vanilla and parallel work fine. Need to validate the Java job submittal. Success is copying!
August 26, 2007
Re-started the grid tonight. Re-ran tests and put them on this web page. Added logging and e-mail output from Condor Job Manager.
Writing the test to allow a job to be created from the Globus Toolkit and ran on a Condor machine. Started to put
information on the web page that will eventually be brought into the paper.
August 27, 2007
Updating the web page with history of HTC and background of Condor.
August 28, 2007
Another night of writing. Updated my Condor implementation web page. The information provided are several. Once it is giving background information on the
architecture, system functionality and history of HTC. After reading the material I found I was able to reinforce the history and what HTC was all about.
Met Kurt today he will be helping us with BITS. He was able to log on remotely. Kurt advised that he might be able to look at the ssh problem of ruth and
fix it.
Condor and Globus Toolkits Test Scripts and Configuration Files |
My Current Work |
My Research Links |
Copyright © 2007, Jeffrey Wells
Revised --August 26, 2007
URL: Jeff Wells Web Page