Wednesday, May 9, 2007

Apple-Xgrid邮件列表上关于环境变量的讨论

I am hoping someone can point me in the right direction. Recently I set up a cluster of 6 XServe nodes. I am trying to perform a series of Monte Carlo simulations on the cluster, submitting the jobs via Xgrid (and am rather new at using Xgrid). The code requires certain user defined environment variables to be set at run time. I actually set these manually within the /etc/profile and etc/bashrc on each node. The executable I am trying to run was compiled with g++ 3.3 and takes a series of values at the command line as input. Every time I submit the code throws an exception saying that an environment variable is not set. I am at a loss of what to do.
 
Recently I began using GridStuffer for job submission. In an attempt to bypass the environment variable problem I wrote a simple shell script which first sets the variable and then calls the program with the command line input. Here it is:
 
----------------------
#!/bin/sh
 
export $G4LEDATA=/usr/local/Geant4.7.0/DataFiles/G4EMLOW2.3
 
echo $G4LEDATA
 
./proton true true false ICRU-49p false false monoenergetic pencil 400 400.0 154.0 150.0 0.0 30.0 20.0 water water 1.0 1.0 1000
----------------------
 
The name of the script is then listed in the first and only line of a text file that I use as input to GridStuffer.  I do indeed get the variable $G4LEDATA send to stdout, but the program will not run beyond a certain point. There is nothing written to stderr. If I don't set $G4LEDATA, the exception I mentioned above is sent to stderr.
 
Another thing I tried was to copy the executable (proton) to each node, all in the same directory. Then in the script I have /directory/proton true true .....
instead of ./proton true true.... However, as far as I can tell the program never executes.
 
My apologies for being rather verbose, but I am really stuck at this point. I openly acknowledge my lack of knowledge with Xgrid and GridStuffer, and think that the problem is in my not fully understanding how either really works. I have completely turned off the all password authentification between the controller and agents since the cluster is not online (completely stand-alone). A couple of questions :
 
(1) How does the controller log into the agent to transfer files? I am assuming it is as a generic user. Shouldn't the user have full access to environment variables defined in /etc/profile? 
(2) With GridStuffer, is there a better way than what I am doing to submit the job? For instance, using the directives -dirs and -files to force certain files to be copied to the agent?
 
Any help will be greatly appreciated.
 
P.S. Charles, I hope you see this because your help would be very beneficial..
 
Thanks,
 
Dan
 
--------
 
 
Dr. Dan J. Fry
Physicist
Henry M. Jackson Foundation For The Advancement of Military Medicine
Walter Reed Army Medical Center
6900 Georgia Avenue, NW
Washington, DC 20307

I did some testing with environment variables to make sure, but it seems quite certain that Xgrid will not load environment variables, or more precisely the shell won't, even if explicitely called using / bin/sh somewhere in the text. This is not too surprising.

The shell script approach you propose should work better for this purpose. If you really need to set up the environment from the information specific for the agent, you might alternatively read /etc/ profile manually to load the env var there in your script (not sure how to do that _exactly_).

Now, it seems your program still won't run "beyond a certain point" within the script. What exactly happens then? Your program only need to have $G4LEDATA? Or is another env var missing? Look for messages in the agent Console. Anyway, I would definitely go ahead with the script wrapping approach and iron out the other problems then, which might be different.

Regarding the GridStuffer format, you only need -files to explicitely force the addition of a file to the job (and you probably don't need - dirs). If 'proton' is in the same path as the input file, and if you don't need any other files to run the program, then you are fine, no - files needed, GridStuffer will figure it out. If you can have everything set up on the agent, even better, then use only full paths for the program and files in the job submission.

Finally, Xgrid agent will usually run as user 'nobody' (unless you are using Kerberos auth or you manually start the agent as a different user).

hope that helps,

charles

0 Comments:

Post a Comment

Subscribe to Post Comments [Atom]

<< Home