星期二, 1月 08, 2008
我的第一本書--Linux作業系統之奧義
這本書其實是在偶然的機會下產生的, 原本只是覺得目前市面上沒有Linux的書適合讓基礎的讀者了解到底何謂Linux, 大部份都是在談一些Server...設定...架站...X Window等等非觀念的書, 所以才會興起出版一本我個人認為比較像是在介紹Linux為何物的觀念書, 當然也沒想到很多讀者在看過這本書後, 都給予我正面的肯定, 感動中~~~~
但對讀者比較抱歉的是, 因為是第一次出版, 所以在沒有經驗的情況下, 第一版的書有比較多的錯誤需要修正, 出版社也以最快的速度修正完畢, 只是自己對已經買到書的讀者感到抱歉, 不過在悅知的網站上已經放有堪誤表供讀者下載, 希望可以解決很多讀者的問題, 也感謝很多讀者來信告訴我錯誤的地方.
在看這本書的過程中, 相信有很多讀者會有很多的問題, 因為這本書當初的設計和大部份作業系統的書很不一樣, 雖然沒有很多頁, 但其實提到的東西非常多, 也是我自己累積下來的一些觀念, 所以如果有無法理解的部份(書中儘量以簡單的方式闡述), 也歡迎直接透過這篇文章的回覆來詢問, 或是用我書中的email和我聯絡都好(只是這樣別的讀者就無法看到您當初遇到的問題), 我會儘快回覆給每一個讀者.
-------------------------------------------------------------------------------
在2008底找到的一些讀者相關建議, 這些不論好壞反正我找得到的就都貼出來(之前一些文章好像已經不見了), 目的只是希望如果有人想參考或是我自己需要再回頭看一下當初大家的意見, 要是有人希望我將連結刪除, 就請mail給我, 謝謝.
http://www.wretch.cc/blog/sclin0323/25986650
http://www.dbanotes.net/review/linux_hardware.html
http://bbs.phpchina.com/thread-61806-1-3.html
http://www.hiadmin.com/%E4%B9%A6%E8%AF%84-linux%E6%93%8D%E4%BD%9C%E7%B3%BB%E7%BB%9F%E4%B9%8B%E5%A5%A5%E7%A7%98/
http://www.anobii.com/books/Linux%E4%BD%9C%E6%A5%AD%E7%B3%BB%E7%B5%B1%E4%B9%8B%E5%A5%A7%E7%BE%A9/9789866761065/017b696892e1d8208b/
http://dango-akachan.appspot.com/?p=40002
http://www.seo-space.net/blog/129-Linux-Book-System-Directory.html
My all Reference documents
在2007/09/12 於悅知網站所發表的"電源管理—Linux下的ACPI"
大家都知道開機與關機,但這一兩個動作所牽涉到的層面其實非常廣,從CPU的電源要吃多少、關機要關到哪一個程度、關機後的動作為何,一直到要如何回覆到原本的狀態,都在ACPI的規範中。舉一個最簡單的例子,在Windows95或Linux 2.4 kernel之前,是不是一按系統的Power Button就會直接斷電,但在Windows98及Linux 2.6 kernel之後的版本,就可以進入所謂的關機階段,自動一步一步的將電源關閉。這中間的差別在哪裡?是硬體還是軟體?其實就在於整個電源的管理模式更新為ACPI的方式。
針對系統的電源管理,ACPI處理電源的方式可以分為G0、G1、G2、G3四個主要的全系統狀態(Global State),分別介紹如下:
1、
G0:一般正常的“工作”狀態,使用者開機進入作業系統後便是屬於此一階段,在這一個正常的用電階段,使用者還是可以進行細部的調整,例如等一下會介紹到針對設備的省電模式。。
2、
G1:就是較常聽到的“睡眠”狀態(sleeping),也是一般使用者比較有機會在作業系統下做調整的模式,在G1的狀態下,可以再細分為S1、S2、S3、S4四個子狀態:
S1:S1是在睡眠狀態中最吃電的一種狀態,Notebook中有時使用的"Standby"選項就是採用S1的方式,但比較好的做法,現在都會以S3為標準。CPU中的cache都持續供電,但停止執行指令。在CPU及記憶體的部份都有電力在供應,但其他的裝置就沒有硬性規定,可斷電也可供電。S1的狀態一般都是在較舊的電腦中才會看到,因為很多新型的主機都將G1S1的功能拿掉,或是廠商會支援使用者在BIOS中做設定,讓使用者可以選擇使用S1或是S3。
S2:所謂的沉睡(Deeper Sleep),連CPU都會斷電,但鮮少電腦會支援G1S2。
S3:這對大家而言一定最為熟悉,因為這就是在Windows選項中會出現的“Standby”,但在技術上的名稱則為Suspend to RAM (STR)。在此狀態下,只有主記憶體有接受供電的權利。但值得注意的是,雖然大部份的資料都會回寫到記憶體,但硬碟本身的Buffer有可能來不及回寫到硬碟,這樣就會造成資料流失,因此在用S3的狀態時,最好是先將硬碟的buffer及cache都關掉,以免造成上述的問題發生。另外,因為ACPI並不是一個世界通用標準,只是由幾間公司所在推的一個功能,因此Linux並沒有完全的支援,像談到現在為止的S1、S2、S3到目前為止Linux都沒有加到預設的功能,只有接下來提到的S4已經變為Fedora 7、RHEL5及SLES10的標準配備。
S4:俗稱冬眠狀態(Hibernet),其技術上的名稱則為Suspend to Disk (STD),這一階段會將所有執行中的資料全部寫入到硬碟中,而之所以要寫入硬碟,就是因為要完全的斷電,但也因為如此,在回覆到原本工作狀態所使用的時間會比S3來得久。在Fedora 7中,已經將S4加入到關機的功能之一,這也是G1中唯一可以在Linux下所使用的“睡眠”方式 。
3、
G2:可稱為Soft Off或是S5,在主機完全關機時,僅供少許的電力給一些像網路卡、鍵盤或像USB設備等,俱有“喚醒”功能(Wake)的設備,當需要開機時,只要透過網路或鍵盤就可以將遠端或近端直接打開電腦。但有個限制,比如說原本設定從網路開機,如果關機後使用者將網路卡拔掉,就會破壞G2的狀態,意即進入G3。在Linux中可以設定網路卡的等待狀態,也就是說直接指定某一張網路卡,當關機時要傾聽有沒有需要被“喚醒”,
設定完成後,只要在同一個網段內的電腦,執行“喚醒”指令,就可以透過在等待中的網路卡的MAC Address,將電腦“喚醒”,也就是進入開機程序。
4、
G3:Mechanical Off,這時的供電大小近乎於零,其實就是各位使用者一般的關機,在正常關機程序結束時便進入G3的狀態,但為何指明是“近乎於零”,因為其實還是有些許的電力在等待使用者開機使用,所以如果在家中要省電,還是把插頭拔掉吧,不然電腦是永遠都在吃電狀態的。
剛剛所提到的這些全系統狀態,指的都是整體的電源管理,而ACPI同時針對CPU也定義了電能以及效能狀態的規範,只是Linux到目前為止大部份沒有完全照著ACPI的方式執行。
以電能狀態而言,CPU分為C0、C1、C2、C3四種狀態,裝置也同樣被區分為D0、D1、D2、D3四種狀態,基本上大同小異,都是數字越小就越耗電,越大就越省電,但在Linux下可以看到,其實不會完全遵照這些狀態的定義,也就是功能雖然有部份做到,但不會完全照著ACPI的定義。
就CPU的效能狀態來說,ACPI定義了17個狀態,稱為P-States,從P0一直到P16。P0的效能最高,P16最低;但相對來說,也就是P0最耗電,而P16是最省電。在Linux 2.6 kernel下,要看到Linux針對ACPI所定義出的CPU效能狀態(P-States),就必須在/sys目錄下才看得到。在圖中可以看到,雖然定義了17種狀態,但實際上只會使用到2到3種的速度去因應CPU的效能變化,但在實際的運作中,只能說夠用了。
當然ACPI定義了不只這一些,另外還有很多的表格要去參考,真的是蠻複雜的一個機制,尤其ACPI並非完全由作業系統來控制的,一樣是透過作業系統與BIOS不斷的溝通才有辦法達到各個狀態。但ACPI對目前全球正熱門的節能概念來說,非常的實用,尤其是一些長期使用NOTEBOOK的使用者來說,更可以讓電力更持久。但有些特殊情況並不適合使用類似的節能方式,會造成反效果(連電力都會更浪費),這在【Linux作業系統之奧義】一書中也已經解釋過,其原因是出在於處理節奏上的差異。適不適用於系統上,還是要等待使用者細心的評估過才能決定,不過,身為一個系統管理者,勢必多了一份為地球省一點電的責任。
#hostname pxeserver
Create DHCP tables in /var/dhcp
#dhcpconfig -D -r SUNWfiles -p /var/dhcp
Start and Stop DHCP service
dhcpconfig -S -d <-- stop dhcpconfig -S -e <-- start PXE default settings # dhtadm -A -m PXEClient:Arch:00000:UNDI:002001 -d ':BootSrvA=192.1.1.254:' Create options # dhtadm -A -s SrootIP4 -d ’Vendor=SUNW.i86pc,2,IP,1,1’ # dhtadm -A -s SrootNM -d ’Vendor=SUNW.i86pc,3,ASCII,1,0’ # dhtadm -A -s SrootPTH -d ’Vendor=SUNW.i86pc,4,ASCII,1,0’ # dhtadm -A -s SinstIP4 -d ’Vendor=SUNW.i86pc,10,IP,1,1’ # dhtadm -A -s SinstNM -d ’Vendor=SUNW.i86pc,11,ASCII,1,0’ # dhtadm -A -s SinstPTH -d ’Vendor=SUNW.i86pc,12,ASCII,1,0’ # dhtadm -A -s SsysidCF -d ’Vendor=SUNW.i86pc,13,ASCII,1,0’ # dhtadm -A -s SjumpsCF -d ’Vendor=SUNW.i86pc,14,ASCII,1,0’ # dhtadm -A -s SbootURI -d ’Vendor=SUNW.i86pc,16,ASCII,1,0’ Add a PXE client # ./add_install_client -d -e "00:a0:d1:e1:a5:3d" \ <-- Client's MAC address # > -s 192.1.1.1:/export/home/sol10 \
# > i86pc
Create private macro for MAC
# dhtadm -A -m 0100A0D1E1173C -d ':SinstNM=192.1.1.254:SinstIP4=192.1.1.254:SinstPTH=/export/home/sol10_ga:SrootNM=client:SrootIP4=192.1.1.254:SrootP
TH=/export/home/sol10_ga/Solaris_10/Tools/Boot:BootFile=nbp.0100A0D1E1173C:SbootURI=tftp\://192.1.1.254/0100A0D1E1173C:'
# dhtadm -A -m 192.1.1.0 -d ':Subnet=255.255.255.0:RDiscvyF=1:Broadcst=192.1.1.255:'
Create IP to suit the macro
# pntadm -C 192.1.1.0
# pntadm -A 192.1.1.100 192.1.1.0
# pntadm -M 192.1.1.100 -m 0100A0D1E1173C 192.1.1.0
2. Install XP and use Linux DVD boot into Linux system.
3. "dd if=/dev/hda1 of=/boot.img bs=512 count=1" to dump MBR or boot sector's bootloader into 1 file, and name it boot.img.
4. Copy boot.img file into XP system C: and add 1 line into tail of c:\boot.ini file as followings.
c:\boot.lnx="Linux system name"
5. Reboot and then you will see the multi boot option, and sure you should can get into that Linux OS.
DHCP
1. /usr/sadm/admin/bin/dhcpmgr command under GUI
2. Now finish configuring DHCP service
3. You can start and stop service by this tool
NFS Image files
1. create /export/home/sol10 directory
2. insert the first CD in the CDROM. Go to /cdrom/cdrom0/s2/Solaris_10/Tools
3. Execute the following command:
# ./setup_install_server /export/home/sol10
4. After finishing the CD1, change to another disk step by step(CD2,3,4) and into the same directory
5. Execute the following command:
# ./add_to_install_server /export/home/sol10
NFS
1. Edit /etc/dfs/dfstab file and add a following line.
share –F nfs –o ro,anon=0 –d "install server directory” /export/home/sol10
2. Do "shareall" command
3. Type "share" command to see if it works.
4. Do command "svcadm enable network/nfs/server"
5. "showmount -e" to see if it works or check "ps -efgrep nfs".
6. If not, do it by yourself like below.
/usr/lib/nfs/nfsd -a
/usr/lib/nfs/statd
/usr/lib/nfs/nfsmapid
/usr/lib/nfs/lockd
/usr/lib/nfs/mountd
7. It should be work now. If not, call the police.....:)
PXE and DHCP
1. Get into /export/home/sol10/Solaris_10/Tools
2.
# ./add_install_client -d -e "00:a0:d1:e1:a5:3d" \ <-- Client's MAC address # > -s 192.1.1.1:/export/home/sol10 \
# > i86pc
$$$$ It shows some messages and VERY IMPORTANT about the PXE function.
3.
Do following command to make the same option for dhcp.
# dhtadm -A -s SrootIP4 -d ’Vendor=SUNW.i86pc,2,IP,1,1’
# dhtadm -A -s SrootNM -d ’Vendor=SUNW.i86pc,3,ASCII,1,0’
# dhtadm -A -s SrootPTH -d ’Vendor=SUNW.i86pc,4,ASCII,1,0’
# dhtadm -A -s SinstIP4 -d ’Vendor=SUNW.i86pc,10,IP,1,1’
# dhtadm -A -s SinstNM -d ’Vendor=SUNW.i86pc,11,ASCII,1,0’
# dhtadm -A -s SinstPTH -d ’Vendor=SUNW.i86pc,12,ASCII,1,0’
# dhtadm -A -s SsysidCF -d ’Vendor=SUNW.i86pc,13,ASCII,1,0’
# dhtadm -A -s SjumpsCF -d ’Vendor=SUNW.i86pc,14,ASCII,1,0’
# dhtadm -A -s SbootURI -d ’Vendor=SUNW.i86pc,16,ASCII,1,0’
4. And then use “/usr/sadm/adm/bin/dhcpmgr&” to call the GUI programe to configure following items.
5. Create 2 macros as it said. (0100xxxxxx and the PXEClientxxxxx)
6. Create the Address FOR the macro.
7. Now finish configuring DHCP with PXE service
8. You can start and stop service by this tool
Trouble Shooting
1. If cannot boot from the client and cause by the driver.
a. Modify path
/export/home/sol10/Solaris_10/Tools/Boot/boot/solaris/devicedb/master
Change driver name from none to “bge.bef” or something like that.
b. Modify path
/export/home/sol10/Solaris_10/Tools/Boot/etc/driver_aliases
Add a line bge “pci14e4,16a8” <-- It shows on your client’s error message. 2. If cannot put into the CDs, please do following command in CLI mode. # svcadm enable smserver 3. The command “lspci” usage in Solaris # prtconf –pv
PXE Server configuration tips
chkconfig tftp on
chkconfig xinetd on
chkconfig nfs on
chkconfig dhcpd on
##Copy all DVD source into NFS folder
cp -a /mnt/* /var/ftp/fedora7
##Copy needed files into tftp folder
mkdir /tftpboot/linux-install/fedora7
cp /var/ftp/fedora7/image/pxeboot/vmlinuz /tftpboot/linux-install/fedora7
cp /var/ftp/fedora7/image/pxeboot/initrd /tftpboot/linux-install/fedora7
##Edit PXE config file
vi /tftpboot/linux-install/pxelinux.cfg/default
--------------------------------------------------------
default 0
timeout 2000
prompt 1
display msgs/boot.msg
label 1
kernel fedora7/vmlinuz
append initrd=fedora7/initrd.img ramdisk_size=65536
## Edit message config file
vi /tftpboot/linux-install/msgs/boot.msg
--------------------------------------------------------
1. Install Fedora Core 7
## NFS has to share /var/ftp/fedora7 folder
##DHCP config file notice
ddns-update-style none;
ignore client-updates;
allow booting;
allow bootp;
class "pxeclients" {
match if substring(option vendor-class-identifier, 0, 9) = "PXEClient";
next-server 192.10.0.1;
filename "linux-install/pxelinux.0";
}
subnet 192.10.0.0
netmask 255.255.255.0 {
range 192.10.0.150 192.10.0.180;
option broadcast-address 192.10.0.255;
option routers 192.10.0.1;
option subnet-mask 255.255.255.0;
}
Boot System by redirect function via serial port
[Test steps]
1. Completing Linux installation.
2. Change BIOS settings /Server/Serial Console Features/BIOS Redirection Port to [Serial port 1 or 2]
3. Power on System.
4. Edit the file /boot/grub/grub.conf, change some arguments to redirects Linux console to ttyS0 (ttyS1) as followings:
"kernel /vmlinuz ro root=/dev/hda3 console=ttyS0,19200,vt100"
5. Edit the file /etc/inittab, add some arguments as followings:
c0:2345:respawn:/sbin/agetty ttyS0 19200 vt100
c1:2345:respawn:/sbin/agetty ttyS1 19200 vt100
6. Modify /etc/securetty, and add some arguments in the tail of the file:
ttyS0 ttyS1
Nagios installation and configuration tips
### Environment###
Client IP address: 10.6.116.59
Node1 (Server) IP address:
10.6.116.64
192.1.1.1
Node2 IP address: 192.1.1.2
################
[ipvsadm section]
tar zxvf ipvsadm-1.24.tar.gz
cd ipvsadm-1.24
Modify Makefile and libipvs/Makefile for /usr/src/linux/include to /usr/src/kernels/2.6.15xxx/include
make;make install
ipvsadm -C
ipvsadm -A -t 10.6.116.64:80 -s rr
ipvsadm -a -t 10.6.116.64:80 -r 192.1.1.2:80 -w 1
ipvsadm -a -t 10.6.116.64:80 -r 192.1.1.3:80 -w 2
ipvsadm -A -t 10.6.116.64:23
ipvsadm -a -t 10.6.116.64:23 -r 192.1.1.2:23 -w 1
ipvsadm -a -t 10.6.116.64:23 -r 192.1.1.3:23 -w 2
ipvsadm
[ab section]
cd /usr/local/apache2/bin
./ab -n 100 node3/
[webmin section]
tar zxvf webmin-1.300.tar.gz
cd webmin-1.300
./setup.sh /usr/local/webmin
[nagios section]
tar zxvf nagios-1.0.tar.gz
adduser nagios
passwd nagios
./configure
make all;make install
make install-init
make install-config
For /usr/local/apache2/conf/httpd.conf modification
######
## For Nagios Use Only##
######
Try it… http://localhost/nagios/
[nagiosplus section]
tar zxvf nagios-plugins-1.4.3.tar.gz
cd nagios-plugins-1.4.3/
./configure
make;make install
[nagios configuration]
cd /usr/local/nagios/etc
make all sample file to cfg file
all cfg file as notes, use minimal.cfg to be default
(cgi.cfgminimal.cfgnagios.cfgcheckcommands.cfgmiscommands.cfgresource.cfg)
Only need to modify the minimal.cfg file as followings
contact area
####
define contact{ contact_name juergen alias Juergen Chiu service_notification_period 24x7 host_notification_period 24x7 service_notification_options w,u,c,r host_notification_options d,r service_notification_commands notify-by-email host_notification_commands host-notify-by-email email juergen@localhost.localdomain }
####
contactgroup area
####
define contactgroup{ contactgroup_name Cluster-Manager alias Cluster Administrators members juergen }
####
host area
####
define host{ host_name node2 alias Cluster Server 2 address 192.1.1.2 check_command check-host-alive max_check_attempts 10 notification_interval 120 notification_period 24x7 notification_options d,r contact_groups Cluster-Manager }
####
hostgroup area
####
define hostgroup{ hostgroup_name Juergen alias Cluster Servers members node2,node3 }
####
service area
####
define service{ host_name node2 service_description HTTP is_volatile 0 check_period 24x7 max_check_attempts 3 normal_check_interval 3 retry_check_interval 1 contact_groups Cluster-Manager notification_interval 120 notification_period 24x7 notification_options w,u,c,r check_command check_http }
####
Modify cgi.cfg file for authentication (use_authentication=0)
Remark the checkcommands.cfg and misccommands.cfg line in nagios.cfg file
add the notes into minimal to add the check_http function
####
define command{ command_name check_http command_line $USER1$/check_http -H $HOSTADDRESS$ }
####
[nagios startup section]
cd /usr/local/nagios/bin
"./nagios -v ../etc/nagios.cfg" to check the config status
./nagios ../etc/nagios.cfg
NetPIPE (Network Protocol Independent Performance Evaluator) installation tips with mpirun
Needed failes
1. mpich.tar.gz (http://www-unix.mcs.anl.gov/mpi/mpich2/downloads/mpich2-1.0.tar.gz)
2. NetPIPE_3.6.2.tar.gz (http://www.scl.ameslab.gov/netpipe/code/NetPIPE_3.6.2.tar.gz)
Installation
tar zxvf mpich.tar.gz
tar zxvf NetPIPE.tar.gz
######### NetPIPE ##################
Get into NetPIPE directory
modify makefile MP_Lite_home --> mpich directory
mpicc --> mpich/bin/mpicc
make mpi
*******************************************************
example command
Get into NetPIPE directory first.
mpirun -np 4 ./NPmpi
HPL (Linpack) installation and how to start to test
1. mpich.tar.gz (http://www-unix.mcs.anl.gov/mpi/mpich2/downloads/mpich2-1.0.tar.gz)
2. tvcpp0p8.tar.gz (http://www.vsipl.org/software/tvcpp0p8.tar.gz)
3. hpl.tar.gz (http://www.netlib.org/benchmark/hpl/hpl.tgz)
Installation
tar zxvf mpich.tar.gz
tar zxvf tvcpp0p8.tar.gz
tar zxvf hpl.tar.gz
######### tvcpp0p8 ##################
Get into tvcpp0p8 directory
make all
######### hpl ##################
Get into hpl directory
cp ./setup/Make.Linux_PII_VSIPL ./ (VSIPL is suit for tvcpp0p8 program)
modify Make.Linux_PII_VSIPL file.
1. TOPdir
2. MPdir
3. LAdir
4. LAlib --> libvsip_c.a to libvsip.a
make all arch=Linux_PII_VSIPL
*********************************************
example command
mpirun -np 4 xhpl (at least 4 processes)
*run this command in Xwin mode.
PMB utility howto and now change to IMB by Intel
1. mpich.tar.gz (http://www-unix.mcs.anl.gov/mpi/mpich2/downloads/mpich2-1.0.tar.gz)
2. PMB2.2.1.tar.gz (ftp://ftp.pallas.com/pub/PALLAS/PMB/PMB2.2.1.tar.gz)
Installation
tar zxvf mpich.tar.gz
tar zxvf PMB2.2.1.tar.gz
######### PMB ##################
Get into PMB2.2.1 directory
modify make_Linux file. MPI_HOME --> mpich directory
modify Makefile to enable "include make_Linux".
make (do not use parameter "all")
chmod 755 PMB2.2.1
chmod 755 SRC_PMB
*******************************************************
example command
Get into mpich/bin directory.
./mpirun -nolocal -np 4 PMB-MPI1 PingPong PingPing Sendrecv
MPICH2 briefly installation tips and howto to check
Needed Packages
1. mpich.tar.gz (http://www-unix.mcs.anl.gov/mpi/mpich2/downloads/mpich2-1.0.tar.gz)
2. RSH
3. NIS (ypserv)
Needed Daemon
1. nfs
2. netfs
3. network
4. nfs (server)
5. rstatd (for RPC)
6. portmap
7. rsh
8. xinetd
9. ypserv (server)
10. yppasswdd (server)
11. ypbind (client) <-- chkconfig ypbind on Installation tar zxvf mpich2.tar.gz ######### HOSTS ################## All /etc/hosts file in all client should contain all nodes information or it will fail. ######### mpich ################## use another user login and use these tools. Get into mpich directory ./configure --prefix=/home/hpcuser/mpich2 <-- not the same with the source folder make;make all;make install add "export PATH=/home/hpcuser/mpich2/bin/:$PATH" to ~/.bash_profile ######### mpd ################## cd $HOME touch .mpd.conf chmod 600 .mpd.conf add "secretword=111111" into this file #mpd& --> start the mpd daemon for mpiexec command.
You can test by "mpiexec -n 1 /bin/hostname" to see if you can run mpi2.
Or use "mpdallexit" to exit the mpd daemon.
######### RSH ##################
It used on client OS.
Open rsh daemon with xinetd service
modify PAM rule under /etc/pam.d/rsh
**remove the "pam_rhosts_auth" option
DON'T USE RSH AS ROOT!!!!!! (Permission Deny)
######### yp (NIS server) ##################
common settings
add "NISDOMAIN=hpcdomain" to /etc/sysconfig/network file
modify /etc/nsswitch passwd item to "nis files"
Server
add following items to /etc/ypserv.conf
127.0.0.0/255.255.255.0 :* :* :none
192.1.0.0/255.255.255.0 :* :* :none
add "/usr/lib/yp/ypinit -m" to /etc/rc.d/rc.local
add "/home (rw,sync)" to /etc/exports and do "/etc/rc.d/init.d/nfs reload"
add "/usr/local/src (rw,sync)" to /etc/exports and do "/etc/rc.d/init.d/nfs reload"
Client
add "domain pxe server 192.1.0.254" to /etc/yp.conf
add "/usr/lib/yp/ypinit -s pxe" to /etc/rc.d/rc.local
add "192.168.0.1:/home" to /etc/fstab
add "192.168.0.1:/usr/local/src" to /etc/fstab
add the mpi user's GID to /etc/group
*******************************************************
example command
cup stress
~/mpich/bin/mpirun -np 4 ~/mpich/mpe/contrib/life/life_g <-- make first, and type 100,1000 arter enter mpirun -np 2 ~/mpich/mpe/contrib/mandel/pmandel <-- xwin demo, "make" first (reference in README file) mpirun -np 4 PMB-MPI1 PingPong PingPing Sendrecv ********* Demo Used ********** cd ~/mpich/mpe ./configure --disable-checkMPI --disable-slog2 make all ~/mpich/examples/basic make mpirun -np 4 ./cpilog --> You will see all processes where they are.
~/mpich/examples/perftest
./configure
make all
MPICH installation and setting tips under Linux platform
Needed Packages
1. mpich.tar.gz (http://www-unix.mcs.anl.gov/mpi/mpich2/downloads/mpich2-1.0.tar.gz)
2. RSH
3. NIS (ypserv)
Needed Daemon
1. nfs
2. netfs
3. network
4. nfs (server)
5. rstatd (for RPC)
6. portmap
7. rsh
8. xinetd
9. ypserv (server)
10. yppasswdd (server)
11. ypbind (client) <-- chkconfig ypbind on Installation tar zxvf mpich.tar.gz ######### HOSTS ################## All /etc/hosts file in all client should contain all nodes information or it will fail. ######### mpich ################## use another user login and use these tools. Get into mpich directory ./configure make modify ($MPI_HOME)/util/machines/machines.LINUX to suit your client's hostname or FQDN. (These hostname should be included in /etc/hosts file) add "export PATH=/home/hpcuser/mpich/bin/:$PATH" to ~/.bash_profile ######### RSH ################## It used on client OS. Open rsh daemon with xinetd service modify PAM rule under /etc/pam.d/rsh **remove the "pam_rhosts_auth" option DON'T USE RSH AS ROOT!!!!!! (Permission Deny) ######### yp (NIS server) ################## common settings add "NISDOMAIN=hpcdomain" to /etc/sysconfig/network file modify /etc/nsswitch passwd item to "nis files" Server add following items to /etc/ypserv.conf 127.0.0.0/255.255.255.0 :* :* :none 192.1.0.0/255.255.255.0 :* :* :none add "/usr/lib/yp/ypinit -m" to /etc/rc.d/rc.local add "/home (rw,sync)" to /etc/exports and do "/etc/rc.d/init.d/nfs reload" add "/usr/local/src (rw,sync)" to /etc/exports and do "/etc/rc.d/init.d/nfs reload" Client add "domain pxe server 192.1.0.254" to /etc/yp.conf add "/usr/lib/yp/ypinit -s pxe" to /etc/rc.d/rc.local add "192.168.0.1:/home" to /etc/fstab add "192.168.0.1:/usr/local/src" to /etc/fstab add the mpi user's GID to /etc/group ******************************************************* example command cup stress ~/mpich/bin/mpirun -np 4 ~/mpich/mpe/contrib/life/life_g <-- make first, and type 100,1000 arter enter mpirun -np 2 ~/mpich/mpe/contrib/mandel/pmandel <-- xwin demo, "make" first (reference in README file) mpirun -np 4 PMB-MPI1 PingPong PingPing Sendrecv ********* Demo Used ********** cd ~/mpich/mpe ./configure --disable-checkMPI --disable-slog2 make all ~/mpich/examples/basic make mpirun -np 4 ./cpilog --> You will see all processes where they are.
~/mpich/examples/perftest
./configure
make all
Cluster manager tool usage and samba HA howto
>service add
Service name: clusamba
Preferred member [None]: node2
Relocate when the preferred member joins the cluster (yes/no/?) [no]: yes
User script (e.g., /usr/foo/script or None) [None]:
Status check interval [0]: 90
Do you want to add an IP address to the service (yes/no/?) [no]: yes
IP Address Information
IP address: 10.1.1.254
Netmask (e.g. 255.255.255.0 or None) [None]:
Broadcast (e.g. X.Y.Z.255 or None) [None]:
Do you want to (a)dd, (m)odify, (d)elete or (s)how an IP address, or
are you (f)inished adding IP addresses [f]:
Do you want to add a disk device to the service (yes/no/?) [no]: yes
Disk Device Information
Device special file (e.g., /dev/sdb4): /dev/sdb1
Filesystem type (e.g., ext2, ext3 or None): ext2
Mount point (e.g., /usr/mnt/service1) [None]: /mnt
Mount options (e.g., rw,nosuid,sync): rw,nosuid,sync
Forced unmount support (yes/no/?) [yes]:
Would you like to allow NFS access to this filesystem (yes/no/?)\
[no]: no
Would you like to share to Windows clients (yes/no/?) [no]: yes
You will now be prompted for the Samba configuration:
Samba share name: clushare
The samba config file /etc/samba/smb.conf.clushare does not exist.
Would you like a default config file created (yes/no/?) [no]: yes
Successfully created /etc/samba/smb.conf.clushare.
Please remember to make necessary customizations and then copy the file
over to the other cluster member.
Do you want to (a)dd, (m)odify, (d)elete or (s)how DEVICES, or
are you (f)inished adding DEVICES [f]: f
name: clusamba
preferred node: node2
relocate: yes
user script: None
monitor interval: 90
IP address 0: 10.1.1.254
netmask 0: None
broadcast 0: None
device 0: /dev/sdb1
mount point, device 0: /mnt
mount fstype, device 0: ext2
mount options, device 0: rw,nosuid,sync
force unmount, device 0: yes
samba share, device 0: clushare
Add clusamba service as shown? (yes/no/?) yes
Test procedure:
You can reboot or shutdown one of these 2 nodes and observe the smb daemon in another node.
#smbclient -L 10.1.1.254
You have to see a sharename with "clushare" and Comment is "High Availability Samba Service" and done.
Reference site:
http://www.redhat.com/docs/manuals/enterprise/RHEL-AS-2.1-Manual/cluster-manager/s1-service-samba.html
Use iscsi driver to integrate NAS storage in Linux system
OS: RedHat AS2.1 u5
Kernel: 2.4.9-e.49smp
rpm file: clumanager-1.0.27-1
node1 --> 10.1.1.101
node2 --> 10.1.1.102
Service IP --> 10.1.1.254 (Public IP)
################################
#### iSCSI settings with NAS Storage #########
Install NAS with linux-iscsi-
uncompress the file and into the iscsi directory
make
make install
Do next 3 steps
#vi /etc/iscsi.conf
----------------------------------------------------------------------
DiscoveryAddress=10.1.1.1:3260 --> your NAS' IP and port number
----------------------------------------------------------------------
#vi /etc/initiatorname.iscsi
----------------------------------------------------------------------------
InitiatorName=host1--> your NAS’ share hosts name
---------------------------------------------------------------------------
#/etc/init.d/iscsi start
You should find out your NAS share disk in /proc/partition as /dev/sda
##########################################
####### raw devices settings ##########
fdisk /dev/sda as /dev/sda1 and /dev/sda2
***Don't use mke2fs or something to format these 2 partition.***
Modify /etc/sysconfig/rawdevices file to add:
/dev/raw/raw1 /dev/sda1
/dev/raw/raw2 /dev/sda2
/etc/rc.d/init.d/rawdevices start
Until now your 2 raw devices are workable.
##########################################
########## Cluster HA settings #########
You can use relative softwares like clu* to manage cluster members.
*** The sample settings below focus on node1, if you use node2 will be reversed.
#cluconfig
Enter cluster name: ha1
Enter IP address for cluster alias: 10.1.1.254 --> This is your service IP, NOT real IP for NIC.
---- Next step is configure your Member 0 --> yourself ----
Enter name of cluster member: node1
Enter number of heartbeat channels:1 --> We only use 1 NIC to connect to another node
Channel type:net
Enter hostname of the cluster member on heartbeat channel 0: node1
---- Next step is configure your raw devices ----
Enter Primary Quorum Partition: /dev/raw/raw1 --> such as your raw devices list in /etc/sysconfig/rawdevices
Enter Shadow Quorum Partition: /dev/raw/raw2 --> such as your raw devices list in /etc/sysconfig/rawdevices
Power switch: NONE
---- Next step is configure your Member 1 --> node2 ----
Enter name of cluster member: node2
Enter hostname of the cluster member on heartbeat channel 0: node2
---- Next step is configure its raw devices ----
Enter Primary Quorum Partition: /dev/raw/raw1 --> such as its raw devices list in /etc/sysconfig/rawdevices
Enter Shadow Quorum Partition: /dev/raw/raw2 --> such as its raw devices list in /etc/sysconfig/rawdevices
Power switch: NONE
*** It will save your configuration in /etc/cluster.conf
Remember if you want to remove the cluster.rpm file, delete the /etc/cluster.conf after you remove.
########################################
Now you can use ifconfig to show your NIC information.
It should add a virtual IP address eth0:0 as 10.1.1.254. --> Your service IP is in eth0:0.
Test procedure
1. Shutdown the node2 directly (power off).
2. Use ifconfig in node1 to observe if its NIC will add a eth0:0 directly.
3. It should less than 30 sec and under my testing is 15 sec.
Reference site: