We need the ability to configure the Node Isolation Script through iManager via the Cluster Plugin.

It would be great if this could be added to the nodes properties - also with a "select multiple nodes" at once function.

(In general all nodes should have the same isolation script - so allowing multiple selection would be great to reduce the admin misstake risk)



Here the discussion with Changju, so far:

Thanks, Martin. I will let you know when the information is available online. Catherine >>> Martin Weiss 3/31/2010 3:07 AM >>> Hi Changju, hi Catherine, I love this functionality! I configured echo o > /proc/sysrq-trigger on all cluster node objects ncp:node isolation script attribute - and removed the "panic" in that attribute. Now - when NCS kills a node - it gets powered off - which is the best way to fence a node! I am adding Catherine to this thread - so that she can add this to the documentation. (and remove
http://www.novell.com/documentation/oes2/clus_admin_lx/dat a/buqvr91.html) - with this solution - it is NOT required to modify the ldncs, anymore! @Catherine: one important point we have to mention for this feature is that multiple commands can be used in this attribute - but they must be separated by semicolons instead of CR/LF. I had some strange effects with "special characters", too: Working: echo o > /proc/sysrq-trigger or echo c > /proc/sysrq-trigger Not working: echo "o" > /proc/sysrq-trigger Currently it is required to use the eDirectory "Modify Object" Plugin to modify this attribute. @Changju - what would be required to get this into the iManager Plugin? Should I create a FATE entry for this? - is it required to have a cluster down, node reboot or cluster..py -init to activate a modified setting? Thanks, Martin >>> Changju Gao schrieb am Dienstag, 30. März 2010 um 16:46: > Hi Martin, > > We tested with emergency reboot (echo b > /proc/sysrq-trigger). > > We run the script as a command. If you have multiple commands, you need to > make them one command and separate them with ";". > > Personally, I am still searching the best way to trigger a kernel panic from > user space. > > If you like, you can enter a bug to ask NCS to provide a way to trigger > kernel panic for you. > > Best regards, > > Changju > >>>> Martin Weiss 3/30/2010 8:00 AM >>> > Hi Changju, > > I gave different versions a try - but non of my scripts work. > > Currently I have only this in the attribute: > > echo o > /proc/sysrq-trigger > > --> if I use this on the bash prompt - this switches the power off > > But as part of the NCS:Node Isolation Script attribute - this does nothing. > > I also tried > > echo c > /proc/sysrq-trigger > > which did not do anything. > > With which script in this attribute did you test the node isolation? > > Thanks, > Martin > >>>> Changju Gao schrieb am Montag, 29. März 2010 um 18:51: >> Hi Martin, >> >> What you need is a command/tool to trigger kernel panic. The following (from > >> kddump document) explains ways to do it. >> >> 362 (
http://www.mjmwired.net/kernel/Documentation/kdump/#362 ) Kernel Panic >> 363 (
http://www.mjmwired.net/kernel/Documentation/kdump/#363 ) ============ >> 364 (
http://www.mjmwired.net/kernel/Documentation/kdump/#364 ) >> 365 (
http://www.mjmwired.net/kernel/Documentation/kdump/#365 ) After >> successfully loading the dump-capture kernel as previously >> 366 (
http://www.mjmwired.net/kernel/Documentation/kdump/#366 ) described, >> the system will reboot into the dump-capture kernel if a >> 367 (
http://www.mjmwired.net/kernel/Documentation/kdump/#367 ) system crash >> is triggered. Trigger points are located in panic(), >> 368 (
http://www.mjmwired.net/kernel/Documentation/kdump/#368 ) die(), >> die_nmi() and in the sysrq handler (ALT-SysRq-c). >> 369 (
http://www.mjmwired.net/kernel/Documentation/kdump/#369 ) >> 370 (
http://www.mjmwired.net/kernel/Documentation/kdump/#370 ) The >> following conditions will execute a crash trigger point: >> 371 (
http://www.mjmwired.net/kernel/Documentation/kdump/#371 ) >> 372 (
http://www.mjmwired.net/kernel/Documentation/kdump/#372 ) If a hard >> lockup is detected and "NMI watchdog" is configured, the system >> 373 (
http://www.mjmwired.net/kernel/Documentation/kdump/#373 ) will boot >> into the dump-capture kernel ( die_nmi() ). >> 374 (
http://www.mjmwired.net/kernel/Documentation/kdump/#374 ) >> 375 (
http://www.mjmwired.net/kernel/Documentation/kdump/#375 ) If die() is >> called, and it happens to be a thread with pid 0 or 1, or die() >> 376 (
http://www.mjmwired.net/kernel/Documentation/kdump/#376 ) is called >> inside interrupt context or die() is called and panic_on_oops is set, >> 377 (
http://www.mjmwired.net/kernel/Documentation/kdump/#377 ) the system >> will boot into the dump-capture kernel. >> 378 (
http://www.mjmwired.net/kernel/Documentation/kdump/#378 ) >> 379 (
http://www.mjmwired.net/kernel/Documentation/kdump/#379 ) On powerpc >> systems when a soft-reset is generated, die() is called by all cpus >> 380 (
http://www.mjmwired.net/kernel/Documentation/kdump/#380 ) and the >> system will boot into the dump-capture kernel. >> 381 (
http://www.mjmwired.net/kernel/Documentation/kdump/#381 ) >> 382 (
http://www.mjmwired.net/kernel/Documentation/kdump/#382 ) For testing >> purposes, you can trigger a crash by using "ALT-SysRq-c", >> 383 (
http://www.mjmwired.net/kernel/Documentation/kdump/#383 ) "echo c > >> /proc/sysrq-trigger" or write a module to force the panic. >> >> Regards, >> >> Changju >> >>>>> Martin Weiss 3/29/2010 6:58 AM >>> >> Hi Changju, >> >> today I configured this script on all cluster nodes: >> >> # Stop instead of reboot >> echo -n 0 > /proc/sys/kernel/panic >> # Deactivate Core-Dump writing >> /sbin/kexec -p -u >> panic >> >> Do you think this is the right way? >> What will happen with the "panic" at the end? >> (could not find the script "panic" anywhere) >> >> Should we send this to Catherine to get it into the documentation at >>
http://www.novell.com/documentation/oes2/clus_admin_lx/dat >> a/buqvr91.html ? >> >> Should we create an RFE to allow this to be configured by the NCS iManager >> Plugin (in the node-configuration)? >> >> Thanks, >> Martin >> >

Comments