This is something you may see happening in the worst of times, in the mos awkward situations and of course when you need it the most. But the truth is root password’s get lost in production enviroments every now and then, so you better know what to do when the moment arrives.
This is based on my personal experience from a few days ago.
A domain inside a Sun 25K got its IP address taken by another server which in turn, caused all running IP-related services to go down. Yup, including ssh.
Luckily, the 25K has a System Controller (SC), from which it’s possible to access any domain in it through a system console. Little I remembered at the time that even though I was logged in to the SC I was still in the need of the root password to access the specified domain.
After a few failed login attempts I decided it was time to recover the lost password so this is what I did:
Send STOP-A signal to the server to go to OBP (Open Boot Prompt). If, as it was my case, you’re logged in through ssh to another server, you can send the signal using ~~# (ATTENTION: using ~#, instead of ~~#, may cause the first server you’re connected to go down, be careful enough to check how you’re connected before doing so.
Once in OBP, I just booted the server to see if I was lucky enough that the server would start on it’s own, re-take the IP it should have, start the IP-related services and then of course I would be able to login through ssh normally and change the password.
Needless to say, I wasn’t lucky enough. The server required a forcefull fsck (to fix errors caused by the abrupt reboot) to be run in single user mode, to which is impossible to access unless you have a root password.
Went back to OBP. Here I realized I needed to boot the server with a CD or from the net in order to blank out root password. I was at home at the time, so the net boot sounded the best option for me. As it’s a 25K we are talking about here, this had a JumpStart Server configured (you may read about how to configure one over the internet), but I had to check some things just to be certain: Make sure the MAC address and internal IP of the server you’re trying to fix it’s in the file /etc/ethers in the SC. Check the /etc/bootparams file in the SC for correct path to the installation image you have available to use (in this case, it won’t mind the solaris version you’re using, as long as you can boot the server and mount the filesystems). Verify the path to the installation image is correctly shared through NFS in the SC.
Once everything has been checked, we can net-boot the server in problems. Just run boot net -s from the sever’s OBP and wait, it will probably take around 10 to 20 minutes to boot up to a prompt.
Now you have a root prompt in your server, what’s left to do is blank root’s password, so you can login and service the server. When I first tried to mount the root file system, I got a message requiring to fsck the filesystem first, so that’s why did, and as this particular server had a SVM mirror, I did a fsck to the mirror disk as well (don’t forget about this, otherwise you’ll get errors all the time)
When the fsck were finished, I mounted the root disk in the /a directory (special directory available for mounts in the single user mode) and then I just edited /a/etc/shadow and erased the root encrypted password. I saved the file, unmounted the /a filesystem and then mounted the mirror disk on /a. Again I blanked the root’s password, saved the shadow file and unmounted the filesystem.
shutdown -g0 -y -i6 (reboot the server, this time it will boot from the root disk)
This time, it should boot with no problems, and when it does, just press enter for root password and execute passwd root in order to set a new password for root
r4pp157 9:25 am on October 26, 2009
This is something you may see happening in the worst of times, in the mos awkward situations and of course when you need it the most. But the truth is root password’s get lost in production enviroments every now and then, so you better know what to do when the moment arrives.
This is based on my personal experience from a few days ago.
A domain inside a Sun 25K got its IP address taken by another server which in turn, caused all running IP-related services to go down. Yup, including ssh.
Luckily, the 25K has a System Controller (SC), from which it’s possible to access any domain in it through a system console. Little I remembered at the time that even though I was logged in to the SC I was still in the need of the root password to access the specified domain.
After a few failed login attempts I decided it was time to recover the lost password so this is what I did: