The adventure began when my sister gave me her notebook PC telling that she could not complete boot into her Windows 10 OS. The notebook PC was ASUS FX553VD on GL553VD mb, with 8 GB DDR4 RAM, i7-7700HQ cpu, NVIDIA GTX1050.
That was right, the PC every time shut down during boot process. I planned to take out the SSD, the HDD, and try to boot from Live Linux USBs.
The distributions I tried were a lot (Ubuntu, Debian, Slackware, Fedora, Solus, etc.). The ending was the same, after a couple of messages related to ACPI, the PC shut down. First of all, I updated the BIOS to version 308. Nothing changed. I then had suspicion about the BIOS chip, ACPI tables, TPM module, RAM module. I planned to boot Live Linux USBs with "kernel boot parameters". The very first successful result was with "nolapic" parameter. It ended with only one working cpu core. Also, "nolapic" parameter was not a desired parameter for me. I tried other parameters, "acpi=off" was also successful also with only one core activity. I preferred to continue with "acpi=off". I moved an SSD from another PC of mine to this notebook PC and added "acpi=off" to "grub". Yes, I had ASUS FX553VD running but with only one core and without ACPI abilities. I prioritize to have more number of cores to be active. I tried "acpi=ht" instead of "acpi=off" to disable all ACPI abilities except hyperthreading, but the situation was unsuccessful ending during boot process with shut down (this might be the first clue for me, but I missed).
I also tried BSD derivatives. Most recent FreeBSD versions (currently supported versions) shut down during boot. NetBSD-10RCs could boot, but they were crashing while opening a browser, etc.
This week (mid March, 2024), I saw a Debian based distro with Openbox with the latest kernel, and wanted to give it a try. It was "Dr.Parted". It also shut down during boot. The "failsafe boot" option was there and I tried. I had the PC running again. I digged into the "kernel boot parameters" found in the "failsafe boot" lines. They were :
"components"
"noeject"
"memtest"
"noapic"
"noapm"
"nodma"
"nomce"
"nolapic"
"nomodeset"
"nosmp"
"nosplash"
"vga=normal"
I figured out that the success to boot came from "nosmp" parameter in addition to parameters I found already.
"nosmp" parameter was disabling "symmetric multiprocessing". This caused me to think about the cpu, maybe for the first time, seriously. I had no great suspicion about a cpu failure before.
I wanted to boot with "maxcpus=1" parameter instead of "nosmp" to gradualy increase active core numbers.
"maxcpus=1" parameter was successful, without suspicion.
"maxcpus=2" parameter also ended with success; I, at least, had two cores then.
"maxcpus=3" parameter ended with failure (shutdown during boot). The third core was failing. What about the fourth core ?
My plan to boot with two cores and to make the remaining cores active inside OS was on the way, then.
I booted with two cores, they were marked as "cpu 0" and "cpu 1". When I wanted to make "cpu 2" online, the OS crashed (third core, as I found previously). Another try with "cpu 3" in addition to "cpu 0" and "cpu 1" also ended with crash. Yes, the fourth core was failing, too. Making "cpu 6" online or making "cpu 7" online resulted in crashes also (hyperthreading related to third core and fourth core). I made "cpu 4" and "cpu 5" online (hyperthreading related to first two cores). It was showing 2 cores, 4 threads.
At last, my search ended with finding the defective part, cpu.
I can run Ubuntu now. But with 2 cores instead of 4, and with 4 threads instead of 8, being better than running with only one core. The other positive point is that I have all the ACPI abilities in this situation.
What I learned is to have suspicion about the cpu although reported very rarely.
No comments:
Post a Comment