Frequent shutdowns and BSOD's on usually great PC

fbnoise

Getting the hang of it
Joined
Dec 29, 2014
Messages
270
Reaction score
61
I run Blue Iris as a service on dedicated i7-4770K (4th gen, Haswell). 16GB DDR3 RAM. 8 various cams, 24/7 recording and motion detect, Direct to Disk (each cam) and HW acceleration turned on. There have been some hiccups since 2013 but it's been pretty good for years - hums along at 20% CPU usually.

Couple weeks ago the PC started hanging or shutting down randomly and frequently. Did a CLEAN install of windows w/ media creation tool. Made sure everything was up to date: Windows 10, drivers, Blue Iris latest, updated ASUS MB bios to latest. No fix after all of that.

Event viewer doesn't tell me much. Here's a dump indicating something HW related, my Google searches on this haven't helped:

crash dump file: C:\Windows\Minidump\060818-8609-02.dmp
This was probably caused by the following module: hal.dll (hal+0x3F520)
Bugcheck code: 0x124 (0x0, 0xFFFFCE05FAF66028, 0xBF800000, 0x124)
Error: WHEA_UNCORRECTABLE_ERROR
file path: C:\Windows\system32\hal.dll
product: Microsoft® Windows® Operating System
company: Microsoft Corporation
description: Hardware Abstraction Layer DLL
Bug check description: This bug check indicates that a fatal hardware error has occurred. This bug check uses the error data that is provided by the Windows Hardware Error Architecture (WHEA).
This is likely to be caused by a hardware problem.
The crash took place in the Windows kernel. Possibly this problem is caused by another driver that cannot be identified at this time.

  • I wondered if it was Intel's GPU failing so I installed an Nvidia card I had on hand and used that. Same problems.
  • No CPU spikes that I have noticed, but I turned OFF HW acceleration just in case based on some recent memory leak posts here (I know those are for later gen processors but wanted to rule that out). No help.
  • Observed CPU and other temperatures across board - seem normal.
Something significant I did learn is the PC will run just fine in Safe Mode (with networking), so that's what I've resorted to in order to keep my cameras up. Seems like that indicates some service is screwing things up..

Any ideas?
 

SantiagoDraco

Getting the hang of it
Joined
Dec 8, 2017
Messages
130
Reaction score
51
Have you checked the operating temperature of the CPU? Cleaned the CPU heatsink, fans and vents? Quite often random lockups, reboots and BSODs are the result of overheating due to dirty cooling systems.

There are a number of tools out there that will help you monitor those temps. Speedfan, CPU-Z and CoreTemp to name a few.
 

fbnoise

Getting the hang of it
Joined
Dec 29, 2014
Messages
270
Reaction score
61
Temps are fine but just in case, I just took PC outside and blew it out. I cleaned CPU fan and re-seated with fresh thermal paste. Computer still restarts very soon after I log into windows.

None of this happens when I run in safe mode w/ networking, and Blue Iris even runs perfectly. Does that indicate a service could be causing problems?
 

fenderman

Staff member
Joined
Mar 9, 2014
Messages
36,902
Reaction score
21,274
Temps are fine but just in case, I just took PC outside and blew it out. I cleaned CPU fan and re-seated with fresh thermal paste. Computer still restarts very soon after I log into windows.

None of this happens when I run in safe mode w/ networking, and Blue Iris even runs perfectly. Does that indicate a service could be causing problems?
this might be helpful
Blue Screen 0x00000124 and HAL.dll error
 

SantiagoDraco

Getting the hang of it
Joined
Dec 8, 2017
Messages
130
Reaction score
51
Temps are fine but just in case, I just took PC outside and blew it out. I cleaned CPU fan and re-seated with fresh thermal paste. Computer still restarts very soon after I log into windows.

None of this happens when I run in safe mode w/ networking, and Blue Iris even runs perfectly. Does that indicate a service could be causing problems?
When you run in safe mode you run with a significantly reduced set of drivers, including no 3D hardware, just standard frame buffered VGA. What this means is that when in safe mode you reduce the work load on the hardware. If overheating is the problem, or you have a failing hardware component, safemode might work fine as you are reducing the hardware components being utilized by the system.

A few questions:
- You said you reinstalled the OS... when you did so how much more than the OS did you install?
- Are you using integrated graphics or a secondary video card? I assume integrated on the 4770.
- Are you overclocking?
- Have you tried resetting the BIOS to defaults (assuming you have tweaked settings such as memory or cpu timings/settings?
- If you have addon cards have you tried removing them?
- Have you reseated your memory modules and if you have multiple (ie dual channel with 4 modules) tried swapping out modules?

There are also USB based tools you can use to help with some diagnostics but ultimately if you have failing hardware the only real way to know is to swap in known good hardware and see if the problem goes away. For example you could try to get a low cost compatable CPU and see if the system runs with it.

4 years of 24x7 operation at 20% cpu utilization, especially if it may have been running "hot" could lead to some degradation of components. It's possible.

And how dirty was it when you cleaned it out?
 

fbnoise

Getting the hang of it
Joined
Dec 29, 2014
Messages
270
Reaction score
61
Ok, good points about safe mode.

- I only installed BI Tools, Timesync, and Jump Desktop (for remote connection) and Blue Iris of course
- I was using integrated graphics on Intel chip. Currently trying a separate video card w/ no improvements in instability
- not overclocking. updated bios to latest, running default
- no add-on cards. temporarily tried an Intel Gigabit adapter card but when that didn't fix I removed
- normal amount of dust came out. Not a lot of visible buildup on fans and etc. as I cleaned it a year ago. PC is in well ventilated room in basement.

I haven't tried re-seating ram or running with just 1 stick and swapping. Will try that ASAP

I did just chat w/ ASUS. The MB is still under warranty until Feb 2019 but they didn't have any replacements in stock atm and want me to send in for repair (7-10 day turnaround not including shipping). I'm highly skeptical going that route would fix
 

SantiagoDraco

Getting the hang of it
Joined
Dec 8, 2017
Messages
130
Reaction score
51
Adding a separate video card wont help if the problem is the Intel chip, most likely. If the problem was due to the GPU sections of the card while BI is in use then you'd have to disable hardware acceleration in BI to test.

Does the system run if you just boot to Windows and do not run BI (including not allowing the service to start?).

Before sending in the MB you should try swapping out the memory modules and CPU if possible.
 

fbnoise

Getting the hang of it
Joined
Dec 29, 2014
Messages
270
Reaction score
61
Good points. I did disable HW acceleration with the new graphics card put in. No luck

I also swapped the ram since my last post with no luck

The PC runs fine until I open BI or run anything taxing such as the intel GPU test I tried to run.

I don’t have a spare processor to test with. Looking up a replacement they seem pretty high $ for being so old. I do have a feeling it’s the processor at this point.

Thanks a lot for your help so far
 

fenderman

Staff member
Joined
Mar 9, 2014
Messages
36,902
Reaction score
21,274
Good points. I did disable HW acceleration with the new graphics card put in. No luck

I also swapped the ram since my last post with no luck

The PC runs fine until I open BI or run anything taxing such as the intel GPU test I tried to run.

I don’t have a spare processor to test with. Looking up a replacement they seem pretty high $ for being so old. I do have a feeling it’s the processor at this point.

Thanks a lot for your help so far
It's highly unlikely it's the processor...try a different intel driver. Also failing power supplies cause weird stuff to happen.
 

Q™

IPCT Contributor
Joined
Feb 16, 2015
Messages
4,990
Reaction score
3,991
Location
Megatroplis, USA
Bad memory module.

Opps...just read you swapped out the memory.

Never mind.
 

SantiagoDraco

Getting the hang of it
Joined
Dec 8, 2017
Messages
130
Reaction score
51
Good points. I did disable HW acceleration with the new graphics card put in. No luck

I also swapped the ram since my last post with no luck

The PC runs fine until I open BI or run anything taxing such as the intel GPU test I tried to run.

I don’t have a spare processor to test with. Looking up a replacement they seem pretty high $ for being so old. I do have a feeling it’s the processor at this point.

Thanks a lot for your help so far
If it's failing when you "tax it" it is absolutely possible it is the CPU, or the heatsink isn't seated properly the second time around.

If you have an addiitonal PC around you could swap out the power supply to isolate that out.

If you rebuilt the OS, and the problem recurred without a full driver update, then it's unlikely it's related to drivers.

Just one of those shitty situations that aren't easy to diagnose but as I stated before it's most likely hardware not software especially considering you reinstalled the OS. Also, if you can, try those modules in different slots, if the MB supports it. Sometimes the slots can go bad as well and cause a problem.

In any case good luck :)
 

fbnoise

Getting the hang of it
Joined
Dec 29, 2014
Messages
270
Reaction score
61
No luck so far, but I really appreciate the ideas. Here's what I tried:

I put RAM in other 2 slots
Installed a 2016 Intel driver (and removed separate card)
Tested with a spare power supply

All 3 ideas resulted in same BSOD when I do anything substantial (especially when I repair/regenerate the DB in Blue Iris). Also observed temps and got nothing over 70C (CPU) during those tests.

Unless there are any other ideas, I'll initiate a warranty claim. I'm HIGHLY doubtful of them actually repairing and I'm actually surprised repairing motherboards is actually a thing. Wish they could give me new but they don't have any.

I'll update if/when they give me a "repaired" MB. Really appreciate the help!
 

SantiagoDraco

Getting the hang of it
Joined
Dec 8, 2017
Messages
130
Reaction score
51
No luck so far, but I really appreciate the ideas. Here's what I tried:

I put RAM in other 2 slots
Installed a 2016 Intel driver (and removed separate card)
Tested with a spare power supply

All 3 ideas resulted in same BSOD when I do anything substantial (especially when I repair/regenerate the DB in Blue Iris). Also observed temps and got nothing over 70C (CPU) during those tests.

Unless there are any other ideas, I'll initiate a warranty claim. I'm HIGHLY doubtful of them actually repairing and I'm actually surprised repairing motherboards is actually a thing. Wish they could give me new but they don't have any.

I'll update if/when they give me a "repaired" MB. Really appreciate the help!
Gl!

The thing with temps is that if the component is already failing then it won't take high temps to cause it to fail so your temps may be "ok" now but it could already be too late. There are other ways to test for failing components such as "freezing" them during operation to force the temps to stay very low to see if they stop failing but it's difficult to do that on things like CPUs which have large heat sinks.

If the MB is the problem it will usually fail at boot unless it's a core chipset component such as the Northbridge where it would fail more often under load, like the CPU.
 

fbnoise

Getting the hang of it
Joined
Dec 29, 2014
Messages
270
Reaction score
61
@fenderman I wish that were the case but I swapped it with another and had same issues

@SantiagoDraco sounds like your bet is CPU. I hope MB fixes but I have my doubts as well

I've never had hardware fail on me other than hard drives, so this is a first for me
 

SantiagoDraco

Getting the hang of it
Joined
Dec 8, 2017
Messages
130
Reaction score
51
@fenderman I wish that were the case but I swapped it with another and had same issues

@SantiagoDraco sounds like your bet is CPU. I hope MB fixes but I have my doubts as well

I've never had hardware fail on me other than hard drives, so this is a first for me
Well I'm actually not saying it IS the CPU just that it could be. That's why I am careful to say "components". Most likely it is either the motherboard or the CPU in this case since you said you swapped out the RAM modules for different ones (I believe?)

If you can't get your hands on another CPU to test then I'd probably just send the MB in under warranty and cross your fingers.
 
Top