3990X & Blueirs thoughts ?

Update 2- Amazon, in their infinite wisdom, decided to ship the replacement mobo in only the mobo box, with nothing securing the flap. They literally just slapped a UPS label on the outside of the box with an untaped flap and sent it on its way. It is evident that the board fell out of the box by the internal packaging damage, and the board itself is clearly bent. It is well known that UPS isn't necessarily gentle with packages, but this is completely on Amazon, as they didn't exercise due care in making sure it was shipped safely. Third time is the charm?

20200604_195842.jpg
 
  • Like
Reactions: tech101 and bp2008
@mrc545 Yeah I don't normally look at motherboards from that angle but I would think the back plate should be parallel with the board.

The first one I got was completely parallel with the IO panel and with the mobo backplate. This one looks like a subtle taco from both sides. With the sheer amount of packages UPS is throwing in their trucks right now, I can only imagine what kind of hits this thing took, especially if it fell out of the box, which is easy to do, as the flaps don't lock into anything, and the mobo is just resting underneath of the top cover.
 
Small update-

-Got the new board in and got everything put together. Ran into some initial headaches with having to flash the BIOS to recognize my memory, and the primary BIOS being corrupted out of the gate. The feature where you can plug in a USB in the dedicated port in the back and push the button to flash while powered down was a lifesaver, since it wouldn't POST without recognizing the memory. The backup BIOS works fine, and I don't plan on updating it again if I don't have to, so I don't think I'll return this board. Not a great experience with Gigabyte boards on this chipset, that's for sure.

-The PCIe Gen 4 NVMe drive that I'm using as my boot drive is blazing fast. It benches at 5000 MB/s seq read, and 4200 MB/s seq write.

-Memory bandwidth is around 98 GB/s per Aida64. 64 GB quad channel DDR4-3600 CL16 via XMP.

-Current draw during CPU benchmarks/stress tests maxed out around 5.5 amps per my meter. The GPU's weren't being worked simultaneously with the CPU, so it could go higher under real-world usage.

-I played around with Precsion Boost Overdrive and Core Performance Boost, and decided to disable them until I see how actual performance with BI is. Having PBO and CPB enabled netted me a 13% increase in performance, but thermals reached well over 85C, with a max of 91C. With PBO and CBP disabled, I didn't see anything higher than 62C. The Noctua cooler does well enough.

I still have more OS and supporting application configuration to do, but once I get BI up and running on it, I'll chime back in.

**Edit: Using the Memory Bandwidth benchmark tool on Sandra gave me 78 GB/s aggregate.
 
Last edited:
  • Like
Reactions: mrc545
Nice ! I just got one part of my ram I went with the CORSAIR Vengeance RGB Pro 128GB (4 x 32GB) 288-Pin DDR4 SDRAM DDR4 3600 (PC4 28800) Desktop Memory Model CMW128GX4M4D3600C18 - Newegg.com

Got 128GB in hand wanna add another 128 gb bringing it to 256
View attachment 63474

View attachment 63475

Next my Cooling Loop and NVME drive ... Video card I will probably just get what ever for now until 3080s.. :D

Cant wait to start building mine.. :D

Very nice. I'm kind of doing the same with my memory, but on a smaller scale. I got the 4x16GB kit, which allows for quad channel, but still gives me the option of expansion in the future. Don't think I'll ever need that much memory though. Depends on what else I'm using the PC for aside from BI.

What NVMe drive are you looking at? I ended up going with the Sabrent Rocket, as it seemed like a great bang for the buck. The speeds definitely don't lie.
 
  • Love
Reactions: tech101
Nice ! 4 x 16 gb is more than enough for anything I think. I am just doing it for overkill at this point lol Not that i am gonna be able to push I think to any where close to 256gb.. Will see. Anyhow I am designing the system to last me for a long time..

For the NVME I have not even started looking into it which one is good sounds like you have done some research in that area.. So I might even consider the Sabrent Rocket drive.. if it has great speeds :D Thank you ! My mobo is TRX40 AORUS XTREME by Gigabyte
 
So far in my testing, you will hit the built-in software decoder limit between 1400 and 2000 MP/s on AMD (even with high-core, high memory channel configs) long before you hit other limits like memory bandwidth. That "wall" may be higher if you have faster cores (Ryzen 9-3950X > Ryzen 9-3960X > EPYC 7302P) and faster RAM. While I need to test that theory, @bp2008 hitting a wall closer to 1900 MP/s using his 3950x with only 2 fast memory channels while I'm stuck at 1450 MP/s on my 7302P with octa-channel memory could be an indication memory bandwidth is not the problem. EDIT 06/13/2020: Turns out my "wall" was just a BIOS setting I had setup wrong that subdivided the CPU and RAM into distinct nodes, so I am back in business!

My EPYC 16-core starts climbing rapidly and inexplicably after hitting this "wall" and MP/s starts actually FALLING as you add more video streams. I believe in effect on the Intel systems, you are just "offloading" some of the decoding work onto a graphics processor that happens to be built-in to the CPU, I don't see any stated driver limits so probably just limited by processor performance. With these high-end AMD builds that offload won't be an option without a graphics card (more power, cost and heat).

I sent an email to Blue Iris support and got this back:
Yes it will be challenging to use so many high-MP cameras without hardware decoding, but I can see in your image you are not using dual-streaming. This was added in version 5.2.7 and will greatly reduce the demand on your CPU, please see this in the CPU Management topic in Help.
Thanks
Ken

Nothing unexpected, the recommended approach is the new sub-stream feature that has been recently released.

Before I move to substreams, which is obviously the solution everyone should be using once the bugs are worked out, I will be testing more extreme scenarios:
  1. testing with Quadro FX 580 (I happen to have) - as a basic test whether headroom changes with a 2009 GPU that does not support NVENC natively.
  2. testing with a GTX970 I already have - as an example of a Maxwell Gen2 mid-field graphics card, how much headroom will the limited 3 NVENC threads add to system headroom. The 3 NVENC thread limit is imposed on all consumer graphics cards, even if you put 2x or 3x of them in your system it appears. (see: Video Encode and Decode GPU Support Matrix )
  3. testing with a Quadro M4000 8GB - as an example of a Maxwell Gen2 mid-field creator card, how much headroom will the unlimited NVENC threads (limited only by memory) add to system headroom. Theoretically an 8GB card might handle 50 streams give or take, but going to test.
I believe I have discovered (doing a ton of research into the problem), that the issue we might be facing is that consumer cards (even the really awesome ones like 2080TI), are limited in the number of NVENC threads available within the NVIDIA DRIVER. Once those threads are fully loaded, I suspect you will hit another wall where adding another camera to the GPU just sends the GPU and CPU soaring as they try to stuff more video frames into the limited pipe. This might also explain why an old Xeon system with 3x P1000 Quadro cards is listed in Blue Iris stats with over 5000 MP/s but even the low-end Quadro cards have the 3-thread limit imposed by the driver, so possibly that system is an example of 9 NVENC threads on Pascal.

Best guess, Nvidia does this to provide product differentiation between the PRO cards (used by creators, CAD etc) and CONSUMER cards (used to game), and charge a premium for cards to do more in those spaces, even though it's primarily limited at the driver level.
 
Last edited:
  • Love
  • Like
Reactions: mrc545 and tech101
So far in my testing, you will hit the built-in software decoder limit between 1400 and 2000 MP/s on AMD (even with high-core, high memory channel configs) long before you hit other limits like memory bandwidth. That "wall" may be higher if you have faster cores (Ryzen 9-3950X > Ryzen 9-3960X > EPYC 7302P) and faster RAM. While I need to test that theory, @bp2008 hitting a wall closer to 1900 MP/s using his 3950x with only 2 fast memory channels while I'm stuck at 1450 MP/s on my 7302P with octa-channel memory could be an indication memory bandwidth is not the problem.

My EPYC 16-core starts climbing rapidly and inexplicably after hitting this "wall" and MP/s starts actually FALLING as you add more video streams. I believe in effect on the Intel systems, you are just "offloading" some of the decoding work onto a graphics processor that happens to be built-in to the CPU, I don't see any stated driver limits so probably just limited by processor performance. With these high-end AMD builds that offload won't be an option without a graphics card (more power, cost and heat).

I sent an email to Blue Iris support and got this back:


Nothing unexpected, the recommended approach is the new sub-stream feature that has been recently released.

Before I move to substreams, which is obviously the solution everyone should be using once the bugs are worked out, I will be testing more extreme scenarios:
  1. testing with Quadro FX 580 (I happen to have) - as a basic test whether headroom changes with a 2009 GPU that does not support NVENC natively.
  2. testing with a GTX970 I already have - as an example of a Maxwell Gen2 mid-field graphics card, how much headroom will the limited 3 NVENC threads add to system headroom. The 3 NVENC thread limit is imposed on all consumer graphics cards, even if you put 2x or 3x of them in your system it appears. (see: Video Encode and Decode GPU Support Matrix )
  3. testing with a Quadro M4000 8GB - as an example of a Maxwell Gen2 mid-field creator card, how much headroom will the unlimited NVENC threads (limited only by memory) add to system headroom. Theoretically an 8GB card might handle 50 streams give or take, but going to test.
I believe I have discovered (doing a ton of research into the problem), that the issue we might be facing is that consumer cards (even the really awesome ones like 2080TI), are limited in the number of NVENC threads available within the NVIDIA DRIVER. Once those threads are fully loaded, I suspect you will hit another wall where adding another camera to the GPU just sends the GPU and CPU soaring as they try to stuff more video frames into the limited pipe. This might also explain why an old Xeon system with 3x P1000 Quadro cards is listed in Blue Iris stats with over 5000 MP/s but even the low-end Quadro cards have the 3-thread limit imposed by the driver, so possibly that system is an example of 9 threads of capacity is.

Best guess, Nvidia does this to provide product differentiation between the PRO cards (used by creators, CAD etc) and CONSUMER cards (used to game), and charge a premium for cards to do more in those spaces, even though it's primarily limited at the driver level.

@crw030 - Thank you for the great write-up! The sub-stream feature will definitely free up a lot of CPU overhead to run other stuff on this build. In my case, the extra horsepower isn't all for naught, as I need it for the overhead that RDP creates. My 9900k system actually did pretty well when I was logged in locally, but with the persistent RDP session, CPU usage about doubled.

I think I've run into something similar to the GPU issue you mentioned before on my previous build, but the behavior was different. The GPU would just "give up" on decoding during high load and drop down to less than 10% utilization, which would send the CPU usage skyrocketing, and the BI Tools watchdog would kick in and restart BI. I have 2x RTX 2060's that I'm going to be using for this setup, so we'll see how it goes.
 
I have 2x RTX 2060's that I'm going to be using

A good test you could perform would be how many camera MP/s a single 2060 can offload using NvidiaCuda acceleration as compared to two of them. Because if performance has nothing to do with NVENC and just the amount of CUDA cores (and speed of those cores) you can throw at the problem, then a (2 x 2060) system would perform approximately the same as a 2080TI at half the price.
 
Nice ! 4 x 16 gb is more than enough for anything I think. I am just doing it for overkill at this point lol Not that i am gonna be able to push I think to any where close to 256gb.. Will see. Anyhow I am designing the system to last me for a long time..

For the NVME I have not even started looking into it which one is good sounds like you have done some research in that area.. So I might even consider the Sabrent Rocket drive.. if it has great speeds :D Thank you ! My mobo is TRX40 AORUS XTREME by Gigabyte

Below is the last benchmark I did for the Sabrent Rocket 2TB. She's pretty quick.


20200609_132322.jpg
 
A good test you could perform would be how many camera MP/s a single 2060 can offload using NvidiaCuda acceleration as compared to two of them. Because if performance has nothing to do with NVENC and just the amount of CUDA cores (and speed of those cores) you can throw at the problem, then a (2 x 2060) system would perform approximately the same as a 2080TI at half the price.

I noticed in the past when using the single RTX 2060, the CUDA graph on the task manager view would only be around 30-40% when the Video Decode graph would be maxed out. How do those play with each other, as far as maxing out the CUDA acceleration goes? I'm admittedly not very sharp on the technical workings of GPU's. But that's why there's smarter people on here than me :D
 
Very nice. I'm kind of doing the same with my memory, but on a smaller scale. I got the 4x16GB kit, which allows for quad channel, but still gives me the option of expansion in the future. Don't think I'll ever need that much memory though. Depends on what else I'm using the PC for aside from BI.

Lots of memory can be hard to use. I built this VM server for the small business I work for, and it is currently running 4 Windows VMs and 8 Linux and even that isn't quite using 64 GB.

1591728453989.png
 
  • Love
  • Like
Reactions: mrc545 and tech101
@crw030
Your results so far do sound very perplexing.


I think you mean NVDEC??

This might also explain why an old Xeon system with 3x P1000 Quadro cards is listed in Blue Iris stats with over 5000 MP/s but even the low-end Quadro cards have the 3-thread limit imposed by the driver, so possibly that system is an example of 9 NVENC threads on Pascal.

I'd say it is more likely that system just has a lot of cloned cameras. The helper has never been able to tell the difference between clones and discrete cameras.
 
  • Like
Reactions: crw030
Lots of memory can be hard to use. I built this VM server for the small business I work for, and it is currently running 4 Windows VMs and 8 Linux and even that isn't quite using 64 GB.

View attachment 63491
This is why I'm such a huge proponent of virtualization. Imagine in the old days you would have to have 12 physical boxes :). Think of all the costs savings in terms of power and space and cooling and our carbon footprint.
 
This is my home cluster. I have about 25 VMs running. Some are for home related things (pfSense, Plex, BI, etc). Most are for testing and learning new things. There are 5 servers total. This screenshot is when all five are on. I usually have 3 running 24/7 (my low power servers because I have to pay the electric bill). The 4th one is an enterprise server which is a power hog and I only enable it when I have to do major testing. The 5th one is my gaming desktop which has dual 2080 supers (also a power hog because of the GPUs which consume about 200W each - yes, I verified with a Kill-A-Watt meter. The server (Intel 9900X) takes about 100W with no games running. When you start a game with one GPU inside, it jumps to 300W. With both GPUs in, it jumps to 500W).

I would not want to run BI on my 5th box because running 24/7 with the GPUs is going it going to cost a lot. I'm not saying that BI is going to kick in both GPUs, but if it did, it would draw 500W 24/7. At $0.12 per kWh, that will cost you over $500/year just for power. You guys are talking about dual 2060 which will also crank up the watts.

With substreams, you may not need that much horsepower for BI. My current utilization is at 2-3% for 4 cameras with the substreams.

Screen Shot 2020-06-10 at 10.35.55 PM.png
 
Last edited:
  • Like
Reactions: mrc545
In the old days I just would have run all the services on one Windows OS :)
Sadly, some people don't think that way. They just seem to have dedicated boxes for different functions. I have one customer who is against vitualization. So she sets up individual machines. This is the government and our tax dollars hard at work.
 
  • Like
Reactions: tech101 and bp2008
VMs Are great, We have lot of company we support which have One Host and lot of VMs..spinning off it. Only downside for VM will be if the host goes down so does all the VMs.. But again Host has lot of redundancy like raid. dual power supply UPS and so on..
 
VMs Are great, We have lot of company we support which have One Host and lot of VMs..spinning off it. Only downside for VM will be if the host goes down so does all the VMs.. But again Host has lot of redundancy like raid. dual power supply UPS and so on..
If it's architected it right, you get a lot more resiliency in a virtual environment. Things like VM fault tolerance or VM HA. There doesn't have to be a single point of failure.
 
  • Like
Reactions: tech101