4.7.8 - August 28, 2018 - Support for the Nvidia CUDA hardware decoding

fenderman

Staff member
Mar 9, 2014
36,892
21,407
This is huge for those folks that want to run large numbers high res cameras cameras.

4.7.8 - August 28, 2018


  • Support for the Nvidia CUDA hardware decoding drivers to offload video decoding to the GPU. Manage your hardware decoding on the Options/Cameras page as well as each individual camera properties Video tab. With a compatible video card you can use its GPU to decode H.264, H.265, MPEG4 and MJPEG. Use the Windows Task Manager to monitor CPU and GPU performance for BlueIris.exe.
 
Oooooh, I was just wondering what to do with the rest of the day!

Update 5: Here is a table of GPU decode compatibility showing which cards support which formats: NVDEC Support Matrix All of the 1xxx series cards support H.265, but if you "open the complete tables" you will see that most of the previous generations 7xx and 9xx series cards do not support H.265. There's also a table for NVENC on that page, and Blue Iris can use it since 4.7.9 to encode video.

Update 1:

Here are some preliminary test results.

This is an i7-8700K system with an Nvidia GT 1030 GPU.

HW Acceleration ModeBlue Iris CPU UsageBlue Iris Memory Usage
Intel14 %5646 MB
Nvidia CUDA12 %5837 MB

That doesn't look so bad until you discover that only 11 cameras were being hardware accelerated, while 10 others were broken. 2 of my cameras are configured to not use hardware acceleration at all.



Update 2:

I configured Performance Monitor to graph various CPU, memory, and GPU usages, and then began toggling the new Nvidia CUDA hardware acceleration to analyze the results.

Again, this is an i7-8700K with Nvidia GT 1030 GPU (the one with GDDR5 memory, not the gimpy one with DDR4).

bULHczZ.png


The red lines are performance counters from the Intel GPU. Blue and light blue lines are from the Nvidia GPU. Yellow is CPU usage, and the pink line along the top is Blue Iris's private working set (memory usage), scaled such that "50" on the Y-axis is 5 GB.

I began with all cameras configured to use Intel decoding except for the two 2MP cams tagged in the screenshot from Update 1, which had hardware acceleration explicitly disabled. I then proceeded to make changes:

Change 1) 8 MP @ 10 FPS Intel -> Nvidia CUDA
Change 2) 4 MP @ 15 FPS Intel -> Nvidia CUDA
Change 3) 4 MP @ 15 FPS Intel -> Nvidia CUDA
Change 4) 2 MP @ 15 FPS. Intel -> Nvidia CUDA
Change 5) 2 MP @ 15 FPS. Intel -> Nvidia CUDA
Change 6) 2 MP @ 15 FPS. No acceleration -> Nvidia CUDA
Change 7) 2 MP @ 10 FPS. No acceleration -> Nvidia CUDA
Change 8) Revert changes 6 and 7

Since then, the graphs have held steady.



Update 3:

I decided to try pushing the limits again, and continued switching cameras over from Intel to Nvidia.

After 11 cameras, new attempts to enable CUDA acceleration resulted in the message "Signal: HW accelerated decoder not found". There is not a hard limit on the number of cameras, though. I disabled the most expensive camera I currently had on CUDA (8MP @ 10 FPS) and was able to enable two more 2MP cameras and a 1MP for a total of 13 active cameras with CUDA acceleration. The 14th camera would then fail with the same error message as before.

This chart shows that happening with plenty of apparent headroom:



However at the same time, GPU-Z was reporting approximately 85-95% GPU usage, so I believe I simply hit the limit of what this card is able to process. Maybe a more powerful card could handle more. I'm already putting a 45-50% load on this card just rendering the Blue Iris GUI at 4K resolution.

Note: All tests I've done so far are using H.264 video.

Update 4:


Here are the performance counters returning to normal as I went through and changed all my hardware acceleration settings back to the way I normally run:

 
Last edited:
Thanks for posting this Brian. I was going to start my search for an Nvidia card based on the "What's New" pop up. Glad I checked here. Please continue to update this thread (or let me know if you create one dedicated to the acceleration) if you do any more experimentation. I'm juggling balls to keep my CPU below 90% most of the time given the number of cameras currently on my i7-4790. And I need to add more (who doesn't?)...
 
I've updated my post above several times, adding new information.

Some conclusions I am able to draw from this are:

1) Both types of hardware acceleration (Intel / Nvidia) reduce CPU usage by a similar amount.
2) The GT 1030 (2GB GDDR5) card could only handle about half of my cameras.
3) Nvidia CUDA acceleration raised memory usage more than Intel Quick Sync.

Maybe a faster graphics card would be able to handle more video before maxing out. However faster GPUs also consume a lot of power so it could end up costing more to run the GPU than to run without it. Modern GPUs are not especially cheap, either, so based on what I've seen today, it is not a good option for a low-budget Blue Iris build.
 
I can only seem to use the Nvidia acceleration for a total of 5 cameras at a time.

I'm currently using a potato: GTX 750Ti.

I have plenty of GPU power available (hovering at about 55%) but the Dedicated GPU memory is tapped (while shared memory is nearly unused).

Could available GPU memory be the limitation?

Edit: Just saw your conclusion and updates bp2008. Thank you for the thorough updates!
 
  • Like
Reactions: fenderman
I can only seem to use the Nvidia acceleration for a total of 5 cameras at a time.

I'm currently using a potato: GTX 750Ti.

I have plenty of GPU power available (hovering at about 55%) but the Dedicated GPU memory is tapped (while shared memory is nearly unused).

Could available GPU memory be the limitation?

Edit: Just saw your conclusion and updates bp2008. Thank you for the thorough updates!

Good call. That might be it. Here is my card at the point it refuses to open another stream. This card has 2048 MB. This time memory is the only thing close to the limit.

 
I did some tests with H.265 vs H.264 and wasn't able to see a CPU usage difference either way. In my opinion the image quality of H.265 was actually worse, given the same bit rate, so I'm going to stick with H.264.
 
  • Like
Reactions: fenderman
Hey folks...just installed this build to try it out on my system.

This is my Core i9-7980XE with 128GB ram and Nvidia Titan X (Pascal) which according to specs has 3584 Cuda cores and 12GB ram. My Blue Iris install has 35 cameras with ranging from some 1080p PTZs, all the way up to 4K. Half of my cameras are 4K with one of them being a 4K PTZ.

Before installing the new build, I was running at about 53% with the GPU load sitting around 15%.

I then installed the new build and only enabled Nvidia Cuda for 12 of my 4K cameras. I had to stop because the GPU load was hitting 100% at that point.

My CPU load went from 53% to 31% (on average), so a good solid 20% decrease. Pretty nice!!!

I'm excited by the potential for this, considering how much additional headroom this might afford me. I will also be super curious to test this again when my new RTX card arrives, as the 2080 Ti has 4352 Cuda cores. As soon as I've tried it out, I will post results.

Thanks to Ken for enabling this long-time wish list item...I know there are a BUNCH of people who are super happy tonight!
 
Hey folks...just installed this build to try it out on my system.

This is my Core i9-7980XE with 128GB ram and Nvidia Titan X (Pascal) which according to specs has 3584 Cuda cores and 12GB ram. My Blue Iris install has 35 cameras with ranging from some 1080p PTZs, all the way up to 4K. Half of my cameras are 4K with one of them being a 4K PTZ.

Before installing the new build, I was running at about 53% with the GPU load sitting around 15%.

I then installed the new build and only enabled Nvidia Cuda for 12 of my 4K cameras. I had to stop because the GPU load was hitting 100% at that point.

My CPU load went from 53% to 31% (on average), so a good solid 20% decrease. Pretty nice!!!

I'm excited by the potential for this, considering how much additional headroom this might afford me. I will also be super curious to test this again when my new RTX card arrives, as the 2080 Ti has 4352 Cuda cores. As soon as I've tried it out, I will post results.

Thanks to Ken for enabling this long-time wish list item...I know there are a BUNCH of people who are super happy tonight!
That is pretty sad performance for a $1000+ card....would be cheaper to use a second $400-$500 pc which would add redundancy and lower the cpu usage more than 20 percent...
 
Well this totally opens up AMD CPUs for Blue Iris. If one was so inclined...My QuickSync-based systems are still very solid, but it pained me to build them. Just sayin'.
 
Well this totally opens up AMD CPUs for Blue Iris. If one was so inclined...My QuickSync-based systems are still very solid, but it pained me to build them. Just sayin'.
actually it doesnt...it as you can see, the nvidia cards are not as powerful at decoding as we thought...it is way cheaper to buy an intel based system and will be cheaper to run it as well.
 
If one was so inclined, and already had the hardware on hand, it totally opens up the possibility of using an AMD CPU. I'm not syaing it's the route I went, and I'm not saying it will use less power (it won't), but it is now possible to use an AMD CPU and have hardware acceleration. That's true.
 
If one was so inclined, and already had the hardware on hand, it totally opens up the possibility of using an AMD CPU. I'm not syaing it's the route I went, and I'm not saying it will use less power (it won't), but it is now possible to use an AMD CPU and have hardware acceleration. That's true.
sure, and its possible to toss your money out the window...doesnt mean its wise...you can use amd without hardware acceleration...no one has tested whether there is actually a power benefit to using nvidia HA vs using it without on amd...unless you are it the cpu limits it would be pointless ....
 
Hey folks...just installed this build to try it out on my system.

This is my Core i9-7980XE with 128GB ram and Nvidia Titan X (Pascal) which according to specs has 3584 Cuda cores and 12GB ram. My Blue Iris install has 35 cameras with ranging from some 1080p PTZs, all the way up to 4K. Half of my cameras are 4K with one of them being a 4K PTZ.

Before installing the new build, I was running at about 53% with the GPU load sitting around 15%.

I then installed the new build and only enabled Nvidia Cuda for 12 of my 4K cameras. I had to stop because the GPU load was hitting 100% at that point.

My CPU load went from 53% to 31% (on average), so a good solid 20% decrease. Pretty nice!!!

I'm excited by the potential for this, considering how much additional headroom this might afford me. I will also be super curious to test this again when my new RTX card arrives, as the 2080 Ti has 4352 Cuda cores. As soon as I've tried it out, I will post results.

Thanks to Ken for enabling this long-time wish list item...I know there are a BUNCH of people who are super happy tonight!

I look forward to those results. What are the frame rates of those 12 4K cameras so we can calculate MP/s?

I wonder how much power the GPU actually draws when under a full decoding load. If it is anywhere near the 250W rating for a Titan X then it is a lot less efficient than software decoding on a CPU. But I guess that is the sacrifice you make to push the limits of what one PC can do.
 
  • Like
Reactions: fenderman
So over on the Plex forums I read that the NVIDIA hardware transcoding was limited to 2 streams, BUT on the Quadro P2000 cards it is an "unlimited" streams, and people have gotten GREAT results for the transcoding using those P2000 cards. Would the same limitations be on the NVIDIA cards for BI hardware decoding? I have a P2000 card on order for my Plex server so I can try it out over the weekend, but was curious if anyone had any thoughts on that?
 
Plex would be using both decoding and encoding functions, and will also handling a relatively small number of streams at a time. Blue Iris is using hardware acceleration for decoding only, and there is certainly no 2-stream limit. I was decoding 11-13 cameras of various resolutions and frame rates at the same time using a GT 1030, and Tuckerdude was using a Titan X to decode 12 4K cameras. Given this, I would not expect a Quadro to perform any better than a consumer card in Blue Iris.
 
  • Like
Reactions: fenderman