DeepStack Case Study: Performance from CPU to GPU version

New Experiment... 2 DeepStack instances opens twice as many python.exe processes as 1 instance (12 vs 6 !!)
Same method as post #157.

I'm wondering if someone else using a Quadro P400 V2 would be willing to try reproducing my observations?

Before restarting the BI service...
1639010397205.png

After restarting the BI service - note that the above PIDs are terminated...
1639010558442.png
 
  • Wow
Reactions: sebastiantombs
It has to be the way a Quadro handles applications versus how a gaming card does the same thing. I'm also surprised at the CPU utilization by the first few instances. 58% total is quite a bit, or does that tail off when things settle down after a reboot
 
  • Like
Reactions: tech101
From BI Support today...

My question:
I also have a question about multiple instances of deepstack. If you configure 2 instances under the main settings, does each camera utilize one or the other automatically or do you have to manually set each camera to use a different instance?

BI Support answer:

You will need to override the default address on the trigger/AI page for each camera that does not use the first port.
 
I'm still getting significantly higher times since switching to DS GPU. And DS keeps restarting. Clearly, I did something wrong.

I thought I'd give Cuda 10 another chance. During installation, it says Visual Studio is not installed but I followed the link and installed the community install option.

My next step is to uninstall everything and start over.

Is Cuda 10 prefered over 11?

Do I need to stop Blue Iris as a service while installing Cuda, cuDNN and Deepstack?

Can someone please post a link to the DS Form/Thread that covers installation details? The more I dive in the more questions I have.
 
The installation went much smoother the 2nd time. I installed Cuda 10 and visual studio 2015.
Installing cuDNN for windows also went smooth until this step:
  1. Include cudnn.lib in your Visual Studio project.
    1. Open the Visual Studio project and right-click on the project name.
    2. Click Linker > Input > Additional Dependencies.
    3. Add cudnn.lib and click OK.
I don't have any Visual Studio projects. Is that step necessary?

Lastly, after installing DeepStack for GPU on Windows, do I need to run any of the PowerShell commands they note on the installation guide? Such as:
deepstack --VISION-DETECTION True --PORT 80
 
  • Like
Reactions: sebastiantombs
You can skip that last step of Visual Studio projects.

No need for any PowerShell commands either. BI will start and stop DeepStack automagically for you, if you select that. Otherwise it's a point and click on the AI page of the BI configuration screen.
 
  • Like
Reactions: AP-123 and MikeLud1
Uninstalled Blue Iris CPU and followed the instructions installing the GPU version however deepstack just times out saying Alert Cancelled. Several restarts still not working. Any assistance please.
 
I recently did a switch from CPU to GPU. GPU can definitely help but depending on the CPU it might not be a massive decrease in processing time. Though I feel like GPU is more consistent/stable in how long it takes to analyze compared to CPU.

Setup:
CPU: Ryzen 7 1700x
GPU: Nvidia T400
Host OS: Proxmox 7.x
VM OS: Debian Bullseye (Diet-pi) Linux 5.10.0-11-amd64 #1 SMP Debian 5.10.92-1
VM CPU: EPYC 1 CPU 16 cores NUMA enabled
VM RAM: 9GB

Blueiris is running as another VM on this same host with the same CPU setup but 12GB of ram and on Windows Server 2019.

Deepstack CPU: 2022-01-01
Deepstack GPU: 2021-09-01

Docker Version: 20.10.12, build e91ed57
Nvidia Driver: 510.54
CUDA: 11.6
nvidia-container-cli: 1.8.1

The latest GPU version of deepstack does not work as of writing, it doesn't detect anything as described here - GPU 2022.01.1 - no objects detected

When I was running on CPU, using the latest version and the environmental variable THREADCOUNT=15 most analysis would take around 500 - 900ms, with some analysis taking up to 1200ms.
Without the threadcount flag or using a previous version analysis would be in the 800 - 1800ms range.

With the GPU setup I'm seeing analysis take 500 - 600ms, sometimes it goes into the 400ms, but never over 600ms.
I've also created two deepstack instances each set for 5 on THREADCOUNT, this didn't change performance but seemed like a better use of resources.
GPU usage is usually around 50% with some peaks to 100%.

I don't send additional images, and don't send the leading image.
Images are mostly 5MP with one camera doing 3MP. No difference in analysis time between cameras.

The GPU is not used by the windows VM, or the host.
 
I recently did a switch from CPU to GPU. GPU can definitely help but depending on the CPU it might not be a massive decrease in processing time. Though I feel like GPU is more consistent/stable in how long it takes to analyze compared to CPU.

Setup:
CPU: Ryzen 7 1700x
GPU: Nvidia T400
Host OS: Proxmox 7.x
VM OS: Debian Bullseye (Diet-pi) Linux 5.10.0-11-amd64 #1 SMP Debian 5.10.92-1
VM CPU: EPYC 1 CPU 16 cores NUMA enabled
VM RAM: 9GB

Blueiris is running as another VM on this same host with the same CPU setup but 12GB of ram and on Windows Server 2019.

Deepstack CPU: 2022-01-01
Deepstack GPU: 2021-09-01

Docker Version: 20.10.12, build e91ed57
Nvidia Driver: 510.54
CUDA: 11.6
nvidia-container-cli: 1.8.1

The latest GPU version of deepstack does not work as of writing, it doesn't detect anything as described here - GPU 2022.01.1 - no objects detected

When I was running on CPU, using the latest version and the environmental variable THREADCOUNT=15 most analysis would take around 500 - 900ms, with some analysis taking up to 1200ms.
Without the threadcount flag or using a previous version analysis would be in the 800 - 1800ms range.

With the GPU setup I'm seeing analysis take 500 - 600ms, sometimes it goes into the 400ms, but never over 600ms.
I've also created two deepstack instances each set for 5 on THREADCOUNT, this didn't change performance but seemed like a better use of resources.
GPU usage is usually around 50% with some peaks to 100%.

I don't send additional images, and don't send the leading image.
Images are mostly 5MP with one camera doing 3MP. No difference in analysis time between cameras.

The GPU is not used by the windows VM, or the host.

I was running deepstack CPU on a ryzen 3400g and getting speeds of about 4-500ms but consuming most of the cpu.
I bought a 2nd hand asus 970.
Now 45-55ms. Massive difference for a sub 200 USD card. Card barely even coughs at processing the images. Your T400 seems a bit slow tbh. Is is a cheap card? What resolution images are you processing?
 
  • Like
Reactions: gouthamravee
I was running deepstack CPU on a ryzen 3400g and getting speeds of about 4-500ms but consuming most of the cpu.
I bought a 2nd hand asus 970.
Now 45-55ms. Massive difference for a sub 200 USD card. Card barely even coughs at processing the images. Your T400 seems a bit slow tbh. Is is a cheap card? What resolution images are you processing?

What OS are you running on? Curious if there's a difference between the windows and docker versions.

Yes the T400 is the lowest end quadro card, its usually in the $80 - $100 range. I got it for ~$160.
This is similar to the P400 but has more memory bandwidth.

I'm sending Deepstack 5MP images, I don't know the exact resolution off the top of my head, but I can get it if you want.

Your 970 is definitely way more powerful.
Basic GPU comparison - T400 vs GeForce GTX 970 vs Quadro P400 [videocardbenchmark.net] by PassMark Software

Without the THREADCOUNT flag I wasn't seeing my CPU get utilized properly, but even with that set my CPU never pinged 100%. This is running the CPU version of deepstack.
I haven't been able to find proper documentation on this, but I feel like I should be running multiple instances of deepstack on the same machine to get proper resource usage.

So you got me curious, so I looked at what the 970's prices are on ebay, and boy that's not bad! I'm seeing ones for $180 even $100.
Only problem is my server cannot support a full height card :\
 
Last edited:
What OS are you running on? Curious if there's a difference between the windows and docker versions.

Yes the T400 is the lowest end quadro card, its usually in the $80 - $100 range. I got it for ~$160.
This is similar to the P400 but has more memory bandwidth.

I'm sending Deepstack 5MP images, I don't know the exact resolution off the top of my head, but I can get it if you want.

Your 970 is definitely way more powerful.
Basic GPU comparison - T400 vs GeForce GTX 970 vs Quadro P400 [videocardbenchmark.net] by PassMark Software

Without the THREADCOUNT flag I wasn't seeing my CPU get utilized properly, but even with that set my CPU never pinged 100%. This is running the CPU version of deepstack.
I haven't been able to find proper documentation on this, but I feel like I should be running multiple instances of deepstack on the same machine to get proper resource usage.

So you got me curious, so I looked at what the 970's prices are on ebay, and boy that's not bad! I'm seeing ones for $180 even $100.
Only problem is my server cannot support a full height card :\

Windows 11 now. I have a full atx sized box. Mind you it was still a tight squeeze! I added 3 more fans to keep it cool so it doesn't use the GPU fans. They come on at 60c but the GPU rarely exceeds 37c. idles at 33c with a good air flow.
I was lucky with the seller as he was local and we tested the card at my house to ensure it was all ok.
My ryzen CPU now idles at around 3-6% and hardly exceeds 10% during heavy GPU use.
If you can get a 970 2nd hand in good condition then performance per buck is excellent, especially considering the prices of new gpu's at the moment.

I can set the snapshots at 150ms intervals or less and it just flies through them!
 
Windows 11 now. I have a full atx sized box. Mind you it was still a tight squeeze! I added 3 more fans to keep it cool so it doesn't use the GPU fans. They come on at 60c but the GPU rarely exceeds 37c. idles at 33c with a good air flow.
I was lucky with the seller as he was local and we tested the card at my house to ensure it was all ok.
My ryzen CPU now idles at around 3-6% and hardly exceeds 10% during heavy GPU use.
If you can get a 970 2nd hand in good condition then performance per buck is excellent, especially considering the prices of new gpu's at the moment.

I can set the snapshots at 150ms intervals or less and it just flies through them!

Unfortunately my NVR is in a 2U server box, but I could fanagle a full sized card in there.
But this has got me re-thinking what to use here, I will definitely see if I can get a cheap 970.

Are you running the latest Nvidia drivers? I was afraid the 970 might be too old.

The T400 still has use to me as a card for transcoding media.
 
Unfortunately my NVR is in a 2U server box, but I could fanagle a full sized card in there.
But this has got me re-thinking what to use here, I will definitely see if I can get a cheap 970.

Are you running the latest Nvidia drivers? I was afraid the 970 might be too old.

The T400 still has use to me as a card for transcoding media.
I'm using Nvidia 30.0.14.9649 with my GTX970. (driver date: 20/10/2021)
 
  • Like
Reactions: gouthamravee
Guys, be aware that your 5MP images are significantly downsized by BI before the snapshot is sent to DS. Using "high resolution" doesn't do any good and can add to detection time as the image is downsized.
 
  • Like
Reactions: gouthamravee
Guys, be aware that your 5MP images are significantly downsized by BI before the snapshot is sent to DS. Using "high resolution" doesn't do any good and can add to detection time as the image is downsized.

Good point, I didn't realize that and can't find good info on it either. Still learning a lot about BI here.
 
  • Like
Reactions: Pentagano
That's not a BI thing, it's what DS needs to see. I think it downsizes to 720 or 1080, more than likely 720 level. Faulty, old, memory so I can't say for certain.