DeepStack Case Study: Performance from CPU to GPU version

It was @sebastiantombs who first pointed me in the direction of the 1060 card as having a good balance between performance and efficiency. I found a 1060 w/3mb card new on Ebay for right at $300. More recently @IReallyLikePizza2 found a Tesla 4 for about that same price. Specs on the Tesla 4 look better than the 1060, but @IReallyLikePizza2 hasn't reported back with his experience to give it a thumbs-up (or down) quite yet. I think he is still waiting for it to arrive.

My sense is you will do better with any card having 1,000+ cores, but I've not tried any others than the 1060 personally. I am very pleased with the 1060 performance. FWIW
 
It was @sebastiantombs who first pointed me in the direction of the 1060 card as having a good balance between performance and efficiency. I found a 1060 w/3mb card new on Ebay for right at $300. More recently @IReallyLikePizza2 found a Tesla 4 for about that same price. Specs on the Tesla 4 look better than the 1060, but @IReallyLikePizza2 hasn't reported back with his experience to give it a thumbs-up (or down) quite yet. I think he is still waiting for it to arrive.

My sense is you will do better with any card having 1,000+ cores, but I've not tried any others than the 1060 personally. I am very pleased with the 1060 performance. FWIW

I assume used for 300 (1060). 600 new on amazon
 
All the prices I found on Amazon were very high, so I went to Ebay. There I found a new 1060 card for $300. Price is now dropped to $260 ... found here --> New nVidia GeForce GTX 1060 3GB Gaming Graphics Video Card Desktop PC HDMI DVI | eBay.

Same card I believe. 2 fans - 3GB 1060
Seen ones with 1 fan also 1060 3GB. What is the difference?
Have to keep an eye out on ebay
 
Last edited:
As an eBay Associate IPCamTalk earns from qualifying purchases.
  • Like
Reactions: icpilot
I always go for two or three fans for the added cooling. The fans draw very little power and make sure the card GPU stays as cool as possible. There are utilities that let you dial in the fan speed based on GPU temperature which is pretty handy.
 
Noob question - do these gtx cards (1050/1060 e.g) need to be powered via an extra 8-pin PCIe power connector?

My dell tower 3420 has no extra power connectors.
My atx ryzen 5 does I believe with the 500w psu.
 
Noob question - do these gtx cards (1050/1060 e.g) need to be powered via an extra 8-pin PCIe power connector?

My dell tower 3420 has no extra power connectors.
My atx ryzen 5 does I believe with the 500w psu.
The PCIe can only deliver a max of 75 watts. The 1050 has a max wattage of 75 watts and the 1060 has a max wattage of 120 watts. So 1050 yes and 1060 no
 
  • Like
Reactions: Pentagano
The PCIe can only deliver a max of 75 watts. The 1050 has a max wattage of 75 watts and the 1060 has a max wattage of 120 watts. So 1050 yes and 1060 no

Thanks.
I see it explained here

I can only see a spare 4 pin connector off my 500w power supply.

May just limit my search for a 1050 for now then
 
It's worth noting that the Tesla P4 does NOT need extra power, despite being quite a powerful card

EDIT: I said P400 when I meant P4
 
Last edited:
  • Like
Reactions: sebastiantombs
I tried DeepStack with a i5 6500. 4 MP images were timing out after 15 seconds. I moved to the GPU version using an RTX 3060 which then was taking 250 ms. That's quite an improvement. I then moved to use the sub-stream from the camera and it's 25-50ms.
 
Just wanted to give my own feedback on here. I was using CPU with an i7-10700 and 14 streams. I'm new to BI, so there's a lot of testing and optimization going on. I am using substreams, and 15 FPS. I have a variety of cameras, some of them are not idea for the purpose. I was using single-channel memory at the time. My point is that the numbers for CPU below may be crap because of my own settings/other hardware, or they may just be what they are.

While on CPU, I was getting (in the DeepStack analysis details) analysis times in the 1000 msec+ times, often much higher, 8,000 to 9,000 wasn't uncommon, and a significant percent were hitting 15,000 and timing out (Error 100). I analyzing images every 500ms for the duration of the motion event.

I purchased a PNY Quadro P400 v2, mostly because it was the cheapest card that did not require external power (referb Dell with no additional powersupply outputs).

The installation for DeepStack GPU went well and had one small hitch, essentially my steps were:
1) Install the regular Nvidia Quadro drivers. I wasn't sure if these were necessary or not, but figured it wouldn't hurt
2) Downloaded an installed CUDA 10.1 (per: Using DeepStack with Windows 10 (CPU and GPU) | DeepStack 2021.09.1 documentation)
3) Downloaded cuDNN, as others have said, you have to create a dev account. I clicked random checkboxes and made it through. Once downloaded, you drop the folders into the appropriate path: Installation Guide :: NVIDIA Deep Learning cuDNN Documentation
4) Installed the GPU version of DeepStack over the CPU version

Step 4 is where I hit a snag. For some reason I was getting Error 100 on every event, even those that were taking way less than 15,000 msec and not timing out. I uninstalled DeepStack, rebooted, reinstalled DeepStack GPU, rebooted, and everything was working.

I'm now getting sub 100 msec analysis times, breaking into the 100 msec maybe a quarter of the time. Not going over 200 msec that I've seen so far. I've increased the number of images to every 250ms for the duration of the event with the same results. GPU solved all the problems, the difference is night and day, 8,000 msec to like 80 if I was guessing averages.

I also have my alerts going to a small RAM drive. That didn't seem to help with CPU analysis times, not sure if it's helping with the GPU version or not, but it doesn't seem to be hurting anything either :p
(Edit: moved the ram disk back to the NVMe, it may have speed things up some 10s of msec, but not enough to notice or make a difference, so I'm erring on the side of simplification)

Lastly, I tried adding the ExDark (GitHub - OlafenwaMoses/DeepStack_ExDark: A DeepStack custom model for detecting common objects in dark/night images and videos.) dataset and turning on custom models again. There is some improvement over CPU, but I still went right back up to 8,000 - 9,000 msec analysis times with both turned on. I'm not sure why that data set absolutely seems to choke everything, but for now I'm leaving it off.
Surprised the 10th Gen Intel was so slow.
I'd like to know how the 12th Gen perform
 
I encourage people to test all these settings.

Just like we say not to chase MP, same with this - in many instances Deepstack doesn't need mainstream images and high mode.

Obviously it is field of view dependent, but I tested it with mine and it makes no difference other than a lot longer time to make running high compared to low.

My system is just as accurate with substream images and low mode for DS.

As always, YMMV.
 
  • Like
Reactions: sebastiantombs
I encourage people to test all these settings.

Just like we say not to chase MP, same with this - in many instances Deepstack doesn't need mainstream images and high mode.

Obviously it is field of view dependent, but I tested it with mine and it makes no difference other than a lot longer time to make running high compared to low.

My system is just as accurate with substream images and low mode for DS.

As always, YMMV.
I use mode high because one of my cameras is mounted high and with my own dark custom model I find it is more accurate. But I'm always tweaking. May try the low mode as I've never used that. My times are around 50-60ms on high so can't complain
 
  • Like
Reactions: sebastiantombs
Yeah at 50-60ms then go with what works.

But 9,000ms on a 10th gen is crazy slow. My 4th gen testing on high mode was doing better than that.
 
  • Like
Reactions: sebastiantombs
I have an i5-6500 (early Intel QuickSync) running 8 cameras through BI+Deepstack. DS processing is 500ms to 2s, so I think I can benefit from even an affordable GPU boost. Would a sixth gen I5 see improvement from an added P400?

After reading this thread, I'm confused about whether my current CPU version of Deepstack is using the QuickSync capabilities of the I5; or does "Deepstack CPU" literally only mean the CPU cores of the I5?

And, a little OT, but I also use this BI machine as an occasional Plex server. AFAIK, Plex does use the QuickSync built-in GPU. Would the P400 outperform the I5 w/QS?