DeepStack Case Study: Performance from CPU to GPU version

icpilot · Oct 22, 2021

It was @sebastiantombs who first pointed me in the direction of the 1060 card as having a good balance between performance and efficiency. I found a 1060 w/3mb card new on Ebay for right at $300. More recently @IReallyLikePizza2 found a Tesla 4 for about that same price. Specs on the Tesla 4 look better than the 1060, but @IReallyLikePizza2 hasn't reported back with his experience to give it a thumbs-up (or down) quite yet. I think he is still waiting for it to arrive.

My sense is you will do better with any card having 1,000+ cores, but I've not tried any others than the 1060 personally. I am very pleased with the 1060 performance. FWIW

Pentagano · Oct 22, 2021

icpilot said:
It was @sebastiantombs who first pointed me in the direction of the 1060 card as having a good balance between performance and efficiency. I found a 1060 w/3mb card new on Ebay for right at $300. More recently @IReallyLikePizza2 found a Tesla 4 for about that same price. Specs on the Tesla 4 look better than the 1060, but @IReallyLikePizza2 hasn't reported back with his experience to give it a thumbs-up (or down) quite yet. I think he is still waiting for it to arrive.

My sense is you will do better with any card having 1,000+ cores, but I've not tried any others than the 1060 personally. I am very pleased with the 1060 performance. FWIW

I assume used for 300 (1060). 600 new on amazon

icpilot · Oct 22, 2021

Pentagano said:
I assume used for 300 (1060). 600 new on amazon

All the prices I found on Amazon were very high, so I went to Ebay. There I found a new 1060 card for $300. Price is now dropped to $260 ... found here --> New nVidia GeForce GTX 1060 3GB Gaming Graphics Video Card Desktop PC HDMI DVI | eBay.

Pentagano · Oct 22, 2021

icpilot said:
All the prices I found on Amazon were very high, so I went to Ebay. There I found a new 1060 card for $300. Price is now dropped to $260 ... found here --> New nVidia GeForce GTX 1060 3GB Gaming Graphics Video Card Desktop PC HDMI DVI | eBay.

Same card I believe. 2 fans - 3GB 1060
Seen ones with 1 fan also 1060 3GB. What is the difference?

MSI Computer GTX 1060 3GT OC NVIDIA GeForce 3GB GDDR5 DVI/HDMI/DisplayPort PCI-E | eBay

Find many great new & used options and get the best deals for MSI Computer GTX 1060 3GT OC NVIDIA GeForce 3GB GDDR5 DVI/HDMI/DisplayPort PCI-E at the best online prices at eBay! Free shipping for many products!

www.ebay.com

Have to keep an eye out on ebay

sebastiantombs · Oct 22, 2021

I always go for two or three fans for the added cooling. The fans draw very little power and make sure the card GPU stays as cool as possible. There are utilities that let you dial in the fan speed based on GPU temperature which is pretty handy.

Pentagano · Oct 23, 2021

Was testing my mx350 again this morning. Captured this humming bird. Quite impressed even with this low end gpu.

Pentagano · Oct 23, 2021

Noob question - do these gtx cards (1050/1060 e.g) need to be powered via an extra 8-pin PCIe power connector?

My dell tower 3420 has no extra power connectors.
My atx ryzen 5 does I believe with the 500w psu.

MikeLud1 · Oct 23, 2021

Pentagano said:
Noob question - do these gtx cards (1050/1060 e.g) need to be powered via an extra 8-pin PCIe power connector?

My dell tower 3420 has no extra power connectors.
My atx ryzen 5 does I believe with the 500w psu.

The PCIe can only deliver a max of 75 watts. The 1050 has a max wattage of 75 watts and the 1060 has a max wattage of 120 watts. So 1050 yes and 1060 no

Pentagano · Oct 23, 2021

MikeLud1 said:
The PCIe can only deliver a max of 75 watts. The 1050 has a max wattage of 75 watts and the 1060 has a max wattage of 120 watts. So 1050 yes and 1060 no

Thanks.
I see it explained here

GPU Power Connectors Explained [Simple Answer] - GPU Mag

All GPU power connectors can be difficult to understand. Here's a simple explanation of the 6-pin and 8-pin connectors and how they differ.

www.gpumag.com

I can only see a spare 4 pin connector off my 500w power supply.

May just limit my search for a 1050 for now then

IReallyLikePizza2 · Oct 23, 2021

It's worth noting that the Tesla P4 does NOT need extra power, despite being quite a powerful card

EDIT: I said P400 when I meant P4

Pentagano · Oct 23, 2021

IReallyLikePizza2 said:
It's worth noting that the Tesla P4 does NOT need extra power, despite being quite a powerful card

EDIT: I said P400 when I meant P4

Big difference in performance between the p400 and 1050 though. Looking at ebay prices for the 1050
Could be a good upgrade for my system

Fubduck · Oct 23, 2021

I'm running deepstack gpu on a 2nd PC that has an i7-860 @ 2.8 GHz and an EVGA GTX 960 FTW ACX 2.0+ 4 GB.

bethzur · Nov 12, 2021

I tried DeepStack with a i5 6500. 4 MP images were timing out after 15 seconds. I moved to the GPU version using an RTX 3060 which then was taking 250 ms. That's quite an improvement. I then moved to use the sub-stream from the camera and it's 25-50ms.

wittaj · Nov 12, 2021

Try running the dark set with the default object detection turned off.

Pentagano · Nov 13, 2021

BeanBagKing said:
Just wanted to give my own feedback on here. I was using CPU with an i7-10700 and 14 streams. I'm new to BI, so there's a lot of testing and optimization going on. I am using substreams, and 15 FPS. I have a variety of cameras, some of them are not idea for the purpose. I was using single-channel memory at the time. My point is that the numbers for CPU below may be crap because of my own settings/other hardware, or they may just be what they are.

While on CPU, I was getting (in the DeepStack analysis details) analysis times in the 1000 msec+ times, often much higher, 8,000 to 9,000 wasn't uncommon, and a significant percent were hitting 15,000 and timing out (Error 100). I analyzing images every 500ms for the duration of the motion event.

I purchased a PNY Quadro P400 v2, mostly because it was the cheapest card that did not require external power (referb Dell with no additional powersupply outputs).

The installation for DeepStack GPU went well and had one small hitch, essentially my steps were:
1) Install the regular Nvidia Quadro drivers. I wasn't sure if these were necessary or not, but figured it wouldn't hurt
2) Downloaded an installed CUDA 10.1 (per: Using DeepStack with Windows 10 (CPU and GPU) | DeepStack 2021.09.1 documentation)
3) Downloaded cuDNN, as others have said, you have to create a dev account. I clicked random checkboxes and made it through. Once downloaded, you drop the folders into the appropriate path: Installation Guide :: NVIDIA Deep Learning cuDNN Documentation
4) Installed the GPU version of DeepStack over the CPU version

Step 4 is where I hit a snag. For some reason I was getting Error 100 on every event, even those that were taking way less than 15,000 msec and not timing out. I uninstalled DeepStack, rebooted, reinstalled DeepStack GPU, rebooted, and everything was working.

I'm now getting sub 100 msec analysis times, breaking into the 100 msec maybe a quarter of the time. Not going over 200 msec that I've seen so far. I've increased the number of images to every 250ms for the duration of the event with the same results. GPU solved all the problems, the difference is night and day, 8,000 msec to like 80 if I was guessing averages.

I also have my alerts going to a small RAM drive. That didn't seem to help with CPU analysis times, not sure if it's helping with the GPU version or not, but it doesn't seem to be hurting anything either
(Edit: moved the ram disk back to the NVMe, it may have speed things up some 10s of msec, but not enough to notice or make a difference, so I'm erring on the side of simplification)

Lastly, I tried adding the ExDark (GitHub - OlafenwaMoses/DeepStack_ExDark: A DeepStack custom model for detecting common objects in dark/night images and videos.) dataset and turning on custom models again. There is some improvement over CPU, but I still went right back up to 8,000 - 9,000 msec analysis times with both turned on. I'm not sure why that data set absolutely seems to choke everything, but for now I'm leaving it off.

Surprised the 10th Gen Intel was so slow.
I'd like to know how the 12th Gen perform

wittaj · Nov 13, 2021

I encourage people to test all these settings.

Just like we say not to chase MP, same with this - in many instances Deepstack doesn't need mainstream images and high mode.

Obviously it is field of view dependent, but I tested it with mine and it makes no difference other than a lot longer time to make running high compared to low.

My system is just as accurate with substream images and low mode for DS.

As always, YMMV.

Pentagano · Nov 13, 2021

wittaj said:
I encourage people to test all these settings.

Just like we say not to chase MP, same with this - in many instances Deepstack doesn't need mainstream images and high mode.

Obviously it is field of view dependent, but I tested it with mine and it makes no difference other than a lot longer time to make running high compared to low.

My system is just as accurate with substream images and low mode for DS.

As always, YMMV.

I use mode high because one of my cameras is mounted high and with my own dark custom model I find it is more accurate. But I'm always tweaking. May try the low mode as I've never used that. My times are around 50-60ms on high so can't complain

wittaj · Nov 13, 2021

Yeah at 50-60ms then go with what works.

But 9,000ms on a 10th gen is crazy slow. My 4th gen testing on high mode was doing better than that.

CAL7 · Nov 17, 2021

I have an i5-6500 (early Intel QuickSync) running 8 cameras through BI+Deepstack. DS processing is 500ms to 2s, so I think I can benefit from even an affordable GPU boost. Would a sixth gen I5 see improvement from an added P400?

After reading this thread, I'm confused about whether my current CPU version of Deepstack is using the QuickSync capabilities of the I5; or does "Deepstack CPU" literally only mean the CPU cores of the I5?

And, a little OT, but I also use this BI machine as an occasional Plex server. AFAIK, Plex does use the QuickSync built-in GPU. Would the P400 outperform the I5 w/QS?

slabbel · Nov 18, 2021

Pentagano said:
Surprised the 10th Gen Intel was so slow.
I'd like to know how the 12th Gen perform

Averaging 200-250ms on a 11th Gen i7-11700

DeepStack Case Study: Performance from CPU to GPU version

Getting comfortable

Getting comfortable

Getting comfortable

Getting comfortable

Known around here

Getting comfortable

Getting comfortable

Getting comfortable

Known around here

Getting comfortable

Getting the hang of it

Getting the hang of it

Getting comfortable

Getting comfortable

Getting the hang of it

Getting the hang of it