Just wanted to give my own feedback on here. I was using CPU with an i7-10700 and 14 streams. I'm new to BI, so there's a lot of testing and optimization going on. I am using substreams, and 15 FPS. I have a variety of cameras, some of them are not idea for the purpose. I was using single-channel memory at the time. My point is that the numbers for CPU below may be crap because of my own settings/other hardware, or they may just be what they are.
While on CPU, I was getting (in the DeepStack analysis details) analysis times in the 1000 msec+ times, often much higher, 8,000 to 9,000 wasn't uncommon, and a significant percent were hitting 15,000 and timing out (Error 100). I analyzing images every 500ms for the duration of the motion event.
I purchased a PNY Quadro P400 v2, mostly because it was the cheapest card that did not require external power (referb Dell with no additional powersupply outputs).
The installation for DeepStack GPU went well and had one small hitch, essentially my steps were:
1) Install the regular Nvidia Quadro drivers. I wasn't sure if these were necessary or not, but figured it wouldn't hurt
2) Downloaded an installed CUDA 10.1 (per:
Using DeepStack with Windows 10 (CPU and GPU) | DeepStack 2021.09.1 documentation)
3) Downloaded cuDNN, as others have said, you have to create a dev account. I clicked random checkboxes and made it through. Once downloaded, you drop the folders into the appropriate path:
Installation Guide :: NVIDIA Deep Learning cuDNN Documentation
4) Installed the GPU version of DeepStack over the CPU version
Step 4 is where I hit a snag. For some reason I was getting Error 100 on every event, even those that were taking way less than 15,000 msec and not timing out. I uninstalled DeepStack, rebooted, reinstalled DeepStack GPU, rebooted, and everything was working.
I'm now getting sub 100 msec analysis times, breaking into the 100 msec maybe a quarter of the time. Not going over 200 msec that I've seen so far. I've increased the number of images to every 250ms for the duration of the event with the same results. GPU solved all the problems, the difference is night and day, 8,000 msec to like 80 if I was guessing averages.
I also have my alerts going to a small RAM drive. That didn't seem to help with CPU analysis times, not sure if it's helping with the GPU version or not, but it doesn't seem to be hurting anything either
(Edit: moved the ram disk back to the NVMe, it may have speed things up some 10s of msec, but not enough to notice or make a difference, so I'm erring on the side of simplification)
Lastly, I tried adding the ExDark (
GitHub - OlafenwaMoses/DeepStack_ExDark: A DeepStack custom model for detecting common objects in dark/night images and videos.) dataset and turning on custom models again. There is some improvement over CPU, but I still went right back up to 8,000 - 9,000 msec analysis times with both turned on. I'm not sure why that data set absolutely seems to choke everything, but for now I'm leaving it off.