Optimisation for small ish deployment

RL2018

n3wb
Joined
Sep 24, 2018
Messages
5
Reaction score
0
Location
Australia
Hi All,

Have a number of BI installs out thre. Most are 5-10 cameras and work well. Have just rolled out a larger but still small install. 20x 4MP cameras @ 25 FPS with audio. Constant recording direct to disk. No motion detection required. The NVR is virtualised but some context on deployment.
16 Port 10GB fibre core switch
2x 10GB as 20GB LACP connection to physical VMWare server
2x 10GB as 20GB LACP connection to 48x port POE switch
2x 10GB as 20GB LACP to NAS.

NAS is comprised of 8x10TB hard drives.
Server is 2x 2650v4 12 Core 24 Thread (24 Core 48 Thread total) with 256GB RAM
BI VM was 8x Core and 8GB RAM

When we got to 15x cameras, CPU was maxxed at 100%. We increased to 16x core and 12GB RAM, but it was still bouncing at 95-100% constantly.

I dropped to 15 FPS across all 15x cameras and am now sitting at 35% CPU and 3.7GB RAM.

I have followed optimisations here Optimizing Blue Iris's CPU Usage | IP Cam Talk

Am I expecting too much hoping for 25 FPS on all cameras? Or have I missed something?
 

bp2008

Staff member
Joined
Mar 10, 2014
Messages
12,690
Reaction score
14,061
Location
USA
20x 4MP @ 25 FPS is 2000 megapixels per second (MP/s). For comparison, the fastest i7 CPU currently available (i7-8700K) can only handle about 1500 MP/s and that is with the advantage of Quick Sync hardware acceleration. To go beyond the capabilities of i7-8700K, you need to throw more cores at the problem. You have more cores, yes, but each is about half the speed of an i7-8700K core. Your current server (all cores) has roughly double the raw CPU power of an i7-8700K so I'm guessing it would take nearly all the CPU your server has got to handle that load in Blue Iris.

Reducing the frame rates to 15 FPS is cutting 40% off of the load (making it 1200 MP/s). I can't explain why you saw a much larger reduction in CPU load than 40%. It could be that you made other changes or it could be something more difficult to measure and fix, like poor optimization for working on dual socket servers.

If you don't care about heat and energy costs, you could try throwing in an Nvidia GPU for hardware acceleration. Nvidia is by far the least efficient decoding option, but it does successfully offload work from the CPU. Or build another machine for Blue Iris based on AMD Ryzen Threadripper 2950X or an Intel equivalent. Such a system should be able to run 2000 MP/s with room to spare.
 

RL2018

n3wb
Joined
Sep 24, 2018
Messages
5
Reaction score
0
Location
Australia
Thank you for the insight. What is the formula to calculate MP/s? And is there a list somewhere of how many MP/s can each CPU handle? What sort of nVidia GPU would you recommend to handle the load comfortably?
 

bp2008

Staff member
Joined
Mar 10, 2014
Messages
12,690
Reaction score
14,061
Location
USA
To calculate MP/s for one camera, just multiply the resolution by the frame rate. e.g. 4 MP * 25 FPS = 100 MP/s. If you have 20 just like it, then you have 20 x 100 = 2000. Sadly there is no list of how much each CPU can handle, because there are too many other variables and nobody wants to fund a series of scientific benchmarks on all kinds of different hardware and configurations. I just happen to know someone who tried testing the limit of his i7-8700K and found that around 1500 MP/s the CPU usage was at 50% and GPU video decode usage was about 100%. Increasing the load further caused CPU usage to skyrocket to 100% really fast (this tends to happen when using hyper-threading -- CPU usage rises faster once you are beyond 50%). Since he was using hardware acceleration we can assume all the streams were H.264. But it could also matter what resolution the streams are, what bit rates were used, or even what camera did the encoding because there are encoding tweaks that affect the difficulty of decoding the video. And there are a ton of other Blue Iris features consuming CPU time. Motion detection. Recording. Adding of timestamp overlays. Rendering video to the screen. Encoding video for recording or remote viewing.

As for a GPU recommendation, I don't think any single card available today would handle hardware accelerated decoding for 2000 MP/s H.264. The same guy who shared his i7-8700K results also tested a Titan X and you can find the results here: 4.7.8 - August 28, 2018 - Support for the Nvidia CUDA hardware decoding but it looked like it was maxing out at accelerating around 1500 MP/s worth of video, making it similar to Quick Sync except where Quick Sync actually reduces energy consumption a small amount, Nvidia CUDA raises energy consumption by a lot. I don't know how well the acceleration scales with cheaper GPUs, though I can say my GT 1030 was maxing out in the neighborhood of 300-400 MP/s. That would hardly put a dent in your CPU load.
 

RL2018

n3wb
Joined
Sep 24, 2018
Messages
5
Reaction score
0
Location
Australia
That's excellent. Thank you very much. One final question for future ddeployments. Does BlueIris favour more cores or higher clock speed?
 

bp2008

Staff member
Joined
Mar 10, 2014
Messages
12,690
Reaction score
14,061
Location
USA
Neither/both. A particularly heavy stream (think 4K@30 FPS H.265 un-accelerated) wouldn't work well on particularly slow cores. But for the most part it won't make a difference.
 

RL2018

n3wb
Joined
Sep 24, 2018
Messages
5
Reaction score
0
Location
Australia
Interesting development. 15x cameras 4MP @ 15 FPS. 37% CPU usage.
16x cameras = 40%
17x cameras = 44%
18x cameras = 48%
19x cameras = 58%
20x cameras = 95%
Any obvious thoughts on the progression here? I have double and triple checked cameras and all are configured identically.
 

bp2008

Staff member
Joined
Mar 10, 2014
Messages
12,690
Reaction score
14,061
Location
USA
That does seem like an overly steep upward curve at 50%. My only guess is that the hypervisor schedules the first 50% load on the first CPU, and the second 50% load on the second CPU, causing there to be non-uniform memory access (where a CPU accesses memory belonging to the other) which can be much slower than normal memory access (this much slower? I have no idea).

Anyway if that theory is correct you should be able to improve the situation somewhat by forcing the VM to exist only on one of the two CPUs. VMware NUMA affinity and hyperthreading Assuming you are still working with a 16 core VM, hopefully you would not see the sharp upward increase in CPU cost until 75%, because 12 out of 16 of your VM's cores could be serviced by real physical cores.
 
Joined
Apr 26, 2016
Messages
1,090
Reaction score
852
Location
Colorado
@bp2008 for what it is worth, I've heard hyperthreading can see this type of dramatic increase over 50% because its not the same as twice as many physical cores. Here's a semi-technical description from Linus (of Linus Tech Tips)
Essentially 4 single-threaded cores will give greater top end performance than 2 hyper-threaded two-thread cores, unless the workload benefits from having the scheduled work from "two hands" feeding the processor work. You can see this also in benchmarks where they do something simple like zipping a file and lining up the data for this operation can firehose it straight to the processor (so threading yields no tangible benefit). Question is, for Blue Iris, does it actually benefit from hyper-threading, and to what point.
@RL2018 can you test CPU usage at 18x - 20x cameras WITH HYPERTHREADING TURNED OFF?

The second question is, if the CUDA implementation isn't quite as efficient, are you just trading daily cost-to-run a CPU+GPU solution for the alternative (a significantly higher cost CPU with gobs more cores and also more power consumption?) and generating a giant space heater in the process.
 

RL2018

n3wb
Joined
Sep 24, 2018
Messages
5
Reaction score
0
Location
Australia
I cannot test with hyperthreading off unfortunately as this is a production environment and I am away on leave at the moment. I don't want to go changing too much and break an environment.
Daily cost does not matter in the grand scheme of things in this particular environment. I'll look into adding a graphics card and pass through to the VM.
But I'll try NUMA Affinity first when I get back and report back next week.
 
Top