CodeProject.AI Version 2.0

FWIW, the Dual TPU shouldn't use 5w unless its going all out. Do you have a heatsink on it? I ended up going through a number of different iterations before finding something that would keep it under 85C when under load. Above that temperature, it throttles itself.
See page 5 for power usage, thermal management starts on page 7.

You can monitor TPU temperatures here:

I'm not currently processing through the TPU. I haven't been able to make it work reliably. Times were around 250ms.

I will try again but everytime I install a bad version of it or BI with upgrades and something goes wrong, I end up having to re-install Windows as no amount of other measures will revert the pc back to a working version of BI / CPAI. I've had to do it several times now and that includes trying the installation of new versions / old versions, uninstall via Apps Menu, Windows Roll Back, DISM commands, etc. I wish CPAI produced a clean installation tool that would allow the complete removal and re-instaltion of CPAI as a new installtion with no remaining registry or other changes to help avoid this. Until then, I'm waiting for some spare cash to buy a cloning software so I can clone my pc before attemtping beta upgrades as having to reinstall WIndows is a pain in itself but far worse when you then have to reinstall BI and go through pages of camera disovery, re-conifguration etc.
 
Yeah, I hear that. For me, rewriting it to be multithreaded is easy compared to getting it running reliably and connected to the rest of the CPAI system. That there is the hard part…
 
Running CPAI on Windows with the Coral module gives me this error code when the module is starting. After this code i just get the "interpreter is busy" error. Cant figure out what the error code means. Both TPUs are showing up fine in the device manager.

Has anyone else had these error messages?

E driver/mmio_driver.cc:254] HIB Error. hib_error_status = 0000000000000010, hib_first_error_status = 0000000000000010
E driver/mmio_driver.cc:254] HIB Error. hib_error_status = 0000000000000010, hib_first_error_status = 0000000000000010
 
I am running 2.5 RC 9 on my Windows box now. I have a Dual TPU and a relative old Xeon that has Intel decoding....

The Coral object detection is pretty good. I am sending a picture from the Explorer and the inference time is 60ms (processing 161.. analysis 406)...

By comparison, on my other E3 Xeon on Linux/Docker, it is taken 441ms (processing 442 round trip 595).

Couple of odd things....

1. Enabling Multi TPU SLOWS down my processing time. At least doing a single image from the explorer.... that 60ms inference time above goes to maybe 80ms.

2. License plate detection is MUCH faster on the Docker image under Linux compared to the same Windows machine.. Like it take around 150 ms on Linux and 2 seconds on Windows. However, my CP.AI windows log shows me getting 'phantom' license place reads that are on the same order. See the first entry versus the 4th one below. That first one shows "no plates found" so that probably explains the difference, but where this is a plate- it is about 10 times slower than CP.AI under Linux.

I would love to try this under Docker for Windows but I would have to run a shitload of Windows updates.

I am also thinking of building another PC. I thinking maybe I spend the money on the Linux box and throw the TPU in there and maybe get a NVIDIA card and try to drive down processing time, but I have to imagine doing to processing on the same machine would be slightly faster.

09:09:07:License Plate Reader: Rec'd request for License Plate Reader command 'alpr' (...9076bd) took 134ms
09:09:40:Object Detection (Coral): Rec'd request for Object Detection (Coral) command 'detect' (...9b1b45) took 290ms
09:09:44:Object Detection (Coral): Rec'd request for Object Detection (Coral) command 'custom' (...fc0fbe) took 82ms
09:09:45:License Plate Reader: Rec'd request for License Plate Reader command 'alpr' (...c9a6c7) took 1782ms
09:10:27:Object Detection (Coral): Rec'd request for Object Detection (Coral) command 'detect' (...459311) took 110ms
09:10:27:Object Detection (Coral): Rec'd request for Object Detection (Coral) command 'custom' (...9b3f46) took 114ms
09:10:27:License Plate Reader: Rec'd request for License Plate Reader command 'alpr' (...2891dc) took 168ms
09:11:12:Object Detection (Coral): Rec'd request for Object Detection (Coral) command 'detect' (...5429bf) took 314ms
09:11:15:Object Detection (Coral): Rec'd request for Object Detection (Coral) command 'custom' (...08c586) took 103ms
09:11:18:License Plate Reader: Rec'd request for License Plate Reader command 'alpr' (...53394e) took 3200ms
09:11:28:Object Detection (Coral): Rec'd request for Object Detection (Coral) command 'custom' (...05d0fe) took 82ms
09:11:31:License Plate Reader: Rec'd request for License Plate Reader command 'alpr' (...6126a4) took 2668ms
 
One thing to note is that multi-TPU isn’t going to make your individual processing time faster, but it should improve your throughput. When before you could only push 5 FPS through Coral, now you can do 10 FPS through two. Also, in general Coral works better if you’re only running one model on it. Each time you load a new model there is ~15 ms of overhead.
 
  • Like
Reactions: wpiman
One other thing to note is that the original code scales the camera image to fit on the 300x300px input tensor. This results in letterboxing of the image and just under half of the input pixels are unused. Instead of that, the multi-TPU code does tiling of the input image. By default it will split your 4k input image into two square 300x300px images, run both of them, and assemble the results. This will result in all input pixels being used in the tensor, but twice as much work for the TPUs.

Another thing of note is that simply the rescaling of the image is one of the more expensive things that CPAI does over all of its modules. Converting a 4k image to any input tensor costs 15 ms of CPU time, regardless of the module used. If someone wants to support the pillow-simd module, it’s AVX2 enabled and will drop that overhead by 6x closer to 3 ms.
 
One other thing to note is that the original code scales the camera image to fit on the 300x300px input tensor. This results in letterboxing of the image and just under half of the input pixels are unused. Instead of that, the multi-TPU code does tiling of the input image. By default it will split your 4k input image into two square 300x300px images, run both of them, and assemble the results. This will result in all input pixels being used in the tensor, but twice as much work for the TPUs.

Another thing of note is that simply the rescaling of the image is one of the more expensive things that CPAI does over all of its modules. Converting a 4k image to any input tensor costs 15 ms of CPU time, regardless of the module used. If someone wants to support the pillow-simd module, it’s AVX2 enabled and will drop that overhead by 6x closer to 3 ms.

So is there an optimal size image to send to Codeproject.AI from Blue Iris for those of us with TPUs? Maybe one or two/

Thanks for the explanation on the multiple TPUs. Makes sense.
 
So is there an optimal size image to send to Codeproject.AI from Blue Iris for those of us with TPUs? Maybe one or two/

Thanks for the explanation on the multiple TPUs. Makes sense.
I wouldn't worry about image sizing. Each model (and model size) will have different input tensor dimensions. In terms of the TPUs' ability, it's internal RAM can't handle tensors larger than around 500 px and things seem to start falling apart in the compiler. But otherwise input tensors tend to be sized anywhere from 300 px to 640 px on a side, depending on the model and model size.
 
So is there an optimal size image to send to Codeproject.AI from Blue Iris for those of us with TPUs? Maybe one or two/

Thanks for the explanation on the multiple TPUs. Makes sense.
Choosing what’s optimal for your computer is really a function of three things:
  • inference frame rate desired
  • complexity of model selected
  • acceptable cpu usage

Simple models that fit entirely on the TPU will be limited by CPU used to resize images or the frame rate across all cameras. Time per inference is less than 4 ms for the MobileNet models, but the accuracy is low. A complex model like YOLOv5l will offload many operations to CPU due to them being unable to process or simply fit on the TPU. The inference frame rate will be closer to 10 FPS total at best (45 ms on my computer, not counting resizing or tiling).

I haven’t experimented with restricting numbers of TPUs on my machine, but my hunch is that there isn’t a huge benefit above 2-3. (I’ve been running 6-8 TPUs for my testing lately and disabling CPU cores. It’s rare that the TPU temperatures get very high these days, but maybe my heat sink design has improved.)

The camera frame size isn’t going to be a major factor here for CPAI. Maybe it’s a bigger factor for BI to decompress?
 
Last edited:
  • Like
Reactions: wpiman and actran
Spec is I5-11400, 16GB DDR4, Single 4TB WD Surveillance Drive, CPU Cooler, 2 fans with PWM that sit mostly idle. Coral Dual TPU fitted in M.2 (5w) but not currently in use.

This was he build thread in case I missed anything out:


Maybe it's just Intel 11 gen as later processors are supposed to be power hungry, or maybe is something a miss:

Here's a screenshot, No UI just BI running in the background, no AV or other Apps running - 45.5 watts:

View attachment 183349




No. Integrated graphics + Coral Dual TPU.
45w is a high number at idle. I dont have an 11th gen but I will test as 12th gen i5 and a 10th gen i5 when i get the chance...i bet its a combo of a large oversized power supply and and gaming parts that is causing your high numbers...these things should idle at 10-20w with just the ssd.
That said you can easily handle ai with just the cpu or with the intel gpu using yolov5 .net
 
The Licence Plate Recogniser uses CPU for me however YOLO uses GPU CUDA. How do I enable GPU for licene plate reader module?
The licence plate module sets my CPU to 100%
 
Anyone else seeing this. All of a sudden everything is coming back object not found and times were shooting to 30,000+ms.

This is what my display shows now - I tried stopping the service and restarting the computer and same problem persists of saying Lost Contact - it will make connection and then lose contact again.

1705964547116.png