YOLO v8 issue with Coral TPU

xm4rcell0x · Mar 3, 2024

Hi guys, i noticed that was added yolov8 module for the coral object detection, i tried but it doesn't work at all. No problem with MobileNet-SSD .
That's my log:

Any ideas to get this module working ?

mailseth · Mar 3, 2024

Try enabling multi-TPU

xm4rcell0x · Mar 3, 2024

It works even if don’t have the adapter for the Dual TPU!!! Thank you
Anyway I noticed that the large model size (yolov8) is far less sensitive than the medium one. Is that normal??

mailseth · Mar 3, 2024

The multi-TPU code is really just a newer version of the code. So it supports the YOLO models, but should probably be tested more before being the default.

What do you mean by sensitive? How is your performance with one TPU and the medium model?

xm4rcell0x · Mar 3, 2024

I mean sensible. To understand, if I analyze the same alert (a car in the nighttime ) with AI in BlueIris, I have “better “ results with medium model size instead the large one. Every frame with the medium is recognized with a car as an object at higher percentages of confidence . The large model size has far less percentage of confidence and at least in 3 or 4 frame is recognized the car.

For the performance I can say that the medium sized has the same performance as the MobileNet small, every inference is done in 30/40ms.
With the large sized I’m in the 220-250ms range.

mailseth · Mar 3, 2024

I would expect much better quality results with YOLOv8 than with MobileNet. It is also running faster than I would expect. So maybe there is something wrong and it is running the wrong model? Do you see anything in the logs about what model is being used?

In the next version of the code I want to print more messages about what is being used.

Chura · Mar 4, 2024

There is also EfficientDet Lite model, worth trying ?

xm4rcell0x · Mar 4, 2024

mailseth said:
I would expect much better quality results with YOLOv8 than with MobileNet. It is also running faster than I would expect. So maybe there is something wrong and it is running the wrong model? Do you see anything in the logs about what model is being used?

In the next version of the code I want to print more messages about what is being used.

Hi Seth! i've done some tests. I have changed module and model size from mobilenet to yolov8, that's the results:

mailseth · Mar 4, 2024

I'm not completely sure what I'm supposed to see in the log, but it does look like the YOLOv8 model is being used regularly by the end of the log file. It cycles from MobileNet to YOLOv8 to a few MobileNets to YOLOv8.

You're welcome to try EfficentDet also. I would expect YOLOv8 to perform the best of all of them.

mailseth · Mar 4, 2024

I also see a few YOLOv8 large model runs in there. From the log here, it looks as I would expect.

How many CPUs do you have? I notice the parallelism is only '1', which seems low.

xm4rcell0x · Mar 4, 2024

I changed model several times (from mobile to yolo and viceversa) to see if CPAI change it or not. I also changed several times the model size.
Anyway I have 1 TPU , the cpu is an intel i5 12400 but I don’t use it to analyze .
So nothing strange right? The inference time for yolov8 medium are correct?

mailseth · Mar 4, 2024

I think so? I would expect roughly a 2x difference between the medium and large models, but a much smaller % of the large model will actually fit on the TPU when you're running a single TPU. The TPU will cache ~7 MB of a model. The medium model is 24 MB while the large model is 44 MB. The most compute-intense part of the model is in the first MB of parameters for whatever reason, so it's definitely not a linear mapping in terms of time. Anyway, I don't see anything in that log that looks unreasonable.

xm4rcell0x · Mar 16, 2024

Now I have a 4 TPU working setup. I have to be honest , I don’t see a lot of difference than the previous setup with 1 TPU, inference times are equal. But I don’t mind, I spent 20€ for each TPU. I’m only waiting for Coral custom model, and even ALPR for Coral. I think that’s the goal to have more than one TPU, right?

mailseth · Mar 16, 2024

You probably aren’t going to see much inference time improvements with multi-TPU. It’s still doing the same work with the same hardware. However, you should see 4x the inferences per second.

mailseth · Mar 16, 2024

Inference time improvements will be very context dependent. Some models perform much better spread across TPUs, others do not. Medium and large models tend to, but only to a point.

xm4rcell0x · Mar 16, 2024

Thanks

Any plan for a Coral ALPR ?

mailseth · Mar 16, 2024

I’m interested, but when it comes down to it, the TPU is best at running one model. There is significant overhead swapping the internal cache from one model to another. So ideally we would have one model that does everything for object detection, and then all the remaining models are run on the CPU or GPU. The model is called “You Only Look Once” for a reason, after all, and by default uses 80 (!) classes.

Some examples:
YOLOv8 small is roughly 12 MB in size and runs at 87 ms per inference on one TPU on my computer. My testing isn't able to make much speedup when adding TPU segments. So 2 TPUs should have roughly 2x the throughput of inferences.
YOLOv8 medium is roughly 24 MB in size and runs at 287 ms per inference on one TPU on my computer. It seems to run optimally when segmented across 2-3 TPUs. It has roughly these speedups for each additional TPU:

# 113.89787337975577 ms per inference
# 68.5618184232153 ms per inference
# 52.671655846294016 ms per inference
# 42.78380761388689 ms per inference
# 32.067516354843974 ms per inference
# 29.010483619291335 ms per inference
# 26.180413575377315 ms per inference

YOLOv8 large is roughly 44 MB in size and runs at 1079ms per inference for one TPU on my computer. It seems to run optimally when segmented across 2-3 TPUs. Note that this means that most of the model is still running on the CPU, but my testing has shown that additional TPUs are best utilized running those same 2-3 segments instead of additional segments. It has roughly these speedups for each additional TPU:

# 168.47994122793898 ms per inference
# 116.75911782123148 ms per inference
# 89.93861217377707 ms per inference
# 77.7926594489254 ms per inference
# 64.36888010893017 ms per inference
# 57.39340456202626 ms per inference
# 51.6873672218062 ms per inference

@MikeLud1's ipcam-general-v8 is 12 MB in size and runs at 238 ms per inference on one TPU. Interestingly, although it's the same size as YOLOv8 small, it prefers to be split into 2-3 segments. Timing with each additional TPU:

# 51.34019722510129 ms per inference
# 24.159977784845978 ms per inference
# 19.426164492033422 ms per inference
# 15.750231327023357 ms per inference
# 15.237237785942852 ms per inference
# 12.418005119077861 ms per inference
# 11.074969850946218 ms per inference

You can see that the speedup is often non-linear and doesn't always make sense.

xm4rcell0x · Mar 16, 2024

Thank you for your explanation!!
So for Coral is best to have only one custom model with all-in-one object detection. Ideally a merge between ipcam-general and ALPR for best performance.

Another question: why I see object detection in the logs even if my cams aren’t triggered? I think it’s a little bit inefficient…Am I right?

mailseth · Mar 16, 2024

Yeah, if it were up to me I’d have a single model that runs everything. Has 80 labels for everything relevant to a IPCam setup. Probably based on the YOLOv8 medium, but that would only run well if you have > 2 TPUs. YOLO’s default labels are honestly mostly irrelevant for security cameras.

I’m actually much less familiar with the how and why BI & CPAI runs. I’ve just gone down a rabbit hole playing with making multi-TPU setups as efficient as possible.

Edit: do the math on the above numbers. As long as you have 2 TPUs or more, the IPcam-general model with two labels works just as fast per inference as the YOLOv8 small with 80 labels. It doesn’t make sense to run more than one object detector

jjj111 · Apr 8, 2024

mailseth said:
....
@MikeLud1's ipcam-general-v8 is 12 MB in size and runs at 238 ms per inference on one TPU. Interestingly, although it's the same size as YOLOv8 small, it prefers to be split into 2-3 segments. Timing with each additional TPU:

If you don't mind, how are you using ipcam-general-v8 on a TPU? Your reference above is the only place I've seen it mentioned

YOLO v8 issue with Coral TPU

n3wb

Attachments

Pulling my weight

n3wb

Pulling my weight

n3wb

Pulling my weight

Getting the hang of it

n3wb

Pulling my weight

Pulling my weight

n3wb

Pulling my weight

n3wb

Pulling my weight

Pulling my weight

n3wb

Pulling my weight

n3wb

Pulling my weight

n3wb