Some success with a coral tpu (m.2) with CPAI and BI

mailseth

Getting the hang of it
Joined
Dec 22, 2023
Messages
126
Reaction score
88
Location
California
Alright. I know there are a number of issues kicking around that I may not be able to debug anytime soon since I’m not in a position to do so. I’ve been only in the core of the code and even then I’ve been testing with particular load patterns and lack of driver/hardware failures.
 

Pentagano

Getting comfortable
Joined
Dec 11, 2020
Messages
613
Reaction score
282
Location
Uruguay
Adding a 2nd tpu did not work for me - the m2 and usb for example.
Added the 2 devices in the device settings of the container. /dev/apex_0 and /dev/usb
Kept saying libraries are missing and failing on multi mode. Even though they both work independently on single mode.
 

freman

n3wb
Joined
Jul 3, 2020
Messages
16
Reaction score
10
Location
Australia
Honestly, I don’t really understand what would cause you to have to reinstall to change the model size, but I haven’t been running that part of the code so I can’t speak to it. It’s an area we’ve actively been working on, however, to try to reduce the CPAI bandwidth cost. So if you’re feeling handy with code and want to dig into what’s going wrong, you may find something serious that needs fixing.
It may have just been the first time, I could not get it to do the yolo model, said it downloaded it, so I tried to switch to it, it promptly spat out a bunch of errors from the tpu, no amount of swapping models around and even restarts didn't fix it. Even weirder that original installation was just not working from blue iris, but a full wipe and reinstall solved both problems.

I promise you I'm not simply nuking it for the fun of it.

I started playing with it at like 8pm it took till 5am to get it to behave well (I was reluctant to reinstall it the second time cos that's slow)

If it plays up when I try swapping again I'll try to grab some logs for you
 

mailseth

Getting the hang of it
Joined
Dec 22, 2023
Messages
126
Reaction score
88
Location
California
Adding a 2nd tpu did not work for me - the m2 and usb for example.
Added the 2 devices in the device settings of the container. /dev/apex_0 and /dev/usb
Kept saying libraries are missing and failing on multi mode. Even though they both work independently on single mode.
Yeah. Something like that should work in theory, but I don’t have a USB one to test with. I’m sure I made an assumption somewhere but don’t know where. Seems silly to spend > $60 on flakey and slow hardware just to see how it breaks. But I may do it anyway just to make it happen.
 

mailseth

Getting the hang of it
Joined
Dec 22, 2023
Messages
126
Reaction score
88
Location
California
It may have just been the first time, I could not get it to do the yolo model, said it downloaded it, so I tried to switch to it, it promptly spat out a bunch of errors from the tpu, no amount of swapping models around and even restarts didn't fix it. Even weirder that original installation was just not working from blue iris, but a full wipe and reinstall solved both problems.

I promise you I'm not simply nuking it for the fun of it.

I started playing with it at like 8pm it took till 5am to get it to behave well (I was reluctant to reinstall it the second time cos that's slow)

If it plays up when I try swapping again I'll try to grab some logs for you
Yeah, I believe you. “It works on my machine.” Sounds like a poor excuse, but really does make it hard to debug and fix issues. And, like I said, all my TPUs are busy right now working on other things and haven’t actually been running CPAI in a real-world way for months.
 

freman

n3wb
Joined
Jul 3, 2020
Messages
16
Reaction score
10
Location
Australia
“It works on my machine.”
I've said that more than once lol, I'm just not as comfortable with python and ai as I am with other languages and concepts (I tried switching to python first right when the 2/3 thing was happening and it drove me mental so I switched to nodejs before go - I understand the theory of our current state of "ai" well enough to not be blindly wowed by it but not the execution)

Reasonably happy to be a guinea pig if you have any experiments to run I have a spare USB tpu and a raspberry pi (I didn't know if I'd get the dual pcie working so I grabbed both)
 
Last edited:

Pentagano

Getting comfortable
Joined
Dec 11, 2020
Messages
613
Reaction score
282
Location
Uruguay
I've said that more than once lol, I'm just not as comfortable with python and ai as I am with other languages and concepts (I tried switching to python first right when the 2/3 thing was happening and it drove me mental so I switched to models before go lol)

Reasonably happy to be a guinea pig if you have any experiments to run I have a spare USB tpu and a raspberry pi (I didn't know if I'd get the dual pcie working so I grabbed both)
Me too. I've got a couple of spare lenovos -i5 mini pcs with a usb coral and A+E. My M2 coral is used in my unraid.
All depends on family commitments though. Son had a sleepover last night and wifey was with her sister looking after their Dad so had all night alone to tinker>
I have found the usb to be fairly unstable with the coral module though. Frigate no issues.
 

mailseth

Getting the hang of it
Joined
Dec 22, 2023
Messages
126
Reaction score
88
Location
California
Part of the problem is that I suspect the logs are enough. There tends not to be enough logged, so someone needs to get in there and start printing out all of the intermediate values.

And the unstable parts are even harder. I've tried to put everything together in such a way that the system can easily be brought down and rebuilt at runtime whenever any problem is detected, but, well, it works on my machine so I don't know the exceptions to catch and where. And if that even works or I'm missing something else.

Edit: Also, I have similar family commitments, except two daughters under the age of three and one sleeps next to my dev machine.
 

Pentagano

Getting comfortable
Joined
Dec 11, 2020
Messages
613
Reaction score
282
Location
Uruguay
Part of the problem is that I suspect the logs are enough. There tends not to be enough logged, so someone needs to get in there and start printing out all of the intermediate values.

And the unstable parts are even harder. I've tried to put everything together in such a way that the system can easily be brought down and rebuilt at runtime whenever any problem is detected, but, well, it works on my machine so I don't know the exceptions to catch and where. And if that even works or I'm missing something else.
Are you running cpai in a container? If so what is your config to build it with 2 devices? I'm suspecting the usb was just causing issues in my case with multi mode. May well work with 2 pcie/m2 devices better
Thanks
 

freman

n3wb
Joined
Jul 3, 2020
Messages
16
Reaction score
10
Location
Australia
Happy to pop it in an isolated vlan and give you ssh to debug to your heart's content, it is 100% free and idle hardware for at least 6 months... It might get absorbed into my Christmas show tho after that
 

Pentagano

Getting comfortable
Joined
Dec 11, 2020
Messages
613
Reaction score
282
Location
Uruguay
Happy to pop it in an isolated vlan and give you ssh to debug to your heart's content, it is 100% free and idle hardware for at least 6 months... It might get absorbed into my Christmas show tho after that
Thanks but I have limited fibre each month, need to conserve it for working remotely. If I start working on anything in the cloud then it consumes my monthly GB quota and I'm close to the edge each month as it is.
I try to keep everything local
 

mailseth

Getting the hang of it
Joined
Dec 22, 2023
Messages
126
Reaction score
88
Location
California
Are you running cpai in a container? If so what is your config to build it with 2 devices? I'm suspecting the usb was just causing issues in my case with multi mode. May well work with 2 pcie/m2 devices better
Thanks
That's part of my debugging problem, I'm not running CPAI or have it setup right now. I'm just running the core TPU code outside of CPAI to work on and test the performance. For example, the two next things I'm planning on looking at are a reorg of the internals to reduce context switching between threads and using OpenCV since its been benchmarking so much better for all the other operations like image resizing. Once all of this is stable, I'm planning on moving it back to my CPAI-and-BI-on-Windows partition.
 

freman

n3wb
Joined
Jul 3, 2020
Messages
16
Reaction score
10
Location
Australia
I try to keep everything local
Lol, sorry I should have been more specific to who I was offering that to. @mailseth said all his TPUs were busy, I have one in an unopened box, a pi on the bench, reasonable (for Australia) bandwidth if he is interested in another testing/debugging platform that isn't going to be missed by me for 6 months. I can even rig up a way to remote power cycle it all lol.

No pressure, just the offer is there.

Edit: could possibly even wrangle windows on a nuc too, wife hasn't touched hers since she left university lol
 

mailseth

Getting the hang of it
Joined
Dec 22, 2023
Messages
126
Reaction score
88
Location
California
I still have all the time limitations of two young daughters, but maybe I should take you up on that offer. Next time there’s a release of CPAI and you’re feeling like you have the time to get everything installed and set up on your end, we could look into it. I’d probably need a walk through of how to get to the log files, tpu_runner.py, and cycle the system.
 

Pentagano

Getting comfortable
Joined
Dec 11, 2020
Messages
613
Reaction score
282
Location
Uruguay
My
I still have all the time limitations of two young daughters, but maybe I should take you up on that offer. Next time there’s a release of CPAI and you’re feeling like you have the time to get everything installed and set up on your end, we could look into it. I’d probably need a walk through of how to get to the log files, tpu_runner.py, and cycle the system.
My son is approaching 15 and starting to get invites to the quinceneras. 15yr old latin celebrations here. Formal parties until 5am!! So us poor parents during the whole year end up either staying up all hours waiting to collect them at 430am or go to bed early and set the alarm clock.
Can spend those hours tweaking:(;)
 

freman

n3wb
Joined
Jul 3, 2020
Messages
16
Reaction score
10
Location
Australia
I still have all the time limitations of two young daughters.
Oh I get it, I haven't got kids yet and I still don't have the free time I'd like to dedicate to this (trying to start a small weekend business besides an actual full time job, planning Christmas shows, etc) just sing out with your preferred os/install and I'll get you the rest of the way. I figure you're helping all of us, it's the least I can do

In the mean time tomorrow I'll try swapping models sizes and see how I go.
 

Pentagano

Getting comfortable
Joined
Dec 11, 2020
Messages
613
Reaction score
282
Location
Uruguay
Just tried the yolov8 large model which takes significantly longer and it does not detect this bird clearly shown in the snapshot.
I would have expected it to pick this up.. maybe contrast issues?

1713801460254.png

My gpu on yolov3.1 picked up anything and everything
 
Last edited:

mailseth

Getting the hang of it
Joined
Dec 22, 2023
Messages
126
Reaction score
88
Location
California
lol. Once you have kids (two under two until recently!) you lose control of time you didn’t even know you had and in ways you were were able to lose. I swear I’ve lost twenty IQ points on top of everything else. Feel free to use whatever os/install is most useful for you. After all I’ll be running and debugging whatever setup is best for me anyway.

The problem with finding wildlife is that it has spent millions of years practicing not being seen and is pretty good at it by now. Humans have spent millions of years practicing finding wildlife and are also pretty decent at it. Computers are only a few years into the process.

Edit: Coincidentally just last night I was working on tiling and auto contrast code to get something started in OpenCV. So maybe it’ll fix your contrast problem, maybe the problem is something else.
 

Pentagano

Getting comfortable
Joined
Dec 11, 2020
Messages
613
Reaction score
282
Location
Uruguay
One odd observation is using the dashboard.
Even if I switch model and under the info it says yolov8 or Efficientdetlite for example - the dashboard only ever gives me the option to test with mobilenet ssd.

What do you observe?

1713802642224.png
 

mailseth

Getting the hang of it
Joined
Dec 22, 2023
Messages
126
Reaction score
88
Location
California
I didn’t write that code, but we are trying to use CPAI model configurations in a way that it hadn’t been originally intended. Both because we are wrapping up a number of different sizes and types of models in the Coral module and because we are trying to only download the models on demand. So if the wrong model is running there is some bug in there that’s actually outside of the TPU module strictly and I’m not familiar with it. You’d need to insert some debugging in the logs to print out exactly what is getting downloaded when, what files were in the archive, and what is getting loaded into the TPU runner and what is running. For starters. Those are all the places that are suspect to me, and need to be verified.

Also, it’s an are of active development so the bug may be fixed already, so make sure you keep track of the changes in whatever the relevant code is on GitHub.

edit: also, I wouldn’t trust file names either, so look at file sizes to make sure the model is what you think it is.
 
Last edited:
Top