CodeProject.AI Version 2.5

MikeLud1

IPCT Contributor
Joined
Apr 5, 2017
Messages
2,255
Reaction score
4,309
Location
Brooklyn, NY
It is python ultralytics. I did bump the image size to 800 since my images are 1280x720 and most of my substreams are the same resolution as well. My batch size is 22 which uses around 22GB of RAM.

Ultralytics YOLOv8.2.40 Python-3.12.6 torch-2.4.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3090, 24575MiB)
engine\trainer: task=detect, mode=train, model=models/ipcamv8.yaml, data=data.yaml, epochs=10, time=None, patience=100, batch=22, imgsz=800, save=True, save_period=-1, cache=False, device=0, workers=8, project=None, name=train2, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=None, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs\detect\train2
....
Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size
1/10 21.5G 2.565 3.498 2.907 68 800: 100%|██████████| 1865/1865 [19:40<00:00, 1.58it/s]
Class Images Instances Box(P R mAP50 mAP50-95): 100%|██████████| 75/75 [01:26<00:00, 1.15s/it]
all 3288 15101 0.15 0.114 0.0875 0.0397
When training does your GPU Utilization % go up. Also what model size are you using? I normally use model=yolov8s.yaml. Below is what I use to start the training

yolo train data=plate.yaml model=yolov8s.yaml epochs=300 imgsz=640 cache=True batch=-1 patience=10 workers=8
 
Last edited:

nmbgeek

n3wb
Joined
Jan 15, 2023
Messages
19
Reaction score
14
Location
Myrtle Beach, SC
When training does your GPU Utilization % go up. Also what model size are you using? I normally use model=yolov8s.yaml. Below is what I use to start the training

yolo train data=plate.yaml model=yolov8s.yaml epochs=300 imgsz=640 cache=True batch=-1 patience=10 workers=8
Yes GPU Utilization is 100%. I was using a custom yaml basically equivalent to large model with 800 image size because I had some larger pictures. Using the yolov8m.yaml and going back to 640 image size lets me set batch size to 50 and that appears to be using 20GB of 24 and brings the epoch time to around 6 minutes.
 

MikeLud1

IPCT Contributor
Joined
Apr 5, 2017
Messages
2,255
Reaction score
4,309
Location
Brooklyn, NY
Yes GPU Utilization is 100%. I was using a custom yaml basically equivalent to large model with 800 image size because I had some larger pictures. Using the yolov8m.yaml and going back to 640 image size lets me set batch size to 50 and that appears to be using 20GB of 24 and brings the epoch time to around 6 minutes.
For the medium model 6 minutes looks normal. Try the below settings, cache=True will help with the speed, batch=-1 will automatically adjust the batch size to maximize your GPU memory, patience=10 will stop the training early if the model does not improve for 10 epochs

epochs=300 imgsz=640 cache=True batch=-1 patience=10 workers=8
 

hapstabu

Getting the hang of it
Joined
Aug 29, 2020
Messages
77
Reaction score
47
Location
US
I look forward to some progress on this front, as we don't need the 80 labels in the CoCo dataset.
Good luck to all trying to trim this down.

Sent from my iPlay_50 using Tapatalk
 

mailseth

Getting the hang of it
Joined
Dec 22, 2023
Messages
143
Reaction score
99
Location
California
FWIW, I’ve been working on a model larger than @MikeLud1 ‘s for a few months, with additional classes for things like packages, license plates, and fire. I’m currently on the fourth iteration. I’ve been using the package ‘fiftyone’ to curate the dataset sourced from a bunch of different sources. So far it’s not as accurate as I’d like, but I’m still working on it. 70 GB of training images on each iteration and working on getting a better label set.

There aren’t any training sets online with labeled IPCam images, so it would be great if anyone would send me theirs (or upload to Roboflow and send me a link). The closest I have are images from hunter’s trail cameras.
 
Last edited:

nmbgeek

n3wb
Joined
Jan 15, 2023
Messages
19
Reaction score
14
Location
Myrtle Beach, SC
For the medium model 6 minutes looks normal. Try the below settings, cache=True will help with the speed, batch=-1 will automatically adjust the batch size to maximize your GPU memory, patience=10 will stop the training early if the model does not improve for 10 epochs

epochs=300 imgsz=640 cache=True batch=-1 patience=10 workers=8
I added the patience=10 previously. When I do batch=-1 it is only using around 60% of my GPU memory. I need 50GB of available RAM to turn on cache so I am going to fill the other 2 slots and that will hopefully be here Saturday. I donated 64GB of RAM from this computer to a proxmox host and never replaced it apparently.
 

MikeLud1

IPCT Contributor
Joined
Apr 5, 2017
Messages
2,255
Reaction score
4,309
Location
Brooklyn, NY
I added the patience=10 previously. When I do batch=-1 it is only using around 60% of my GPU memory. I need 50GB of available RAM to turn on cache so I am going to fill the other 2 slots and that will hopefully be here Saturday. I donated 64GB of RAM from this computer to a proxmox host and never replaced it apparently.
You can specify the percentage to use for the batch, I would try 0.85 or 0.90. For cache you can use the disk. Below is a link to all the training argument

1726176451678.png
1726176503764.png

 

nmbgeek

n3wb
Joined
Jan 15, 2023
Messages
19
Reaction score
14
Location
Myrtle Beach, SC
You can specify the percentage to use for the batch, I would try 0.85 or 0.90. For cache you can use the disk. Below is a link to all the training argument

View attachment 202991
View attachment 202992

Update = From what I can tell whatever setting you start the training with is used for any resumes.

yolo task=detect mode=train model=runs/detect/train3/weights/last.pt data=data.yaml epochs=200 imgsz=640 batch=50 patience=10 device=0 cache=disk resume=True
Ultralytics YOLOv8.2.92 Python-3.12.6 torch-2.4.1+cu121 CUDA:0 (NVIDIA GeForce RTX 3090, 24575MiB)
engine\trainer: task=detect, mode=train, model=runs\detect\train3\weights\last.pt, data=data.yaml, epochs=200, time=None, patience=10, batch=50, imgsz=640, save=True, save_period=-1, cache=False, device=0, workers=8, project=None, name=train3, exist_ok=False, pretrained=True, optimizer=auto, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=10, resume=runs\detect\train3\weights\last.pt, amp=True, fraction=1.0, profile=False, freeze=None, multi_scale=False, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, vid_stride=1, stream_buffer=False, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, embed=None, show=False, save_frames=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, show_boxes=True, line_width=None, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.0, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, bgr=0.0, mosaic=1.0, mixup=0.0, copy_paste=0.0, auto_augment=randaugment, erasing=0.4, crop_fraction=1.0, cfg=None, tracker=botsort.yaml, save_dir=runs\detect\train3

My current training run I tried disk in the command but it still resulted in False in the training.
 
Last edited:
Top