ImageToText

Posted on 2025-03-23 Edited on 2025-04-23 In Learning Notes

The relevant papers and some experimental problems and solutions

Mac use library deepface.
I first want to use deepface: a library for face recognization, analysis and many other staffs. However, whatever way I use to install deepface library will cause a problem: when I program the python code: from deepface import deepface, it will cause a problem:cannot import name ‘DeepFace’ from partially initialized module ‘deepface’ (most likely due to a circular import). I tried to install the library by pip or by download the github repository and install it. But all failed. I find the problem, it is a stupid problem: I named the python file deepface.py the same name as the library deepface, so it caused circular reference problem. I mixed the mistake. What a foolist problem!. Then I encountered another problem when I download deepface weight .h5 files, the files are too large that it will make it timeout for downloading.
The ideas I learned:
ViT, the basic structure of LLM, the pre-training method of LLM, the basic principle of VLM, CLIP, BILP, zero shot, CogVLM，EVA-CLIP， metircs of image caption
Pre-train, fine-tuning,ViT, CLIP,Prompt engineering

transformer.HfAugumentation:HfArgumentParser：解析命令行参数，从类对象中创建解析对象，可以将类对象中的实例属性转换成转换为解析参数
mtp:multi-token prediction
vision-tower：视觉编码器
PEFT：Parameter-Efficient Fine-Tuning
lora_enable：低秩适配器
Prompt learning：任务设置过于理想，试图只调节输入端的小部分参数，对深层部分的影响是相当有限的，这就会造成最终fine-tune的效果受到局限。
transformers.AutoTokenizer API：将文本输入转化为模型可以接受的输入
tune_mm_mlp_adapter：
mm_use_im_start_end添加特殊的图像标记和
deepspeed：Zero 3
fsdp: PyTorch 原生的 FSDP (FullyShardedDataParallel)

TODO：
预训练模型训练方法：Albert，LoRA，Prompt Learning， Prefix-truning，P-Truning