Rakshith B N

mld-FineTune

A lightweight architecture for efficient fine-tuning of Vision Transformers for Image Datasets.

The current update employs Lion Optimizer

Developed as a command Line Tool (CLI) aimed at being a lightweight architecture for efficient fine tuning of vision transformers for small and medium sized image datasets training. Developed with an objective to make training computationally lightweight and performance equally comparable to heavier architectures. Parameter Efficient Fine Tuning also known as PEFT with Low-Rank Adaptaion technique (LoRA) was adopted to utilize large pre-trained models like Vision Transformers to streamline and downsize tasks with minimal computational overhead. The developed method outperformed several heavier architectures like ResNet50, RCNNs etc.

Note : The model was tested and configured for the `Chula-ParasiteEgg11` dataset - ICIP 2022 Grand Challenge.The program is based on the `google/vit-base-patch16-224` model it.

The developed method combined with df-analyze achieved better performance across different configurations when compared to heavier architectural models like ResNet50, AlexNet, and Faster R-CNN. This developed method also outperformed top performing models of the ICIP 2022 grand challenge leaderboard across different configurations.

- run PEFT on sample dataset. ■ MD032/blanks-around-lists: Lists should be surrounded by blank lines ```python ■ MD031/blanks-around-fences: Fenced code blocks should be surrounded by blank lines python3 finetune.py --data-path sample_data --num-epochs 10 --batch-size 16 ``` - Extract features of the finetuned sample dataset ■ MD032/blanks-around-lists: Lists should be surrounded by blank lines ```python ■ MD031/blanks-around-fences: Fenced code blocks should be surrounded by blank lines python3 feature_extraction.py --model_path model.pth --dataset_path sample_data ``` ### Dataset Format: ■■■ MD026/no-trailing-punctuation: Trailing punctuation in heading [Punctuation: ':'] The dataset is expected to be in the followving format: ```python sample_data ├── folder ├── class1 ├── class2 └── ... ``` ## Extract Features ■ MD022/blanks-around-headings: Headings should be surrounded by blank lines [Expected: 1; Actual: 0; Below] - Utilize the feature extraction method to extract embeddings. The output is a .CSV file. ■■ MD032/blanks-around-lists: Lists should be surrounded by blank lines - Run the feature extraction with just a few arguments. ■■ MD009/no-trailing-spaces: Trailing spaces [Expected: 0 or 2; Actual: 1] ```python ■ MD031/blanks-around-fences: Fenced code blocks should be surrounded by blank lines python feature_extraction.py --model_path <.pth path> --dataset_path ``` ## Model ■■ MD009/no-trailing-spaces: Trailing spaces [Expected: 0 or 2; Actual: 1] - **Model**: The `google/vit-base-patch16-224` Vision Transformer model is used, fine-tuned with LoRA. ■ MD032/blanks-around-lists: Lists should be surrounded by blank lines - **Optimizer**: Lion-Optimizer - **Loss Function**: **Cross Entropy Loss**. Calculated at the end of each epoch. ## Evaluation ■ MD022/blanks-around-headings: Headings should be surrounded by blank lines [Expected: 1; Actual: 0; Below] After training, the model is evaluated on a test dataset, and the average test loss is reported. The current update uses stratified sampling. Output ■ MD022/blanks-around-headings: Headings should be surrounded by blank lines [Expected: 1; Actual: 0; Below] - Outputs `.pth` file within the same directory. ■■ MD032/blanks-around-lists: Lists should be surrounded by blank lines The current limitations are known. I'm pushing toward improving this repositry. If you identify an error, a possible fix or improvement, open a clearly defined pull request.