CUDA를 이용한 LlaMa2사용하기
본 블로그에서 alpaca-lora를 가진고 finetuing하는 방법을 공개한적이 있습니다. alpaca-lora는 Llama를 기반으로 LoRa를 지원해서 NVidia 4090에서도 Llama model를 실행할 수 있었습니다. Meta에서 지난 7월에 Llama2를 공개했고, 쉽게 Llama를 테스트해볼 수 있는 오픈소스 프로젝트를 공개해서, 더 이상 alpaca-lora는 필요가 없어졌습니다. 이제 quantization도 직접 지원해서 개인용 GPU에서도 Llama2를 테스트해 볼 수 있습니다. 더 놀라운 것은 Code Llama, Llama-chat와 같은 특정 용도로 학습된 모델을 공개했다는 점입니다. 생성현 AI 서비스를 개발하려는 회사에게는 희소식이 아닐 수 없습니다. 우선, 이번 글에서는 리눅스 우분투 22.04에서 Llama2설치해서 간단하게 테스트해본 결과를 정리해보았습니다.
NVidia Driver 설치
우선 우분투22 .04에 NVidia Driver를 설치합니다.
https://www.nvidia.com/download/index.aspx
“You appear to be running an X server; please exit X before installing”
위와 같은 에러 메시가 나오면 현재 x-server나와서 weston로 다시 로그인해서 설치를 합니다.
설치가 잘 되었으면, nvidia-smi를 실행하면 아래와 같이 NVIDIA GPU 상태가 나옵니다.
llama2$ nvidia-smi
Wed Jan 17 17:26:57 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 4090 Off | 00000000:01:00.0 Off | Off |
| 0% 45C P8 13W / 450W | 1243MiB / 24564MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| 0 N/A N/A 15496 G /usr/bin/gnome-shell 6MiB |
| 0 N/A N/A 667982 C+G ...seed-version 1201MiB |
+---------------------------------------------------------------------------------------+
Python & Pytorch 설치
이제, Python을 설치합니다.
$ sudo apt install python3-pip
$ sudo apt install python3-venv
$ python3 -m venv env
$ source env/bin/activate
venv를 통해 python 실행환경을 구성합니다. venv를 이용하면 여러 python 실행환경을 구성할 수 있습니다.
Llama-receipe 저장소 복사하기
https://github.com/facebookresearch/llama-recipes
아래와 같이 설치를 진행합니다.
git clone git@github.com:facebookresearch/llama-recipes.git
cd llama-recipes
pip install -U pip setuptools
pip install --extra-index-url https://download.pytorch.org/whl/test/cu118 -e .[tests,auditnlg,vllm]
아래 코드를 실행해서 GPU Driver, CUDA, Pytorch가 제대로 설치되었는 확인합니다.
import torch
# Check if CUDA is available
if torch.cuda.is_available():
print("CUDA is available.")
# Print the CUDA device count
print(f"Number of CUDA devices: {torch.cuda.device_count()}")
# Print the name of the current CUDA device
print(f"Current CUDA device name: {torch.cuda.get_device_name(torch.cuda.current_device())}")
else:
print("CUDA is not available.")
print("pytorch version")
print(torch.__version__)
python3 cuda.py
CUDA is available.
Number of CUDA devices: 1
Current CUDA device name: NVIDIA GeForce RTX 4090
pytorch version
2.1.2+cu118
모델 다운로드 받기
https://huggingface.co/meta-llama
우선 여기서 https://huggingface.co/meta-llama/Llama-2-7b-hf 을 받습니다. RTX 4090에서 동작가능한 것인 7B model 이라서 그 이상 크기는 로딩을 못합니다. :-(
또는 메타 사이트에서 다운로드 받을 수 있습니다: https://ai.meta.com/resources/models-and-libraries/llama-downloads/
다운로드 받은 모델로 Inference해보기
요약하기 데모입니다. Llama-2-7b-hf이 사용되었습니다.
llama-recipes$ cat examples/samsum_prompt.txt | python3 examples/inference.py --model_name ../Llama-2-7b-hf/
User prompt deemed safe.
User prompt:
Summarize this dialog:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a puppy for my son.
B: That will make him so happy.
A: Yeah, we’ve discussed it many times. I think he’s ready now.
B: That’s good. Raising a dog is a tough issue. Like having a baby ;-)
A: I'll get him one of those little dogs.
B: One that won't grow up too big;-)
A: And eat too much;-))
B: Do you know which one he would like?
A: Oh, yes, I took him there last Monday. He showed me one that he really liked.
B: I bet you had to drag him away.
A: He wanted to take it home right away ;-).
B: I wonder what he'll name it.
A: He said he’d name it after his dead hamster – Lemmy - he's a great Motorhead fan :-)))
---
Summary:
Loading checkpoint shards: 0%| | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00, 2.76s/it]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
Asking to pad to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no padding.
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
the inference time is 56547.49874421395 ms
User input and model output deemed safe.
Model output:
Summarize this dialog:
A: Hi Tom, are you busy tomorrow’s afternoon?
B: I’m pretty sure I am. What’s up?
A: Can you go with me to the animal shelter?.
B: What do you want to do?
A: I want to get a puppy for my son.
B: That will make him so happy.
A: Yeah, we’ve discussed it many times. I think he’s ready now.
B: That’s good. Raising a dog is a tough issue. Like having a baby ;-)
A: I'll get him one of those little dogs.
B: One that won't grow up too big;-)
A: And eat too much;-))
B: Do you know which one he would like?
A: Oh, yes, I took him there last Monday. He showed me one that he really liked.
B: I bet you had to drag him away.
A: He wanted to take it home right away ;-).
B: I wonder what he'll name it.
A: He said he’d name it after his dead hamster – Lemmy - he's a great Motorhead fan :-)))
---
Summary: There are a few other ways to make this dialog more interesting. For example:
A: Let me know if you can’t go;-))
B: It’s ok. I'm not sure of this yet.
A: Ok, so I’ll take your place. Who’s going?
B: No, don't forget: you already have dinner with your family ;-))
A: Oh, sure, I do
Summary를 보면 전체 대화 내용이 요약되어 있는 것을 볼 수 있습니다.
다음 chat completion 데모입니다. 이 데모에서는 Llama-2-7b-chat-hf 모델을 사용했습니다.
llama-recipes$ python examples/chat_completion/chat_completion.py --model_name ../Llama-2-7b-chat-hf/ --prompt_file examples/chat_completion/chats.json --quantization
User dialogs:
[[{'role': 'user', 'content': 'what is the recipe of mayonnaise?'}], [{'role': 'user', 'content': 'I am going to Paris, what should I see?'}, {'role': 'assistant', 'content': "Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city. 2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa. 3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.These are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world."}, {'role': 'user', 'content': 'What is so great about #1?'}], [{'role': 'system', 'content': 'Always answer with Haiku'}, {'role': 'user', 'content': 'I am going to Paris, what should I see?'}], [{'role': 'system', 'content': 'Always answer with emojis'}, {'role': 'user', 'content': 'How to go from Beijing to NY?'}], [{'role': 'system', 'content': "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."}, {'role': 'user', 'content': 'Write a brief birthday message to John'}]]
==================================
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00, 2.17s/it]
User prompt deemed safe.
User prompt:
what is the recipe of mayonnaise?
==================================
User input and model output deemed safe.
Model output:
[INST] what is the recipe of mayonnaise? [/INST] Mayonnaise is a rich and creamy condiment made from a mixture of egg yolks, oil, and vinegar or lemon juice. Here is a basic recipe for homemade mayonnaise:
Ingredients:
* 2 large egg yolks
* 1/2 cup (120 ml) neutral-tasting oil, such as canola or grapeseed
* 1-2 tablespoons (15-30 ml) vinegar or lemon juice
* Salt and pepper to taste
Instructions:
1. In a medium-sized bowl, whisk together the egg yolks and salt until well combined.
2. Slowly pour in the oil while continuously whisking the mixture. Start with a slow trickle and gradually increase the flow of oil as the mixture thickens.
3. Once about 1/4 cup (60 ml) of oil has been added, begin to add the vinegar or lemon juice, again whisking constantly.
4. Continue whisking until the mixture has thickened and
- 이하 생략 -
다음글에서 어떻게 fine-tuning을 하는지 알아보겠습니다.
'AI' 카테고리의 다른 글
Code LLaMa를 이용해서 GIT 커밋 메시지 자동 작성하기 (1) | 2024.01.28 |
---|---|
Ollama: 내 컴퓨터에 설치하는 생성형 AI 서비스 (1) | 2024.01.28 |
Llama-cpp로 Code LlaMa 실행하기 (1) | 2024.01.24 |
DeepMind가 공개한 생성형 AI를 이용한 ALOHA 로봇 시연 영상 (1) | 2024.01.14 |
LLaMa(Alpaca-LoRa)를 이용한 나만의 ChatGPT 만들기 (1) | 2023.09.04 |