열린 소프트웨어 이야기

전체 글에 해당하는 글 269건

Loz: 깃 커밋 메세지(git commit message) 자동 작성 툴2024.01.31
Code LLaMa를 이용해서 GIT 커밋 메시지 자동 작성하기12024.01.28
Ollama: 내 컴퓨터에 설치하는 생성형 AI 서비스12024.01.28
Microsoft가 Electron을 버린 이유2024.01.27
Edge WebView2로 나만의 웹브라우저 만들기2024.01.27
크롬 Headless 모드에서 GPU 가속하기32024.01.25
Llama-cpp로 Code LlaMa 실행하기12024.01.24
CUDA를 이용한 LlaMa2사용하기2024.01.18

Loz: 깃 커밋 메세지(git commit message) 자동 작성 툴

AI2024. 1. 31. 12:21

깃 커밋 메시지 작성은 번거로운 일이라서 소수의 인원이 개발을 할 때는 제목 조차도 잘 작성하지 않는 경우가 많습니다. 하지만 나중을 위해서라도 깃 커밋 메시지를 가능한 자세하게 작성하면 좋습니다. 이번 글에서 Loz라는 LLM을 사용해서 깃 커밋 메시지를 자동 작성해주는 툴을 소개하겠습니다.

아래 글은 프로젝트의 README를 한글로 번역한 것입니다.

시작하기

시작하려면 다음 npm 명령을 실행하세요.

sudo npm install loz -g

또는 저장소를 복제하세요.

git clone https://github.com/joone/loz.git

이 프로그램이 작동하려면 NodeJS와 npm이 필요합니다. Linux를 사용하는 경우 패키지 관리자를 사용하여 설치하세요.

$ ./install.sh

LLM 구성

Loz는 OpenAI API 와 Ollama를 지원하므로 프롬프트 모드에서 구성 명령을 사용하여 이 두 LLM 서비스 간에 쉽게 전환할 수 있습니다.

Ollama 설정

로컬 시스템에서 Ollama를 활용하려면 llama2 및 codellama 모델을 모두 설치해야 합니다. Linux 시스템에서 이를 수행하는 방법은 다음과 같습니다.

자세한 내용은 https://ollama.ai/download를 참조하세요

OpenAI API 설정

OpenAI API 자격 증명 설정에는 몇 가지 간단한 단계가 포함됩니다.

먼저 .env 프로젝트 루트에 파일을 만들고 다음 변수를 추가합니다.

OPENAI_API_KEY=YOUR_KEY

또는 npm 명령을 사용하여 Loz를 설치하는OPENAI_API_KEY=YOUR_KEY.bashrc에

export OPENAI_API_KEY=YOUR_KEY

OpenAI API를 설정하고 loz를 사용할 때, 다음 오류가 발생하면 무료 할당량을 초과했다는 의미입니다.

Request failed with status code 429:
API request limit reached

API를 계속 사용하려면 다음 링크를 통해 결제 방법을 설정해야 합니다: https://platform.openai.com/account/billing/paid-methods

OpenAI API 과금은 입력한 토큰수에 따라 비례하는게 계속 가격이 저렴해지고 있습니다. 일상적 개발에 사용한다면 한달에 커피 한잔 가격도 안나올겁니다.

초기 구성

Loz를 처음 시작하면 선호하는 LLM 서비스를 선택할 수 있습니다.

$ loz
Choose your LLM service: (ollama, openai) ollama

프롬프트 모드 내에서 config 명령을 사용하여 언제든지 LLM 서비스 기본 설정을 수정할 수 있습니다 .

> config api openai

대화형 모드

$ loz

loz가 실행되면 loz와 상호 작용하여 대화를 시작할 수 있습니다. loz는 입력에 따라 관련 메시지로 응답합니다.

파이프 모드

Loz는 Unix 파이프를 활용하여 다른 명령줄 도구의 입력을 처리할 수 있습니다.

$ ls | loz "count the number of files"
$ 23 files

텍스트 문서에 있는 모든 영문자를 대문자로 변경해보겠습니다.

$ cat example.txt | loz "convert the input to uppercase"

AS AI TECHNLOGY ADVANCED, A SMALL TOWN IN THE COUNTRYSIDE DECIDED 
TO IMPLEMENT AN AI SYSTEM TO CONTROL TRAFFIC LIGHTS. THE SYSTEM 
WAS A SUCCESS, AND THE TOWN BECAME A MODEL FOR OTHER CITIES TO 
FOLLOW. HOWEVER, AS THE AI BECAME MORE SOPHISTCATED, IT STARTED 
TO QUESTION THE DECISIONS MADE BY THE TOWN'S RESIDENTS, LEADING 
TO SOME UNEXPECTED CONSEQUENCES.

철자가 틀린 영어 단어도 쉽게 찾아줍니다.

$ cat example.txt | loz "list any spelling errors"

Yes, there are a few spelling errors in the given text:
1. "technlogy" should be "technology"
2. "sophistcated" should be "sophisticated"

파일 리스트를 JSON으로 변경하는 것은 기존 유닉스 툴로는 아마 쉽지 않을 겁니다. 하지만 LLM을 이용하면 간단하게 해결됩니다.

$ cd src
$ ls -l | loz "convert the input to JSON"

[
  {
    "permissions": "-rw-r--r--",
    "owner": "joone",
    "group": "staff",
    "size": 792,
    "date": "Mar 1 21:02",
    "name": "cli.ts"
  },
  {
    "permissions": "-rw-r--r--",
    "owner": "joone",
    "group": "staff",
    "size": 4427,
    "date": "Mar 1 20:43",
    "name": "index.ts"
  }
]

GIT 커밋 메시지 자동 작성

loz commitGit 리포지토리에서 실행하면 loz는 다음과 같은 단계적 변경 사항이 포함된 커밋 메시지를 자동으로 생성합니다.

$ git add --update
$ loz commit

또는 script/prepare-commit-msg를 .git/hooks에

$ chmod a+x .git/hooks/prepare-commit-msg

Loz는 LOZ 환경 변수를 사용하여 준비된 파일의 차이점을 읽어 커밋 메시지를 생성합니다.

$ LOZ=true git commit

알림: 이전 버전을 이미 복사한 경우 prepare-commit-msg를 업데이트하세요. 이전 버전은 리베이스하는 동안 커밋 메시지를 자동으로 업데이트합니다.

$ git diff HEAD~1 | loz -g
또는
$ git diff | loz -g

작성자 , 날짜 및 커밋 ID 줄은 커밋 메시지를 OpenAI 서버로 보내기 전에 제거됩니다.

현재는 영어로만 작성이 가능한데, 소스코드에 있는 Prompt를 변경하면 한글로 작성이 가능합니다.
일단, 영어로 작성해보는 것을 추천해드립니다.

'AI' 카테고리의 다른 글

[Llama2 receipt] Python Gradio를 이용한 WebUI 추가하기 (0)	2024.02.06
LLaMa2 파인튜닝(finetuning) 하기 (0)	2024.02.03
Code LLaMa를 이용해서 GIT 커밋 메시지 자동 작성하기 (1)	2024.01.28
Ollama: 내 컴퓨터에 설치하는 생성형 AI 서비스 (1)	2024.01.28
Llama-cpp로 Code LlaMa 실행하기 (1)	2024.01.24

Code LLaMa를 이용해서 GIT 커밋 메시지 자동 작성하기

AI2024. 1. 28. 15:15

우선 Ollama를 설치합니다. 자세한 방법 이전 블로그 글에도 있습니다.

curl https://ollama.ai/install.sh | sh

이번에는 Code Lamma를 설치합니다. Meta에서 공개한 Code LLaMa는 코딩에 특화된 대규모 언어 모델(LLM)입니다. Code LLaMa는 LLaMA2 모델을 기반으로 하며, 코딩 작업을 위한 만들어진 LLM 중 하나입니다. 개발자들의 업무 효율성을 높이고 코딩을 배우는 이들을 지원하기 위해 설계되었다고 합니다. 아마도 Github Co-Pilot과 비슷한 것 같습니다. 학습할 때, 프로그래밍 소스 코드를 좀 더 많이 학습했을 것으로 생각합니다. 아마다 여러 오픈소스 프로젝트의 GIT Commit 로그도 많이 학습했을 것입니다.

아래와 같이 git repostiory로 이동하고 코드를 수정한 후, 다음과 같이 명령을 실행하면,

ollama run codellama "Write a git message with this diff: $(git diff)"

아래 처럼 Code LlaMa가 작성한 GIT commit message를 볼 수 있습니다.

$ git diff
diff --git a/a.c b/a.c
index b73d93d..eda8d83 100644
--- a/a.c
+++ b/a.c
@@ -1,4 +1,6 @@
- int main() {
+#include <stdio.h>
+
+int main() {
 
    printf("hello world");
    return 0;

$ ollama run codellama "Write a git message with this diff: $(git diff)"
Here is a Git commit message with the diff you provided:
```
Diff between a.c and b.c
-------------------------

This commit adds a new line to the `main` function of `a.c`. The new line includes the header `<stdio.h>` and the body of the function
has been modified to include the call to `printf`.
```

한글로도 작성이 가능한데, 잘 되지는 않는다. 여러번 실행해서 다음과 같은 결과를 얻었습니다.

$ ollama run codellama "GIT commit message를 한글어로  작성하세요: : $(git diff)"
버전 관리 시스템 GIT은 소스 코드의 변경 사항을 추적하는 데 도움이 됩니다. 위의 메시지는 "a.c"라는 파일에서 두 개의 변경 사항을 기록하고 있습니다.

1. 행 1, 2: `#include <stdio.h>`라는 새로운 코드가 추가됨.
2. 행 4, 5: `int main()`이 수정되었음.

메시지는 "a.c"라는 파일의 인덱스를 나타내고 있습니다. 이것은 GIT이 어떤 변경 사항을 추가하거나 수정해야 하는지 확인하기 위한 것입니다.

앞서 언급한 두 개의 변경 사항은 "a.c"라는 파일에 대해 수행되었으며, 버전 관리 시스템 GIT은 이러한 변경 사항을 기록하고 있습니다.

'AI' 카테고리의 다른 글

LLaMa2 파인튜닝(finetuning) 하기 (0)	2024.02.03
Loz: 깃 커밋 메세지(git commit message) 자동 작성 툴 (0)	2024.01.31
Ollama: 내 컴퓨터에 설치하는 생성형 AI 서비스 (1)	2024.01.28
Llama-cpp로 Code LlaMa 실행하기 (1)	2024.01.24
CUDA를 이용한 LlaMa2사용하기 (0)	2024.01.18

Ollama: 내 컴퓨터에 설치하는 생성형 AI 서비스

AI2024. 1. 28. 12:33

오늘 소개하는 툴은 바로 ollama입니다. LlaMa가 공개된 이후 여러가지 방법으로 로컬 컴퓨터에서 LLM을 설치해서 나만의 ChatGPT를 구현하는 여러 방법이 소개되었습니다. 누구나 컴퓨터만 있으면 쉽게 설치해서 사용할 수 있습니다.

curl https://ollama.ai/install.sh | sh

자, LlaMa2 언어 모델을 설치해봅시다.

ollama pull llama2

실행도 너무나 쉽습니다.

$ ollama run llama2
>>> hello
Hello! It's nice to meet you. Is there something I can help you with or would you like to chat?

>>> Tell me about Steve Jobs
Steve Jobs (1955-2011) was a visionary entrepreneur, inventor, and designer who co-founded Apple Inc. and Pixar Animation Studios. He is widely 
recognized as one of the most innovative and successful business leaders of the last century. Here are some key facts about Steve Jobs:

1. Early Life: Jobs was born in San Francisco to a Syrian-American father and a Swiss-American mother. He grew up in Mountain View, California, and 
showed an early interest in electronics and design.
2. Co-founding Apple: In 1976, Jobs co-founded Apple Computer with Steve Wozniak. They launched the Apple I, one of the first personal computers, and 
later introduced the Macintosh computer, which was the first commercially successful personal computer to use a graphical user interface (GUI).
3. Pixar Animation Studios: In 1986, Jobs acquired Pixar Animation Studios from Lucasfilm and served as its CEO until it was acquired by Disney in 
2006. Under Jobs' leadership, Pixar created some of the most successful animated films ever made, including Toy Story (1995), Finding Nemo (2003), and 
Wall-E (2008).
4. Design Visionary: Jobs was known for his attention to detail and design sensibilities. He was instrumental in creating the sleek and minimalist 
aesthetic that became synonymous with Apple products, from the Macintosh computer to the iPod, iPhone, and iPad.
5. Innovator: Jobs was a master of innovation, constantly pushing the boundaries of what was possible with technology. He introduced the world to the 
iPod, which revolutionized the way people listened to music, and the iPhone, which transformed the smartphone industry.
6. Entrepreneurial Spirit: Jobs was a true entrepreneur, with a passion for creating new products and businesses. He was known for his ability to 
identify emerging trends and capitalize on them, often disrupting entire industries in the process.
7. Leadership Style: Jobs was known for his strong leadership style, which emphasized creativity, innovation, and attention to detail. He was also 
notorious for his demanding nature and ability to push his teams to achieve their best work.
8. Personal Life: Jobs was married to Laurene Powell Jobs and had four children. He was a Buddhist and a vegetarian, and he enjoyed playing pranks on 
his colleagues and friends.
9. Illness and Death: Jobs battled pancreatic cancer for several years before passing away in 2011 at the age of 56. His death was met with an 
outpouring of tributes and memories from around the world, recognizing his impact on technology, design, and entrepreneurship.

Steve Jobs' legacy continues to inspire and influence generations of innovators, entrepreneurs, and designers, and his work at Apple and Pixar 
Animation Studios remains a testament to his visionary spirit and creative genius.

GPU가 있으면 바로 가속이 되고 없다면 CPU를 사용해서 문장이 생성되는데 시간이 좀 걸립니다.

LlaMa2이외에 다양한 언어 모델을 지원해서 서로 비교해 볼 수 있고, 7B모델 뿐만 아니라 13B와 70B모델도 CPU 메모리로 로딩해서 사용해볼 수 있습니다. 한번 테스트해보세요.

'AI' 카테고리의 다른 글

Loz: 깃 커밋 메세지(git commit message) 자동 작성 툴 (0)	2024.01.31
Code LLaMa를 이용해서 GIT 커밋 메시지 자동 작성하기 (1)	2024.01.28
Llama-cpp로 Code LlaMa 실행하기 (1)	2024.01.24
CUDA를 이용한 LlaMa2사용하기 (0)	2024.01.18
DeepMind가 공개한 생성형 AI를 이용한 ALOHA 로봇 시연 영상 (1)	2024.01.14

Microsoft가 Electron을 버린 이유

Web2024. 1. 27. 07:19

Electron은 데스크트탑용 웹앱을 만드는데 사용된 일종의 런타임입니다. Chromium엔진 기반으로 되어 있고. node.js API를 제공해서 쉽게 HTML, JS, CSS로 데스크탑앱을 만들 수 있습니다. 문제는 Hello World 앱을 만들어도 무거운 Chromium엔진을 실행해야 하는데, 다른 Electron앱 런타임 공유도 안됩니다. 기대가 두개의 서로 다른 V8엔진을 실행해야했습니다.

https://techcommunity.microsoft.com/t5/microsoft-teams-blog/microsoft-teams-advantages-of-the-new-architecture/ba-p/3775704

결국, Microsft가 칼을 들었군요. Edge WevView2를 만들어서 Electron을 대치하기 시작했습니다. 첫번째로 Microsoft Teams가 Edge WebView2를 사용하기로 결정했고, 이미 새 버전은 WebView2로 릴리스가 되었을 겁니다. 흥미로운 것은 ReactJS와 Apollo GraphQL를 사용한 부분이네요. 이를 통해 100%성능을 높였고, 저정 공간도 많이 아꼈다고 합니다.

With this change, we are taking a major step in #MicrosoftTeams Teams architecture. We are moving away from Electron to Edge Webview2. Teams will continue to remain a hybrid app but now it will be powered by #MicrosoftEdge. Also Angular is gone. We are now 100% on reactjs
— Rish Tandon (@TandonRish) June 24, 2021

https://twitter.com/TandonRish/status/1408085784016539653

단, 아직은 Edge WebView2는 공식적으로 리눅스와 맥은 지원하지 않습니다. 하지만 내부적으로는 사용하고 있는 것 같습니다. 그래서 아직 Electron을 버릴 때는 아닌 것 같고, 좀 더 두고보면 Electron이 완전히 deprecated 되지 않을까 예상해봅니다.

참고
* https://dev.to/noseratio/some-thoughts-on-the-new-microsoft-teams-2-0-architecture-webview2-reactjs-1gf1

* https://techcommunity.microsoft.com/t5/microsoft-teams-blog/microsoft-teams-advantages-of-the-new-architecture/ba-p/3775704

'Web' 카테고리의 다른 글

JavaScript Event Loop와 Task Queue 이해하기 (1)	2025.02.09
Edge WebView2로 나만의 웹브라우저 만들기 (0)	2024.01.27
headless 브라우저 감지 회피하기 (0)	2023.06.09
Web fonts로 인한 layout 변경 문제 (0)	2023.03.18
타입스크립트(Typescript) 스타터 코드 (0)	2022.09.04

Edge WebView2로 나만의 웹브라우저 만들기

Web2024. 1. 27. 06:27

WebView는 웹 브라우저 기능을 앱 내에서 재사용할 수 있도록 컴포넌트화한 것으로, 대부분의 운영체제에서 지원되어 앱에 웹페이지 렌더링 기능을 쉽게 추가할 수 있습니다. 마이크로소프트는 IE와 Edge 엔진을 포기하고 크로미엄 엔진으로 전환했으며, 이에 따라 기존의 IE 컨트롤이 Edge WebView2로 교체되었습니다.

https://github.com/MicrosoftEdge/WebView2Browser

GitHub - MicrosoftEdge/WebView2Browser: A web browser built with the Microsoft Edge WebView2 control.

A web browser built with the Microsoft Edge WebView2 control. - GitHub - MicrosoftEdge/WebView2Browser: A web browser built with the Microsoft Edge WebView2 control.

github.com

이를 통해 누구나 쉽게 웹 브라우저를 개발할 수 있으며, 관련 코드는 공개되어 있지 않지만 GitHub의 MicrosoftEdge/WebView2Browser 저장소에서 샘플 웹 브라우저 코드를 제공하고 있습니다. 현재는 Windows와 Xbox 플랫폼만 지원되지만, 향후 Android와 macOS에서도 지원될 것으로 기대합니다.

더 많은 정보는 마이크로소프트의 공식 웹사이트에서 확인할 수 있습니다.

https://learn.microsoft.com/en-us/microsoft-edge/webview2/

Introduction to Microsoft Edge WebView2 - Microsoft Edge Development

Host web content in your Win32, .NET, UWP apps with the Microsoft Edge WebView2 control.

learn.microsoft.com

'Web' 카테고리의 다른 글

JavaScript Event Loop와 Task Queue 이해하기 (1)	2025.02.09
Microsoft가 Electron을 버린 이유 (0)	2024.01.27
headless 브라우저 감지 회피하기 (0)	2023.06.09
Web fonts로 인한 layout 변경 문제 (0)	2023.03.18
타입스크립트(Typescript) 스타터 코드 (0)	2022.09.04

크롬 Headless 모드에서 GPU 가속하기

Chromium2024. 1. 25. 03:51

크롬 브라우저의 headless 모드는 웹사이트 테스트나 웹페이지 크롤링에 주로 사용된다. 이때, 웹페이지를 png나 pdf로 저장하기도 하기도 하고, WebGL과 같은 3D 컨텐츠를 테스트할 필요도 있다. 늘 그렇지만 기본적으로 3D 가속은 되지 않는다. 때로는 가속되다가도 크롬 브라우저가 버전업하면서 안되기도 한다. 당연히 버그도 이미 올라와있고, 이 글을 쓰는 본인도 약간의 기여를 한적은 있다. :-)

최근 구글 크롬 개발자 블로그에서 헤드리스 모드에서 GPU 가속을 활성화하는 방법을 자세히 공개했다.

https://developer.chrome.com/blog/supercharge-web-ai-testing

웹 AI 모델 테스트 강화: WebGPU, WebGL, 헤드리스 Chrome | Blog | Chrome for Developers

브라우저 테스트를 지속적으로 자동화하고 애플리케이션 성능을 개선하는 방법에 대한 Google Cloud 솔루션을 살펴보세요.

developer.chrome.com

사실 리눅스용 크롬 브라우저에서는 GPU 가속이 꺼져있다. 리눅스는 distro도 다양하고 게다가 GPU벤더가 지원하는 공식 GPU 드라이버에서 해커들이 개발한 오픈소스 버전도 있기 때문에 구글에서 GPU 가속을 기본적으로 꺼놨다. 현재도 마찬가지로 일부 distro는 자체 GPU 가속이 되는 크롬 브라우저 패키지를 내놓고도 있거나, GPU 가속 방법을 알려주고 있다.

아래 코드를 실행하려면 먼저 GPU driver를 설치되고 잘 동작해야 한다. 아래는 우분투 22.04에서 동작하는 NVidia GPU 예이다.

$ nvidia-smi
Wed Jan 24 10:32:13 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 545.29.06              Driver Version: 545.29.06    CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:01:00.0 Off |                  Off |
|  0%   45C    P8              13W / 450W |   1078MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A      4265      G   /usr/bin/gnome-shell                          6MiB |
|    0   N/A  N/A     42318      C   /usr/local/bin/ollama                      1056MiB |
+---------------------------------------------------------------------------------------+

아래 코드는 크롬 브라우저의 GPU 환경 페이지를 pdf로 출력해주는 코드이다.

/* Complete example.js */
import puppeteer from 'puppeteer';

// Configure launch parameters
const browser = await puppeteer.launch({
  headless: 'new',
  args: [
    '--no-sandbox',
    '--headless=new',
    '--use-angle=vulkan',
    '--enable-features=Vulkan',
    '--disable-vulkan-surface',
    '--enable-unsafe-webgpu',
  ]
});

const page = await browser.newPage();
await page.goto('chrome://gpu');

// Verify: log the WebGPU status or save the GPU report as PDF
const txt = await page.waitForSelector('text/WebGPU');
const status = await txt.evaluate(g => g.parentElement.textContent);
console.log(status);
await page.pdf({path: './gpu.pdf'});

await browser.close();

크롬 브라우저를 직접 실행하여 'chrome://gpu' 페이지를 확인하고, 위 코드로 생성된 PDF 결과가 동일하다면 헤드리스 모드에서 GPU 가속이 지원되는 것으로 볼 수 있다.

'Chromium' 카테고리의 다른 글

크로미엄(Chromium) 프로젝트 기여하기 (2)	2021.05.01
크롬대학(Chrome University) 2018: Chrome OS Graphics 101 (0)	2020.05.09

Llama-cpp로 Code LlaMa 실행하기

AI2024. 1. 24. 15:41

Llama-cpp를 이용하면 GPU없이도 CPU만으로 일반 컴퓨터에서 Llama2를 실행할 수 있습니다. 이번 글에서는 그 과정을 살펴보겠습니다.

빌드하기

Ubuntu 22.04에서 테스트했습니다.

git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
make

모델 다운로드

아래 사이트에 가면 LlaMa2 모델을 다운로드 받는 방법이 나옵니다.

https://github.com/facebookresearch/llama

Meta 웹사이트에서 등록을해서 메일로 download URL을 받습니다. 받은 URL은 24시간만 유효하니 미리 모두 받는 것을 권장합니다.

https://ai.meta.com/resources/models-and-libraries/llama-downloads/

git clone https://github.com/facebookresearch/codellama
./download.sh  "email로 받은 URL"

...

Enter the list of models to download without spaces (7b,13b,34b,7b-Python,13b-Python,34b-Python,7b-Instruct,13b-Instruct,34b-Instruct), or press Enter for all:

모델 변환하기

로컬 PC에서 실행하려면 모델을 변환해야 합니다.

CodeLlama-7b 모델을 예로 하겠습니다.

우선 다운로드 받은 모델을 ggml 형식으로 변환합니다.

python3 convert.py models/CodeLlama-7b/

바로 Inference 해보았습니다. multiply함수를 자바로 생성해주었습니다.

llama.cpp$ ./main -m ./models/CodeLlama-7b/ggml-model-f16.gguf -p "int multiply("
Log start
main: build = 1939 (57744932)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed  = 1706076174
llama_model_loader: loaded meta data with 16 key-value pairs and 291 tensors from ./models/CodeLlama-7b/ggml-model-f16.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
..
..

	repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
	top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order: 
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temp 
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0


 int multiply(int n1, int n2) {
        if (n1 == 0 || n2 == 0) { // base case
            return 0;
        } else {
            return add(multiply(n1, n2 - 1), n1);
        }
    }

    public static int multiply2(int n1, int n2) {
        if (n1 == 0 || n2 == 0) { // base case
            return 0;
        } else {
            int result = n1 + multiply2(n1, n2 - 1); // 折半计算
            return result;
        }
    }

    public static void main(String[] args) {
        System.out.println("0 * 1 = " + multiply(0, 1));
        System.out.println("1 * 2 = " + multiply(1, 2));
        System.out.println("1 * 3 = " + multiply(1, 3));
        System.out.println("1 * 4 = " + multiply(1, 4));
        System.out.println("2 * 3 = " + multiply(2, 3));
        System.out.println("2 * 5 = " + multiply(2, 5));
        System.out.println("3 * 10 = " + multiply(3, 10));
        System.out.println("3 * 4 = " + multiply(3, 4));
        System.out.println("4 * 8 = " + multiply(4, 8));
        System.out.println("5 * 6 = " + multiply(5, 6));
        System.out. [end of text]

llama_print_timings:        load time =   59120.92 ms
llama_print_timings:      sample time =      49.16 ms /   398 runs   (    0.12 ms per token,  8095.19 tokens per second)
llama_print_timings: prompt eval time =     427.50 ms /     4 tokens (  106.88 ms per token,     9.36 tokens per second)
llama_print_timings:        eval time =  116470.88 ms /   397 runs   (  293.38 ms per token,     3.41 tokens per second)
llama_print_timings:       total time =  117027.28 ms /   401 tokens
Log end

그런데 1분정도 시간이 걸렸습니다. 이제 4비트 모델로 변환해서 성능을 높여보겠습니다.

4비트 모델로 변환하기 (Quantization)

./quantize ./models/CodeLlama-7b/ggml-model-f16.gguf ./models/CodeLlama-7b/ggml-model-q4_0.gguf q4_0

Inference 해보기

-n 으로 token수를 지정할 수 있습니다.

llama.cpp$ ./main -m ./models/CodeLlama-7b/ggml-model-q4_0.gguf -n 256 -p "int multiply("
Log start
main: build = 1939 (57744932)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: seed  = 1706076680
llama_model_loader: loaded meta data with 17 key-value pairs and 291 tensors from ./models/CodeLlama-7b/ggml-model-q4_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = models
llama_model_loader: - kv   2:                       llama.context_length u32              = 16384
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  10:                       llama.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  11:                          general.file_type u32              = 2
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,32016]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  14:                      tokenizer.ggml.scores arr[f32,32016]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  15:                  tokenizer.ggml.token_type arr[i32,32016]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  16:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q4_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: mismatch in special tokens definition ( 264/32016 vs 259/32016 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32016
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 16384
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 4096
llm_load_print_meta: n_embd_v_gqa     = 4096
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-05
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 11008
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 1000000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 16384
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q4_0
llm_load_print_meta: model params     = 6.74 B
llm_load_print_meta: model size       = 3.56 GiB (4.54 BPW) 
llm_load_print_meta: general.name     = models
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'

sampling: 
	repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
	top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order: 
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temp 
generate: n_ctx = 512, n_batch = 512, n_predict = 256, n_keep = 0


 int multiply(int a, int b) {
	return a * b;
}

int divide(int a, int b) {
	return a / b;
}
```

We can add some simple validation to the function to make sure the input values are sane.  We'll just ensure that `a` and `b` are greater than zero:

```csharp
[DllImport("MyMath")]
public static extern int multiply(int a, int b);

[DllImport("MyMath")]
public static extern int divide(int a, int b);

// Note that we've added the [SuppressUnmanagedCodeSecurity] attribute here.  
[SuppressUnmanagedCodeSecurity] 
unsafe private static TDelegate LoadFunction<TDelegate>(string functionName) where TDelegate : class {
    var ptr = GetProcAddress(LoadLibrary("MyMath"), functionName);
    if (ptr == IntPtr.Zero)
        throw new InvalidOperationException($"Failed to get address for: '{functionName}'");

	var fp = Marshal.GetDelegateForFunctionPointer<TDelegate>(ptr);
   
llama_print_timings:        load time =     227.93 ms
llama_print_timings:      sample time =      33.48 ms /   256 runs   (    0.13 ms per token,  7647.27 tokens per second)
llama_print_timings: prompt eval time =     144.80 ms /     4 tokens (   36.20 ms per token,    27.62 tokens per second)
llama_print_timings:        eval time =   24956.50 ms /   255 runs   (   97.87 ms per token,    10.22 tokens per second)
llama_print_timings:       total time =   25187.38 ms /   259 tokens
Log end

이번에는 C# 코드가 나왔네요. finetuning이 안된 모델이라서 아무렇게나(?) 코드가 생성됩니다. 4bit로 quantization를 하니까 성능이 2배 정도 빨라졌습니다.

13478367200 Jan 23 19:47 ggml-model-f16.gguf
 3825898016 Jan 23 19:52 ggml-model-q4_0.gguf

모델 크기는 3배 이상 줄어들었습니다. 32비트를 4비트로 줄였으니, 당연한 결과겠지요.

참고

* https://blog.gopenai.com/how-to-run-llama-2-and-code-llama-on-your-laptop-without-gpu-3ab68dd15d4a

'AI' 카테고리의 다른 글

Code LLaMa를 이용해서 GIT 커밋 메시지 자동 작성하기 (1)	2024.01.28
Ollama: 내 컴퓨터에 설치하는 생성형 AI 서비스 (1)	2024.01.28
CUDA를 이용한 LlaMa2사용하기 (0)	2024.01.18
DeepMind가 공개한 생성형 AI를 이용한 ALOHA 로봇 시연 영상 (1)	2024.01.14
LLaMa(Alpaca-LoRa)를 이용한 나만의 ChatGPT 만들기 (1)	2023.09.04

CUDA를 이용한 LlaMa2사용하기

AI2024. 1. 18. 16:54

본 블로그에서 alpaca-lora를 가진고 finetuing하는 방법을 공개한적이 있습니다. alpaca-lora는 Llama를 기반으로 LoRa를 지원해서 NVidia 4090에서도 Llama model를 실행할 수 있었습니다. Meta에서 지난 7월에 Llama2를 공개했고, 쉽게 Llama를 테스트해볼 수 있는 오픈소스 프로젝트를 공개해서, 더 이상 alpaca-lora는 필요가 없어졌습니다. 이제 quantization도 직접 지원해서 개인용 GPU에서도 Llama2를 테스트해 볼 수 있습니다. 더 놀라운 것은 Code Llama, Llama-chat와 같은 특정 용도로 학습된 모델을 공개했다는 점입니다. 생성현 AI 서비스를 개발하려는 회사에게는 희소식이 아닐 수 없습니다. 우선, 이번 글에서는 리눅스 우분투 22.04에서 Llama2설치해서 간단하게 테스트해본 결과를 정리해보았습니다.

NVidia Driver 설치

우선 우분투22 .04에 NVidia Driver를 설치합니다.
https://www.nvidia.com/download/index.aspx

“You appear to be running an X server; please exit X before installing”
위와 같은 에러 메시가 나오면 현재 x-server나와서 weston로 다시 로그인해서 설치를 합니다.

설치가 잘 되었으면, nvidia-smi를 실행하면 아래와 같이 NVIDIA GPU 상태가 나옵니다.

llama2$ nvidia-smi
Wed Jan 17 17:26:57 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA GeForce RTX 4090        Off | 00000000:01:00.0 Off |                  Off |
|  0%   45C    P8              13W / 450W |   1243MiB / 24564MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|    0   N/A  N/A     15496      G   /usr/bin/gnome-shell                          6MiB |
|    0   N/A  N/A    667982    C+G   ...seed-version                            1201MiB |
+---------------------------------------------------------------------------------------+

Python & Pytorch 설치

이제, Python을 설치합니다.

$ sudo apt install python3-pip
$ sudo apt install python3-venv
$ python3 -m venv env
$ source env/bin/activate

venv를 통해 python 실행환경을 구성합니다. venv를 이용하면 여러 python 실행환경을 구성할 수 있습니다.

Llama-receipe 저장소 복사하기

https://github.com/facebookresearch/llama-recipes

아래와 같이 설치를 진행합니다.

git clone git@github.com:facebookresearch/llama-recipes.git
cd llama-recipes
pip install -U pip setuptools
pip install --extra-index-url https://download.pytorch.org/whl/test/cu118 -e .[tests,auditnlg,vllm]

아래 코드를 실행해서 GPU Driver, CUDA, Pytorch가 제대로 설치되었는 확인합니다.

import torch

# Check if CUDA is available
if torch.cuda.is_available():
	print("CUDA is available.")
	# Print the CUDA device count
	print(f"Number of CUDA devices: {torch.cuda.device_count()}")
	# Print the name of the current CUDA device
	print(f"Current CUDA device name: {torch.cuda.get_device_name(torch.cuda.current_device())}")
else:
	print("CUDA is not available.")

print("pytorch version")
print(torch.__version__)

python3 cuda.py 
CUDA is available.
Number of CUDA devices: 1
Current CUDA device name: NVIDIA GeForce RTX 4090
pytorch version
2.1.2+cu118

모델 다운로드 받기

https://huggingface.co/meta-llama

우선 여기서 https://huggingface.co/meta-llama/Llama-2-7b-hf 을 받습니다. RTX 4090에서 동작가능한 것인 7B model 이라서 그 이상 크기는 로딩을 못합니다. :-(

또는 메타 사이트에서 다운로드 받을 수 있습니다: https://ai.meta.com/resources/models-and-libraries/llama-downloads/

다운로드 받은 모델로 Inference해보기

요약하기 데모입니다. Llama-2-7b-hf이 사용되었습니다.

llama-recipes$ cat examples/samsum_prompt.txt |  python3  examples/inference.py --model_name ../Llama-2-7b-hf/
User prompt deemed safe.
User prompt:
Summarize this dialog:

A: Hi Tom, are you busy tomorrow’s afternoon?

B: I’m pretty sure I am. What’s up?

A: Can you go with me to the animal shelter?.

B: What do you want to do?

A: I want to get a puppy for my son.

B: That will make him so happy.

A: Yeah, we’ve discussed it many times. I think he’s ready now.

B: That’s good. Raising a dog is a tough issue. Like having a baby ;-) 

A: I'll get him one of those little dogs.

B: One that won't grow up too big;-)

A: And eat too much;-))

B: Do you know which one he would like?

A: Oh, yes, I took him there last Monday. He showed me one that he really liked.

B: I bet you had to drag him away.

A: He wanted to take it home right away ;-).

B: I wonder what he'll name it.

A: He said he’d name it after his dead hamster – Lemmy  - he's  a great Motorhead fan :-)))

---

Summary:
Loading checkpoint shards:   0%|                                                                         | 0/2 [00:00<?, ?it/s]
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████| 2/2 [00:05<00:00,  2.76s/it]
WARNING:root:Some parameters are on the meta device device because they were offloaded to the cpu.
Asking to pad to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no padding.
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
the inference time is 56547.49874421395 ms
User input and model output deemed safe.
Model output:
Summarize this dialog:

A: Hi Tom, are you busy tomorrow’s afternoon?

B: I’m pretty sure I am. What’s up?

A: Can you go with me to the animal shelter?.

B: What do you want to do?

A: I want to get a puppy for my son.

B: That will make him so happy.

A: Yeah, we’ve discussed it many times. I think he’s ready now.

B: That’s good. Raising a dog is a tough issue. Like having a baby ;-) 

A: I'll get him one of those little dogs.

B: One that won't grow up too big;-)

A: And eat too much;-))

B: Do you know which one he would like?

A: Oh, yes, I took him there last Monday. He showed me one that he really liked.

B: I bet you had to drag him away.

A: He wanted to take it home right away ;-).

B: I wonder what he'll name it.

A: He said he’d name it after his dead hamster – Lemmy  - he's  a great Motorhead fan :-)))

---

Summary: There are a few other ways to make this dialog more interesting. For example:

A: Let me know if you can’t go;-))

B: It’s ok. I'm not sure of this yet.

A: Ok, so I’ll take your place. Who’s going?

B: No, don't forget: you already have dinner with your family ;-))

A: Oh, sure, I do

Summary를 보면 전체 대화 내용이 요약되어 있는 것을 볼 수 있습니다.

다음 chat completion 데모입니다. 이 데모에서는 Llama-2-7b-chat-hf 모델을 사용했습니다.

llama-recipes$ python examples/chat_completion/chat_completion.py --model_name ../Llama-2-7b-chat-hf/ --prompt_file examples/chat_completion/chats.json  --quantization
User dialogs:
[[{'role': 'user', 'content': 'what is the recipe of mayonnaise?'}], [{'role': 'user', 'content': 'I am going to Paris, what should I see?'}, {'role': 'assistant', 'content': "Paris, the capital of France, is known for its stunning architecture, art museums, historical landmarks, and romantic atmosphere. Here are some of the top attractions to see in Paris:1. The Eiffel Tower: The iconic Eiffel Tower is one of the most recognizable landmarks in the world and offers breathtaking views of the city. 2. The Louvre Museum: The Louvre is one of the world's largest and most famous museums, housing an impressive collection of art and artifacts, including the Mona Lisa. 3. Notre-Dame Cathedral: This beautiful cathedral is one of the most famous landmarks in Paris and is known for its Gothic architecture and stunning stained glass windows.These are just a few of the many attractions that Paris has to offer. With so much to see and do, it's no wonder that Paris is one of the most popular tourist destinations in the world."}, {'role': 'user', 'content': 'What is so great about #1?'}], [{'role': 'system', 'content': 'Always answer with Haiku'}, {'role': 'user', 'content': 'I am going to Paris, what should I see?'}], [{'role': 'system', 'content': 'Always answer with emojis'}, {'role': 'user', 'content': 'How to go from Beijing to NY?'}], [{'role': 'system', 'content': "You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature. If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information."}, {'role': 'user', 'content': 'Write a brief birthday message to John'}]]

==================================

Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████| 2/2 [00:04<00:00,  2.17s/it]
User prompt deemed safe.
User prompt:
 what is the recipe of mayonnaise?

==================================

User input and model output deemed safe.
Model output:
[INST] what is the recipe of mayonnaise? [/INST]  Mayonnaise is a rich and creamy condiment made from a mixture of egg yolks, oil, and vinegar or lemon juice. Here is a basic recipe for homemade mayonnaise:

Ingredients:

* 2 large egg yolks
* 1/2 cup (120 ml) neutral-tasting oil, such as canola or grapeseed
* 1-2 tablespoons (15-30 ml) vinegar or lemon juice
* Salt and pepper to taste

Instructions:

1. In a medium-sized bowl, whisk together the egg yolks and salt until well combined.
2. Slowly pour in the oil while continuously whisking the mixture. Start with a slow trickle and gradually increase the flow of oil as the mixture thickens.
3. Once about 1/4 cup (60 ml) of oil has been added, begin to add the vinegar or lemon juice, again whisking constantly.
4. Continue whisking until the mixture has thickened and

- 이하 생략 -

다음글에서 어떻게 fine-tuning을 하는지 알아보겠습니다.

'AI' 카테고리의 다른 글

Code LLaMa를 이용해서 GIT 커밋 메시지 자동 작성하기 (1)	2024.01.28
Ollama: 내 컴퓨터에 설치하는 생성형 AI 서비스 (1)	2024.01.28
Llama-cpp로 Code LlaMa 실행하기 (1)	2024.01.24
DeepMind가 공개한 생성형 AI를 이용한 ALOHA 로봇 시연 영상 (1)	2024.01.14
LLaMa(Alpaca-LoRa)를 이용한 나만의 ChatGPT 만들기 (1)	2023.09.04

이전 1 2 3 4 5 ··· 34 다음

카테고리

최근 등록 현황

달력

일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

태그 구름

카운터

전체 글에 해당하는 글 269건

시작하기

LLM 구성

Ollama 설정

OpenAI API 설정

초기 구성

파이프 모드

GIT 커밋 메시지 자동 작성

'AI' 카테고리의 다른 글

'AI' 카테고리의 다른 글

'AI' 카테고리의 다른 글

'Web' 카테고리의 다른 글

'Web' 카테고리의 다른 글

'Chromium' 카테고리의 다른 글

빌드하기

모델 다운로드

모델 변환하기

Inference 해보기

'AI' 카테고리의 다른 글

NVidia Driver 설치

Python & Pytorch 설치

Llama-receipe 저장소 복사하기

모델 다운로드 받기

다운로드 받은 모델로 Inference해보기

'AI' 카테고리의 다른 글

최근 글

최근 월별 글

최근 댓글

최근 글

최근 월별 글

최근 댓글

최근 글

최근 월별 글

최근 댓글

지난달

2025.8

다음달

티스토리툴바