ML study

[네이버AI class] 1주차 - 개발 환경 설정, Pandas, Numpy

mlslly 2024. 5. 5. 12:32

* 네이버 AI 엔지니어 부스트 클래스 수강 내용을 참고하여 작성

 

Numpy, Pandas 기초문제

 

1. 행렬곱 연산 

>>> import numpy as np

>>> arr1 = np.random.rand(5,3)
>>> arr2 = np.random.rand(3,2)
>>> arr1 @ arr2

array([[0.30803948, 0.94545996],
       [0.22873815, 0.3066217 ],
       [0.33170786, 0.60242841],
       [0.3039172 , 0.5035964 ],
       [0.28638591, 0.98754071]])

 

2. concatenate 연산 

>>> import numpy as np

>>> arr1 = [[5,7], [9,11]] 
>>> arr2 = [[2,4], [6,8]]

>>> print(np.concatenate([arr1, arr2], axis=0))
[[ 5  7]
 [ 9 11]
 [ 2  4]
 [ 6  8]]
 
>>> print(np.concatenate([arr1, arr2], axis=1))
[[ 5  7  2  4]
 [ 9 11  6  8]]

 

3. pandas - 딕셔너리, Series

>>> import pandas as pd

>>> idx = ["HDD", "SSD", "USB", "CLOUD"]
>>> data = [19, 11, 5, 97]
>>> dic = dict(zip(idx, data))

>>> series = pd.Series(dic) 
>>> filtered_series = series[(series >= 10) & (series <= 20)]
>>> filtered_series

HDD    19
SSD    11

 

4. pandas - 표만들기 및 데이터 추출

>>> import pandas as pd

>>> df1 = {
...     'Name' : ['cherry','mango','potato','onion'],
...     'Type' : ['fruit','fruit','vegetable','vegetable'],
...     'Price' : [100,110,60,80]
... }
>>> df2 = {
...     'Name': ['pepper','carrot','banana','kiwi'],
...     'Type': ['vegetable','vegetable','fruit','fruit'],
...     'Price': [50,70,90,120]
>>> df1 = pd.DataFrame(df1)
>>> df2 = pd.DataFrame(df2)

>>> df = pd.concat([df1, df2], axis=0)
>>> df.sort_values(by= 'Type', inplace=True)
>>> df.reset_index(drop=True, inplace=True)
>>> df

     Name       Type  Price
0  cherry      fruit    100
1   mango      fruit    110
2  banana      fruit     90
3    kiwi      fruit    120
4  potato  vegetable     60
5   onion  vegetable     80
6  pepper  vegetable     50
7  carrot  vegetable     70

>>> max_fruit_price = df.loc[df['Type'] == 'fruit', 'Price'].max()
>>> max_vegetable_price = df.loc[df['Type'] == 'vegetable', 'Price'].max()

>>> print(max_fruit_price + max_vegetable_price)

200

 

5. pandas - 데이터프레임, describe() 

>>> import pandas as pd

>>> df= {
...     'sue':[55, 65, 60, 66, 57],
...     'ryan':[64, 77, 71, 79, 67],
...     'jay':[88, 81, 79, 89, 77],
...     'jane':[45, 35, 30, 46, 47],
...     'anna':[91, 96, 90, 97, 99]
... }
>>> df = pd.DataFrame(df)
>>> df.columns = ['round_1', 'round_2', 'round_3', 'round_4', 'round_5']
>>> df
   round_1  round_2  round_3  round_4  round_5
0       55       64       88       45       91
1       65       77       81       35       96
2       60       71       79       30       90
3       66       79       89       46       97
4       57       67       77       47       99

>>> df.describe().loc[['mean','max','min']]
      round_1  round_2  round_3  round_4  round_5
mean     60.6     71.6     82.8     40.6     94.6
max      66.0     79.0     89.0     47.0     99.0
min      55.0     64.0     77.0     30.0     90.0