딥러닝을 위한 Pandas 개념 정리

Notice

Recent Posts

Recent Comments

Link

« 2025/11 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30

Tags more

Archives

Today

Total

관리 메뉴

ENN

딥러닝을 위한 Pandas 개념 정리 본문

딥러닝 NLP

딥러닝을 위한 Pandas 개념 정리

ParkIsComing 2022. 7. 25. 20:18

Pandas의 주요 구성 요소

dataframe: column과 row로 구성된 2차원 데이터셋
series : 1개의 column 값으로만 구성된 1차원 데이터셋
index : DMBS의 기본키(pk)랑 같은 개념

#기본 세팅
import pandas as pd
titanic_df = pd.read_csv('titanic.csv') #파일 불러오기

head()와 tail()

head()는 dataframe의 맨 앞부터 일부 데이터만 추출
titanic_df.head()
tail()은 dataframe의 맨 뒤부터 일부 데이터만 추출
titanic_df.tail()

#display 옵션 설정
pd.set_option('display.max_rows',100) #보이는 행의 개수 설정
pd.set_option('display.max_colwidth',100) #개별 칼럼의 길이 설정
pd.set_option('display.max_columns',100) #보이는 열의 개수 설정

#display
display(titanic_df)
display(titanic_df.head(3)) #앞에서 3개 추출
display(titanic_df.tail(3)) #뒤에서 3개 추출

shape

dataframe의 행과 열 크기를 알 수 있음
titanic_df.shape

DataFrame 생성 방법

딕셔너리를 dataframe으로 변환
새로운 컬럼명 추가
인덱스를 새로운 값으로 할당

dict = { 
'name': ['Jay', 'Katie', 'Jackson'], 
'age' : [12, 18, 20], 
'gender':['male', 'female', 'male']
}

#딕셔너리를 dataframe으로
students_df = pd.DataFrame(dict)

#새로운 컬럼명 추가
students_df = pd.DataFrame(dict, columns = ['name', 'age', 'gender', 'grade'])

#인덱스를 새로운 값으로 할당
student_df = pd.DataFrame(dict, index=['uno', 'dos', 'tres'])

DataFrame의 컬럼명과 인덱스 알기

#예시
print("columns: ", student_df.columns)
print("index: ", student_df.index)
print("index value: ", student_df.index.values) //index를 ndarray로 바꿔서 출력

info()

DataFrame의 컬럼명, 데이터타입, non-null건수, 데이터건수 등 관련 정보 제공
student_df.info()

describe()

데이터값들의 평균, 표준편차, 4분위분포도 제공
student_df.describe()

value_counts()

동일한 데이터값이 몇개 있는지에 대한 정보 제공.
Null값을 포함하여 개별 데이터 값의 건수를 계산할지 말지의 여부를 dropna로 명시. defalt는 True인데 이때는 Null 무시하고 개별 데이터 값의 건수 계산
건수 내림차순으로 정렬되어 출력

DataFrame / 리스트 / 딕셔너리 / 넘파이 ndarray 상호변환

변환 형태	방식
리스트 -> ndarray	ndarray_ex = np.array(list_ex)
리스트 -> DataFrame	df_1 = pd.DataFrame(list_ex, columns = [컬럼명1, 컬럼명2])
ndarray -> DataFrame	df_2 = pd.DataFrame(ndarray_ex, columns= [컬럼명1, 컬럼명2])
딕셔너리 -> DataFrame	df_3 = {'col1': [1, 11], 'col2':[2,22], 'col3':[3,33]} 딕셔너리 key -> 컬럼명 각 column에 들어갈 값 -> 리스트 형식으로 입력
DataFrame -> ndarray	df_example.values`
DataFrame -> 리스트	df_example.values.tolist()
DataFrame -> `딕셔너리	df_example.to_dict()

DataFrame의 칼럼 데이터 셋의 생성&수정

#예시
#칼럼 데이터 셋 생성
df_example['height']=150 #height라는 컬럼 생성하고 다 150으로 값 넣어줌
df_example['future_height'] = df_example['height'] +10 #기존 height 칼럼의 값들에 다 10씩 더해서 future height라는 새로운 칼럼 생성

drop()

형식: `df_ex.drop(삭제할 컬럼(들) 또는 로우(들))
여러개 삭제할 때는 drop()안에 컬럼명 또는 인덱스를 리스트 형태로 묶어서 넣음
axis를 설정해 column 또는 row 삭제
- axis=0 -> row 삭제
- axis=1 -> column 삭제
inplace를 설정해 원본 DataFrame 변경 여부 결정
- 원본 DataFrame은 유지하면서 드롭 완료된 DataFrame을 새로운 객체 변수에 넣고 싶으면 inplace=False
- 원본 DataFrame에 삭제된 상태를 적용하고 싶으면 inplace=True
- df_ex = df_ex.drop('컬럼명', axis=1, inplace=False) 도 가능.

#예시
df_ex = df_ex.drop([1,2,3], axis=0, inplace=False)
df_ex = df_ex.drop(['name','age'], axis=1, inplace=False)

'딥러닝 NLP' 카테고리의 다른 글

NLP, NLP Process, NLP Library (0)	2022.09.19
CNN과 RNN 비교 (0)	2022.09.07
자연어처리를 위한 NLTK라이브러리 (0)	2022.08.30

'딥러닝 NLP' Related Articles

ENN

딥러닝을 위한 Pandas 개념 정리 본문

딥러닝을 위한 Pandas 개념 정리

Pandas의 주요 구성 요소

head()와 tail()

shape

DataFrame 생성 방법

DataFrame의 컬럼명과 인덱스 알기

info()

describe()

value_counts()

DataFrame / 리스트 / 딕셔너리 / 넘파이 ndarray 상호변환

DataFrame의 칼럼 데이터 셋의 생성&수정

drop()

'딥러닝 NLP' 카테고리의 다른 글

티스토리툴바