Study/Today_I_Learned

[Datacamp] Importing data in Python

가짜연구소 5기 러너로 활동하면서, 데이터캠프의 Data scientist course를 수강 중입니다. 시작한지는 꽤 되었지만 이어드림스쿨과 각종 프로젝트랑 기간이 겹쳐 따로 포스팅할 시간이 넉넉하지 않았어요. 12월 부로 이어드림스쿨이 마무리되기도 했고, 데이터캠프의 수업이 기초에 대한 공부를 하기에 굉장히 좋은 수업들이라고 생각해, 제가 몰랐던 부분들에 대해 간략하게 내용을 정리하는 포스팅을 하려고 합니다 👏

context manager를 사용해 따로 닫아주는 코드 없이 파일 불러오기

#open file without closing 
with open('file_name.txt', 'r') as file:
	print(file.read())

Flat files : 정보가 기록된 텍스트 파일 = 표 (e.g csv, txt files)
SAS : statistical anlysis system, 주로 비즈니스 분석과 바이오통계 분야에서 쓰임
hdf5 file : 대용량 데이터 다루기에 용이, 몇백 giga 혹은 terabytes 의 데이터셋 handling 가능, HDF group이 유지보수
hierarchical structure를 가지고 있음 - 3 keys: meta, quality, strain
matlab data - engineering & science 분야에서 산업 표준

#to load .mat file
import scipy.io
mat = scipy.io.loadmat(filename)

#type of mat : dict

관계형 데이터베이스 relational database
- 각 row들을 식별하기 위한 primary key가 필요
- 관계형 데이터베이스의 테이블들은 연결되어있음
- 관계형 데이터베이스 종류: PostgreSQL, mySQL, SQLite(심플하고 빠름)
SQL 쿼리의 순서

필요한 패키지와 함수 import
database engine 생성
engine 연결
query the database
query 결과를 dataframe으로 저장
연결 종료

#connecting to engine
from sqlalchemy import create_engine
engine = create_engine('sqlite:///Chinook.sqlite')

#connect
con = engine.connect()

#query 
rs = con.execute('SELECT * FROM Orders') #assign table to rs

#fetches all rows to df
df = pd.DataFrame(rs.fetchall())
df.columns = rs.keys() #set column names

#close the connection
con.close()

#using pandas, you can do it simpler
df = pd.read_sql_query('SQL query', engine)

저작자표시 비영리 변경금지 (새창열림)

'Study > Today_I_Learned' 카테고리의 다른 글

[Datacamp] Hypothesis tests and z-scores (1) (2)	2023.01.28
[Datacamp] Introduction to Regression with statsmodels (0)	2023.01.19
[알고리즘] 버블 정렬 Bubble sort (0)	2022.09.09
[알고리즘] 이진 탐색 Binary search (0)	2022.09.07
[알고리즘] 탐욕 알고리즘 Greedy algorithm (0)	2022.08.31

Contents

새소식

인기 검색어

[Datacamp] Importing data in Python

'Study > Today_I_Learned' 카테고리의 다른 글

당신이 좋아할만한 콘텐츠

티스토리툴바