Road to Data Analyst
-
[matplotlib] Merge two graphs into oneRoad to Data Analyst/Python 2023. 3. 1. 14:24
twinx() import pandas as pd import matplotlib.pyplot as plt import numpy as np f, ax = plt.subplots(1,1) x_values = [0,1,2,3,4,5] y_values = [0,1,4,9,16,25] ax.plot(x_values, y_values, color='red', linestyle='dashed', linewidth=1, marker="o") x2_values = [0,1,2,3,4,5] y2_values = [1000,2000,3000,4000,5000,6000] ax2=ax.twinx() ax2.plot(x2_values, y2_values, color='darkblue', linestyle='dashed', l..
-
[pandas] merge/concat data-frameRoad to Data Analyst/Python 2023. 1. 21. 11:38
Sometimes, two data set should be merged to provide more accurate information. Merging data frame is divided into two parts: vertical and horizontal merging. 1) Vertical merging import pandas as pd df1 = pd.DataFrame({'A' : [1, 2, 3], 'B' : [11, 12, 13], 'C' : [21, 22, 23]}) df2 = pd.DataFrame({'A' : [4, 5, 6], 'B' : [14, 15, 16], 'C' : [24, 25, 26]}) print(df.concat([df1, df2])) print(df.concat..
-
[pandas] Data transformation - using current data to classifyRoad to Data Analyst/Python 2023. 1. 21. 10:53
import pandas as pd df = pd.DataFrame({'a':[1,2,3,4,5]}) # create a 'b' column if a= 2)] df['b'][a.index] = 's4' a = df[df['a'] > 4] df['b'][a.index] = 'b4' print(df) # Second method. using 'Apply with function' def apply_function(a): if a < 2: return 's2' elif a < 4: return 's4' else: return 'b4' df['b'] = df['a'].apply(apply_function) # Map - when there is a certain categories without conditio..
-
[pandas] add new column / delete columnRoad to Data Analyst/Python 2023. 1. 21. 10:37
1. Column import pandas as pd df = pd.DataFrame({'a': [1,1,3,4,5], 'b': [2,3,2,3,4], 'c': [3,4,7,6,4]}) # 1. to simply add new column df['d'] = [1,3,6,4,8] # 2. add one number in that column (will be filled out with that number) df['e'] = 1 # +) calculated result can be also created as a new column. # check datatype first print(df.dtypes) df['f'] = df['a'] + df['b'] - df['c'] # delete a column d..
-
[pandas] Data type conversion (astype etc...)Road to Data Analyst/Python 2023. 1. 21. 10:12
import pandas as pd df = pd.DataFrame({'date' : ['5/11/21', '5/12/21', '5/13/21', '5/14/21', '5/15/21'], 'sales' : ['10', '15', '20', '25', '30'], 'visitors' : ['10', '-', '17', '23', '25'], 'temp.' : ['24.1', '24.3', '24.8', '25', '25.4']}) # we need to check data type for each column (to change/edit data) print(df.dtypes) # try to change the value without data conversion df['edited sales'] = d..
-
[pandas] Missing valueRoad to Data Analyst/Python 2023. 1. 18. 10:30
Table of contents 0) Intro 1) Check the dataset whether it has any missing value 2) Delete the row / column that has any missing value 3) Replace the missing value with other neighboring value / average 0) Intro In the dataset, it is likely to have some missing values. To analyze the data, it is important to decide how to cope with this missing value. To make an example, we need NumPy. import pa..
-
[pandas] SortRoad to Data Analyst/Python 2023. 1. 18. 09:50
Basically, in pandas, there are two basic ways to sort - (1) sort by index and (2) sort by value. 1) Sort by index df = pd.DataFrame({'a': [2,3,2,7,4], 'b': [2,1,3,5,3], 'c': [1,1,2,3,5]}) # ascending print(df.sort_index()) # descending print(df.sort_index(acending=False)) If you want to reset the index information as the one we sorted, df.sort_index(ascending=False, inplace=True) # (1) print(df..
-
[pandas] value_counts() 특정변수 least occurrence로 정렬하기Road to Data Analyst/Python 2022. 9. 24. 08:23
value_counts() import pandas as pd df = pd.read_csv(url, sep='|') #|로 나뉘어진 부분 기준 분리 users = df users.set_index('user_id', inplace=True) #inplace=True : 기존 변수에 덮어쓰기; 다른 데이터 볼때 user_id는 기본으로 딸려오는 느낌 users.age.value_counts(ascending=True) value_counts를 통해 특정 변수의 (특정 column) 빈도 / counts를 구할 수 있다. age가 7, 10, 11, 66, 73인 data가 하나임을 확인할 수 있다.