-
[pandas] Data type conversion (astype etc...)Road to Data Analyst/Python 2023. 1. 21. 10:12
import pandas as pd df = pd.DataFrame({'date' : ['5/11/21', '5/12/21', '5/13/21', '5/14/21', '5/15/21'], 'sales' : ['10', '15', '20', '25', '30'], 'visitors' : ['10', '-', '17', '23', '25'], 'temp.' : ['24.1', '24.3', '24.8', '25', '25.4']}) # we need to check data type for each column (to change/edit data) print(df.dtypes) # try to change the value without data conversion df['edited sales'] = df['sales'] + 1
In the last sentence, the program won't return its calculated result. It is because df['sales'] is in object type. To calculate, the data should be converted into numeric type (int, float, etc.)
df = df.astype({'sales':'int'}) # now we can calculate/edit/change the data df['edited sales'] = df['sales'] + 1 print(df) # However when there is missing value in the data, the program will not convert the data # therefore, we need to coerce the program to convert the data. df['visitors'] = pd.to_numeric(df['visitors'], errors='coerce')
As you can see the result, the missing value is changed from '-' to 'NaN'
And then we need one more step to edit/calculate.
# convert it to 0 / other values first df.fillna(0, inplace=True) # then convert that data type of the column to int type df = df.astype({'visitors':'int'}) print(df) df['visitors+1'] = df['visitors'] + 1 print(df)
As you can see, after we finished to change the datatype, users can change the data.
extra) change the format of data into datetime format
df['date'] = pd.to_datetime(df['date'], format="%m/%d/%y") print(df)
Then it would be converted into a datetime format.
'Road to Data Analyst > Python' 카테고리의 다른 글
[pandas] Data transformation - using current data to classify (0) 2023.01.21 [pandas] add new column / delete column (0) 2023.01.21 [pandas] Missing value (0) 2023.01.18 [pandas] Sort (0) 2023.01.18 [pandas] value_counts() 특정변수 least occurrence로 정렬하기 (0) 2022.09.24