ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • [pandas] Data type conversion (astype etc...)
    Road to Data Analyst/Python 2023. 1. 21. 10:12
    import pandas as pd
    df = pd.DataFrame({'date' : ['5/11/21', '5/12/21', '5/13/21', '5/14/21', '5/15/21'],
                       'sales' : ['10', '15', '20', '25', '30'], 'visitors' : ['10', '-', '17', '23', '25'],
                       'temp.' : ['24.1', '24.3', '24.8', '25', '25.4']})
    
    # we need to check data type for each column (to change/edit data)
    print(df.dtypes)
    
    # try to change the value without data conversion
    df['edited sales'] = df['sales'] + 1

    In the last sentence, the program won't return its calculated result. It is because df['sales'] is in object type. To calculate, the data should be converted into numeric type (int, float, etc.)

    df = df.astype({'sales':'int'})
    # now we can calculate/edit/change the data
    df['edited sales'] = df['sales'] + 1
    print(df)
    
    # However when there is missing value in the data, the program will not convert the data
    # therefore, we need to coerce the program to convert the data.
    df['visitors'] = pd.to_numeric(df['visitors'], errors='coerce')

     

    As you can see the result, the missing value is changed from '-' to 'NaN'

    And then we need one more step to edit/calculate.

    # convert it to 0 / other values first
    df.fillna(0, inplace=True)
    # then convert that data type of the column to int type
    df = df.astype({'visitors':'int'})
    print(df)
    
    df['visitors+1'] = df['visitors'] + 1
    print(df)

    As you can see, after we finished to change the datatype, users can change the data.

     

    extra) change the format of data into datetime format

    df['date'] = pd.to_datetime(df['date'], format="%m/%d/%y")
    print(df)

    Then it would be converted into a datetime format.