ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • [pandas] add new column / delete column
    Road to Data Analyst/Python 2023. 1. 21. 10:37

    1. Column

    import pandas as pd
    df = pd.DataFrame({'a': [1,1,3,4,5], 'b': [2,3,2,3,4], 'c': [3,4,7,6,4]})
    
    # 1. to simply add new column
    df['d'] = [1,3,6,4,8]
    
    # 2. add one number in that column (will be filled out with that number)
    df['e'] = 1
    
    # +) calculated result can be also created as a new column.
    # check datatype first
    print(df.dtypes)
    df['f'] = df['a'] + df['b'] - df['c']
    
    # delete a column
    df.drop(['d', 'e', 'f'], axis=1, inplace=True)

    when you delete a column, be aware that you need to write 'axis=1' to notify program that the user wants to delete the column.

     

    2. Row

    import pandas as pd
    df = pd.DataFrame({'a': [1,1,3,4,5], 'b': [2,3,2,3,4], 'c': [3,4,7,6,4]})
    
    # simply add a new row - 'ignore_index=True' should be written!!!
    df = df.append({'a':6, 'b':7, 'c':8}, ignore_index=True)
    
    # add a new row by using loc (it will be added after the last row of the original dataset)
    df.loc[6] = [7,8,9]
    # however, if the number n in loc[n] is between the existing dataframe,
    # then the new row data will be replaced that existing data
    df.loc[1] = [7,8,9]

    there are several ways to delete/drop the row of the data 

    # delete a certain row - similar to list index info. df.drop(1) := drop/delete the second row
    df = df.drop(1)
    print(df)
    
    # delete several rows in a time
    df = df.drop([0,1])
    print(df)
    
    # delete a range of rows in a time
    # limitation: only can delete from first row to nth row
    df = df.drop([i for i in range(4)])
    
    # more flexible way
    df = df.drop(df[df['a'] < 4].index)
    # more than one conditions
    df = df.drop(df[(df['a'] < 3) & (df['c'] == 4)].index)