# Dependency needed to install file 
    
    !pip install xlrd
    
    
    # Import required library
    
    import pandas as pd
    
     
    
  • After the import command, we now have access to a large number of pre-built classes and functions.


  • This assumes the library is installed; . One way pandas allows you to work with data is a dataframe.


  • Let's go through the process to go from a comma separated values (.csv) file to a dataframe.


  • This variable csv_path stores the path of the .csv, that is used as an argument to the read_csv function.


  • The result is stored in the object df, this is a common short form used for a variable referring to a Pandas dataframe.


  • 
    
    # Read data from CSV file
    
    csv_path = 'https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/Chapter%204/Datasets/TopSellingAlbums.csv'
    df = pd.read_csv(csv_path)
    
    
    
     
    
  • We can use the method head() to examine the first five rows of a dataframe:


  • 
    
    
    
    
    # Print first five rows of the dataframe
    
    df.head()
    
    
     
    
  • We use the path of the excel file and the function read_excel. The result is a data frame as before:


  • 
    
    # Read data from Excel File and print the first five rows
    
    xlsx_path = 'https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/PY0101EN/Chapter%204/Datasets/TopSellingAlbums.xlsx'
    
    df = pd.read_excel(xlsx_path)
    df.head()
    
     
    
  • We can access the column Length and assign it a new dataframe x:


  • 
    
    
    
    
    # Access to the column Length
    
    x = df[['Length']]
    x
    
    
     
    
  • The process is shown in the figure:


  • Read data from Excel File and print the first five rows in pandas


  • Viewing Data and Accessing Data You can also get a column as a series. You can think of a Pandas series as a 1-D dataframe. Just use one bracket:


  • 
    
    
    # Get the column as a series
    
    x = df['Length']
    x
    
    
     
    
  • You can also get a column as a dataframe. For example, we can assign the column Artist:


  • 
    
    
    
    # Get the column as a dataframe
    
    x = type(df[['Artist']])
    x
    
    
     
    
  • You can do the same thing for multiple columns; we just put the dataframe name, in this case, df, and the name of the multiple column headers enclosed in double brackets. The result is a new dataframe comprised of the specified columns:


  • 
    
    
    
    # Access to multiple columns
    
    y = df[['Artist','Length','Genre']]
    y
    
    
     
    
  • The process is shown in the figure:


  • Access to multiple columns


  • One way to access unique elements is the iloc method, where you can access the 1st row and the 1st column as follows:


  • 
    
    
    # Access the value on the first row and the first column
    
    df.iloc[0, 0]
    
    
    
     
    
  • You can access the 2nd row and the 1st column as follows:


  • 
    
    
    
    
    # Access the value on the second row and the first column
    
    df.iloc[1,0]
    
     
    
  • You can access the 1st row and the 3rd column as follows:


  • 
    
    
    # Access the value on the first row and the third column
    
    df.iloc[0,2]
    
    
     
    
  • You can access the column using the name as well, the following are the same as above:


  • 
    
    
    # Access the column using the name
    
    df.loc[0, 'Artist']
    
    
    
     
    
    
    
    # Access the column using the name
    
    df.loc[1, 'Artist']
    
    
     
    
    
    
    
    # Access the column using the name
    
    df.loc[0, 'Released']
    
    
    
     
    
    
    
    # Access the column using the name
    
    df.loc[1, 'Released']
     
    
  • You can perform slicing using both the index and the name of the column:


  • 
    
    
    
    # Slicing the dataframe
    
    df.iloc[0:2, 0:3]
    
     
    
    
    
    
    
    # Slicing the dataframe using name
    
    df.loc[0:2, 'Artist':'Released']