Introduction of Pandas

 Introduction of Pandas

Pandas is an open-source library for relational or labeled data both developed by Wes McKinney in 2008.

It provides various data structures, operations, and functions for analyzing, cleaning, exploring, and manipulating data according to time series.

Pandas are analyzed big data, clean messy data sets, make them readable and relevant conclusions based on statistical theories in data science.

Data Science is a Computer science branch, that studies how to store, use, and analysed data for deriving information.

NumPy is used in Pandas.

The data produced by Pandas, used as input plotting functions in Matplotlib, statistical analysis in SciPy, machine learning algorithms in Scikit-learn.

Pandas programs are run by any text editor but Jupiter Notebook is recommended because it provides an easy way to visualize data frames and plots.

The Pandas source code is located at GitHub ( https://github.com/pandas-dev/pandas)

 

Advantages 

·         Fast, efficient, manipulating, and analyzing data.

·         Different file objects Data can be loaded.

·         Easily handle missing data. (represented as NaN)

·         Size mutability: (columns can be inserted and deleted from DataFrame and higher dimensional objects)

·         Flexible, reshaping, pivoting, merging, and joining Data set.

·         Provides time-series functionality.

·         Perform split-apply-combine operations on data sets.

 

After the panda’s installation import the library as follows:

 

import pandas as pd

(pd is an alias to the Pandas).

 

Pandas provide two data structures for manipulation, they are: 

·         Series

·         Data Frame

Post a Comment

0 Comments