Introduction of Pandas
Pandas
is an open-source library for relational or labeled data both developed by Wes
McKinney in 2008.
It
provides various data structures, operations, and functions for analyzing, cleaning, exploring, and
manipulating data according to time series.
Pandas are analyzed big data, clean
messy data sets, make them readable and relevant conclusions based on statistical
theories in data science.
Data Science is a Computer science branch,
that studies how to store, use, and analysed data for deriving information.
NumPy
is used in Pandas.
The
data produced by Pandas, used as input plotting functions in Matplotlib,
statistical analysis in SciPy, machine learning algorithms in Scikit-learn.
Pandas
programs are run by any text editor but Jupiter Notebook is recommended because it
provides an easy way to visualize data frames and plots.
The Pandas source code is
located at GitHub ( https://github.com/pandas-dev/pandas)
Advantages
·
Fast, efficient, manipulating, and analyzing data.
·
Different file objects Data can be loaded.
·
Easily handle missing data. (represented as NaN)
·
Size mutability: (columns can be inserted and deleted from DataFrame and
higher dimensional objects)
·
Flexible, reshaping, pivoting, merging, and joining Data set.
·
Provides time-series functionality.
·
Perform split-apply-combine operations on data sets.
After the panda’s installation
import the library as follows:
import pandas as pd
(pd is an alias to the
Pandas).
Pandas provide two data
structures for manipulation, they are:
·
Series
·
Data Frame
0 Comments