pandas (software)

[5] The development of Pandas introduced into Python many comparable features of working with DataFrames that were established in the R programming language.

In 2015, Pandas signed on as a fiscally sponsored project of NumFOCUS, a 501(c)(3) nonprofit charity in the United States.

Data for these collections can be imported from various file formats such as comma-separated values, JSON, Parquet, SQL database tables or queries, and Microsoft Excel.

[4]: 177–182 Pandas implements a subset of relational algebra, and supports one-to-one, many-to-one, and many-to-many joins.

[9]: 115 Pandas also includes built-in operations for arithmetic, string manipulation, and summary statistics such as mean, median, and standard deviation.

[4]: 253–259 Pandas includes support for time series, such as the ability to interpolate values [4]: 316–317 and filter using a range of timestamps (e.g. data['1/1/2023':'2/2/2023'] will return all dates between January 1st and February 2nd).

[4]: 295 Pandas represents missing time series data using a special NaT (Not a Timestamp) object, instead of the NaN value it uses elsewhere.

An index with this structure, called a "MultiIndex", allows a single DataFrame to represent multiple dimensions, similar to a pivot table in Microsoft Excel.

Pandas can require 5 to 10 times as much memory as the size of the underlying data, and the entire dataset must be loaded in RAM.

Wes McKinney, the creator of Pandas, has recommended Apache Arrow as an alternative to address these performance concerns and other limitations.