PandasPython数据分析我爱编程

Python | pandas入门介绍(表格型数据的处理利器)

2017-09-23  本文已影响952人  ccccfys

pandas库是用Python进行数据分析绝对会使用到的一个第三方库,因此无论如何你都必须要了解它,本文是对pandas库官方文档中对pandas库介绍部分的翻译,学习编程时,学会阅读和使用官方文档是解决问题最直接也是最靠谱的方法,因此建议å如果真想在编程上有所为,一定要去阅读官方文档,否则一直只能吃别人嚼过的东西。

pandas库官方文档地址:http://pandas.pydata.org/pandas-docs/stable/

注:由于英语水平有限,难免会有错误,如发现,请留言指正。

pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive. It aims to be the fu ndamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. It is already well on its way toward this goal.

pandas库是Python的第三方库,它提供快速,灵活并且富有表达能力的数据结构,这些数据结构让我们能够更加容易和直观的处理关系型和代带标签的数据。它致力于成为在Python中对真实世界进行数据分析的基础高层次构建模块。不仅如此,它还有一个更加远大的目标,那就是在任何编程语言中,成为一个最强大,最灵活的开源数据处理与分析的工具。它现在正在积极的向它的目标迈进!

pandas is well suited for many different kinds of data:

pandas适合多种不同种类的数据

The two primary data structures of pandas, Series (1-dimensional) and DataFrame (2-dimensional), handle the vast majority of typical use cases in finance, statistics, social science, and many areas of engineering. For R users, DataFrame provides everything that R’s data.frame provides and much more. pandas is built on top of NumPy and is intended to integrate well within a scientific computing environment with many other 3rd party libraries.

pandas 最重要的两种数据结构是Series(一维)和DataFrame(二维),这两种数据结构能够应对大多数金融,统计,工程领域的数据处理需求。对于R的使用者而言,DataFrame提供的功能不仅包括了R’s data.frame所能提供的一切,还包括了一些R’s data.frame所没有的功能。pandas库构建在Numpy库之上,被科学计算领域的很多第三方库集成。

Here are just a few of the things that pandas does well:

这里列出pandas库很擅长的一些事情:

Many of these principles are here to address the shortcomings frequently experienced using other languages / scientific research environments. For data scientists, working with data is typically divided into multiple stages: munging and cleaning data, analyzing / modeling it, then organizing the results of the analysis into a form suitable for plotting or tabular display. pandas is the ideal tool for all of these tasks.

以上提到的很多的pandas的特点是为了解决其他语言/科学研究环境常有的一些缺点。对于数据科学家而言,通常数据分析工作分为几个阶段:清洗和整理数据,分析和建模,然后把结果整理成用于展示的图表。pandas是做这些工作的理想工具。(也就是pandas从清洗数据,到最后的结果展现阶段都会用到)

Some other notes

一些关键点:

Note: This documentation assumes general familiarity with NumPy. If you haven’t used NumPy much or at all, do invest some time in learning about NumPy first.

注意:pandas的文档内容会假定你对NumPy库已经熟悉了,如果你没有使用过NumPy,先花一些时间去学习它吧~

上一篇 下一篇

猜你喜欢

热点阅读