萌新向Python数据分析及数据挖掘第二章 pandas 第二节 Python Language Basics, IPython, and Jupyter Notebooks

Posted 2021-02-15 跨界混子

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了萌新向Python数据分析及数据挖掘第二章 pandas 第二节 Python Language Basics, IPython, and Jupyter Notebooks相关的知识，希望对你有一定的参考价值。

Python Language Basics, IPython, and Jupyter Notebooks

In [5]:

import numpy as np #导入numpy
np.random.seed(12345)#设定再现的的随机数
np.set_printoptions(precision=4, suppress=True) #设置打印设置

Signature: np.set_printoptions(precision=None, threshold=None, edgeitems=None, linewidth=None, suppress=None, nanstr=None, infstr=None, formatter=None, sign=None, floatmode=None, **kwarg) Docstring: Set printing options.

These options determine the way floating point numbers, arrays and other NumPy objects are displayed.

The Python Interpreter

$ python
Python 3.6.0 | packaged by conda-forge | (default, Jan 13 2017, 23:17:12)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 5
>>> print(a)
5

print(‘Hello world‘)

$ python hello_world.py
Hello world

$ ipython
Python 3.6.0 | packaged by conda-forge | (default, Jan 13 2017, 23:17:12)
Type "copyright", "credits" or "license" for more information.

IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython‘s features.
%quickref -> Quick reference.
help      -> Python‘s own help system.
object?   -> Details about ‘object‘, use ‘object??‘ for extra details.

In [1]: %run hello_world.py
Hello world

In [2]:

IPython Basics

Running the IPython Shell

In [6]:

import numpy as np 
data = {i : np.random.randn() for i in range(7)}
data

Out[6]:

{0: -0.20470765948471295,
 1: 0.47894333805754824,
 2: -0.5194387150567381,
 3: -0.55573030434749,
 4: 1.9657805725027142,
 5: 1.3934058329729904,
 6: 0.09290787674371767}

Return a sample (or samples) from the "standard normal" distribution. 返回7个标准正态分布随机数，存在data字典里 KEY为0-6

from numpy.random import randn data = {i : randn() for i in range(7)} print(data) {0: -1.5948255432744511, 1: 0.10569006472787983, 2: 1.972367135977295, 3: 0.15455217573074576, 4: -0.24058577449429575, 5: -1.2904897053651216, 6: 0.3308507317325902}

Running the Jupyter Notebook

$ jupyter notebook
[I 15:20:52.739 NotebookApp] Serving notebooks from local directory:
/home/wesm/code/pydata-book
[I 15:20:52.739 NotebookApp] 0 active kernels
[I 15:20:52.739 NotebookApp] The Jupyter Notebook is running at:
http://localhost:8888/
[I 15:20:52.740 NotebookApp] Use Control-C to stop this server and shut down
all kernels (twice to skip confirmation).
Created new window in existing browser session.

Tab Completion

In [ ]:

按TAB键可以有提示输入功能

In [1]: an_apple = 27

In [2]: an_example = 42

In [3]: an

In [3]: b = [1, 2, 3]

In [4]: b.

In [1]: import datetime

In [2]: datetime.

In [7]: datasets/movielens/

Introspection

问号可以显示相应帮助信息

In [8]: b = [1, 2, 3]

In [9]: b?
Type:       list
String Form:[1, 2, 3]
Length:     3
Docstring:
list() -> new empty list
list(iterable) -> new list initialized from iterable‘s items

In [10]: print?
Docstring:
print(value, ..., sep=‘ ‘, end=‘
‘, file=sys.stdout, flush=False)

Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file:  a file-like object (stream); defaults to the current sys.stdout.
sep:   string inserted between values, default a space.
end:   string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.
Type:      builtin_function_or_method

def add_numbers(a, b):
    """
    Add two numbers together

    Returns
    -------
    the_sum : type of arguments
    """
    return a + b

In [11]: add_numbers?
Signature: add_numbers(a, b)
Docstring:
Add two numbers together

Returns
-------
the_sum : type of arguments
File:      <ipython-input-9-6a548a216e27>
Type:      function

In [12]: add_numbers??
Signature: add_numbers(a, b)
Source:
def add_numbers(a, b):
    """
    Add two numbers together

    Returns
    -------
    the_sum : type of arguments
    """
    return a + b
File:      <ipython-input-9-6a548a216e27>
Type:      function

In [13]: np.*load*?
np.__loader__
np.load
np.loads
np.loadtxt
np.pkgload

In [ ]:

*load*？可以搜索numpy顶级命名空间中有load的所有函数

The %run Command

def f(x, y, z):
    return (x + y) / z

a = 5
b = 6
c = 7.5

result = f(a, b, c)

In [14]: %run ipython_script_test.py

In [15]: c
Out [15]: 7.5

In [16]: result
Out[16]: 1.4666666666666666

>>> %load ipython_script_test.py

    def f(x, y, z):
        return (x + y) / z

    a = 5
    b = 6
    c = 7.5

    result = f(a, b, c)

中断运行代码 CTRL+C

从剪贴板执行代码

x = 5
y = 7
if x > 5:
    x += 1

    y = 8

In [17]: %paste
x = 5
y = 7
if x > 5:
    x += 1

    y = 8
## -- End pasted text --

In [18]: %cpaste
Pasting code; enter ‘--‘ alone on the line to stop or use Ctrl-D.
:x = 5
:y = 7
:if x > 5:
:    x += 1
:
:    y = 8
:--

Terminal Keyboard Shortcuts

技术图片

About Magic Commands

In [20]: a = np.random.randn(100, 100)

In [20]: %timeit np.dot(a, a)
10000 loops, best of 3: 20.9 µs per loop

魔法命令计时

In [21]: %debug?
Docstring:
::

  %debug [--breakpoint FILE:LINE] [statement [statement ...]]

Activate the interactive debugger.

This magic command support two ways of activating debugger.
One is to activate debugger before executing code.  This way, you
can set a break point, to step through the code from the point.
You can use this mode by giving statements to execute and optionally
a breakpoint.

The other one is to activate debugger in post-mortem mode.  You can
activate this mode simply running %debug without any argument.
If an exception has just occurred, this lets you inspect its stack
frames interactively.  Note that this will always work only on the last
traceback that occurred, so you must call this quickly after an
exception that you wish to inspect has fired, because if another one
occurs, it clobbers the previous one.

If you want IPython to automatically do this on every exception, see
the %pdb magic for more details.

positional arguments:
  statement             Code to run in debugger. You can omit this in cell
                        magic mode.

optional arguments:
  --breakpoint <FILE:LINE>, -b <FILE:LINE>
                        Set break point at LINE in FILE.

魔法命令 DEBUG 激活交互式调试器。

这个神奇的命令支持两种激活调试器的方法。一种是在执行代码之前激活调试器。这样，你可以设置一个断点，从点开始逐步执行代码。您可以通过给出要执行的语句来使用此模式一个断点。

另一个是在事后模式下激活调试器。您可以激活此模式只需运行％debug而不带任何参数。如果刚刚发生异常，则可以检查其堆栈交互式地框架。请注意，这始终只适用于最后一个发生了回溯，所以你必须在一个之后快速调用它你希望检查的异常已被解雇，因为如果另一个发生了，它破坏了前一个。

如果您希望IPython在每个异常上自动执行此操作，请参阅％pdb magic更多细节。

In [22]: %pwd
Out[22]: ‘/home/wesm/code/pydata-book

In [23]: foo = %pwd

In [24]: foo
Out[24]: ‘/home/wesm/code/pydata-book‘

魔法命令输出路径

Matplotlib Integration

In [26]: %matplotlib
Using matplotlib backend: Qt4Agg

In [26]: %matplotlib inline

让matolotlib显示在notebook中

Python Language Basics

Language Semantics

规定使用缩进表示代码间的逻辑

for x in array:
    if x < pivot:
        less.append(x)
    else:
        greater.append(x)

a = 5; b = 6; c = 7

万物皆是对象

井号后面一行不执行

results = []
for line in file_handle:
    # keep the empty lines for now
    # if len(line) == 0:
    #   continue
    results.append(line.replace(‘foo‘, ‘bar‘))

print("Reached this line")  # Simple status report

函数和对象方法调用

result = f(x, y, z) g()

obj.some_method(x, y, z)

对象.公式(参数1,参数2,参数3)

result = f(a, b, c, d=5, e=‘foo‘)

变量和参数传递

In [8]:

a = [1, 2, 3]

In [9]:

b = a

In [12]:

a.append(4)
b# 4

Out[12]:

[1, 2, 3, 4]

In [13]:

b.append(5)
a

Out[13]:

[1, 2, 3, 4, 5]

自制append

def append_element(some_list, element):
    some_list.append(element)

In [27]: data = [1, 2, 3]

In [28]: append_element(data, 4)

In [29]: data
Out[29]: [1, 2, 3, 4]

Dynamic references, strong types

In [14]:

a = 5
type(a)
a = ‘foo‘
type(a)

Out[14]:

str

In [15]:

‘5‘ + 5

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-4dd8efb5fac1> in <module>()
----> 1 ‘5‘ + 5

TypeError: must be str, not int

In [16]:

a = 4.5
b = 2
# String formatting, to be visited later
print(‘a is {0}, b is {1}‘.format(type(a), type(b)))
a / b

a is <class ‘float‘>, b is <class ‘int‘>

Out[16]:

2.25

In [17]:

a = 5
isinstance(a, int)

Out[17]:

True

In [18]:

a = 5; b = 4.5
isinstance(a, (int, float))
isinstance(b, (int, float))

Out[18]:

True

属性和方法

In [1]: a = ‘foo‘

In [2]: a.<按Tab出现属性和方法提示m>
a.capitalize  a.format      a.isupper     a.rindex      a.strip
a.center      a.index       a.join        a.rjust       a.swapcase
a.count       a.isalnum     a.ljust       a.rpartition  a.title
a.decode      a.isalpha     a.lower       a.rsplit      a.translate
a.encode      a.isdigit     a.lstrip      a.rstrip      a.upper
a.endswith    a.islower     a.partition   a.split       a.zfill
a.expandtabs  a.isspace     a.replace     a.splitlines
a.find        a.istitle     a.rfind       a.startswith

In [19]:

a = ‘foo‘

In [21]:

getattr(a, ‘split‘)

Out[21]:

<function str.split>

Docstring: getattr(object, name[, default]) -> value

Get a named attribute from an object; getattr(x, ‘y‘) is equivalent to x.y. When a default argument is given, it is returned when the attribute doesn‘t exist; without it, an exception is raised in that case. Type: builtin_function_or_method

Duck typing

在Python中鸭子类型在Python中被广泛使用。Python术语表这样定义鸭子类型：

Pythonic programming style that determines an object‘s type by inspection of its method or attribute signature rather than by explicit relationship to some type object ("If it looks like a duck and quacks like a duck, it must be a duck.") By emphasizing interfaces rather than specific types, well-designed code improves its flexibility by allowing polymorphic substitution. Duck-typing avoids tests using type() or isinstance(). Instead, it typically employs the EAFP (Easier to Ask Forgiveness than Permission) style of programming.

It‘s easier to ask forgiveness than it is to get permission. Variant: If it‘s a good idea, go ahead and do it. It is much easier to apologize than it is to get permission.---Grace Hopper - Wikiquote

在Python中，鸭子类型的最典型例子就是类似file的类。这些类可以实现file的一些或全部方法，并可以用于file通常使用的地方。例如，GzipFile实现了一个用于访问gzip压缩的数据的类似file的对象。cStringIO允许把一个Python字符串视作一个文件。套接字（socket）也和文件共同拥有许多相同的方法。然而套接字缺少tell()方法，不能用于GzipFile可以使用的所有地方。这体现了鸭子类型的可伸缩性：一个类似file的对象可以实现它有能力实现的方法，且只能被用于它有意义的情形下。

EAFP原则描述了异常处理的使用。例如相对于检查一个自称为类似Duck的对象是否拥有一个quack()方法（使用if hasattr(mallard, "quack"): ...），人们通常更倾向于用异常处理把对quack的调用尝试包裹起来：

try: mallard.quack() except (AttributeError, TypeError): print "mallard並沒有quack()函式" 这个写法的优势在于它鼓励结构化处理其他来自类的错误（这样的话，例如，一个不能完成quack的Duck子类可以抛出一个“QuackException”，这个异常可以简单地添加到包裹它的代码，并不需要影响更多的代码的逻辑。同时，对于其他不同类的对象存在不兼容的成员而造成的命名冲突，它也能够处理（例如，假设有一个医学专家Mallard有一个布尔属性将他分类为“quack=True”，试图执行Mallard.quack()将抛出一个TypeError）。

在更实际的实现类似file的行为的例子中，人们更倾向于使用Python的异常处理机制来处理各种各样的可能因为各种程序员无法控制的环境和operating system问题而发生的I/O错误。在这里，“鸭子类型”产生的异常可以在它们自己的子句中捕获，与操作系统、I/O和其他可能的错误分别处理，从而避开复杂的检测和错误检查逻辑。

In [22]:

def isiterable(obj):
    try:
        iter(obj)
        return True
    except TypeError: # 不可迭代
        return False

Docstring: iter(iterable) -> iterator iter(callable, sentinel) -> iterator

从对象获取迭代器。 In the first form, the argument must supply its own iterator, or be a sequence. In the second form, the callable is called until it returns the sentinel. Type: builtin_function_or_method

In [25]:

isiterable(‘a string‘)

Out[25]:

True

In [26]:

isiterable([1, 2, 3])

Out[26]:

True

In [27]:

isiterable(5)

Out[27]:

False

if not isinstance(x, list) and isiterable(x): x = list(x)

导入

# some_module.py
PI = 3.14159

def f(x):
    return x + 2

def g(a, b):
    return a + b

import some_module result = some_module.f(5) pi = some_module.PI

from some_module import f, g, PI result = g(5, PI)

import some_module as sm from some_module import PI as pi, g as gf

r1 = sm.f(pi) r2 = gf(6, pi)

二元运算符和比较

In [30]:

5 - 7

Out[30]:

-2

In [31]:

12 + 21.5

Out[31]:

33.5

In [32]:

5 <= 2

Out[32]:

False

In [34]:

a = [1, 2, 3]
b = a
c = list(a)
a is b

Out[34]:

True

In [35]:

a is not c

Out[35]:

True

In [36]:

a == c

Out[36]:

True

In [39]:

a = None

In [40]:

a is None

Out[40]:

True

可变和不可变的对象

In [41]:

a_list = [‘foo‘, 2, [4, 5]]
a_list[2] = (3, 4)
a_list

Out[41]:

[‘foo‘, 2, (3, 4)]

In [43]:

a_tuple = (3, 5, (4, 5))#元组元素不可变
a_tuple[1] = ‘four‘

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-43-2c9bddc8679c> in <module>()
      1 a_tuple = (3, 5, (4, 5))
----> 2 a_tuple[1] = ‘four‘

TypeError: ‘tuple‘ object does not support item assignment

In [45]:

a_tuple = (3, 5, [4, 5])#但是元组内的可变对象可以修改
a_tuple[2][0] = ‘four‘
a_tuple

Out[45]:

(3, 5, [‘four‘, 5])

Scalar Types

Numeric types

In [46]:

ival = 17239871
ival ** 6

Out[46]:

26254519291092456596965462913230729701102721

In [47]:

fval = 7.243
fval2 = 6.78e-5

In [48]:

3 / 2

Out[48]:

1.5

In [49]:

3 // 2  #"//"取整除 - 返回商的整数部分（向下取整）

Out[49]:

Strings

a = ‘one way of writing a string‘ b = "another way"

In [ ]:

c = """
This is a longer string that
spans multiple lines
"""

In [50]:

c.count(‘
‘)# 换行符在c字符串的计数

Out[50]:

In [54]:

a = ‘this is a string‘
a[10] = ‘f‘

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-54-85038afe6a01> in <module>()
      1 a = ‘this is a string‘
----> 2 a[10] = ‘f‘
      3 b = a.replace(‘string‘, ‘longer string‘)
      4 b

TypeError: ‘str‘ object does not support item assignment

In [55]:

b = a.replace(‘string‘, ‘longer string‘)
b

Out[55]:

‘this is a longer string‘

Docstring: S.replace(old, new[, count]) -> str

Return a copy of S with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.

In [53]:

Out[53]:

‘this is a string‘

In [57]:

a = 5.6
s = str(a)
print(s)
print(type(s))

5.6
<class ‘str‘>

In [59]:

s = ‘python‘
l= list(s)

s[:3]
print(type(l))
print(type(s))

<class ‘list‘>
<class ‘str‘>

转义符及raw还原

In [63]:

s = ‘12\34‘
s1 = ‘1234‘
s2 = r‘1234‘
print(s)
print(s1)
print(s2)

1234
12
1234

In [62]:

s = r‘thishas
ospecialcharacters‘
s

Out[62]:

‘this\has\no\special\characters‘

In [ ]:

字符串拼接

In [64]:

a = ‘this is the first half ‘
b = ‘and this is the second half‘
a + b

Out[64]:

‘this is the first half and this is the second half‘

格式化输出

In [66]:

template = ‘{0:.2f} {1:s} are worth US${2:d}‘

In [67]:

template.format(4.5560, ‘Argentine Pesos‘, 1)

Out[67]:

‘4.56 Argentine Pesos are worth US$1‘

S.format(*args, **kwargs) -> str

Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{‘ and ‘}‘).

Bytes and Unicode

In [71]:

val = "español"
print(val)
print(type(val))

español
<class ‘str‘>

In [72]:

val_utf8 = val.encode(‘utf-8‘)
val_utf8
type(val_utf8)

Out[72]:

bytes

In [73]:

val_utf8.decode(‘utf-8‘)

Out[73]:

‘español‘

In [ ]:

val.encode(‘latin1‘)
val.encode(‘utf-16‘)
val.encode(‘utf-16le‘)

In [74]:

bytes_val = b‘this is bytes‘
bytes_val
decoded = bytes_val.decode(‘utf8‘)
decoded  # this is str (Unicode) now

Out[74]:

‘this is bytes‘

Booleans

In [ ]:

True and True
False or True

Type casting

In [77]:

s = ‘3.14159‘
fval = float(s)

In [78]:

type(fval)

Out[78]:

float

In [79]:

int(fval)

Out[79]:

In [80]:

bool(fval)

Out[80]:

True

In [81]:

bool(0)

Out[81]:

False

None

In [ ]:

a = None
a is None
b = 5
b is not None

def add_and_maybe_multiply(a, b, c=None): result = a + b

if c is not None:
    result = result * c

return result

In [82]:

type(None)

Out[82]:

NoneType

Dates and times

In [84]:

from datetime import datetime, date, time
dt = datetime(2011, 10, 29, 20, 30, 21)
dt.day
dt.minute

Out[84]:

Init signature: datetime(self, /, *args, **kwargs) Docstring:
datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])

年月日必须。 tzinfo may be None, or an instance of a tzinfo subclass. The remaining arguments may be ints.

https://docs.python.org/3/library/datetime.html

In [86]:

dt.date()

Out[86]:

datetime.time(20, 30, 21)

In [87]:

dt.time()

Out[87]:

datetime.time(20, 30, 21)

In [88]:

dt.strftime(‘%m/%d/%Y %H:%M‘) #格式化时间显示

Out[88]:

‘10/29/2011 20:30‘

In [89]:

datetime.strptime(‘20091031‘, ‘%Y%m%d‘)

Out[89]:

datetime.datetime(2009, 10, 31, 0, 0)

In [90]:

dt.replace(minute=0, second=0)

Out[90]:

datetime.datetime(2011, 10, 29, 20, 0)

In [95]:

dt2 = datetime(2011, 11, 15, 22, 30)
delta = dt2 - dt
delta

Out[95]:

datetime.timedelta(17, 7179)

In [96]:

type(delta)

Out[96]:

datetime.timedelta

A duration expressing the difference between two date, time, or datetime instances to microsecond resolution.

In [100]:

dt
dt + delta

Out[100]:

datetime.datetime(2011, 11, 15, 22, 30)

Control Flow

if, elif, and else

if x < 0: print(‘It‘s negative‘)

if x < 0: print(‘It‘s negative‘) elif x == 0: print(‘Equal to zero‘) elif 0 < x < 5: print(‘Positive but smaller than 5‘) else: print(‘Positive and larger than or equal to 5‘)

In [101]:

a = 5; b = 7
c = 8; d = 4
if a < b or c > d:
    print(‘Made it‘)

Made it

In [102]:

4 > 3 > 2 > 1

Out[102]:

True

for loops

for value in collection:

# do something with value

sequence = [1, 2, None, 4, None, 5] total = 0 for value in sequence: if value is None: continue total += value

sequence = [1, 2, 0, 4, 6, 5, 2, 1] total_until_5 = 0 for value in sequence: if value == 5: break total_until_5 += value

In [103]:

for i in range(4):
    for j in range(4):
        if j > i:
            break
        print((i, j))

(0, 0)
(1, 0)
(1, 1)
(2, 0)
(2, 1)
(2, 2)
(3, 0)
(3, 1)
(3, 2)
(3, 3)

Init signature: range(self, /, *args, **kwargs) Docstring:
range(stop) -> range object range(start, stop[, step]) -> range object

Return an object that produces a sequence of integers from start (inclusive) to stop (exclusive) by step. range(i, j) produces i, i+1, i+2, ..., j-1. start defaults to 0, and stop is omitted! range(4) produces 0, 1, 2, 3. These are exactly the valid indices for a list of 4 elements. When step is given, it specifies the increment (or decrement).

for a, b, c in iterator:

# do something

while loops

x = 256 total = 0 while x > 0: if total > 500: break total += x x = x // 2

pass

if x < 0: print(‘negative!‘) elif x == 0:

# TODO: put something smart here
pass

else: print(‘positive!‘)

range

In [104]:

range(10)
list(range(10))

Out[104]:

[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

Init signature: list(self, /, *args, **kwargs) Docstring:
list() -> new empty list list(iterable) -> new list initialized from iterable‘s items

In [ ]:

list(range(0, 20, 2))
list(range(5, 0, -1))

seq = [1, 2, 3, 4] for i in range(len(seq)): val = seq[i]

sum = 0 for i in range(100000):

# % is the modulo operator
if i % 3 == 0 or i % 5 == 0:
    sum += i

Ternary expressions

value =

In [ ]:

x = 5
‘Non-negative‘ if x >= 0 else ‘Negative‘

以上是关于萌新向Python数据分析及数据挖掘第二章 pandas 第二节 Python Language Basics, IPython, and Jupyter Notebooks的主要内容，如果未能解决你的问题，请参考以下文章

萌新向Python数据分析及数据挖掘 第二章 pandas 第二节 Python Language Basics, IPython, and Jupyter Notebooks

Python Language Basics, IPython, and Jupyter Notebooks

The Python Interpreter

IPython Basics

Running the IPython Shell

Running the Jupyter Notebook

Tab Completion

Introspection

The %run Command

中断运行代码 CTRL+C

从剪贴板执行代码

Terminal Keyboard Shortcuts

About Magic Commands

Matplotlib Integration

Python Language Basics

Language Semantics

规定使用缩进表示代码间的逻辑

万物皆是对象

井号后面一行不执行

函数和对象方法调用

变量和参数传递

Dynamic references, strong types

属性和方法

Duck typing

导入

二元运算符和比较

可变和不可变的对象

Scalar Types

Numeric types

Strings

Bytes and Unicode

Booleans

Type casting

None

Dates and times

Control Flow

if, elif, and else

for loops

while loops

pass

range

Ternary expressions

萌新向Python数据分析及数据挖掘第二章 pandas 第二节 Python Language Basics, IPython, and Jupyter Notebooks