萌新向Python数据分析及数据挖掘 第二章 pandas 第二节 Python Language Basics, IPython, and Jupyter Notebooks

Posted 跨界混子

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了萌新向Python数据分析及数据挖掘 第二章 pandas 第二节 Python Language Basics, IPython, and Jupyter Notebooks相关的知识,希望对你有一定的参考价值。

Python Language Basics, IPython, and Jupyter Notebooks

In [5]:
import numpy as np #导入numpy
np.random.seed(12345)#设定再现的的随机数
np.set_printoptions(precision=4, suppress=True) #设置打印设置
 

Signature: np.set_printoptions(precision=None, threshold=None, edgeitems=None, linewidth=None, suppress=None, nanstr=None, infstr=None, formatter=None, sign=None, floatmode=None, **kwarg) Docstring: Set printing options.

These options determine the way floating point numbers, arrays and other NumPy objects are displayed.

 

The Python Interpreter

 
$ python
Python 3.6.0 | packaged by conda-forge | (default, Jan 13 2017, 23:17:12)
[GCC 4.8.2 20140120 (Red Hat 4.8.2-15)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> a = 5
>>> print(a)
5
 
print(‘Hello world‘)
 
$ python hello_world.py
Hello world
 
$ ipython
Python 3.6.0 | packaged by conda-forge | (default, Jan 13 2017, 23:17:12)
Type "copyright", "credits" or "license" for more information.

IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython‘s features.
%quickref -> Quick reference.
help      -> Python‘s own help system.
object?   -> Details about ‘object‘, use ‘object??‘ for extra details.

In [1]: %run hello_world.py
Hello world

In [2]:
 

IPython Basics

 

Running the IPython Shell

 

$

In [6]:
import numpy as np 
data = {i : np.random.randn() for i in range(7)}
data
Out[6]:
{0: -0.20470765948471295,
 1: 0.47894333805754824,
 2: -0.5194387150567381,
 3: -0.55573030434749,
 4: 1.9657805725027142,
 5: 1.3934058329729904,
 6: 0.09290787674371767}
 

Return a sample (or samples) from the "standard normal" distribution. 返回7个标准正态分布随机数,存在data字典里 KEY为0-6

 

from numpy.random import randn data = {i : randn() for i in range(7)} print(data) {0: -1.5948255432744511, 1: 0.10569006472787983, 2: 1.972367135977295, 3: 0.15455217573074576, 4: -0.24058577449429575, 5: -1.2904897053651216, 6: 0.3308507317325902}

 

Running the Jupyter Notebook

 
$ jupyter notebook
[I 15:20:52.739 NotebookApp] Serving notebooks from local directory:
/home/wesm/code/pydata-book
[I 15:20:52.739 NotebookApp] 0 active kernels
[I 15:20:52.739 NotebookApp] The Jupyter Notebook is running at:
http://localhost:8888/
[I 15:20:52.740 NotebookApp] Use Control-C to stop this server and shut down
all kernels (twice to skip confirmation).
Created new window in existing browser session.
 

Tab Completion

In [ ]:
按TAB键可以有提示输入功能
 
In [1]: an_apple = 27

In [2]: an_example = 42

In [3]: an
 
In [3]: b = [1, 2, 3]

In [4]: b.
 
In [1]: import datetime

In [2]: datetime.
 
In [7]: datasets/movielens/
 

Introspection

 

问号可以显示相应帮助信息

 
In [8]: b = [1, 2, 3]

In [9]: b?
Type:       list
String Form:[1, 2, 3]
Length:     3
Docstring:
list() -> new empty list
list(iterable) -> new list initialized from iterable‘s items

In [10]: print?
Docstring:
print(value, ..., sep=‘ ‘, end=‘
‘, file=sys.stdout, flush=False)

Prints the values to a stream, or to sys.stdout by default.
Optional keyword arguments:
file:  a file-like object (stream); defaults to the current sys.stdout.
sep:   string inserted between values, default a space.
end:   string appended after the last value, default a newline.
flush: whether to forcibly flush the stream.
Type:      builtin_function_or_method
 
def add_numbers(a, b):
    """
    Add two numbers together

    Returns
    -------
    the_sum : type of arguments
    """
    return a + b
 
In [11]: add_numbers?
Signature: add_numbers(a, b)
Docstring:
Add two numbers together

Returns
-------
the_sum : type of arguments
File:      <ipython-input-9-6a548a216e27>
Type:      function
 
In [12]: add_numbers??
Signature: add_numbers(a, b)
Source:
def add_numbers(a, b):
    """
    Add two numbers together

    Returns
    -------
    the_sum : type of arguments
    """
    return a + b
File:      <ipython-input-9-6a548a216e27>
Type:      function
 
In [13]: np.*load*?
np.__loader__
np.load
np.loads
np.loadtxt
np.pkgload
In [ ]:
*load*可以搜索numpy顶级命名空间中有load的所有函数
 

The %run Command

 
def f(x, y, z):
    return (x + y) / z

a = 5
b = 6
c = 7.5

result = f(a, b, c)
 
In [14]: %run ipython_script_test.py
 
In [15]: c
Out [15]: 7.5

In [16]: result
Out[16]: 1.4666666666666666
 
>>> %load ipython_script_test.py

    def f(x, y, z):
        return (x + y) / z

    a = 5
    b = 6
    c = 7.5

    result = f(a, b, c)
 

中断运行代码 CTRL+C

 

从剪贴板执行代码

 
x = 5
y = 7
if x > 5:
    x += 1

    y = 8
 
In [17]: %paste
x = 5
y = 7
if x > 5:
    x += 1

    y = 8
## -- End pasted text --
 
In [18]: %cpaste
Pasting code; enter ‘--‘ alone on the line to stop or use Ctrl-D.
:x = 5
:y = 7
:if x > 5:
:    x += 1
:
:    y = 8
:--
 

Terminal Keyboard Shortcuts

 

技术图片

 

About Magic Commands

 
In [20]: a = np.random.randn(100, 100)

In [20]: %timeit np.dot(a, a)
10000 loops, best of 3: 20.9 µs per loop
 

魔法命令 计时

 
In [21]: %debug?
Docstring:
::

  %debug [--breakpoint FILE:LINE] [statement [statement ...]]

Activate the interactive debugger.

This magic command support two ways of activating debugger.
One is to activate debugger before executing code.  This way, you
can set a break point, to step through the code from the point.
You can use this mode by giving statements to execute and optionally
a breakpoint.

The other one is to activate debugger in post-mortem mode.  You can
activate this mode simply running %debug without any argument.
If an exception has just occurred, this lets you inspect its stack
frames interactively.  Note that this will always work only on the last
traceback that occurred, so you must call this quickly after an
exception that you wish to inspect has fired, because if another one
occurs, it clobbers the previous one.

If you want IPython to automatically do this on every exception, see
the %pdb magic for more details.

positional arguments:
  statement             Code to run in debugger. You can omit this in cell
                        magic mode.

optional arguments:
  --breakpoint <FILE:LINE>, -b <FILE:LINE>
                        Set break point at LINE in FILE.
 

魔法命令 DEBUG 激活交互式调试器。

这个神奇的命令支持两种激活调试器的方法。 一种是在执行代码之前激活调试器。这样,你 可以设置一个断点,从点开始逐步执行代码。 您可以通过给出要执行的语句来使用此模式 一个断点。

另一个是在事后模式下激活调试器。您可以 激活此模式只需运行%debug而不带任何参数。 如果刚刚发生异常,则可以检查其堆栈 交互式地框架。请注意,这始终只适用于最后一个 发生了回溯,所以你必须在一个之后快速调用它 你希望检查的异常已被解雇,因为如果另一个 发生了,它破坏了前一个。

如果您希望IPython在每个异常上自动执行此操作,请参阅 %pdb magic更多细节。

 
In [22]: %pwd
Out[22]: ‘/home/wesm/code/pydata-book

In [23]: foo = %pwd

In [24]: foo
Out[24]: ‘/home/wesm/code/pydata-book‘
 

魔法命令 输出路径

 

Matplotlib Integration

 
In [26]: %matplotlib
Using matplotlib backend: Qt4Agg
 
In [26]: %matplotlib inline
 

让matolotlib显示在notebook中

 

Python Language Basics

 

Language Semantics

 

规定使用缩进表示代码间的逻辑

 
for x in array:
    if x < pivot:
        less.append(x)
    else:
        greater.append(x)
 
a = 5; b = 6; c = 7
 

万物皆是对象

 

井号后面一行不执行

 
results = []
for line in file_handle:
    # keep the empty lines for now
    # if len(line) == 0:
    #   continue
    results.append(line.replace(‘foo‘, ‘bar‘))
 
print("Reached this line")  # Simple status report
 

函数和对象方法调用

 

result = f(x, y, z) g()

 
obj.some_method(x, y, z)
 

对象.公式(参数1,参数2,参数3)

 
result = f(a, b, c, d=5, e=‘foo‘)
 

变量和参数传递

In [8]:
a = [1, 2, 3]
In [9]:
b = a
In [12]:
a.append(4)
b# 4
Out[12]:
[1, 2, 3, 4]
In [13]:
b.append(5)
a
Out[13]:
[1, 2, 3, 4, 5]
 

自制append

 
def append_element(some_list, element):
    some_list.append(element)
 
In [27]: data = [1, 2, 3]

In [28]: append_element(data, 4)

In [29]: data
Out[29]: [1, 2, 3, 4]
 

Dynamic references, strong types

In [14]:
a = 5
type(a)
a = ‘foo‘
type(a)
Out[14]:
str
In [15]:
‘5‘ + 5
 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-4dd8efb5fac1> in <module>()
----> 1 ‘5‘ + 5

TypeError: must be str, not int
In [16]:
a = 4.5
b = 2
# String formatting, to be visited later
print(‘a is {0}, b is {1}.format(type(a), type(b)))
a / b
 
a is <class ‘float‘>, b is <class ‘int‘>
Out[16]:
2.25
In [17]:
a = 5
isinstance(a, int)
Out[17]:
True
In [18]:
a = 5; b = 4.5
isinstance(a, (int, float))
isinstance(b, (int, float))
Out[18]:
True
 

属性和方法

 
In [1]: a = ‘foo‘

In [2]: a.<Tab出现属性和方法提示m>
a.capitalize  a.format      a.isupper     a.rindex      a.strip
a.center      a.index       a.join        a.rjust       a.swapcase
a.count       a.isalnum     a.ljust       a.rpartition  a.title
a.decode      a.isalpha     a.lower       a.rsplit      a.translate
a.encode      a.isdigit     a.lstrip      a.rstrip      a.upper
a.endswith    a.islower     a.partition   a.split       a.zfill
a.expandtabs  a.isspace     a.replace     a.splitlines
a.find        a.istitle     a.rfind       a.startswith
In [19]:
a = ‘foo‘
In [21]:
getattr(a, ‘split‘)
Out[21]:
<function str.split>
 

Docstring: getattr(object, name[, default]) -> value

Get a named attribute from an object; getattr(x, ‘y‘) is equivalent to x.y. When a default argument is given, it is returned when the attribute doesn‘t exist; without it, an exception is raised in that case. Type: builtin_function_or_method

 

Duck typing

 

在Python中 鸭子类型在Python中被广泛使用。Python术语表这样定义鸭子类型:

Pythonic programming style that determines an object‘s type by inspection of its method or attribute signature rather than by explicit relationship to some type object ("If it looks like a duck and quacks like a duck, it must be a duck.") By emphasizing interfaces rather than specific types, well-designed code improves its flexibility by allowing polymorphic substitution. Duck-typing avoids tests using type() or isinstance(). Instead, it typically employs the EAFP (Easier to Ask Forgiveness than Permission) style of programming.

It‘s easier to ask forgiveness than it is to get permission. Variant: If it‘s a good idea, go ahead and do it. It is much easier to apologize than it is to get permission.---Grace Hopper - Wikiquote

在Python中,鸭子类型的最典型例子就是类似file的类。这些类可以实现file的一些或全部方法,并可以用于file通常使用的地方。例如,GzipFile实现了一个用于访问gzip压缩的数据的类似file的对象。cStringIO允许把一个Python字符串视作一个文件。套接字(socket)也和文件共同拥有许多相同的方法。然而套接字缺少tell()方法,不能用于GzipFile可以使用的所有地方。这体现了鸭子类型的可伸缩性:一个类似file的对象可以实现它有能力实现的方法,且只能被用于它有意义的情形下。

EAFP原则描述了异常处理的使用。例如相对于检查一个自称为类似Duck的对象是否拥有一个quack()方法(使用if hasattr(mallard, "quack"): ...),人们通常更倾向于用异常处理把对quack的调用尝试包裹起来:

try: mallard.quack() except (AttributeError, TypeError): print "mallard並沒有quack()函式" 这个写法的优势在于它鼓励结构化处理其他来自类的错误(这样的话,例如,一个不能完成quack的Duck子类可以抛出一个“QuackException”,这个异常可以简单地添加到包裹它的代码,并不需要影响更多的代码的逻辑。同时,对于其他不同类的对象存在不兼容的成员而造成的命名冲突,它也能够处理(例如,假设有一个医学专家Mallard有一个布尔属性将他分类为“quack=True”,试图执行Mallard.quack()将抛出一个TypeError)。

在更实际的实现类似file的行为的例子中,人们更倾向于使用Python的异常处理机制来处理各种各样的可能因为各种程序员无法控制的环境和operating system问题而发生的I/O错误。在这里,“鸭子类型”产生的异常可以在它们自己的子句中捕获,与操作系统、I/O和其他可能的错误分别处理,从而避开复杂的检测和错误检查逻辑。

In [22]:
def isiterable(obj):
    try:
        iter(obj)
        return True
    except TypeError: # 不可迭代
        return False
 

Docstring: iter(iterable) -> iterator iter(callable, sentinel) -> iterator

从对象获取迭代器。 In the first form, the argument must supply its own iterator, or be a sequence. In the second form, the callable is called until it returns the sentinel. Type: builtin_function_or_method

In [25]:
isiterable(‘a string‘)
Out[25]:
True
In [26]:
isiterable([1, 2, 3])
Out[26]:
True
In [27]:
isiterable(5)
Out[27]:
False
 

if not isinstance(x, list) and isiterable(x): x = list(x)

 

导入

 
# some_module.py
PI = 3.14159

def f(x):
    return x + 2

def g(a, b):
    return a + b
 

import some_module result = some_module.f(5) pi = some_module.PI

 

from some_module import f, g, PI result = g(5, PI)

 

import some_module as sm from some_module import PI as pi, g as gf

r1 = sm.f(pi) r2 = gf(6, pi)

 

二元运算符和比较

In [30]:
5 - 7
Out[30]:
-2
In [31]:
12 + 21.5
Out[31]:
33.5
In [32]:
5 <= 2
Out[32]:
False
In [34]:
a = [1, 2, 3]
b = a
c = list(a)
a is b
Out[34]:
True
In [35]:
a is not c
Out[35]:
True
In [36]:
a == c
Out[36]:
True
In [39]:
a = None
In [40]:
a is None
Out[40]:
True
 

可变和不可变的对象

In [41]:
a_list = [‘foo‘, 2, [4, 5]]
a_list[2] = (3, 4)
a_list
Out[41]:
[‘foo‘, 2, (3, 4)]
In [43]:
a_tuple = (3, 5, (4, 5))#元组元素不可变
a_tuple[1] = ‘four‘
 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-43-2c9bddc8679c> in <module>()
      1 a_tuple = (3, 5, (4, 5))
----> 2 a_tuple[1] = ‘four‘

TypeError: ‘tuple‘ object does not support item assignment
In [45]:
a_tuple = (3, 5, [4, 5])#但是元组内的可变对象可以修改
a_tuple[2][0] = ‘four‘
a_tuple
Out[45]:
(3, 5, [‘four‘, 5])
 

Scalar Types

 

Numeric types

In [46]:
ival = 17239871
ival ** 6
Out[46]:
26254519291092456596965462913230729701102721
In [47]:
fval = 7.243
fval2 = 6.78e-5
In [48]:
3 / 2
Out[48]:
1.5
In [49]:
3 // 2  #"//"取整除 - 返回商的整数部分(向下取整)
Out[49]:
1
 

Strings

 

a = ‘one way of writing a string‘ b = "another way"

In [ ]:
c = """
This is a longer string that
spans multiple lines
"""
In [50]:
c.count(
)# 换行符在c字符串的计数
Out[50]:
0
In [54]:
a = ‘this is a string‘
a[10] = ‘f‘
 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-54-85038afe6a01> in <module>()
      1 a = ‘this is a string‘
----> 2 a[10] = ‘f‘
      3 b = a.replace(‘string‘, ‘longer string‘)
      4 b

TypeError: ‘str‘ object does not support item assignment
In [55]:
b = a.replace(‘string‘, ‘longer string‘)
b
Out[55]:
‘this is a longer string‘
 

Docstring: S.replace(old, new[, count]) -> str

Return a copy of S with all occurrences of substring old replaced by new. If the optional argument count is given, only the first count occurrences are replaced.

In [53]:
a
Out[53]:
‘this is a string‘
In [57]:
a = 5.6
s = str(a)
print(s)
print(type(s))
 
5.6
<class ‘str‘>
In [59]:
s = ‘python‘
l= list(s)

s[:3]
print(type(l))
print(type(s))
 
<class ‘list‘>
<class ‘str‘>
 

转义符及raw还原

In [63]:
s = ‘12\34‘
s1 = ‘1234s2 = r‘1234‘
print(s)
print(s1)
print(s2)
 
1234
12
1234
In [62]:
s = r‘thishas
ospecialcharacters‘
s
Out[62]:
‘this\has\no\special\characters‘
In [ ]:
字符串拼接
In [64]:
a = ‘this is the first half ‘
b = ‘and this is the second half‘
a + b
Out[64]:
‘this is the first half and this is the second half‘
 

格式化输出

In [66]:
template = {0:.2f} {1:s} are worth US${2:d}
In [67]:
template.format(4.5560, ‘Argentine Pesos‘, 1)
Out[67]:
‘4.56 Argentine Pesos are worth US$1‘
 

S.format(*args, **kwargs) -> str

Return a formatted version of S, using substitutions from args and kwargs. The substitutions are identified by braces (‘{‘ and ‘}‘).

 

Bytes and Unicode

In [71]:
val = "español"
print(val)
print(type(val))
 
español
<class ‘str‘>
In [72]:
val_utf8 = val.encode(‘utf-8‘)
val_utf8
type(val_utf8)
Out[72]:
bytes
In [73]:
val_utf8.decode(‘utf-8‘)
Out[73]:
‘español‘
In [ ]:
val.encode(‘latin1‘)
val.encode(‘utf-16‘)
val.encode(‘utf-16le‘)
In [74]:
bytes_val = b‘this is bytes‘
bytes_val
decoded = bytes_val.decode(‘utf8‘)
decoded  # this is str (Unicode) now
Out[74]:
‘this is bytes‘
 

Booleans

In [ ]:
True and True
False or True
 

Type casting

In [77]:
s = ‘3.14159‘
fval = float(s)
In [78]:
type(fval)
Out[78]:
float
In [79]:
int(fval)
Out[79]:
3
In [80]:
bool(fval)
Out[80]:
True
In [81]:
bool(0)
Out[81]:
False
 

None

In [ ]:
a = None
a is None
b = 5
b is not None
 

def add_and_maybe_multiply(a, b, c=None): result = a + b

if c is not None:
    result = result * c

return result
In [82]:
type(None)
Out[82]:
NoneType
 

Dates and times

In [84]:
from datetime import datetime, date, time
dt = datetime(2011, 10, 29, 20, 30, 21)
dt.day
dt.minute
Out[84]:
30
 

Init signature: datetime(self, /, *args, **kwargs) Docstring:
datetime(year, month, day[, hour[, minute[, second[, microsecond[,tzinfo]]]]])

年月日必须。 tzinfo may be None, or an instance of a tzinfo subclass. The remaining arguments may be ints.

In [86]:
dt.date()
Out[86]:
datetime.time(20, 30, 21)
In [87]:
dt.time()
Out[87]:
datetime.time(20, 30, 21)
In [88]:
dt.strftime(‘%m/%d/%Y %H:%M‘) #格式化时间显示
Out[88]:
‘10/29/2011 20:30‘
In [89]:
datetime.strptime(‘20091031‘, ‘%Y%m%d)
Out[89]:
datetime.datetime(2009, 10, 31, 0, 0)
In [90]:
dt.replace(minute=0, second=0)
Out[90]:
datetime.datetime(2011, 10, 29, 20, 0)
In [95]:
dt2 = datetime(2011, 11, 15, 22, 30)
delta = dt2 - dt
delta
Out[95]:
datetime.timedelta(17, 7179)
In [96]:
type(delta)
Out[96]:
datetime.timedelta
 

A duration expressing the difference between two date, time, or datetime instances to microsecond resolution.

In [100]:
dt
dt + delta
Out[100]:
datetime.datetime(2011, 11, 15, 22, 30)
 

Control Flow

 

if, elif, and else

 

if x < 0: print(‘It‘s negative‘)

 

if x < 0: print(‘It‘s negative‘) elif x == 0: print(‘Equal to zero‘) elif 0 < x < 5: print(‘Positive but smaller than 5‘) else: print(‘Positive and larger than or equal to 5‘)

In [101]:
a = 5; b = 7
c = 8; d = 4
if a < b or c > d:
    print(‘Made it‘)
 
Made it
In [102]:
4 > 3 > 2 > 1
Out[102]:
True
 

for loops

 

for value in collection:

# do something with value
 

sequence = [1, 2, None, 4, None, 5] total = 0 for value in sequence: if value is None: continue total += value

 

sequence = [1, 2, 0, 4, 6, 5, 2, 1] total_until_5 = 0 for value in sequence: if value == 5: break total_until_5 += value

In [103]:
for i in range(4):
    for j in range(4):
        if j > i:
            break
        print((i, j))
 
(0, 0)
(1, 0)
(1, 1)
(2, 0)
(2, 1)
(2, 2)
(3, 0)
(3, 1)
(3, 2)
(3, 3)
 

Init signature: range(self, /, *args, **kwargs) Docstring:
range(stop) -> range object range(start, stop[, step]) -> range object

Return an object that produces a sequence of integers from start (inclusive) to stop (exclusive) by step. range(i, j) produces i, i+1, i+2, ..., j-1. start defaults to 0, and stop is omitted! range(4) produces 0, 1, 2, 3. These are exactly the valid indices for a list of 4 elements. When step is given, it specifies the increment (or decrement).

 

for a, b, c in iterator:

# do something
 

while loops

 

x = 256 total = 0 while x > 0: if total > 500: break total += x x = x // 2

 

pass

 

if x < 0: print(‘negative!‘) elif x == 0:

# TODO: put something smart here
pass

else: print(‘positive!‘)

 

range

In [104]:
range(10)
list(range(10))
Out[104]:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
 

Init signature: list(self, /, *args, **kwargs) Docstring:
list() -> new empty list list(iterable) -> new list initialized from iterable‘s items

In [ ]:
list(range(0, 20, 2))
list(range(5, 0, -1))
 

seq = [1, 2, 3, 4] for i in range(len(seq)): val = seq[i]

 

sum = 0 for i in range(100000):

# % is the modulo operator
if i % 3 == 0 or i % 5 == 0:
    sum += i
 

Ternary expressions

 

value =

 

if

In [ ]:
x = 5
‘Non-negative‘ if x >= 0 else ‘Negative‘

以上是关于萌新向Python数据分析及数据挖掘 第二章 pandas 第二节 Python Language Basics, IPython, and Jupyter Notebooks的主要内容,如果未能解决你的问题,请参考以下文章

萌新向Python数据分析及数据挖掘 第三章 机器学习常用算法 第二节 线性回归算法 (下)实操篇

萌新向Python数据分析及数据挖掘 第一章 Python基础 第一节 python安装以及环境搭建 第二节 变量和简单的数据类型

萌新向Python数据分析及数据挖掘 第一章 Python基础 (上)未排版

萌新向Python数据分析及数据挖掘 第一章 Python基础 第八节 函数

萌新向Python数据分析及数据挖掘 第一章 Python基础 第十节 文件和异常

萌新向Python数据分析及数据挖掘 第一章 Python基础 第九节 类