Tag Archives: python

Install PyTorch in Anaconda environment

download address: https://download.pytorch.org/whl/torch_stable.html

my CPU is python 3, 7 win10 system corresponding to the following

2 switch directory

copies the downloaded WHL file to the site-packages directory in the lib of anaconda.
directory such as: E:\Anaconda3\Lib\site-packages.

3 installation
After

open the anaconda prompt and switch to the site-packages directory:
(base) C:\Users\Lenovo> E:
(base) E:> cd Anaconda3\Lib\site-packages
(base) E:\Anaconda3\Lib\site-packages> For example, my WHL file name is:
torch 1.3.0+cpu-cp37-cp37m-win_amd64. WHL

test: import torch
no error should be no problem

KeyError: b ‘TEST’ problem in python3 conversion from csn-rcnn code

One of the faster RCNN codes that

has been using is python2, and we decided to change it to python3.

many errors are print function, xrange function, easy to solve.

solution to the last error was reported, the card for a day, can not find a solution on the Internet, record:

error as follows:

Caused by op ‘PyFunc’, defined at:
File “/home/q/yd/Faster -rcnn-21 /tools/demo.py”, line 118, in < module>
net = get_network(args.demo_net)
File “/home/q/yd/ swift-rcnn-21 /tools/.. /lib/networks/factory.py”, line 29, in get_network
return network.vggnet_test ()
File “/home/q/yd/ quickly-rcnn-21 /tools/.. /lib/networks/ vggnet_test.py “, line 18, in init__,> self. Setup ()
File “/home/q/yd/Faster -rcnn-21 /tools/.. (proposal_layer(__feat_stride, anchor_scales, STR (‘TEST’), name=’rois’))
File “/home/q/yd/ quickly-rcnn-21 /tools/. Floatal_layer [proposal_layer]), [-1, 5], proposal_layer [proposal_layer], [proposal_layer] Name = name)
the File “/ home/q/anaconda3 envs/py35/lib/python3.5/site – packages/tensorflow/python/ops/script_ops py”, line 212, in py_func
input = inp, token = token, Tout = Tout, Name = name)
the File “/ home/q/anaconda3 envs/py35/lib/python3.5/site – packages/tensorflow/python/ops/gen_script_ops py”, line, 50 in _py_func
“PyFunc”, input = input, token = token, Tout = Tout, Name = name)
the File “/ home/q/anaconda3 envs/py35/lib/python3.5/site – packages/tensorflow/python/framework/op_def_library py”, line 787, In _apply_op_helper
op_def = op_def)
the File “/ home/q/anaconda3/envs/py35/lib/python3.5/site – packages/tensorflow/python/framework/ops. Py”, line 2956, In create_op
op_def = op_def)
the File “/ home/q/anaconda3/envs/py35/lib/python3.5/site – packages/tensorflow/python/framework/ops. Py”, line 1470,
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

UnknownError (see above for traceback): KeyError: b’TEST’
[[Node:] PyFunc = PyFunc[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_STRING, DT_INT32, DT_INT32], Tout=[DT_FLOAT], token=”pyfunc_0″, _device=”/job:localhost/replica:0/task:0/device:CPU:0″](rpn_cls_prob_reshape/_95, Rpn_bbox_pred /rpn_bbox_pred/_97, _arg_Placeholder_1_0_1, PyFunc/input_3, PyFunc/input_4, PyFunc/input_5)]]
[[Node: PyFunc/_99 = _Recv[client_terminated=false, recv_device=”/job:localhost/replica:0/task:0/device:GPU:0″, send_device=”/job:localhost/replica:0/task:0/device:CPU:0″, send_device_incarnation=1, tensor_name=”edge_218_PyFunc”, tensor_type=DT_FLOAT, _device=”/job:localhost/replica:0/task:0/device:GPU:0″]()]]


KeyError: b’TEST’,

after analysis, the final locating reason is that a ‘TEST’ string entered in tf initialization cannot be parsed.

error in the vgg_test.py file

(self.feed('rpn_cls_prob_reshape', 'rpn_bbox_pred', 'im_info')
 .proposal_layer(_feat_stride, anchor_scales, str('TEST'), name='rois'))

there is a ‘TEST’ that will be entered into the tb.py_func () function

cannot resolve reason unknown

the final solution is to assign a value of

directly to ‘TEST’ in the proposal_layer_tf.py file

cfg_key = 'TEST'
pre_nms_topN  = cfg[cfg_key].RPN_PRE_NMS_TOP_N
post_nms_topN = cfg[cfg_key].RPN_POST_NMS_TOP_N
nms_thresh    = cfg[cfg_key].RPN_NMS_THRESH
min_size      = cfg[cfg_key].RPN_MIN_SIZE

with this in red, it works.

original code, syntax check is not wrong, only manually assign ‘TEST’ to cfg_key first to solve this problem.

many on the web say that python2 and python3 are coded differently, plus utf-8, and the test doesn’t work.

Here is the difference and connection of Torch. View (), Transpose (), and Permute ()

having recently been stretched by several of pytorch’s Tensor Tensor dimensional transformations, I delve into them, outlining their journey and their results as follows:

note: torch. The __version__ 1.2.0 ‘= =’

torch. Transpose () and the torch permute ()

and

are used to exchange content from different dimensions. Here, however, torch. () is exchange the content of two specified dimensions, and permute() can exchange more than one dimension all at once. Here is code:
(): the exchange of two dimensions

 >>> a = torch.Tensor([[[1,2,3,4,5], [6,7,8,9,10], [11,12,13,14,15]], 
                  [[-1,-2,-3,-4,-5], [-6,-7,-8,-9,-10], [-11,-12,-13,-14,-15]]])
 >>> a.shape
 torch.Size([2, 3, 5])
 >>> print(a)
 tensor([[[  1.,   2.,   3.,   4.,   5.],
         [  6.,   7.,   8.,   9.,  10.],
         [ 11.,  12.,  13.,  14.,  15.]],

        [[ -1.,  -2.,  -3.,  -4.,  -5.],
         [ -6.,  -7.,  -8.,  -9., -10.],
         [-11., -12., -13., -14., -15.]]])
 >>> b = a.transpose(1,2)  # 使用transpose,将维度1和2进行交换。这个很好理解。转换后tensor与其shape如下
 >>> print(b, b.shape)
 (tensor([[[  1.,   6.,  11.],
         [  2.,   7.,  12.],
         [  3.,   8.,  13.],
         [  4.,   9.,  14.],
         [  5.,  10.,  15.]],

        [[ -1.,  -6., -11.],
         [ -2.,  -7., -12.],
         [ -3.,  -8., -13.],
         [ -4.,  -9., -14.],
         [ -5., -10., -15.]]]),
torch.Size([2, 5, 3])))

permute() : does an arbitrary dimension swap

at once

 >>> c = a.permute(2, 0, 1)
 >>> print(c, c.shape)  # 此举将原维度0,1,2的次序变为2,1,0,所以shape也发生了相应的变化。
 (tensor([[[  1.,   6.,  11.],
          [ -1.,  -6., -11.]],
 
         [[  2.,   7.,  12.],
          [ -2.,  -7., -12.]],
 
         [[  3.,   8.,  13.],
          [ -3.,  -8., -13.]],
 
         [[  4.,   9.,  14.],
          [ -4.,  -9., -14.]],
 
         [[  5.,  10.,  15.],
          [ -5., -10., -15.]]]),
 torch.Size([5, 2, 3]))

This transformation between

transpose() and permute() :

>>> b = a.permute(2,0,1)
>>> c = a.transpose(1,2).transpose(0,1)
>>> print(b == c, b.shape)
(tensor([[[True, True, True],
          [True, True, True]],
 
         [[True, True, True],
          [True, True, True]],
 
         [[True, True, True],
          [True, True, True]],
 
         [[True, True, True],
          [True, True, True]],
 
         [[True, True, True],
          [True, True, True]]]),
 torch.Size([5, 2, 3]))

as shown in the code, if you swap the first and second dimensions for Tensor a, and then swap the first and second dimensions for Tensor a, then they will get the same result as permute.


transpose () and the view ()

view() is a very common function in pytorch. This function also ACTS as an Tensor dimension, but does all this in a very different way from Transpose ()/permute(). If tranpose() is the Tensor whose original dimensions are exchanged faithfully, then view() is much more straightforward and simple — first, the view() function flattens all the Tensor dimensions into one, and then reconstructs an Tensor based on the incoming dimension information. Code is as follows:

# 还是上面的Tensor a
 >>> print(a.shape)
 torch.Size([2, 3, 5])
 >>> print(a.view(2,5,3))
 tensor([[[  1.,   2.,   3.],
         [  4.,   5.,   6.],
         [  7.,   8.,   9.],
         [ 10.,  11.,  12.],
         [ 13.,  14.,  15.]],

        [[ -1.,  -2.,  -3.],
         [ -4.,  -5.,  -6.],
         [ -7.,  -8.,  -9.],
         [-10., -11., -12.],
         [-13., -14., -15.]]])
  >>> c = a.transpose(1,2)
 >>> print(c, c.shape)
(tensor([[[  1.,   6.,  11.],
          [  2.,   7.,  12.],
          [  3.,   8.,  13.],
          [  4.,   9.,  14.],
          [  5.,  10.,  15.]],
 
         [[ -1.,  -6., -11.],
          [ -2.,  -7., -12.],
          [ -3.,  -8., -13.],
          [ -4.,  -9., -14.],
          [ -5., -10., -15.]]]),
 torch.Size([2, 5, 3]))

is shown in the code. Even though view() and () end up doing the same thing, their contents are not the same. The view function is just going to be applied to the Tensor dimensions of (2,5,3), which are going to be applied to the elements and ; All this does is do of the first second dimension.


Moreover, there are cases where the Tensor after transpose cannot be called view, because the Tensor after transpose is not “continuous” (non-self-help). The question about self-help array is the same in numpy, we have a great explanation here for

Python USES the PO design pattern for automated testing

in web automation test, PO mode is Page Object (operation flow is separated from Page elements); The test case does not contain any element positioning, only the business logic, and the page element positioning is written as a method encapsulated in the object.

PO mode basic steps are divided into 3 steps:
1. Prepare the base class BasePage, which contains all the page’s element operation methods
2. Define A class A, which inherits the BasePage class, which contains all the element positioning and action that the test page needs to use encapsulated as A method
3. Define A class B to inherit from unittest, write the test case (just call the corresponding wrapper method in A to complete the business logic writing)

follow the above three steps to operate the login of suning:

# !/usr/bin/env python
# -*- coding: utf-8 -*-
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.action_chains import ActionChains


class BasePage(object):

    def __init__(self, br, url):
        self.br = br
        self.url = url

    # 打开网页
    def open_url(self):
        self.br.get(self.url)

    # 查找元素
    def find_element(self, *loc):
        return self.br.find_element(*loc)

    # 输入
    def input_txt(self, loc, text):
        self.find_element(*loc).send_keys(text)

    # 清空文本
    def clear(self, loc):
        self.find_element(*loc).clear()

    # 点击
    def click(self, loc):
        self.find_element(*loc).click()

    # 下拉列表选择
    def select_term(self, loc, index):
        # 实例化Select
        Select(self.find_element(*loc)).select_by_index(int(index))
        # s1.select_by_index(1)  # 选择第二项选项:o1
        # s1.select_by_value("o2")  # 选择value="o2"的项
        # s1.select_by_visible_text("o3")  # 选择text="o3"的值,即在下拉时我们可以看到的文本

    # 鼠标悬浮
    def mouse_hover(self, loc):
        move = self.find_element(*loc)
        ActionChains(self.br).move_to_element(move).perform()

    # 获取title
    def get_title(self):
        return self.br.title

    # 获取文本信息
    def get_txt(self, loc):
        return self.find_element(*loc).text

    # 获取当前页面url
    def get_page_url(self):
        return self.br.current_url






web_login.py

# !/usr/bin/env python
# -*- coding: utf-8 -*-
from selenium.webdriver.common.by import By
from bs_page import BasePage


class LoginPage(BasePage):

    def __init__(self, br, url):
        BasePage.__init__(self, br, url)

    # 切换登录方式
    def switch_login_type(self):
        login_type = (By.XPATH, '/html/body/div[2]/div/div/div[1]/a[2]')
        self.click(login_type)

    # 输入用户名
    def send_user_name(self, username):
        name_input = (By.ID, "userName")
        self.clear(name_input)
        self.input_txt(name_input, username)

    # 输入密码
    def send_user_pwd(self, userpwd):
        pwd_input = (By.ID, "password")
        self.clear(pwd_input)
        self.input_txt(pwd_input, userpwd)

    # 点击 登录按钮
    def click_login_btn(self):
        login_btn = (By.ID, 'submit')
        self.click(login_btn)

    # 获取提示信息
    def get_emg_txt(self):
        emg_txt = (By.XPATH, '/html/body/div[2]/div/div/div[2]/div[1]/div[1]/div[2]/span')
        return self.get_txt(emg_txt)


ex_login.py

# !/usr/bin/env python
# -*- coding: utf-8 -*-
import time
import ddt
import unittest
from pages.web_login import LoginPage
from common import html_report, read_yaml, journal, set_browser


@ddt.ddt
class TestBiliLogin(unittest.TestCase):

    # 日志
    log = journal.bs_log()
    # 读取用例
    login_data = read_yaml.get_yaml_data("login_data.yaml")
    # 浏览器初始化配置
    br = set_browser.base_browser()
    # 网站首页地址
    web_url = set_browser.base_url()

    @classmethod
    def setUpClass(cls):
        pass

    @classmethod
    def tearDownClass(cls):
        # 退出浏览器
        cls.br.quit()

    # 初始化工作
    def setUp(self):
        time.sleep(1)

    # 退出清理工作
    def tearDown(self):
        time.sleep(1)

    @ddt.data(*login_data)
    def test_login(self, user_data):
        br = self.br
        p_url = self.web_url+"/login"

        b_page = LoginPage(br, p_url)
        # 打开登录页面
        b_page.open_url()
        # 切换登录方式
        b_page.switch_login_type()
        # 输入用户名
        b_page.send_user_name(user_data[0])
        # 输入密码
        b_page.send_user_pwd(user_data[1])
        # 点击 登录
        b_page.click_login_btn()
        # 获取当前页面地址
        page_url = b_page.get_page_url()
        # 判断页面地址是否发生变化
        if page_url == p_url:
            # 获取提示信息
            emg = b_page.get_emg_txt()
            self.assertEqual(user_data[2], emg)
        else:
            # 获取页面title
            p_title = b_page.get_title()
            self.assertEqual(user_data[2], p_title)


if __name__ == '__main__':
    suite = unittest.TestLoader().loadTestsFromTestCase(TestBiliLogin)
    # 调用定义HTMLTestRunner的方法
    runner = html_report.re_html_cn()
    # 执行测试套件
    runner.run(suite)


Full explanation of SYS module of Python

sys module has many functions, here we introduce some more practical functions, I believe you will like it, and I will walk into the python module!

sys module list of common functions

  • sys.argv: implements passing parameters from outside the program to the program.

  • sys. Exit ([arg]) : in the middle of the program exit, arg = 0 as normal exit.

  • sys.getdefaultencoding(): gets the current code of the system, which is generally ASCII by default.

  • sys.setdefaultencoding(): set the default code of the system. This method will not be seen when dir (sys) is executed. It cannot be executed in the interpreter. (see set the system default encoding)

  • sys. Getfilesystemencoding () : access to the file system using encoding, return the 'MBCS' Windows, MAC returns' utf-8.

  • sys. Path : Gets a collection of strings that specify the module search path. You can place the written module in one of the resulting paths and find it correctly when importing in your program.

  • sys. Platform : access to current system platform.

  • sys. Stdin, sys. Stdout, sys. Stderr : stdin, stdout, and stderr variable contains corresponding with the standard I/O flow stream objects. If you need more control over the output, and print doesn't give you what you want, that's all you need. You can also replace them by redirecting output and input to other devices, or by processing them in a non-standard way

sys.argv

function: pass parameters from outside to inside the program
example: sys.py

#!/usr/bin/env python

import sys
print sys.argv[0]
print sys.argv[1]

run:

# python sys.py argv1
sys.py
argv1

try it yourself and understand the corresponding parameter

sys.exit(n)

function: at the end of the main program, the interpreter exits automatically, but if you need to exit the program, you can call sys.exit with an optional integer argument returned to the calling program, indicating that you can capture a call to sys.exit from the main program. (0 is normal exit, others are exceptions)

example: exit.py

#!/usr/bin/env python

import sys

def exitfunc(value):
    print value
    sys.exit(0)

print "hello"

try:
    sys.exit(1)
except SystemExit,value:
    exitfunc(value)

print "come?"

run:

# python exit.py
hello
1

sys.path

function: get the string collection of the specified module search path, you can put the written module under the path, it can be found correctly in the program import.

example:

>>> import sys
>>> sys.path
['', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages/PILcompat', '/usr/lib/python2.7/dist-packages/gtk-2.0', '/usr/lib/python2.7/dist-packages/ubuntu-sso-client']

sys.path. Append (" custom module path ")

sys.modules

is a global dictionary that is loaded in memory after python is started. Every time a programmer imports a new module, sys.modules will automatically record the module. When the module is imported a second time, Python looks it up directly in the dictionary, which speeds up the program. It has everything a dictionary has.

Py

#!/usr/bin/env python

import sys

print sys.modules.keys()

print sys.modules.values()

print sys.modules["os"]

run:

python modules.py
['copy_reg', 'sre_compile', '_sre', 'encodings', 'site', '__builtin__',......

sys.stdin\stdout\stderr

features: stdin, stdout, and stderr variables contain flow objects corresponding to standard I/O flows. If you need more control over the output, and print doesn't give you what you want, that's all you need. You can also replace them by redirecting output and input to other devices, or by processing them in a nonstandard way

Don't forget the original intention, always put

An introduction to sys modules in Python and how packages are imported and used

introduction to sys module in Python and usage of package import


1. Argv: implementation from the outside of the program to the program to pass the parameter
in F:\Pywork temp.py file to write the following code :

import sys
print (sys.argv[0])#文件名
print (sys.argv[1])#传入的外部参数
命令行:
F:\Pywork> python temp.py "andy"
temp.py
andy

2. sys. Modules. Keys () returns all have imported module list
3. sys. Exit (n)

function: at the end of the main program, the interpreter exits automatically, but if you need to exit the program, you can call sys.exit with an optional integer argument returned to the calling program, indicating that you can capture a call to sys.exit from the main program. (0 is normal exit, others are exceptions)

4.sys.path

function: get the string collection of specified module search path, you can put the written module under the path, you can find

correctly when import in the program

sys.path.append("自定义模块路径")

5. Sys. Returns the imported module field system modules, the key is the module name, the value is the module
6. Sys. Stdin, sys. Stdout, sys. Stderr: stdin, stdout, and stderr variable contains corresponding with the standard I/O flow stream objects. If you need more control over the output, and print doesn't give you what you want, that's all you need. You can also replace them by redirecting output and input to other devices, or processing them in a nonstandard way.

import sys 
sys.stdout.write('HelloWorld!')
print ('Please enter yourname:')
name=sys.stdin.readline()[:-1]
print 'Hi, %s!' % name

Package import in python and use
first, there are three ways to import a package. Take importing sys package as an example

import sys 
print("-----------python modules-----------")
print("命令行参数:")
for x in sys.argv:  #sys.argv传递给Python脚本的命令行参数列表。argv[0]是脚本名
    #argv[1]就是执行脚本命令行外部传递的第一个参数,[2]就是第二个
    print(x)
print("Python的路径是:",sys.path)

is the normal import method, and the sys package is imported using the sys. function name to use the sys built-in function

from sys import argv,path
print("-----------python modules-----------")
print("命令行参数:")
for x in argv:  
    print(x)
print("Python的路径是:",path)

is a built-in function specified by sys, which can be used directly. Sys.

is not required

import sys as s
print("-----------python modules-----------")
print("命令行参数:")
for x in s.argv:  
    print(x)
print("Python的路径是:",s.path)

USES as to rename sys packages for later use.

Python Time Module timestamp, Time string formatting and Conversion (13-bit timestamp)

Python built-in modules for processing time and timestamp include time, and datetime.

several concepts about timestamps

  • timestamp, offset in seconds starting at 00:00:00 on January 1, 1970.
  • time tuple (struct_time), containing 9 elements. Struct_time (tm_year=2017, tm_mon=10, tm_mday=1, tm_hour=14, tm_min=21, tm_sec=57, tm_wday=6, tm_yday=274, tm_isdst=0)
  • time format string, string form time.

time module is an important function related to timestamp and time

  • time.time() generates the current timestamp in the form of a 10-bit integer floating point number. Strftime () generates a time formatted string based on the time tuple. Strptime () generates time tuples based on time formatting strings. time.strptime() and time.strftime() are interoperable.
  • time.localtime() generates the time tuple for the current time zone based on the timestamp.
  • time.mktime() generates timestamps based on time tuples.

example
A simple example of a timestamp and formatted string is

import time

#生成当前时间的时间戳,只有一个参数即时间戳的位数,默认为10位,输入位数即生成相应位数的时间戳,比如可以生成常用的13位时间戳
def now_to_timestamp(digits = 10):
    time_stamp = time.time()
    digits = 10 ** (digits -10)
    time_stamp = int(round(time_stamp*digits))
    return time_stamp

#将时间戳规范为10位时间戳
def timestamp_to_timestamp10(time_stamp):
    time_stamp = int (time_stamp* (10 ** (10-len(str(time_stamp)))))
    return time_stamp

#将当前时间转换为时间字符串,默认为2017-10-01 13:37:04格式
def now_to_date(format_string="%Y-%m-%d %H:%M:%S"):
    time_stamp = int(time.time())
    time_array = time.localtime(time_stamp)
    str_date = time.strftime(format_string, time_array)
    return str_date

#将10位时间戳转换为时间字符串,默认为2017-10-01 13:37:04格式
def timestamp_to_date(time_stamp, format_string="%Y-%m-%d %H:%M:%S"):
    time_array = time.localtime(time_stamp)
    str_date = time.strftime(format_string, time_array)
    return str_date

#将时间字符串转换为10位时间戳,时间字符串默认为2017-10-01 13:37:04格式
def date_to_timestamp(date, format_string="%Y-%m-%d %H:%M:%S"):
    time_array = time.strptime(date, format_string)
    time_stamp = int(time.mktime(time_array))
    return time_stamp

#不同时间格式字符串的转换
def date_style_transfomation(date, format_string1="%Y-%m-%d %H:%M:%S",format_string2="%Y-%m-%d %H-%M-%S"):
    time_array  = time.strptime(date, format_string1)
    str_date = time.strftime(format_string2, time_array)
    return str_date

test

print(now_to_date())
print(timestamp_to_date(1506816572))
print(date_to_timestamp('2017-10-01 08:09:32'))
print(timestamp_to_timestamp10(1506816572546))
print(date_style_transfomation('2017-10-01 08:09:32'))

is

1506836224000
2017-10-01 13:37:04
2017-10-01 08:09:32
1506816572
1506816572
2017-10-01 08-09-32

Python time tuples are converted to timestamps, strings

The

python time tuples are converted to timestamps and strings

1, time tuple –> Timestamp

import time
timeArray = time.localtime(time.time())
timeStamp = time.mktime(timeArray)  # 时间戳:1600858387.0

2, time tuple –> String

timeDate = time.strftime("%Y-%m-%d %H:%M:%S", timeArray)  # datetime.datetime:'2020-09-23 18:53:07'
timeDate = time.strftime("%Y-%m-%d", timeArray)  # datetime.date:'2020-09-23'

Python: Panda scramble data

pandas own:

shuffle_data = df. Sample (frac = 1). Reset_index (drop = True)

 

sklearn:

import numpy as np
import pandas as pd
from sklearn.utils import shuffle as reset


def train_test_split(data, test_size=0.3, shuffle=True, random_state=None):
    '''Split DataFrame into random train and test subsets
    
    Parameters
    ----------
    data : pandas dataframe, need to split dataset.
    
    test_size : float
        If float, should be between 0.0 and 1.0 and represent the
        proportion of the dataset to include in the train split.
        
    random_state : int, RandomState instance or None, optional (default=None)
        If int, random_state is the seed used by the random number generator;
        If RandomState instance, random_state is the random number generator;
        If None, the random number generator is the RandomState instance used
        by `np.random`.
        
    shuffle : boolean, optional (default=None)
        Whether or not to shuffle the data before splitting. If shuffle=False
        then stratify must be None.
    '''

    if shuffle:
        data = reset(data, random_state=random_state)
	
    train = data[int(len(data)*test_size):].reset_index(drop = True)
    test  = data[:int(len(data)*test_size)].reset_index(drop = True)
    
    return train, test

 

Python: How to Reshape the data in Pandas DataFrame

directory

perspective Pandas DataFrame

groups the data in Pandas DataFrame

summary


After using our dataset, we’ll take a quick look at visualizations that can be easily created from the dataset using the popular Python library, and then walk through an example of visualizations.

  • download CSV and database file – 127.8 KB
  • download the source code 122.4 KB

    Python and “Pandas” is part of the data cleaning series. It aims to leverage data science tools and technologies to get developers up and running quickly.

    if you would like to see other articles in this series, they can be found here:

    Part 1 – introduction to Pandas</li b>

  • – loading CSV and SQL data into Pandas
  • – correcting missing data
  • – merging multiple data sets in Pandas
  • – cleaning up Pandas part 5 – removing Pandas
  • -0 1 1 2 4 5

  • 6
  • 7 part 7 – use seaframe and Pandas to visualize data
  • 8

9
DataFrame to make the most of the data.

, even after the data set is cleaned up, the Pandas need to be reshaped to make the most of the data. Remodeling is manipulating the table structure to form a term used when different data sets, such as </ span> “</ span> wide </ span> ” </ span> data table is set to </ span> “</ span> long </ span> ” </ span> .

and 0 1 cross 2 3 table support, you will be familiar with this if you have used pivottability tables in or data built into many relational databases pivot and 1 cross 2 3 table support.

For example, the table above (Pandas document. ) has been adjusted by perspective, stacking or disassembling the table.

</ span> stack0 method takes tables with multiple indexes and groups them into groups 1 2

  • 3 4 unstack6 method takes tables with multiple unique columns and ungroups them into groups 7 89in this phase, we will study a variety of methods to use to reshape the data. We’ll see how to use perspective and stack of data frames to get different images of the data.

    please note that we have used this series of module source data files to create a full , you can in the head of this article download and install .

    see through  Pandas  DataFrame

    , we can use pivot function to create a new 0 1 DataFrame2 3 from the existing frame. So far, our table has been indexed by buy ID, but let’s convert the previously created combinedData table into a more interesting table.

    First, let’s try the following

    method by starting a new code block and adding:

    productsByState = combinedData.pivot(index='product_id', columns='company', values='paid')

    the result looks something like this:

    running this command will result in a duplicate index error, only applies to DataFrame.

    but there’s another way to get us to a solution to this problem. pivot_table works similarly to PivotTable, but it will aggregate duplicate values without generating errors.

     

; pivot_table; pivot_table; pivot_table

let’s use this method with the default value:

productsByState = combinedData.pivot_table(index=['product_id', 'product'], columns='state', values='paid')

you can view the results here:

This will produce a DataFrame, which contains the list of products and the average of each state in each column. This isn’t really that useful, so let’s change the aggregation method:

reshapedData = combinedData.pivot_table(index=['product_id', 'product'], columns='state', values='paid', aggfunc=np.sum)
reshapedData = reshapedData.fillna(0)
print(reshapedData.head(10))

now, this will generate a product table that contains the total sales of all these products in each state. The second line of this method also removes the NaN value and replaces it with 0, since it is assumed that the product is not sold in this state.

in to group data

another remodeling activity that we’ll see is grouping data elements together. Let’s go back to the original large DataFrame and create a new DataFrame.

  • groupby </ span> method the large data set and grouped according to the column value </ span> </ li> </ ul>
    Start a new code block and add:
volumesData = combinedData.groupby(by='customer_id') print(volumesData.head(10))

results as follows:

doesn’t seem to be really doing anything because our DataFrame was on purchase_id.

let’s add a summary function to summarize the data so that our grouping works as expected:

volumesData = combinedData.groupby(by='customer_id').sum()
print(volumesData.head(10))

again, this is the result:

this would group the data set the way we expected but we seem to be missing some columns and doesn’t make any sense so let’s extend the groupby method and trim that 0 1 purchase_id2 3 column: 4

5

volumesData = combinedData.groupby(by=['customer_id','first_name','last_name','product_id','product']).sum()
volumesData.drop(columns='purchase_id', inplace=True)
print(volumesData.head(10))

this is our new result:

the end result looks good and gives us a good idea of what the customer is buying, the amount of money and how much they are paying.

Finally, we will make another change to the groupby data set. Add the following to create a total for each state DataFrame :

totalsData = combinedData.groupby(by='state').sum().reset_index()
totalsData.drop(columns=['purchase_id','customer_id','product_id'], inplace=True)

The key change here is that we added a sum after the reset_index. This is to ensure that the generated DataFrame has available indexes for our visualization work.

summary

we have taken a complete, clean data set and adapted it in several different ways to give us more insight into the data.

Next, we’ll look at visualizations and see how they can be an important tool for presenting our data and ensuring that the results are clean.

Python+ Pandas + Evaluation of Music Equipment over the years (Notes)

from the point of view of the title, this lesson is supposed to explore how something changes over time. Therefore, the core content should be to explore the changing trend of time series data. Details are as follows:

1. Fetch data

data source: http://jmcauley.ucsd.edu/data/amazon/links.html

amazon e-commerce website, provides some data resources, the data on the above page is from May 1996 to July 2014, more than 20 years of product reviews. Ratings only data header “user, item, rating, timestamp”
we download “himself Instruments” comments in the file. (the data download is very slow, almost to the time of day), the teacher can use the downloaded file: time analysis/https://www.njcie.com/python/2
2. Processing data

1, read data

[script]

rnames = ['uid', 'pid', 'rating', 'timestamp']
ratings = pd.read_csv('D:\\ratings_Musical_Instruments.csv', header=None, names=rnames)

2. Processing time stamp

[script]

ratings['date'] = ratings['timestamp'].apply(datetime.fromtimestamp)
ratings['year'] = ratings['date'].dt.year
ratings['month'] = ratings['date'].dt.month
ratings= ratings['date'].to_period(freq='M')
print(ratings)

[result]
date uid pid rating timestamp year month
2014-03 A1YS9MDZP93857 00028320 3.0 1394496000 2014 3
2013-06 A3TS466QBAWB9D 0014072149 5.0 1370476800 2013 6
[description]

  1. time stamp is an integer, 1970-01-01 00:00:00 to the number of seconds of statistical time.
  2. datetime. Fromtimestamp () converts timestamp data to datetime data
  3. . Note that the datatime data is converted to period data by specifying the index column and assigning the converted dataframe to a DF variable. Namely: ratings ratings of = [‘ date ‘] to_period (freq = “M”)
    Analysis data

    1. Mean score of each month

    [script 1]

    pingFen = ratings['rating'].groupby('date').mean()
    plt.plot(pingFen)
    plt.show()
    

    [script 2]

    pingFen = ratings['rating'].groupby('date').mean()
    plt.plot(pingFen.to_timestamp())
    plt.show()
    

    TypeError: float() argument must be a string or a number, not ‘Period’. TypeError: float() argument must be a string or a number, not ‘Period’. TypeError: float() argument must be a string or a number, not ‘Period’, TypeError: TypeError: float() argument must be a string or a number, not ‘Period’.

    script 2 results as follows:

    [description]
    1) this does not indicate the score for which item, but only the overall score for the “musical instrument” product.
    2) it can be seen that the score of Musical Instruments tends to be stable after 2004, and shows a downward trend from 1998 to 2004.

    2. The number of participants per month

    because the above results seem to ignore one factor, which is the number of participants in the scoring. In extreme cases, the fewer participants, the more unstable the scoring and the worse the representation. Let’s take a look at the number of ratings:
    [script]

    pingFenR = ratings['rating'].groupby('date').count()
    plt.plot(pingFenR.to_timestamp())
    plt.show()
    

    【 results 】

    From the figure, we can see that before 2010, the number of participants was very small, but after 2010, the number of participants increased rapidly, so in analysis 1, the mean score after 2010 was more stable, which can also be said that the data at this stage is more meaningful.

    3. Combined with the effective number of scores in each period, the score was observed

    , then how to present the data of three dimensions in the same graph, including time, number of participants and mean score?
    scatter plot with size!
    [script]

    pingFen = ratings[['rating']].groupby('date').agg(['count', 'mean'])
    plt.scatter(pingFen.index.to_timestamp(),pingFen['rating']['mean'], ingFen['rating']['count'])
    plt.show()
    

    【 results 】

    【 description 】
     contrast PLT. Scatter (“) and (PLT) the plot (), the period index into the timestamp, the different?
    plt.plot() : pingfn.to_timestamp ()
    plt.scatter() : pingfn.index. To_timestamp () :
    plt.scatter() : pingfn.index. To_timestamp () :
    ; == I don’t know about this point, anyone who knows can leave a message ~ ==

    4. Adjust the image display effect

    1) resize (enlarge or shrink)

    sometimes we show little difference in the size of the scatter, so we can adjust it by multiplying or dividing the parameter of the size of the point by a value.
    for example: in the graph I formed above, some points are too big. The way to turn them down is as follows:

    plt.scatter(pingFen.index.to_timestamp(), pingFen['rating']['mean'], pingFen['rating']['count']/100)
    

    results are as follows:

    2) resize (normalized)

    maps all data to between 0 and 1, using (n-min)/(max-min).
    I tried the following script and got the wrong result
    (pingFen['count'] -pingfen ['count']. Min () /(pingFen['count']. Max ())
    KeyError: ‘count’
    the reason is that there is no field in this rating table (pingFen) called ‘count’, count is just an algorithm.
    solution is as follows:

    pingFen = ratings.groupby('date').agg(cnt=('rating', 'count'), avg =('rating', 'mean'))
    pingFen['sl'] = (pingFen['cnt']-pingFen['cnt'].min())/(pingFen['cnt'].max())
    

    and I’m going to talk specifically about the agg() function here, and it’s going to take me a lot of work to understand it.
    “about pandas in the agg (instructions) 】

    3) color

    parameter details, see https://www.jb51.net/article/127806.htm
    here I use the script as follows:

    ratings = ratings.to_period(freq='M')
    pingFen = ratings.groupby('date').agg(cnt=('rating', 'count'), avg =('rating', 'mean'))
    pingFen['sl'] = (pingFen['cnt']-pingFen['cnt'].min())/(pingFen['cnt'].max())
    plt.scatter(pingFen.index.to_timestamp(), pingFen['avg'], s=pingFen['sl']*1500, c=pingFen['cnt']/pingFen['cnt'].mean(), alpha=0.3)
    

    to scatter () interpretation is as follows:

    after the adjustment effect below, this is my dream JingTu, although is not the best,

    【 description 】
     figure in x axis as the time
     y for scoring average, as can be seen from the graph, after 2006, scoring average concentration between 4.1 to 4.5
     point of capital for the number of participation grade, can be seen from the diagram after 2012, the number of raters broad and the number of scores before 2004, so its score more valuable
     color representation of the information can be configured, because this case without introducing too much observation data dimensions, I still use the grading number to display color here, Purple is small, yellow is large, and green is in the middle.

    [end]

Design python+mysql database with multi-level menu directory

1. Database design

for multilevel directory design, the simplest and most direct way, can directly put the directory name and content and other related information into a table, how to distinguish the top-level directory and subdirectory is the key to the design.

thought :

you can think of a directory tree as a tree structure with multiple root nodes, subdirectories being sons, parent directories being parents, and each subdirectory having only one parent node. Therefore, at the time of design, the parent_ID of the top-level directory can be set as NULL, and all top-level directories can be directly searched out according to NULL. How to add child nodes?Each subdirectory has a parent_ID, and the value of parent_ID corresponds to the ID of other records, so that a tree can be formed by parent_ID and ID. Get to the root node, then iterate through each of the root node leaf nodes, if there is a leaf node, then hung beneath it, such as id for the object of the 5 parent_id is 1, then put the data on id not the record of 1, id is 20 records the parent_id is 5, then hang the id for the record of 20 id for 5 record, json structure is as follows:

           
         {
            "id": 1,
            "text": "顶级目录",
            "parent_menu_tree_item": null,
            "menu_type": 0,
            "data": [
                {
                    "id": 5,
                    "text": "二级目录",
                    "parent_menu_tree_item": 1,
                    "menu_type": 0,
                    "data": {
                        "id": 20,
                        "text": "三级目录",
                        "parent_menu_tree_item": 5,
                        "menu_type": 0
                    }
                }
             ]
         }

ii. Programming

idea: recursion. We select all null records with parent_ID, that is, the root node, and then traverse through each parent subtree. If there are any, we recursively traverse through all the subtrees, each time hanging on the I [“data”] of the parent node, and then traverse to the next root node.

core code is as follows :

1) gets all root nodes :

items = HandsOnCaseMenuTreeItem.objects.using("admin").filter(parent_menu_tree_item=None,
                                                                          resource_id=resource_id) \
                .values("id", "text", "parent_menu_tree_item", "menu_type")
            datas = []
            for i in items:
                find_child(i, datas, resource_id)
            r.data = datas

2) recursively traverses all root nodes :

def find_child(i, datas):
    p_id = i["id"]
    p = HandsOnCaseMenuTreeItem.objects \
        .filter(parent_menu_tree_item=p_id) \
        .values("id", "text", "parent_menu_tree_item", "menu_type")
    if len(p) > 1:
        i['data'] = list(p)
        datas.append(i)
        for j in p:
            find_child(j, datas)
    elif len(p) == 1:
        find_child(p[0], datas)
        i["data"] = p[0]
    return datas