Python switch / case statement implementation method

Different from Java, C/C + + and other languages, python does not provide switch/case statements, which makes me feel very strange. We can implement the switch/case statement in the following ways.

Use if elif… elif… Else to realize switch/case

You can use if elif… Elif.. else sequence to replace the switch/case statement, this is the easiest way to think of. However, with the increase of branches and frequent modification, this alternative method is not very good for debugging and maintenance.

Switch/case using dictionary

Switch/case can be realized by dictionary, which is easy to maintain and can reduce the amount of code. The following is the switch/case implementation using dictionary simulation:


def num_to_string(num):
    numbers = {
        0 : "zero",
        1 : "one",
        2 : "two",
        3 : "three"
    }

    return numbers.get(num, None)

if __name__ == "__main__":
    print num_to_string(2)
    print num_to_string(5)

The results are as follows:

two
None

Python dictionary can also include functions or lambda expressions. The code is as follows:

def success(msg):
    print msg

def debug(msg):
    print msg

def error(msg):
    print msg

def warning(msg):
    print msg

def other(msg):
    print msg

def notify_result(num, msg):
    numbers = {
        0 : success,
        1 : debug,
        2 : warning,
        3 : error
    }

    method = numbers.get(num, other)
    if method:
        method(msg)

if __name__ == "__main__":
    notify_result(0, "success")
    notify_result(1, "debug")
    notify_result(2, "warning")
    notify_result(3, "error")
    notify_result(4, "other")

The results are as follows:

success
debug
warning
error
other

Through the above example, it can be proved that the switch/case statement can be fully implemented through Python dictionary, and it is flexible enough. especially at runtime, it is convenient to add or delete a switch/case option in the dictionary.

Switch/case can be implemented by using scheduling method in class

If you are not sure which method to use in a class, you can use a scheduling method to determine it at run time. The code is as follows:

class switch_case(object):

    def case_to_function(self, case):
        fun_name = "case_fun_" + str(case)
        method = getattr(self, fun_name, self.case_fun_other)
        return method

    def case_fun_1(self, msg):
        print msg

    def case_fun_2(self, msg):
        print msg

    def case_fun_other(self, msg):
        print msg


if __name__ == "__main__":
    cls = switch_case()
    cls.case_to_function(1)("case_fun_1")
    cls.case_to_function(2)("case_fun_2")
    cls.case_to_function(3)("case_fun_other")

The results are as follows:

case_fun_1
case_fun_2
case_fun_other

summary

Personally, using dictionary to realize switch/case is the most flexible, but it is also difficult to understand.

The difference between LSTM and Gru

First, some conclusions are given

The performance of Gru and LSTM is equal in many tasks. With fewer Gru parameters, it is easier to converge, but LSTM performs better when the data set is large. Structurally, Gru has only two gates (update and reset), LSTM has three gates (forge, input and output). Gru directly transfers the hidden state to the next unit, while LSTM packages the hidden state with memory cell.

1. Basic structure

1.1 GRU

Gru is designed to better capture long-term dependencies. Let’s look at the input first

ht−1

and

x(t)

How can Gru calculate the output

h(t)

：

Reset gate

r(t)

Responsible for decision making

h(t−1)

For new memory

h^(t)

How important is it if

r(t)

If it’s about 0,

h(t−1)

It will not be passed to new memory

h^(t)

new memory

h^(t)

It’s a new input

x(t)

And the hidden state of the previous moment

h(t−1)

The summary of this paper. Calculate the new vector summed up

h^(t)

Contains the above information and new input

x(t)

. Update gate

z(t)

Responsible for deciding how much to pass

ht−1

. If

z(t)

If it’s about one,

ht−1

Almost directly copied to

On the contrary, if

z(t)

About 0, new memory

h^(t)

Pass directly to

. Hidden state:

h(t)

from

h(t−1)

and

h^(t)

The weight of the two is determined by update gate

z(t)

Control.

1.2 LSTM

LSTM is also designed to better capture long-term dependencies, but the structure is different and more complex. Let’s take a look at the calculation process:

The new memory cell step is similar to the new memory in Gru. The output vector

c^(t)

It’s all about new input

x(t)

And the hidden state of the previous moment

h(t−1)

The summary of this paper. Input gate

i(t)

Responsible for determining input

x(t)

Whether the information is worth saving. Forget gate

f(t)

Responsible for determining past memory cell

c^(t−1)

yes

c(t)

It’s important. final memory cell

c(t)

from

c^(t−1)

and

c^(t)

The weight is determined by forge gate and input gate respectively. The output gate is not available in Gru. It’s responsible for making decisions

c(t)

Which parts of should be passed to hidden state

h(t)

2. Difference

1. Control of memory

LSTM: controlled by output gate and transmitted to the next unit

Gru: pass it directly to the next unit without any control

2. Input gate and reset gate have different action positions

LSTM: computing new memory

c^(t)

Instead of controlling the information of the previous moment, we use forge gate to achieve this independently

Gru: computing new memory

h^(t)

Reset gate is used to control the information of the previous time.

3. Similar

The biggest similarity is that addition is introduced in the update from t to T-1.

The advantage of this addition is that it can prevent gradient dispersion, so LSTM and Gru are better than RNN.

Reference：
1. https://cs224d.stanford.edu/lecture_ notes/LectureNotes4.pdf
2. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
3. https://feature.engineering/difference-between-lstm-and-gru-for-rnns/

Python generates (x, y, z) 3D coordinate sequence

For a (x, y, z) three-dimensional space, the dataframe sequence of [‘x ‘,’y’,’z ‘] is generated

import pandas as pd
import numpy as np

# x = 30 ，y = 20， z = 5
_x_size_temp = 30
_y_size_temp = 20
_z_size_temp = 5

_x_se = []
for _ in range(_x_size_temp):
    _x_se += [_] * (_y_size_temp * _z_size_temp)

_y_se = []
for _ in range(_y_size_temp):
    _y_se += [_] * (_z_size_temp)
_y_se *= _x_size_temp

_z_se = []
_z_se = np.arange(0, _z_size_temp).tolist() * (_x_size_temp * _y_size_temp)

cargo_state_3d = pd.DataFrame(data={
    'x': _x_se,
    'y': _y_se,
    'z': _z_se,
})

Several ways to view spark task log

The tasks run by spark are often viewed through the web. However, when the spark streaming task is running, the log is often very large, which makes it inconvenient to view the web. Therefore, it is necessary to locate it on the server. Here are two ways to view the driver side and the executor side logs.

1、 View the web log:

The following is the general web interface of four yarn scheduling spark tasks:

Click the first task: application_ 1509845442132_ 3866 enter the interface below. The log recorded in the lower right corner is actually the log of the driver side. The driver side is on the mosaic node.

In addition, we can view the log on the executor node. As shown in the figure above, open the applicationmaster and jump to the general task scheduling interface of spark

After clicking on the executor, you can see four executors and a driver. See the log on the right. Stdout is the output log of println, and stderr is the standard log of spark output.

2、 Server side log view

The task log of sparkstreaming is often very large, so it is not convenient to view it on the web. So we need to go to the server to view it. As you can see from the web, that node is the driver. The driver side log is usually in the following directory: Horn/container logs/

In case you don’t know which directory it is, you can find it directly: find/- name “application_ 1509845442132_ 3866”

The corresponding executor log is also found on the server in this way.

It’s time to upgrade your parquet: IOException: totalvaluecount = = 0

This article is from Huawei cloud community “your parquet should be upgraded: IOException: totalvaluecount = = 0 problem positioning Tour”, original author: wzhfy.

1. Problem description

When using spark SQL to perform ETL task, an error is reported when reading a table: “IOException: totalvaluecount = = 0”, but there is no exception when writing the table.

2. Preliminary analysis

The result of this table is generated after two tables join. After analysis, the result of join produces data skew, and the skew key is null. After join, each task writes a file, so the task whose partition key is null writes a large number of null values to a file, and the number of null values reaches 2.2 billion.

The figure of 2.2 billion is sensitive, just exceeding the maximum value of int 2147483647 (more than 2.1 billion). Therefore, it is suspected that parquet is writing more than int.max There’s a problem with a value.

[note] this paper only focuses on the problem that a large number of null values are written to the same file, resulting in an error when reading. As for whether it is reasonable to generate such a large number of nulls in this column, it is beyond the scope of this paper.

3. Deep dive into parquet (version 1.8.3, some contents may need to be understood in combination with parquet source code)

Entry: Spark (spark 2.3) – & gt; parquet

The parquet call entry is in spark, so the call stack is mined from spark.

InsertIntoHad oopFsRelationCommand.run ()/ SaveAsHiveFile.saveAsHiveFile () -> FileFormatWriter.write ()

There are several steps

before starting a job, create an outputwriterfactory: ParquetFileFormat.prepareWrite ()。 A series of configuration information related to parquet writing files will be set here. The main one is to set the writesupport class ParquetOutputFormat.setWriteSupportClass (job, classof [parquetwritesupport]), parquetwritesupport is a class defined by spark itself. In executetask () – & gt; writeTask.execute In (), first create the outputwriter (parquetoutputwriter) through the outputwriterfactory: outputWriterFactory.newInstance ()。 For each row of records, use ParquetOutputWriter.write The (internalrow) method writes the parquet file in turn. Before the task ends, call ParquetOutputWriter.close () shut down resources.

3.1 write process

In parquetoutputwriter, through the ParquetOutputFormat.getRecordWriter Construct a recordwriter (parquet recordwriter), which includes:

Writesupport set when

In parquetrecordwriter, the write operation is actually delegated to an internalwriter (internal parquetrecordwriter, constructed with writesupport and parquetfilewriter).

Now let’s sort out the general process so far:

single directory writetask/dynam icPartitionWriteTask.execute
-> ParquetOutputWriter.write -> ParquetRecordWriter.write -> Interna lParquetRecordWriter.write

Next, interna lParquetRecordWriter.write There are three things in it

（1） writeSupport.write , i.e ParquetWriteSupport.write There are three steps

(2) Increase counter RecordCount (long type)

(3) Check the block size to see if flushrowgrouptostore – checkpoint 2 is required

Since all the written values are null and the memsize of 1 and 2 checkpoints is 0, page and row group will not be refreshed. As a result, null values are always added to the same page. The counter valuecount of columnwriterv1 is of type int, when it exceeds int.max The overflow becomes a negative number.

Therefore, flushrowgrouptostore is executed only when the close() method is called (at the end of the task):
the ParquetOutputWriter.close -> ParquetRecordWriter.close
-> Interna lParquetRecordWriter.close -> flushRowGroupToStore
-> ColumnWriteStoreV1.flush -> for each column ColumnWriterV1.flush

Page will not be written here because valuecount overflow is negative.

Because writepage has not been called, the totalvaluecount here is always 0.
ColumnWriterV1.writePage -> C olumnChunkPageWriter.writePage -&Value total

At the end of the write, interna lParquetRecordWriter.close -> flushRowGroupToStore -> Colum nChunkPageWriteStore.flushToFileWriter -> for each column C olumnChunkPageWriter.writeToFileWriter :

ParquetFileWriter.startColumn : totalvaluecount is assigned to currentchunkvalueco untParquetFileWriter.writeDataPagesParquetFileWriter . endcolumn: currentchunk valuecount (0) and other metadata information construct a columnchunk metadata, and the relevant information will be written to the file eventually.

3.2 read process

Also, take spark as the entry to view.
Initialization phase: ParquetFileFormat.BuildReaderWithPartitionValues -> Vectorize dParquetRecordReader.initialize -> ParquetFileReader.readFooter -> Parq uetMetadataConverter.readParquetMetadata -> fromParquetMetadata -> ColumnChunkMetaData.get , which contains valuecount (0).

When reading: vectorize dParquetRecordReader.nextBatch -> checkEndOfRowGroup:
1） ParquetFileReader.readNextRowGroup -> for each chunk, currentRowGroup.addColumn ( chunk.descriptor.col , chunk.readAllPages ())

Since getvaluecount is 0, pagesinchunk is empty.

2) Construct columnchunkpagereader:

Because the page list is empty, the totalvaluecount is 0, resulting in an error in the construction of vectorizedcolumnreader.

4. Solution: parquet upgrade (version 1.11.1)

In the new version, ParquetWriteSupport.write ->
MessageColumnIO.MessageColumnIORecordConsumer .endMessage ->
ColumnWriteStoreV1(ColumnWriteStoreBase).endRecord:

In endrecord, the attribute of maximum number of records per page (2W records by default) and the check logic are added. When the limit is exceeded, writepage will be generated, so that the valuecount of columnwriterv1 will not overflow (it will be cleared after each writepage).

Compared with the old version 1.8.3, columnwritestorev1.endrecord is empty.

Attachment: a small trick in parquet

In parquet, in order to save space, when a long type value is within a certain range, int will be used to store it. The method is as follows:

Determine whether it can be stored with int:

If you can, use intcolumnchunkmetadata instead of longcolumnchunkmetadata to convert on construction time:

When you use it, turn it back, in tColumnChunkMetaData.getValueCount -> intToPositiveLong()：

The common int range is – 2 ^ 31 ~ (2 ^ 31 – 1). Because metadata information (such as valuecount) is a non negative integer, it can only store numbers in the 0 ~ (2 ^ 31 – 1) range. In this way, the number in the range of 0 ~ (2 ^ 32 – 1) can be expressed, and the expression range is doubled.

Attachment: test case code that can be used to reproduce (depending on some spark classes, it can be run in spark project)

Test case code.txt 1.88kb

Click follow to learn about Huawei’s new cloud technology for the first time~

original_keras_version = f.attrs[‘keras_version‘].decode(‘utf8‘) AttributeError: ‘str‘ object has

Solution:
uninstall the original h5py module and install version 2.10

pip install h5py==2.10 -i https://pypi.tuna.tsinghua.edu.cn/simple/

Flink 1.1 error: no executorfactory found to execute the application

Error reported when migrating Flink to 1.1:

No ExecutorFactory found to execute the application

After investigation, the reason is: starting from flink1.11, the Flink streaming Java dependency on Flink clients has been removed, and the clients dependency needs to be added manually.

The POM file is as follows, add Flink clients_ 2.12, problem solving.

Keytool error: java.io.FileNotFoundException : MyAndroidKey.keystore (access denied)

This is the problem I encountered when I used java keytool to generate certificates this afternoon

Cause error:

Keytool error: java.io.FileNotFoundException : MyAndroidKey.keystore (access denied)

Analyze the problem:

When the certificate is generated, the permissions are not enough when writing to disk C. The properties of all the directories under C disk are read-only.

terms of settlement:

Method 1:
the path of JDK is in disk C. you can move JDK to other disk letters.
Method 2:
this method is the simplest to open CMD as an administrator, so that the error will not occur when writing something.

IntelliJ idea compilation error: Error:java : Compilation failed: internal java compiler error

When using IntelliJ idea for java development, the following errors are occasionally reported:

IntelliJ IDEA comlilation error:
Error:java: Compilation failed: internal java compiler error

The solution to this problem is as follows:

Confirm the JDK of the project

File> project structure> Project Settings> project view the project SDK and project language level of the project. My settings are project SDK = 1.8 and project language level = 8

Modify compiler settings

File> setting> build, execution, deployment> compiler> java compiler, modify the target bytecode version to 1.8, compile the project again, and the error disappears.

When starting Vue project: cannot find module ‘webpack cli/bin/config yargs’ error resolution

An error is reported when starting Vue project: cannot find module ‘webpack cli/bin/config yargs’, as shown in the following figure:

the reason for this error is that the version of webpack and webpack dev server is incompatible. The solution is as follows:

input below the VUE project:
npm uninstall  webpack -g
npm uninstall -g webpack-dev-server
uninstall and install:
npm install [email protected] --save-dev
npm install [email protected] --save-dev

Restart the project, success!

VUE: How to Solve NPM Always Install Error

In a project, errors are always reported when installing dependent modules. The errors reported are as follows

solutions

firstly install npm install --ignore-scripts
then use npm install

Error Running Context: The server unexpectedly closed the connection

Svn error:

Error running context : The server unexpectedly closed the connection

Error analysis:
means that the remote service has closed the connection. The inspection idea is as follows

1. view firewall firewalld&&iptables
firewall-cmd --stat  
If the return is not running, firewalld is not on
iptables -nL
Check if the rule has a 3690 drop
If you have not used iptables in the environment, you can use iptables -F to clear the rules! Use with caution!
2. Check if the svn service is up
ps uax | grep svn && netstat -lnpt | grep 2690
This method is used to determine the process and port
3. If you are using nginx reverse proxy, pay attention to the configuration of tcp forwarding
Note that especially if you use IF.Svnadmin and forget to configure it
server {
	listen xxx so_keepalived=on;
	server_name xxx;
	proxy_pass svn_server:3690;
	}

There are more methods than problems. Come on!!!!

ProgrammerAH

Programmer Guide, Tips and Tutorial

Python switch / case statement implementation method

The difference between LSTM and Gru

Python generates (x, y, z) 3D coordinate sequence

Several ways to view spark task log

It’s time to upgrade your parquet: IOException: totalvaluecount = = 0

original_keras_version = f.attrs[‘keras_version‘].decode(‘utf8‘) AttributeError: ‘str‘ object has

Flink 1.1 error: no executorfactory found to execute the application

Keytool error: java.io.FileNotFoundException : MyAndroidKey.keystore (access denied)

IntelliJ idea compilation error: Error:java : Compilation failed: internal java compiler error

When starting Vue project: cannot find module ‘webpack cli/bin/config yargs’ error resolution

VUE: How to Solve NPM Always Install Error

Error Running Context: The server unexpectedly closed the connection