Tag Archives: algorithm

[Solved] Grid Search Error (GridSearchCV): ‘ascii‘ codec can‘t encode characters in position 18-20: ordinal not in r

Grid Search Error: UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position 18-20: ordinal not in range(128)

E:\DLstudy\Scripts\python.exe E:/PycharmProjects/DLstudy/run/train_model.py
[INFO] tuning hyperparameters...
Traceback (most recent call last):
  File "E:\PycharmProjects\DLstudy\run\train_model.py", line 22, in <module>
    model.fit(trainX, trainY)
  File "E:\DLstudy\lib\site-packages\sklearn\model_selection\_search.py", line 820, in fit
    with parallel:
  File "E:\DLstudy\lib\site-packages\joblib\parallel.py", line 725, in __enter__
    self._initialize_backend()
  File "E:\DLstudy\lib\site-packages\joblib\parallel.py", line 735, in _initialize_backend
    n_jobs = self._backend.configure(n_jobs=self.n_jobs, parallel=self,
  File "E:\DLstudy\lib\site-packages\joblib\_parallel_backends.py", line 494, in configure
    self._workers = get_memmapping_executor(
  File "E:\DLstudy\lib\site-packages\joblib\executor.py", line 20, in get_memmapping_executor
    return MemmappingExecutor.get_memmapping_executor(n_jobs, **kwargs)
  File "E:\DLstudy\lib\site-packages\joblib\executor.py", line 42, in get_memmapping_executor
    manager = TemporaryResourcesManager(temp_folder)
  File "E:\DLstudy\lib\site-packages\joblib\_memmapping_reducer.py", line 531, in __init__
    self.set_current_context(context_id)
  File "E:\DLstudy\lib\site-packages\joblib\_memmapping_reducer.py", line 535, in set_current_context
    self.register_new_context(context_id)
  File "E:\DLstudy\lib\site-packages\joblib\_memmapping_reducer.py", line 560, in register_new_context
    self.register_folder_finalizer(new_folder_path, context_id)
  File "E:\DLstudy\lib\site-packages\joblib\_memmapping_reducer.py", line 590, in register_folder_finalizer
    resource_tracker.register(pool_subfolder, "folder")
  File "E:\DLstudy\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py", line 191, in register
    self._send('REGISTER', name, rtype)
  File "E:\DLstudy\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py", line 204, in _send
    msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 18-20: ordinal not in range(128)

Process finished with exit code 1

Solutioin:

Original error code:

model = GridSearchCV(LogisticRegression(), params, cv=3, n_jobs=-1)

Set parameter n_jobs = - 1 parameter can be deleted and changed to:

model = GridSearchCV(LogisticRegression(), params, cv=3)

After a look, this parameter indicates how many processors we need to work

 

n_jobs : int, default=None
        Number of jobs to run in parallel.
        ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
        ``-1`` means using all processors. See :term:`Glossary <n_jobs>`
        for more details.

If n_jobs = – 1 is specified, there is a step at the bottom to use ASCII for coding, but the coding fails every time
therefore, if we do not specify this parameter, one processor will be used by default.

If you really want to specify multiple processors

Then we need to modify the code of the path with the problem in our error message
for example, in our error messages:

  File "E:\DLstudy\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py", line 204, in _send
    msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 18-20: ordinal not in range(128)

Note the location of my error is e:\dlstudy\lib\site packages\joblib\externals\rocky\backend\resource_Line 204 of tracker.py In the _send method, click
Source code of _send function:

  def _send(self, cmd, name, rtype):
        msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('ascii')
        if len(name) > 512:
            # posix guarantees that writes to a pipe of less than PIPE_BUF
            # bytes are atomic, and that PIPE_BUF >= 512
            raise ValueError('name too long')
        nbytes = os.write(self._fd, msg)
        assert nbytes == len(msg)

Change
msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('ascii')
to
msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('utf8')
That is, the encoding is changed to utf-8, and the changed code is as follows.

```python
  def _send(self, cmd, name, rtype):
        msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('utf8')
        if len(name) > 512:
            # posix guarantees that writes to a pipe of less than PIPE_BUF
            # bytes are atomic, and that PIPE_BUF >= 512
            raise ValueError('name too long')
        nbytes = os.write(self._fd, msg)
        assert nbytes == len(msg)

Then run the code again
don’t worry, you will still report errors. Because we only modified the encoding method, but not the decoding method
the error information is as follows:

     .............(...)
    splitted = line.strip().decode('ascii').split(':')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 18: ordinal not in range(128)
Traceback (most recent call last):
  File "E:\DLstudy\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py", line 253, in main
    splitted = line.strip().decode('ascii').split(':')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 18: ordinal not in range(128)

Similarly, find the error path in the error message: e:\dlstudy\lib\sitepackages\joblib\externals\loky\backend\resource

There is an error in line 253 of the tracker.py file. We found the corresponding location:

......(...)
       with open(fd, 'rb') as f:
            while True:
                line = f.readline()
                if line == b'':  # EOF
                    break
                try:
                    splitted = line.strip().decode('ascii').split(':')
                    # name can potentially contain separator symbols (for
                    # instance folders on Windows)
                    cmd, name, rtype = (
                        splitted[0], ':'.join(splitted[1:-1]), splitted[-1])
......(...)

We just need to replace
line. Strip(). Decode ('ascii '). Split (': ')
with
line. Strip(). Decode (' 'utf8). Split (': ') ,
Run the file again to succeed.

Interpreter error /com.fasterxml.jackson.databind.JavaType

In the process of rabbitmq docking, after importing the jar package, errors such as mismatch of jar package version and lack of method are reported during compilation;

Processing method: import the missing jar package

<dependency>
  <groupId>com.fasterxml.jackson.core</groupId>
  <artifactId>jackson-annotations</artifactId>
  <version>2.9.7</version>
</dependency>
<dependency>
  <groupId>com.fasterxml.jackson.core</groupId>
  <artifactId>jackson-databind</artifactId>
  <version>2.9.8</version>
</dependency>
<dependency>
  <groupId>com.fasterxml.jackson.core</groupId>
  <artifactId>jackson-core</artifactId>
  <version>2.9.8</version>
</dependency>

Leetcode error: address sanitizer: detailed analysis and solution of deadlysignal

Leetcode error: address sanitizer: detailed analysis and solution of deadlysignal

Problem description, problem analysis, case analysis, error source code, OK code after source code analysis and solution

For more summary, please refer to: C brush questions: leetcode brush questions step on the pit common bug summary

Problem description


Error: addresssanitizer: deadlysignal, details are as follows

===42====ERROR:AddressSanitizer: SEGV on unknown address xx. 
The signal is caused by a READ memory access.

Problem analysis


In general, there may be the following problems:

Out of bounds, the array reference exceeds the left and right bounds, infinite recursion, the code can not normally end, return function input and output parameters, return processing error

According to the above ideas, the example code is analyzed to solve the bug.

Case analysis


The questions come from leetcode topic 46. Please refer to the problem analysis blog for details. Paste the problem code of the above error in the first edition:

Error source code

Main key code of the first layer:

/**
 * Return an array of arrays of size *returnSize.
 * The sizes of the arrays are returned as *returnColumnSizes array.
 * Note: Both returned array and *columnSizes array must be malloced, assume caller calls free().
 */
int g_trackNum; // Used for temporary stacking during recursive calls
int g_rowPos;

// Subfunction declaration
int isContanin(int *nums, int len, int val);
void backtrack(int *nums, int numsSize, int **returnColumnSizes, int *track);

// Main call function
int** permute(int* nums, int numsSize, int* returnSize, int** returnColumnSizes)
{
    // Calculate all possible totals
    int row = 1, i;
    for (i = numsSize; i > 0; i--) {
        row *= i;
    }
    *returnSize = row;

    printf("row = %d\n", row);

    // Request a two-dimensional array of the corresponding size and allocate space
    returnColumnSizes = (int **)malloc((row + 10) * sizeof(int*));
    if (returnColumnSizes == NULL) {
        return NULL;
    }
    int *p;
    for (i = 0; i < row; i++) {
        p = (int*)malloc((numsSize + 10) * sizeof(int));
        if (p == NULL) {
            return NULL;
        }
        returnColumnSizes[i] = p;
    }
    p = (int*)malloc(numsSize * sizeof(int));
    if (p == NULL) {
        return NULL;
    }
    int *track = p;

    // Backtrack to exhaust all possible permutations
    g_trackNum = 0;
    g_rowPos = 0;
    backtrack(nums, numsSize, returnColumnSizes, track); // Put the result from line 0

    // Returns returnSize and a two-dimensional pointer
    return returnColumnSizes;
}

Recursive code for backtracking implementation:

void backtrack(int *nums, int numsSize, int **returnColumnSizes, int *track)
{
    // Reach the leaf node track join returnColumSizes, the recorded path has been equal to the length of the array stop
    int i;
    if (g_trackNum == numsSize) {
        // printf("back: g_rowPos = %d\n", g_rowPos);
        for (i = 0; i < numsSize; i++) {
            // printf("back: g_rowPos = %d\n", g_rowPos);
            returnColumnSizes[g_rowPos][i] = track[i];
        }
        g_rowPos++;
        return;
    }

    // Recursive traversal
    for (i = 0; i < numsSize; i++) {
        // check if the current value is in the track
        if (isContanin(track, g_trackNum, nums[i])) {
            continue;
        }

        // If not, join track
        // printf("back: g_trackNum = %d\n", g_trackNum);
        track[g_trackNum++] = nums[i];
        
        // Continue traversing backwards
        backtrack(nums, numsSize, returnColumnSizes, track);
        // After the node returns, retrieve the value in the track
        g_trackNum--;
    }

    return;
}

Sub function to determine whether the current value has been traversed

int isContanin(int *nums, int len, int val)
{
    int flag = 0;
    int i;
    for (i = 0; i < len; i++) {
        if (nums[i] == val) {
            flag = 1;
            break;
        }
    }
    return flag;
}

Source code analysis

check possible problem 1 : out of bounds, array reference out of bounds

There are two main ideas:
1. Try to allocate enough space first to see if it is a space problem
2. Print subscript value in advance where it may cross the boundary to see if it overflows. Because address disinfection is interrupted at runtime, you can use printf to print the situation before the interruption.

Method 1

At each possible cross-border reference, the subscript is printed in advance to record the subscript coefficient printed before the program crashes. An example of the printing code is as follows: printf ("row = D/N", row)printf("back: g_ rowPos = %d\n", g_ rowPos);printf("back: g_ trackNum = %d\n", g_ trackNum);

Method 2

When initializing the array allocation space, forcibly allocates enough space to ensure that the space is sufficient. If there is no error after increasing the space, it means that the array reference is out of bounds

After running the code, it was found that subscript printing was normal, and no problem was found, so we continued to investigate possible problem 2.

check possible problem 2 : infinite recursion, the code can’t end and return normally

Print the output record directly in the termination condition of the recursive function backtrack, and observe whether the recursion is carried out according to the expected recursion mode. If there is no print record, then the function is not terminated and recursion is infinite all the time

After running the code, it is found that there is no such problem.

check possible problem 3 : error in the return processing of function input and output parameters
after carefully reading the input and output instructions of the first line of the code and comparing the implementation of online C code, it is found that the understanding of output parameters is wrong.

Variable secondary pointer returncolumnsizes stores the number of columns output in each row. Although the number of columns in the title is fixed, it needs to be assigned to the corresponding number of columns. And I first understood that this is the return pointer of a two-dimensional array. The return pointer of the two-dimensional array is passed through the function return parameter. The first address of the two-dimensional array allocated by direct return can be used.

After modifying the above problems, the code output is normal and no error is reported.

OK code after solution

// Determine if an element has been traversed
int isContain(int *nums, int len, int val)
{
    int flag = 0;
    int i;
    for (i = 0; i < len; i++) {
        if (nums[i] == val) {
            flag = 1;
            break;
        }
    }
    return flag;
}

// Note that this global variable is best defined by declaration only and initialized before backtrack
// in case the LeetCode judge does not re-initialize it, resulting in a miscalculation
int g_trackNum; // Used for temporary stacking during recursive calls
int g_rowPos; // Record each row

void backtrack(int *nums, int numsSize, int **returnColumnSizes, int *track)
{
    // Reach the leaf node track join returnColumSizes, the recorded path has been equal to the length of the array stop
    int i;
    if (g_trackNum == numsSize) {
        for (i = 0; i < numsSize; i++) {
            returnColumnSizes[g_rowPos][i] = track[i];
        }
        g_rowPos++;
        return;
    }

    // Recursive traversal
    for (i = 0; i < numsSize; i++) {
        // check if the current value is in the track
        if (isContain(track, g_trackNum, nums[i])) {
            continue;
        }

        // If not, add track
        track[g_trackNum++] = nums[i];
        // continue traversing backwards
        backtrack(nums, numsSize, returnColumnSizes, track);
        // After the node returns, retrieve the value in track
        g_trackNum--;
    }

    return;
}

int** permute(int* nums, int numsSize, int* returnSize, int** returnColumnSizes)
{
    // Calculate the total number of all possible n!
    int row = 1, i;
    for (i = numsSize; i > 0; i--) {
        row *= i;
    }
    *returnSize = row;

    // Calculate the number of columns in each row of the returned array
    *returnColumnSizes = (int *)malloc(sizeof(int) * (*returnSize));
    if (returnColumnSizes == NULL) {
        return NULL;
    }
    for (int i = 0; i < row; i++) {
        returnColumnSizes[0][i] = numsSize;
    }

    // Request a two-dimensional array of the corresponding size and allocate space
    int **res = (int **)malloc((row + 10) * sizeof(int*));
    if (res == NULL) {
        return NULL;
    }
    int *p;
    for (i = 0; i < row; i++) {
        p = (int*)malloc((numsSize + 10) * sizeof(int));
        if (p == NULL) {
            return NULL;
        }
        res[i] = p;
    }
    p = (int*)malloc(numsSize * sizeof(int));
    if (p == NULL) {
        return NULL;
    }
    int *track = p;

    // Backtrack to exhaust all possible permutations
    g_trackNum = 0;
    g_rowPos = 0;
    backtrack(nums, numsSize, res, track); // put the result from row 0

    return res;
}

Read in wannier90 output file hr.dat to construct real space Hamiltonian to calculate Fermi surface

def get_uniformity_kpoint(n):
    kPoints = []
    for i in range(n):
        for j in range(n):
            kPoints.append(i/n * rec[0] + j/n * rec[1])
    np.save("kPoints.npy", np.array(kPoints))
    return kPoints

Take points evenly along the inverted parallelogram. Carry out the previous calculation.

The whole Brillouin zone is restored by translation.

import numpy as np
import matplotlib.cm as cm
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d.axes3d import Axes3D
from matplotlib.ticker import LinearLocator, FormatStrFormatter

basis_vector = [[1.37287871,1.37287871,-2.74575742],[-2.74575742,1.37287871,1.37287871],[13.36629497,13.36629497,13.36629497]]
V = np.dot(basis_vector[0], np.cross(basis_vector[1], basis_vector[2]) )
rec = [np.cross(basis_vector[1], basis_vector[2]) * 2 * np.pi/V,
       np.cross(basis_vector[2], basis_vector[0]) * 2 * np.pi/V,
       np.cross(basis_vector[0], basis_vector[1]) * 2 * np.pi/V]

print(rec)
nk = 200
Ek = np.load("Ek_mesh.npy")[:, 0]
Ek = Ek.reshape(nk, nk)
print(Ek.shape)

kPoints = np.load("kPoints.npy")
print(kPoints.shape)
kx = kPoints[:,0]
ky = kPoints[:,1]
print(kx.shape)
kx = kx.reshape(nk, nk)
ky = ky.reshape(nk, nk)



plt.contour(kx, ky, Ek, [0.0])
plt.contour(kx + rec[0][0] -rec[1][0] , ky + rec[0][1] -rec[1][1] , Ek, [0.0])
plt.contour(kx - rec[0][0], ky - rec[0][1] , Ek, [0.0])
plt.contour(kx - rec[1][0], ky - rec[1][1] , Ek, [0.0])
plt.contour(kx  +rec[0][0], ky + rec[0][1] , Ek, [0.0])
plt.show()

To solve the problem of increasing video memory when training network (torch)

Method 1

torch.backends.cudnn.enabled = True
torch.backends.cudnn.benchmark = True

Principle:

Cundnn follows the following criteria:

    if the dimension or type of network input data changes little, set    torch.backends.cudnn.benchmark  =  true   It can increase the operation efficiency; If the input data of the network changes every iteration, cndnn will find the optimal configuration every time, which will improve the operation efficiency

     
    Method 2

    Tensor calculation should be written as follows:

    train_loss += loss.item()

Mongodb java version 3.X, prompt “XXX (an encryption algorithm) is not available” when there is a user name and password

First of all, describe the project environment: Maven + mongodb-java-driver-3.2.2

The original project was written with Mongo 2. X driver, and then upgraded to 3. X. The first big change is user name and password verification.

This is the way to get mongoclient.


import java.util.ArrayList;
import java.util.List;
import java.util.Properties;

import com.mongodb.MongoClient;
import com.mongodb.MongoClientURI;
import com.mongodb.MongoCredential;
import com.mongodb.ServerAddress;

/**
 * MongoDBUtils
 * @author administration
 *
 */
public class DBUtil {

	/**
	 * Connect to mongoDB with username password (MongoCredential method)
	 * @param mongodbAddr
	 * @param databaseName
	 * @return
	 */
	public static MongoClient getMongoClientByCredent(String mongodbAddr, String databaseName){
		MongoClient mongoClient;
		Properties p = TDTFile.getProperAddr("db.properties");
		String user = p.getProperty("username");
		String pswd = p.getProperty("password");
		List<ServerAddress> serverAddrList = new ArrayList<ServerAddress>();
		ServerAddress serverAddress = new ServerAddress(mongodbAddr);
		serverAddrList.add(serverAddress);
		List<MongoCredential> credentialList = new ArrayList<MongoCredential>();
		MongoCredential credential = MongoCredential.createCredential(user, databaseName, pswd.toCharArray());
		credentialList.add(credential);
		mongoClient = new MongoClient(serverAddrList, credentialList);
		return mongoClient;
	}
	
	/**
	 * Connect to mongoDB with username password (URI method)
	 * @param mongodbAddr
	 * @param databaseName
	 * @return
	 */
	public static MongoClient getMongoClientByURI(String mongodbAddr, String databaseName){
		MongoClient mongoClient;
		Properties p = TDTFile.getProperAddr("db.properties");
		String user = p.getProperty("username");
		String pswd = p.getProperty("password");
		//System.out.println(user + "," + pswd);
		String uri = String.format("mongodb://%s:%s@%s/%s", user, pswd, mongodbAddr, databaseName);
		System.out.println(uri);
		MongoClientURI mongoClientURI = new MongoClientURI(uri);
		mongoClient = new MongoClient(mongoClientURI);
		return mongoClient;
	}
	
}

Two ways are no problem, run well in eclipse. But!! After packaging with MVN package, it can’t be used. After a night and a day’s analysis, I locked the root cause of the error in the statement that the command line executed the command. My sentence is like this:

java -Djava.ext.dirs=../lib Other parameters

The problem lies in the – D parameter. Baidu knows that – D is equivalent to setting external environment variables. Since my main method relies on many third-party jar packages, it seems that there is nothing wrong with it. But it ignores a problem, that is

After using the – D parameter to specify other directories, Java needs to load% Java_ The jar package in the home% \ JRE/lib/ext directory is no longer loaded!! I searched the usage of – D parameter on the Internet for a long time, but no one mentioned it.. But it’s hard for me. One day is wasted here..

 

Solution:

Package the required external dependent packages, including some packages such as JDK’s own encryption algorithm, into lib with Maven. Problem solving. By the way, the algorithm package required by mongodb3. X is% Java_ HOME%\jre\lib\ext\sunjce_ provider.jar

 

Causes and solutions of errors in “insufficient ram for flash algorithms”

The error of “insufficient ram for flash algorithms” usually has a prompt window of “can’t load flash programming algorithm!” as shown in the following figure:
in case of “insufficient ram for flash algorithms” error, there will be a prompt window of “can’t load flash programming algorithm!”

“Insufficient ram for flash algorithms” literally means “insufficient RAM space for loading flash algorithm”.

This error usually occurs after adding a new flash burning algorithm.

Reason: the flash burning algorithm itself is also equivalent to a small program, which is executed by the chip in the process of burning program from JLINK to flash, so the burning algorithm needs to allocate memory space in the burning process. Open the settings of the utilities tab to see its configuration options;

As shown in the figure:

The place indicated by the arrow in the figure is the size of the RAM space for storing the burning algorithm. If the allocation of this place is too small, the above error message will be caused.

The problem can be solved by changing its size to a larger one.

For reference only.

If you want to reprint it, please note: the reasons and solutions for the errors in insufficientram for flash algorithms

Twitter’s distributed self increasing ID algorithm snowflake (Java version)

summary

In distributed systems, there are some scenarios where a globally unique ID is needed. In this case, in order to prevent ID conflicts, a 36 bit UUID can be used. However, UUID has some disadvantages. First, it is relatively long. In addition, UUID is generally unordered.

Sometimes we want to use a simpler ID, and we want the ID to be generated in time order.

Twitter’s snowflake solved this problem. At first, twitter migrated the storage system from Mysql to Cassandra. Because Cassandra had no sequential ID generation mechanism, it developed such a global unique ID generation service.

structure

The structure of each part is as follows:

0 – 0000000000 0000000000 0000000000 0000000000 0 – 00000 – 00000 – 000000000000

The first bit is unused, the next 41 bits are milliseconds (the length of 41 bits can be used for 69 years), then 5-bit datacenterid and 5-bit workerid (the length of 10 bits can support deployment of 1024 nodes at most), and the last 12 bits are counts within milliseconds (the 12 bit counting sequence number supports 4096 ID serial numbers per millisecond for each node)

A total of 64 bits, a long type. (length of converted string is 18)

The IDs generated by snowflake are sorted according to the time increment, and there is no ID collision (distinguished by datacenter and workerid) in the whole distributed system, and the efficiency is high. It is said that snowflake can generate 260000 IDS per second.

Source code

(Java version of the source)

/**
 * Twitter_Snowflake<br>
 * The structure of SnowFlake is as follows (each part is separated by -):<br>
 * 0 - 0000000000 0000000000 0000000000 0000000000 0 - 00000 - 00000 - 000000000000 <br>
 * 1-bit identifier, as the long basic type is signed in Java, the highest bit is the sign bit, positive numbers are 0, negative numbers are 1, so the id is generally positive, the highest bit is 0 <br>
 * 41-bit time intercept (milliseconds), note that the 41-bit time intercept is not a time intercept to store the current time, but the difference between the time intercept (current time intercept - start time intercept)
 * the value obtained), where the start time intercept, generally our id generator to start using the time specified by our program (the following program IdWorker class startTime property). 41-bit time intercept, you can use 69 years, year T = (1L << 41)/(1000L * 60 * 60 * 24 * 365) = 69<br>
 * 10-bit data machine bits that can be deployed in 1024 nodes, including 5-bit datacenterId and 5-bit workerId<br>
 * 12-bit sequential, millisecond counting, 12-bit counting sequence number supports 4096 ID sequential numbers per node per millisecond (same machine, same time cutoff)<br>
 * Add up to exactly 64 bits for a Long type. <br>
 * The advantage of SnowFlake is that the overall self-increasing sorting by time and no ID collision within the whole distributed system (distinguished by data center ID and machine ID), and high efficiency, tested, SnowFlake can generate about 260,000 IDs per second.
 */
public class SnowflakeIdWorker {

    // ==============================Fields===========================================
    /** Start time cutoff (2015-01-01) */
    private final long twepoch = 1420041600000L;

    /* The number of bits occupied by the machine id */
    private final long workerIdBits = 5L;

    /* The number of bits occupied by the data identifier id */
    private final long datacenterIdBits = 5L;

    /* The maximum machine id supported, resulting in 31 (this shift algorithm can quickly calculate the maximum number of decimal digits that can be represented by a few binary digits) */
    private final long maxWorkerId = -1L ^ (-1L << workerIdBits);

    /* The maximum supported data identifier id, resulting in 31 */
    private final long maxDatacenterId = -1L ^ (-1L << datacenterIdBits);

    /** The number of bits in the id that the sequence occupies */.
    private final long sequenceBits = 12L;

    /** The machine ID is shifted 12 bits to the left */.
    private final long workerIdShift = sequenceBits;

    /** The data identifier id is shifted to the left by 17 bits (12+5)*/.
    Private final long datacenterIdShift = sequenceBits + workerIdBits。

    /** The time truncation is shifted to the left by 22 bits (5+5+12) */.
    private final long timestampLeftShift = sequenceBits + workerIdBits + datacenterIdBits。

    /** Generate the mask for the sequence, here 4095. (0b111111111111=0xfff=4095) */
    private final long sequenceMask = -1L ^ (-1L << sequenceBits);

    /** Work machine ID(0~31) */
    private long workerId;

    /** Work machine ID(0~31) */
    private long datacenterId;

    /** Intra-millisecond sequence (0~4095) */
    private long sequence = 0L;

    /** Time cutoff of the last generated ID */
    private long lastTimestamp = -1L;

    //==============================Constructors=====================================
    /**
     * Constructor
     * @param workerId Job ID (0~31)
     * @param datacenterId datacenterId (0~31)
     */
    public SnowflakeIdWorker(long workerId, long datacenterId) {
        if (workerId > maxWorkerId || workerId < 0) {
            throw new IllegalArgumentException(String.format("worker Id can't be greater than %d or less than 0", maxWorkerId));
        }
        if (datacenterId > maxDatacenterId || datacenterId < 0) {
            throw new IllegalArgumentException(String.format("datacenter Id can't be greater than %d or less than 0", maxDatacenterId));
        }
        this.workerId = workerId;
        this.datacenterId = datacenterId;
    }

    // ==============================Methods==========================================
    /**
     * Get the next ID (this method is thread-safe)
     * @return SnowflakeId
     */
    public synchronized long nextId() {
        long timestamp = timeGen();

        //If the current time is less than the timestamp of the last ID generation, it means that the system clock is backed off and an exception should be thrown at this time
        if (timestamp < lastTimestamp) {
            throw new RuntimeException(
                    String.format("Clock moved backwards.  Refusing to generate id for %d milliseconds", lastTimestamp - timestamp));
        }

        //If it was generated at the same time, then perform a sequence within milliseconds
        if (lastTimestamp == timestamp) {
            sequence = (sequence + 1) & sequenceMask;
            // sequence overflow in milliseconds
            if (sequence == 0) {
                //Block until the next millisecond, get the new timestamp
                timestamp = tilNextMillis(lastTimestamp);
            }
        }
        //Timestamp change, sequence reset in milliseconds
        else {
            sequence = 0L;
        }

        //Time cutoff of the last generated ID
        lastTimestamp = timestamp;

        //Shifted and put together by orthogonal operations to form a 64-bit ID
        return ((timestamp - twepoch) << timestampLeftShift) //
                | (datacenterId << datacenterIdShift) //
                | (workerId << workerIdShift) //
                | sequence;
    }

    /**
     * Block to the next millisecond until the new timestamp is obtained
     * @param lastTimestamp Time cutoff of the last generated ID
     * @return currentTimestamp
     */
    protected long tilNextMillis(long lastTimestamp) {
        long timestamp = timeGen();
        while (timestamp <= lastTimestamp) {
            timestamp = timeGen();
        }
        return timestamp;
    }

    /**
     * Returns the current time in milliseconds
     * @return current time in milliseconds
     */
    protected long timeGen() {
        return System.currentTimeMillis();
    }

    //==============================Test=============================================
    /** TEST */
    public static void main(String[] args) {
        SnowflakeIdWorker idWorker = new SnowflakeIdWorker(0, 0);
        for (int i = 0; i < 1000; i++) {
            long id = idWorker.nextId();
            System.out.println(Long.toBinaryString(id));
            System.out.println(id);
        }
    }
}

“21442;” 32771;

Implementation of tupledesc and tuple in mit6.830 lab1 / exercise 1

Tupledesc is the schema of tuple, for example, for the table student

id(int)  name(string)  sex(string)
1           xxx         m
2           yyy         f

Then (1, XXX, m) is a tuple, and tupledesc is (ID (int) name (string) sex (string)) .

 

Estimated time: 2 hours

Difficulty: easy

 

Key and difficult points

1. Make clear the data structure of tuple and tupledesc, and implement them with ArrayList

2. Java programming language details

(a) Fast processing of get and set methods

(b) Automatic production of idea with equals and hashcode methods

(c) Some pits in ArrayList

3. Run through all unit tests.

 

 

Numpy realizes the forward propagation process of CNN

Due to the natural tensor property of numpy, the implementation code with numpy is very concise and has few parameters.

In this version, only a small number of operations are numpy processed, and most operations are for loops, only to understand the algorithm

import numpy as np

def zero_pad(X, pad):
    """
    Pad with zeros all images of the dataset X. The padding is applied to the height and width of an image, 
    as illustrated in Figure 1.
    
    Argument:
    X -- python numpy array of shape (m, n_H, n_W, n_C) representing a batch of m images
    pad -- integer, amount of padding around each image on vertical and horizontal dimensions
    
    Returns:
    X_pad -- padded image of shape (m, n_H + 2*pad, n_W + 2*pad, n_C)
    """

    return np.pad(X, ((0, 0), (pad, pad), (pad, pad), (0, 0)), 'constant', constant_values=0)


def conv_single_step(a_slice_prev, W, b):
    """
    Apply one filter defined by parameters W on a single slice (a_slice_prev) of the output activation 
    of the previous layer.
    
    Arguments:
    a_slice_prev -- slice of input data of shape (f, f, n_C_prev)
    W -- Weight parameters contained in a window - matrix of shape (f, f, n_C_prev)
    b -- Bias parameters contained in a window - matrix of shape (1, 1, 1)
    
    Returns:
    Z -- a scalar value, result of convolving the sliding window (W, b) on a slice x of the input data
    """
    s = np.multiply(a_slice_prev, W) + b
    Z = np.sum(s)
    return Z

def conv_forward(A_prev, W, b, hparameters):
    """
    Implements the forward propagation for a convolution function
    
    Arguments:
    A_prev -- output activations of the previous layer, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
    W -- Weights, numpy array of shape (f, f, n_C_prev, n_C)
    b -- Biases, numpy array of shape (1, 1, 1, n_C)
    hparameters -- python dictionary containing "stride" and "pad"
        
    Returns:
    Z -- conv output, numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache of values needed for the conv_backward() function
    """
    
    # Retrieve dimensions from A_prev's shape  
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    
    # Retrieve dimensions from W's shape
    (f, f, n_C_prev, n_C) = W.shape

    # Retrieve information from "hparameters"
    stride = hparameters['stride']
    pad = hparameters['pad']
    
    # Compute the dimensions of the CONV output volume using the formula given above.
    n_H = int((n_H_prev - f + 2 * pad) / stride) + 1
    n_W = int((n_W_prev - f + 2 * pad) / stride) + 1
    
    # Initialize the output volume Z with zeros.
    Z = np.zeros((m, n_H, n_W, n_C))
    
    # Create A_prev_pad by padding A_prev
    A_prev_pad = zero_pad(A_prev, pad)
    
    for i in range(m):                                 
        a_prev_pad = A_prev_pad[i]                    
        for h in range(n_H):                           
            for w in range(n_W):                       
                for c in range(n_C):                  
                    # Find the corners of the current "slice"
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride
                    horiz_end = horiz_start + f
                    # Use the corners to define the (3D) slice of a_prev_pad
                    a_slice_prev = a_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :]
                    # Convolve the (3D) slice with the correct filter W and bias b, to get back one output neuron.
                    Z[i, h, w, c] = conv_single_step(a_slice_prev, W[...,c], b[...,c])
                                        
    # Making sure your output shape is correct
    assert(Z.shape == (m, n_H, n_W, n_C))
    
    # Save information in "cache" for the backprop
    cache = (A_prev, W, b, hparameters)
    
    return Z, cache

def relu(Z):
    return np.maximum(0, Z)


def pool_forward(A_prev, hparameters, mode = "max"):
    """
    Implements the forward pass of the pooling layer
    
    Arguments:
    A_prev -- Input data, numpy array of shape (m, n_H_prev, n_W_prev, n_C_prev)
    hparameters -- python dictionary containing "f" and "stride"
    mode -- the pooling mode you would like to use, defined as a string ("max" or "average")
    
    Returns:
    A -- output of the pool layer, a numpy array of shape (m, n_H, n_W, n_C)
    cache -- cache used in the backward pass of the pooling layer, contains the input and hparameters 
    """
    
    # Retrieve dimensions from the input shape
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    
    # Retrieve hyperparameters from "hparameters"
    f = hparameters["f"]
    stride = hparameters["stride"]
    
    # Define the dimensions of the output
    n_H = int(1 + (n_H_prev - f) / stride)
    n_W = int(1 + (n_W_prev - f) / stride)
    n_C = n_C_prev
    
    # Initialize output matrix A
    A = np.zeros((m, n_H, n_W, n_C))              
    
    for i in range(m):                           
        for h in range(n_H):                     
            for w in range(n_W):                
                for c in range (n_C):            
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride
                    horiz_end = horiz_start + f
                    
                    # Use the corners to define the current slice on the ith training example of A_prev, channel c. (≈1 line)
                    a_prev_slice = A_prev[i, vert_start:vert_end, horiz_start:horiz_end, c]
                    
                    # Compute the pooling operation on the slice. Use an if statment to differentiate the modes. Use np.max/np.mean.
                    if mode == "max":
                        A[i, h, w, c] = np.max(a_prev_slice)
                    elif mode == "average":
                        A[i, h, w, c] = np.mean(a_prev_slice)
    
    # Store the input and hparameters in "cache" for pool_backward()
    cache = (A_prev, hparameters)
    
    # Making sure your output shape is correct
    assert(A.shape == (m, n_H, n_W, n_C))
    
    return A, cache


def conv_relu_pooling_forward(A_prev, W, b, hparameters_conv, hparameters_pool, mode_pool = "max"):
    # Retrieve dimensions from A_prev's shape  
    (m, n_H_prev, n_W_prev, n_C_prev) = A_prev.shape
    
    # Retrieve dimensions from W's shape
    (f, f, n_C_prev, n_C) = W.shape

    # Retrieve information from "hparameters"
    stride_conv = hparameters_conv['stride']
    pad_conv = hparameters_conv['pad']

    stride = hparameters_pool['stride']
    f = hparameters_pool['f']
    
    # Compute the dimensions of the CONV output volume using the formula given above.
    n_H_conv = int((n_H_prev - f + 2 * pad_conv) / stride_conv) + 1
    n_W_conv = int((n_W_prev - f + 2 * pad_conv) / stride_conv) + 1
    
    # Initialize the output volume Z with zeros.
    Z = np.zeros((m, n_H_conv, n_W_conv, n_C))
    
    # Create A_prev_pad by padding A_prev
    A_prev_pad = zero_pad(A_prev, pad_conv)

    n_H = int(1 + (n_H_conv - f) / stride)
    n_W = int(1 + (n_W_conv - f) / stride)
    n_C = n_C_prev

    Z_out = np.zeros((m, n_H, n_W, n_C))  


    for i in range(m):                                 
        a_prev_pad = A_prev_pad[i]

        for h in range(n_H_conv):                           
            for w in range(n_W_conv):                       
                for c in range(n_C):                  
                    # Find the corners of the current "slice"
                    vert_start = h * stride_conv
                    vert_end = vert_start + f
                    horiz_start = w * stride_conv
                    horiz_end = horiz_start + f
                    # Use the corners to define the (3D) slice of a_prev_pad
                    a_slice_prev = a_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :]
                    # Convolve the (3D) slice with the correct filter W and bias b, to get back one output neuron.
                    Z[i, h, w, c] = conv_single_step(a_slice_prev, W[...,c], b[...,c])
                    Z[i, h, w, c] = max(0.0, Z[i, h, w, c])

        for h in range(n_H):                     
            for w in range(n_W):                
                for c in range (n_C):            
                    vert_start = h * stride
                    vert_end = vert_start + f
                    horiz_start = w * stride
                    horiz_end = horiz_start + f
                    
                    # Use the corners to define the current slice on the ith training example of A_prev, channel c. (≈1 line)
                    a_prev_slice = Z[i, vert_start:vert_end, horiz_start:horiz_end, c]
                    
                    # Compute the pooling operation on the slice. Use an if statment to differentiate the modes. Use np.max/np.mean.
                    if mode_pool == "max":
                        Z_out[i, h, w, c] = np.max(a_prev_slice)
                    elif mode_pool == "average":
                        Z_out[i, h, w, c] = np.mean(a_prev_slice)
                
    # Making sure your output shape is correct
    assert(Z_out.shape == (m, n_H, n_W, n_C))
    
    # Save information in "cache" for the backprop
    # cache = (A_prev, W, b, hparameters)
    
    return Z