Lock wait timeout exceeded — transaction and index

Problem description
There are two errors reported in the project, one is the task script execution time is too long, the other is the lock wait time is too long, as follows

Lock wait timeout exceeded; try restarting transaction


This task script is used to synchronize data between multiple tables, and there are multiple scripts and other code in the project that may manipulate table data and generate locks. There are two tables with millions of data levels and a large number of items.
If you are not familiar with deadlocks and lock waits, you can quickly jump to the links below to understand what a lock wait is and why it might be causing it. Deadlocks and lock waits
Problem analysis
First of all, let’s put out the pseudocode for this part.

void method(){
	The transaction begins;
	while(true) {
		List<ObjectA> list1 = Get 100 data from table A according to the criteria;
		if(list1 is empty){ // if it is empty, it means the data has been processed.
			return;
		}
		
		// manipulate ObjectA data sequentially
		for(ObjectA objectA: list1) {
			Delete table A data;
			Merge with table B data and update the corresponding data;
			Synchronize the table C data;
		}
	}
	That's it;

Translated with www.DeepL.com/Translator (free version)
}

The main reason is that there is a problem with the location of the transaction addition, and the data table has no corresponding index, resulting in the update time is too long.
Tasks take too long to execute
This is mainly because there is a lack of index when updating the data. When the data volume is relatively large, it takes a long time to find the corresponding data. This can be analyzed through the database SQL statement execution time, because our unit has a big data platform, through the analysis of the big data platform, find the longest execution time of several SQL analysis. If there is no relevant platform, it is suggested to add logging before and after the query and update operation of some tables with large data volume, so as to facilitate the analysis of SQL execution.
Lock wait times are too long
This is the result of both transactions and indexes.
During the execution of a transaction, all tables involved in updating/adding in the transaction are locked and wait when other processes request data from this part of the table. The missing index leads to long loops in the code and transactions are added to the entire loop, so the lock wait time will time out, causing the problem at the beginning of the article. In addition to adding indexes at the database level, you need to adjust the content of the code, remove transactions, or optimize transaction locations.
Problem solving
1. Database: We built the index in Table B according to the query conditions. The average execution time was 10min before, but it was shortened to 1min after addition.
2. Code: Due to the characteristics of our business, this code, even if there is an exception in the update, as long as the data in Table A is still retained, it will not have an impact on the overall business. The next time the script executes, we can add this data back, so we remove the transaction, restructure the code, and add an exception handler. The resulting code looks like this.

void method(){
	while(true) {
		List<ObjectA> list1 = Get 100 data from table A according to the conditions;
		if(list1 is empty){ // if it is empty, the data has been processed, the end
			return;
		}
		
		// manipulate ObjectA data sequentially
		for(ObjectA objectA: list1) {
			try{
				Merge with table B data and update the corresponding data;
				Synchronize table C data;
				Delete the data from table A;
			} catch(Exception e) {
				Logging exceptions;
			} finally {
				Logging execution time;

Translated with www.DeepL.com/Translator (free version)
			}
		}
	}
}

Of course, if the business scenario requires a transaction, you can also pull out the code inside the loop and do the transaction separately.

Read More: