Background
Recently, when running sparksql, I frequently print logs and report errors in the middle of execution:
WARN TaskMemoryManager: Failed to allocate a page (104876 bytes), try again.
WARN TaskMemoryManager: Failed to allocate a page (104876 bytes), try again.
…
reason
There is not enough memory to perform tasks, and resources need to be recycled frequently
Solution:
1. Optimize SQL scripts. (preferred, that’s how I solved it at that time)
2. Increase driver memory, — driver memory 6g
My SQL at that time was simplified as follows:
select name
from stu
where id in (select id from in_stu);
Stu data volume is 800W, in_stu data volume is 1.2kW
Optimized as:
select name
from stu
where id in (select distinct id from in_stu);
after optimization, The data volume of In_stu ID is reduced to 11 W, and the problem is solved.