Tag Archives: Search – recommendation

A solution to automatically convert special characters into Unicode when taking out data from MySQL and encapsulating it into JSON

    @Test
    public void xxx() throws ParseException, UnsupportedEncodingException, Exception {
        ArrayList<JSONObject> list = new ArrayList<>();
        String s = "Appliances jerry-built, poor quality clothing ...... still believe that "e-commerce custom products" more affordable";
        JSONObject json = new JSONObject();
        json.put("title", s);
        JSONObject json1 = new JSONObject();
        json1.put("title", s);
        list.add(json);
        list.add(json1);
        System.out.println("old:"+list.toString());
        System.out.println("new"+StringEscapeUtils.unescapeJava(list.toString()));
    }

Output:
before transformation: [{“title”: “home appliances cut corners and poor clothing quality”}]
after transformation [{“title”: “home appliances cut corners and poor clothing quality”}]
after transformation [{“title”: “home appliances cut corners and poor clothing quality”} Also believe that “e-commerce customized products” are more affordable “}, {” title “:” home appliances cut corners, poor quality of clothing Also believe that “e-commerce customized products” are more affordable “}]

Web Crawler: How to get the data in the web page and disguise the header, disguise as a browser to visit many times, avoid a single visit leading to IP blocked

User agent: user agent. It is a kind of identification that provides information such as browser type, operating system and version, CPU type, browser rendering engine, browser language, browser plug-in, etc. The UA string is sent to the server every time the browser makes an HTTP request

Referer: http referer is a part of the header. When a browser sends a request to a web server, it usually brings a referer to tell the server which page I’m linking from, so that the server can get some information for processing

	public static String getHtmls(String url) throws IOException {
		RequestConfig globalConfig = RequestConfig.custom().setCookieSpec(CookieSpecs.IGNORE_COOKIES).build();
		String html = "";
		CloseableHttpClient httpClient = HttpClients.custom().setDefaultRequestConfig(globalConfig).build();
		HttpGet httpget = new HttpGet(url);
		//Browser identifier (OS identifier; encryption level identifier; browser language) Rendering engine identifier Version information
		httpget.setHeader("User-Agent","Mozilla/5.0 (Linux; U; Android 2.3.6; zh-cn; GT-S5660 Build/GINGERBREAD) AppleWebKit/533.1 (KHTML, like Gecko) Version/4.0 Mobile Safari/533.1 MicroMessenger/4.5.255");
	    // Camouflage head
		httpget.setHeader("Referer", "https://mp.weixin.qq.com");
		
		try {
			HttpResponse responce = httpClient.execute(httpget);//
			int resStatu = responce.getStatusLine().getStatusCode();
			if (resStatu == HttpStatus.SC_OK) {

				HttpEntity entity = responce.getEntity();
				if (entity != null) {
					html = EntityUtils.toString(entity);// Get html source code
				}
			}
		} catch (Exception e) {
			System.out.println("request " + url + " error!");
			e.printStackTrace();
		} finally {
			// close
			httpClient.close();
		}
		return html;
	}

java.lang.NoSuchMethodError Quote: javax.servlet.com Yeah. HttpServletRequest.isAsyncStarted ()Z

When developing jetty 9 embedded system, it starts normally, but when browsing the page, an error is reported as follows:

java.lang.NoSuchMethodError : javax.servlet.http . HttpServletRequest.isAsyncStarted () Z
reason: jetty 9 relies on servlet API version 3. X. if other third-party open source libraries in the project implicitly rely on servlet API version 2. X, this error will be reported.
Reprinted: https://www.cnblogs.com/yjmyzz/p/5090990.html

Parsing double quotation marks with JSON

Parse a JSON data:

{“manifest”:{ Version:“3.0”}}

If you look carefully, this string is not in the normal JSON format. Version lacks double quotation marks. It should be:

{“manifest”:{ “Version”: “3.0”}}

Reprinted: https://www.cnblogs.com/afluy/p/4023838.html

If used

JSONObject mainfestObject.getJSONObject (“manifest”);

This method analysis will report an error, but if you use

String mainfestStr = object.optString (“manifest”, “”);

JSONObject mainfestObject = new JSONObject(mainfestStr);

The above method is successful!

The solution cannot be separated due to a special separator

Today, when dealing with text data, we encountered this kind of text matching with space and tab regularization, which did not work. Later, I asked my colleagues and found that “\ \ uf8f5” could be used to match.

Pending text:

A	abbr.安 
A-10IInone.美空军主力近距离空中支援攻击机,无愧为“坦克杀手”。
A-12none.夭折的美海军第一种隐形舰载攻击机。
A-4  none.54年服役的单座轻型舰载攻击机,现仍被多国使用。
A-6none.双座重型全天候舰载攻击机,主要用于低空突防,可进行核打击。
A-7IInone.离开沙场的单座亚音速攻击机,曾是美海空军主力。
A-OKnone.极好, 妙极, 完美的
A-Znone.无所不包的
A-boilern.原子反应器加热用的锅炉
A-bombn.原子弹
A-certificatenone.儿童不宜n.A级
A-controln.原子能管制
A-energyn.原子能
A-framen.金字塔形建筑物
A-lovelnone.英语学校里某一课程结束时举行的高深考试, 高深级考试及格
A-oneadj.第一等的, 第一流的
A-roadnone.A级公路, 主车道
A-siden.A面
A-testn.原子爆炸试验
A-weaponn.原子武器

Separation processing:

	public static void main(String[] args) throws Exception {
		String dic = util.Directory.GetAppPath("steamData") + "dic.txt.bak";
		BufferedReader br = util.MyFileTool.GetBufferReader(dic);
		while(br.ready()) {
			String line = br.readLine();
			String[] words = line.split("\\uf8f5");
			System.out.println("size: " + words.length);
			System.out.println(words[0]);
		}
		br.close();
	}

Special JSON array of special bracket

	@Test
	public void demo93() throws Exception {
		String str = "[\"a\", \"b\", \"c\"]";
		//Generate json arrays
		JSONArray createArray = new JSONArray();
		createArray.put("a");
		createArray.put("b");
		createArray.put("c");
		System.out.println("createJSONArray: " + createArray);
		//Parsing json arrays
		JSONArray parseArray = new JSONArray(str);
		System.out.println("parseJSONArray: " + parseArray);
		for(int i = 0; i < parseArray.length(); i++) {
			System.out.print(parseArray.get(i) + " ");
		}
	}

Output:

Crawler: crawls news websites with cookies to get web content

If you use the HTTP protocol to request, the following information will be reported:

Error: sslhandshake error is known. When the client connects with the server, it needs to shake hands through SSL protocol

(2) use: rewrite the defaulthttpclient method to support SSL protocol

package httpsParse;
import java.security.cert.CertificateException;  
import java.security.cert.X509Certificate;  
import javax.net.ssl.SSLContext;  
import javax.net.ssl.TrustManager;  
import javax.net.ssl.X509TrustManager;  
import org.apache.http.conn.ClientConnectionManager;  
import org.apache.http.conn.scheme.Scheme;  
import org.apache.http.conn.scheme.SchemeRegistry;  
import org.apache.http.conn.ssl.SSLSocketFactory;  
import org.apache.http.impl.client.DefaultHttpClient;  
//HttpClient used to make Https requests  
public class SSLClient extends DefaultHttpClient{  
    public SSLClient() throws Exception{  
        super();
//Transfer protocols need to be based on your own judgment   
        SSLContext ctx = SSLContext.getInstance("TLSv1.2");  
        X509TrustManager tm = new X509TrustManager() {  
                @Override  
                public void checkClientTrusted(X509Certificate[] chain,  
                        String authType) throws CertificateException {  
                }  
                @Override  
                public void checkServerTrusted(X509Certificate[] chain,  
                        String authType) throws CertificateException {  
                }  
                @Override  
                public X509Certificate[] getAcceptedIssuers() {  
                    return null;  
                }  
        };  
        ctx.init(null, new TrustManager[]{tm}, null);  
        SSLSocketFactory ssf = new SSLSocketFactory(ctx,SSLSocketFactory.ALLOW_ALL_HOSTNAME_VERIFIER);  
        ClientConnectionManager ccm = this.getConnectionManager();  
        SchemeRegistry sr = ccm.getSchemeRegistry();  
        sr.register(new Scheme("https", 443, ssf));  
    }  
}

(PIT) and then use httpclient to request the source code of the web page:


    public static void main(String[] args) throws Exception {
    	HttpClientUtil httpClientUtil = new HttpClientUtil();
    	String url = "https://www.yidaiyilu.gov.cn/zchj.htm";
		String html = httpClientUtil.doGet(url);
		System.out.println(html);
	}

Finally, the result is a JS code

<script>var x="@catch@@@d@@toString@@String@@36@pathname@if@@toLowerCase@var@855@captcha@@Array@@@1@@for@1500@@document@@@@chars@attachEvent@addEventListener@substr@Expires@@false@f@0@fromCharCode@innerHTML@@@@8@@@@@@@split@parseInt@createElement@g@new@16@search@May@@https@@reverse@@RegExp@@while@@@charCodeAt@rOm9XFMtA3QKV7nYsPGT4lifyWwkq5vcjH2IdxUoCbhERLaz81DNB6@@10@JgSe0upZ@else@match@0xFF@@@07@length@@e@eval@@@19@@@Path@a@div@setTimeout@cookie@3@5@@0xEDB88320@@GMT@challenge@@@Tue@@@window@@href@return@try@@@@@location@onreadystatechange@function@1557242170@DOMContentLoaded@@firstChild@replace@__jsl_clearance@charAt@join@".replace(/@*$/,"").split("@"),y="g 3b=3q(){31('3o.3h=3o.c+3o.1s.40(/[\\?|&]i-39/,\\'\\')',q);s.32='41=3r.h|19|'+(3q(){g 1i=[3q(3b){3i 2n('9.1a('+3b+')')},(3q(){g 3b=s.1o('30');3b.1b='<2u 3h=\\'/\\'>3l</2u>';3b=3b.3u.3h;g 1i=3b.2f(/20?:\\/\\//)[19];3b=3b.14(1i.2k).f();3i 3q(1i){p(g 3l=19;3l<1i.2k;3l++){1i[3l]=3b.42(1i[3l])};3i 1i.43('')}})()],3l=[[([(-~[]<<-~[])]*(((+!+{})+[(-~[]<<-~[])]>>(-~[]<<-~[])))+[])+[-~~~!{}+[~~[]]-(-~~~!{})],(-~{}+[]+[[]][19])+[~~'']+[-~(+!+{})],[34]+(-~[-~{}-~{}]+[[]][19]),[-~{}-~[-~{}-~{}]]+(((-~[]<<-~[])<<(-~[]<<-~[]))+[[]][19]),(-~{}+[]+[[]][19])+(-~{}+[]+[[]][19])+[-~(+!+{})],(-~{}+[]+[[]][19])+(-~{}+[]+[[]][19])+[-~{}-~[-~{}-~{}]],[33-~(+!+{})-~(+!+{})]+(-~[-~{}-~{}]+[[]][19]),[34]+[-~(+!+{})],[-~~~!{}+[~~[]]-(-~~~!{})]+(((-~[]<<-~[])<<(-~[]<<-~[]))+[[]][19]),[33-~(+!+{})-~(+!+{})]+(-~[-~{}-~{}]+[[]][19]),(-~{}+[]+[[]][19])+[~~'']+[33-~(+!+{})-~(+!+{})]],[(-~{}+[]+[[]][19])+(((-~[]<<-~[])<<(-~[]<<-~[]))+[[]][19]),[33-~(+!+{})-~(+!+{})]],[[34]+[-~{}-~[-~{}-~{}]],(-~[-~{}-~{}]+[[]][19])+[33-~(+!+{})-~(+!+{})],[34]+[~~''],([(-~[]<<-~[])]*(((+!+{})+[(-~[]<<-~[])]>>(-~[]<<-~[])))+[])+([(-~[]<<-~[])]*(((+!+{})+[(-~[]<<-~[])]>>(-~[]<<-~[])))+[]),([(-~[]<<-~[])]*(((+!+{})+[(-~[]<<-~[])]>>(-~[]<<-~[])))+[])+(((-~[]<<-~[])<<(-~[]<<-~[]))+[[]][19]),(-~{}+[]+[[]][19])+(-~{}+[]+[[]][19])+[34]],[(-~{}+[]+[[]][19])+[-~(+!+{})],([(-~[]<<-~[])]*(((+!+{})+[(-~[]<<-~[])]>>(-~[]<<-~[])))+[])],[[34]+(-~{}+[]+[[]][19]),(((-~[]<<-~[])<<(-~[]<<-~[]))+[[]][19])+[~~''],[34]+[34],[34]+([(-~[]<<-~[])]*(((+!+{})+[(-~[]<<-~[])]>>(-~[]<<-~[])))+[]),[-~{}-~[-~{}-~{}]]+(((-~[]<<-~[])<<(-~[]<<-~[]))+[[]][19])],[(-~{}+[]+[[]][19])+(((-~[]<<-~[])<<(-~[]<<-~[]))+[[]][19]),(-~{}+[]+[[]][19])+[-~(+!+{})]],[([(-~[]<<-~[])]*(((+!+{})+[(-~[]<<-~[])]>>(-~[]<<-~[])))+[])+[-~~~!{}+[~~[]]-(-~~~!{})],(-~[-~{}-~{}]+[[]][19])+[33-~(+!+{})-~(+!+{})],[34]+(-~{}+[]+[[]][19]),([(-~[]<<-~[])]*(((+!+{})+[(-~[]<<-~[])]>>(-~[]<<-~[])))+[])+(((-~[]<<-~[])<<(-~[]<<-~[]))+[[]][19])]];p(g 3b=19;3b<3l.2k;3b++){3l[3b]=1i.22()[(-~{}+[]+[[]][19])](3l[3b])};3i 3l.43('')})()+';15=3c, 2j-1t-2q 1r:1r:2c 38;2t=/;'};d((3q(){3j{3i !!3f.13;}2(2m){3i 17;}})()){s.13('3s',3b,17)}2e{s.12('3p',3b)}",f=function(x,y){var a=0,b=0,c=0;x=x.split("");y=y||99;while((a=x.shift())&&(b=a.charCodeAt(0)-77.5))c=(Math.abs(b)<13?(b+48.5):parseInt(a,36))+y*c;return c},z=f(y.match(/\w/g).sort(function(x,y){return f(x)-f(y)}).pop());while(z++)try{eval(y.replace(/\b\w+\b/g, function(y){return x[f(y,z)-1]||("_"+y)}));break}catch(_){}</script>

At first, I suspected that it was the cause of the cookie. Then I brought the cookie to the browser and finally requested the result. However, the cookie has a validity period. After a period of time, the cookie will be invalid. Therefore, this method will not work. Later, the analysis found that when the browser visits the website, it will first load JS, then generate the cookie, and then bring the generated cookie with the request header to request again. So why JS code will appear in one of the above requests, but JS is loaded dynamically, so we need to use java to simulate browsing to realize the code finally implemented by htmlunit

package cn.server;


import org.openqa.selenium.htmlunit.HtmlUnitDriver;


public class GFDynamicWeb {
	public static HtmlUnitDriver driver = new HtmlUnitDriver();
	public static boolean isGetCookie = false;
//	public static boolean isRepeatExec = false;
	public static String GetContent(String url) {
		if(!isGetCookie) {
			driver.setJavascriptEnabled(true);
			//First load js get cookie
			driver.get(url);
		}
		driver.setJavascriptEnabled(false);
		//Second load page source code
		driver.get(url);
        String pageSource = driver.getPageSource();
        isGetCookie = true;
		return pageSource;
	}
	public static void renewIsGetCookie() {
		isGetCookie = false;
	}
	public static void closeDriver() {
		driver.close();
	}
    public static void main(String[] args) {
    	long s = System.currentTimeMillis();
    	for(int i = 0; i < 100; i ++) {
        	String url = "https://www.yidaiyilu.gov.cn/";
    		String content = GetContent(url);
    		System.out.println(content);
    	}
    	long e = System.currentTimeMillis();
    	System.out.println((e - s)/1000 + "秒");
    	renewIsGetCookie();
    	closeDriver();
    }
}

Website used during the period:

Online interface test

521 status code function

521 error problem solution

java.lang.IllegalArgumentException : urlcoder exception resolution

Exception:

Exception in thread “main” java.lang.IllegalArgumentException : URLDecoder: Illegal hex characters in escape (%) pattern – For input string: “u9”
    at java.net.URLDecoder .decode( URLDecoder.java:194 )
    at com.hbzx.controller . PayResultController.main ( PayResultController.java:253 )
reasons:

Java call URLDecoder.decode (STR, “UTF-8”); the main reason for throwing the above exception is that% is a special character in the URL and needs special escape,

Solution: replace the% sign in the string with% 25

solve:

 url = url.replaceAll("%(?![0-9a-fA-F]{2})", "%25");
   String urlStr = URLDecoder.decode(url, "UTF-8");

D-page address: https://blog.csdn.net/afgasdg/article/details/40304817