Table of contents
Because the world of the Internet is becoming ever more fragmented. In the past, the Internet dissolved physical boundaries and allowed all people to live in an increasingly flat world. However, since the beginning of the mobile Internet in 2010, the world of the Chinese Internet has become more and more fragmented – the ocean of the Internet has become an island. And each silo is building its own barrier to defend itself against the invasion of other competitors.
Once the Internet was built on websites, each of which could be cross-referenced and indexed by search engines; now the Internet is built on apps, each of which has its own built-in search and rejects indexing by external search engines. As early as 2008, Taobao blocked access to Baidu, China’s largest search engine, and in 2015, Taobao blocked WeChat, China’s largest communications app. Shortly thereafter, WeChat immediately fought back by blocking Taobao links from being shared on the WeChat app, causing you to see very strange characters when you get a Taobao share link, such as
1.0 xixi:/!SfZcXOaZ4QG嘻 口罩收纳盒
Such meaningless strings. For an e-commerce platform, products are very critical data. To protect their merchandise data, most companies build anti-crawl barriers. For our service, it is becoming increasingly difficult to crawl an item that a customer wants to buy.
How to share Taobao app products
- When you browse products on the mobile Taobao app, you have two ways to get links to products: via the ellipsis at the top right, or the share button at the bottom right. Let’s say you click the share button.
- In the new window that pops up at the bottom, you have many ways to share Taobao products. For example, copy the product link, save the product as an image with QR code, share it to WeChat, QQ, etc. Suppose you clicked the first button, which is the Copy Link button.
- A new window pops up telling you that the code has been copied. Even from the text description you can see the difference between the two. The second screenshot you click on the copy link button and the third screenshot tells you that the code has been copied successfully. Notice that the code is not a link. Why?
9.0哈Br6QXMykNtT啊 魔道祖师动画周边魏无羡蓝忘机手办人偶
Because most of Taobao’s customers are Chinese, and almost every Chinese person uses WeChat. Since, as mentioned above, Taobao and WeChat blocked each other back in 2015, the genius engineers at Taobao came up with a solution: generate a bunch of strings that don’t seem to have any pattern to them, so that WeChat can’t recognize that this is a share from Taobao.
When another person received this strange string on WeChat, he first copied it and then opened the Taobao app. since every time he opened the Taobao app, it first read the phone’s clipboard, it was able to get this strange string, recognize it, and jump to the corresponding product page. And when you submit this strange string to our system – since it is not a real link, we are not able to recognize it.
Notice that the content of the share that each person gets from the Share Button may be different, and it is possible that you will see something like this.
4.0哈zdg8XMyPq3p信 https://m.tb.cn/h.fbq2A38?sm=a0d2db 魔道祖师动画周边魏无羡蓝忘机手办人偶
Only in this case, the string carries a link to the product, and we can crawl and analyze it like a normal link.
To summarize: if you get a share code that does not contain a link, we do not support crawling. You can copy the code first and then open the Taobao app and try to get the share code with a link via the share button.
The form of sharing code | Support crawl |
9.0哈Br6QXMykNtT啊 魔道祖师动画周边魏无羡蓝忘机手办人偶 | NO |
4.0哈zdg8XMyPq3p信 https://m.tb.cn/h.fbq2A38?sm=a0d2db 魔道祖师动画周边魏无羡蓝忘机手办人偶 | YES |
How we solve the crawl failure problem
We are a very small startup and our technical prowess is not comparable to that of a tech giant like Taobao. Although we are also actively dealing with anti-crawl resistance, it does cause problems for many customers due to crawl failures.
Please don’t worry when you submit an order and find that you can’t get the pictures, prices and other product information, our staff will update these product data manually.
If you want to solve this problem immediately, then you can upload the content of a product page through the HTML button we set up, so that we can get the product data quickly.
Step 1: Get product page content
When the crawl fails, we need your help. You can help us get product information by uploading product page content.
Get product page content in Chrome
Right-click anywhere on the page to see the View Page Source option. Click on it to see the HTML code of the page. Copy all the content of this page, go back to 42agent and paste it.
Other ways to get the page HTML code: Type: view-source: at the beginning of the URL in the browser address bar to get the page HTML content.
Get product page content in Safari
Right-click anywhere on the page and you will see the Show Page Source option. Click on it to see the HTML code of the page. Copy all the content of this page and go back to 42agent to paste it.
The procedure is similar for other browsers. However, please note that the copied product page is the page corresponding to the URL of the product you want to buy.
Step 2: Uploading product data via HTML button
Just upload the HTML content you got in the first step.
Get the content of the Xianyu product page
The Xianyu product page is a bit more complex and cannot be obtained simply by the first step. Here you need to download a Chrome plugin and get the HTML content of the page by the following way.
1, please download and install a Chrome plugin
2, on the opened product page, click the plugin icon to get the HTML content of the page
3, Click the Copy button to copy the content
4, upload the data via our HTML button.
Related Pages