First and foremost, do not touch web scrawler if you are 100% sure that you wanna do this.
The target website is enlightent, a third-party data website which have data of my company and component.
We need to do some background research first. The problems I encountered are listed:
- Need to log in with WeChat account by QR code
- Simulate click (by package selenium)
- It’s a dynamic website, you need to wait for its information loaded (by package time)
- Write into MySQL (by package Pymysql)
Step 1: Find the pattern in html. Using chrome, just ctrl+u or ctrl+shift+i. It needs your patience to find the thing you want. If you mistaken the pattern, you cannot get the information you want
Step 2: Choose the function: by_path or by_class. The tricky point is that if there is only one class, it’s okay to use by_class, if there are more than two classes, selenium would choose the first class as your output. As a result, I choose by_path
Step 3: Install chrome driver according to your chrome version. Be sure to download into /anaconda3/lib/site-packages.
# Choose daily
album_separately = string_list[j][string_list[j].find('data-name='):string_list[j].find('data-channeltype="tv"')]
db = DB('your database')