有没有想过,把你写的爬虫装进手机里?
比如:
- 想听歌时,后台自动爬取音乐的资源并播放;
- 想搜图时,后台自动爬取 高清图接口并下载;
- 想看人时,一键聚合搜索社交用户数据。
今天我们将实战一个MoonMusic。它的核心不是 UI,而是强大的异步数据采集层。
🔧 核心技术栈
- 数据采集 (Crawler): httpx (异步 HTTP 请求), BeautifulSoup4 (HTML 解析)
- 并发控制 (Concurrency): asyncio (协程调度)
- 数据可视化 (GUI): Flet (基于 Flutter 的 Python UI 框架)
- 部署 (Deploy): Android APK / iOS IPA
第一部分:硬核爬虫设计
1. 逆向 API 分析与封装
我们通过抓包(F12 Network),分析出了各大平台的搜索接口。为了统一调用,我们定义了一个 CrawlerService 类。
2. asyncio.gather 实现真·并发
这是本项目的技术高光时刻。用户输入一个关键词,我们同时向 3-5 个平台发起请求
3. 反爬与 Cookie 管理
为了应对大厂的反爬策略,我们设计了 DataHelper 类来专门管理 Cookie 和 Headers。
- 支持从 config.json 动态读取 Cookie(比如 VIP 账号)。
- 随机 User-Agent 生成。
- Referer 防盗链处理。
📱 第二部分:Flet 可视化 (The Visualization)
有了强大的爬虫后端,我们需要一个“皮肤”来展示数据。Flet 允许我们用 Python 写出类似 Flutter 的原生界面。
1. 列表渲染 (ListView)
爬虫返回的 JSON 数据,直接映射为 Flet 的 UI 组件列表。
2. 移动端音频流处理
对于爬取到的 .mp3 或 .m4a 链接,我们不使用 Pygame(兼容性差),而是直接调用 Flet 的 ft.Audio,它底层调用的是 Android 的 ExoPlayer,支持流式播放。
📦 第三部分:从 Python 脚本到 Android APK
这是爬虫工程师最想学的技能:如何让你的脚本脱离电脑运行?
- 环境:安装 flet 库。
- 命令:在项目根目录运行:
flet build apk - 原理:Flet 会自动拉取 Flutter 引擎,将你的 Python 爬虫代码编译成字节码,并打包进 APK 中。
📥 运行与源码
本项目是一个绝佳的“爬虫 + GUI”练手案例,涵盖了逆向、并发、UI 设计和移动端打包。
效果图:


代码(建议去GitHub链接看,分层更清晰详细,当前代码不全):
1import warnings 2 3 4warnings.filterwarnings("ignore", category=UserWarning, module="pygame") 5warnings.filterwarnings("ignore", category=DeprecationWarning) 6 7import flet as ft 8import httpx 9import asyncio 10import json 11import base64 12import os 13import random 14import re 15import time 16import uuid 17import urllib.parse 18from bs4 import BeautifulSoup 19import pygame 20from mutagen.mp3 import MP3 21 22 23# ========================================== 24# 4. Service 层 25# ========================================== 26 27class CrawlerService: 28 def __init__(self, helper): 29 self.helper = helper 30 31 async def search_netease(self, keyword): 32 url = "https://music.163.com/api/search/get/web" 33 params = {"s": keyword, "type": 1, "offset": 0, "total": "true", "limit": 10} 34 async with httpx.AsyncClient(verify=False) as client: 35 try: 36 headers = self.helper.get_headers("netease") 37 resp = await client.post(url, headers=headers, data=params) 38 data = resp.json() 39 songs = data['result']['songs'] 40 results = [] 41 for s in songs: 42 pic_url = s.get('album', {}).get('picUrl', '') 43 if not pic_url and s.get('artists'): pic_url = s['artists'][0].get('img1v1Url', '') 44 results.append({ 45 "name": s['name'], 46 "artist": s['artists'][0]['name'], 47 "id": s['id'], 48 "media_id": s['id'], 49 "pic": pic_url, 50 "url": f"http://music.163.com/song/media/outer/url?id={s['id']}.mp3", 51 "source": "网易" 52 }) 53 return results 54 except: 55 return [] 56 57 async def get_qq_purl(self, songmid, media_id=None): 58 if not media_id: media_id = songmid 59 guid = str(random.randint(1000000000, 9999999999)) 60 file_types = [{"prefix": "M500", "ext": "mp3", "mid": media_id}, 61 {"prefix": "C400", "ext": "m4a", "mid": media_id}] 62 url = "https://u.y.qq.com/cgi-bin/musicu.fcg" 63 data = { 64 "req": {"module": "CDN.SrfCdnDispatchServer", "method": "GetCdnDispatch", 65 "param": {"guid": guid, "calltype": 0, "userip": ""}}, 66 "req_0": { 67 "module": "vkey.GetVkeyServer", 68 "method": "CgiGetVkey", 69 "param": { 70 "guid": guid, 71 "songmid": [songmid] * 2, 72 "songtype": [0] * 2, 73 "uin": self.helper.qq_uin, 74 "loginflag": 1, 75 "platform": "20", 76 "filename": [f"{ft['prefix']}{ft['mid']}.{ft['ext']}" for ft in file_types] 77 } 78 } 79 } 80 async with httpx.AsyncClient(verify=False) as client: 81 try: 82 headers = self.helper.get_headers("qq") 83 resp = await client.get(url, params={"data": json.dumps(data)}, headers=headers) 84 js = resp.json() 85 midurlinfos = js.get('req_0', {}).get('data', {}).get('midurlinfo', []) 86 sip = js.get('req_0', {}).get('data', {}).get('sip', []) 87 for info in midurlinfos: 88 if info.get('purl'): 89 base = sip[0] if sip else "http://ws.stream.qqmusic.qq.com/" 90 return f"{base}{info['purl']}" 91 return "" 92 except: 93 return "" 94 95 async def search_qq(self, keyword): 96 search_url = f"https://c.y.qq.com/soso/fcgi-bin/client_search_cp?p=1&n=10&w={keyword}&format=json" 97 async with httpx.AsyncClient(verify=False) as client: 98 try: 99 headers = self.helper.get_headers("qq") 100 resp = await client.get(search_url, headers=headers) 101 text = resp.text 102 if text.startswith("callback("): 103 text = text[9:-1] 104 elif text.endswith(")"): 105 text = text[text.find("(") + 1:-1] 106 data = json.loads(text) 107 songs = data['data']['song']['list'] 108 results = [] 109 for s in songs: 110 songmid = s['songmid'] 111 media_mid = s.get('media_mid', s.get('strMediaMid', songmid)) 112 albummid = s['albummid'] 113 pic = f"https://y.gtimg.cn/music/photo_new/T002R300x300M000{albummid}.jpg" if albummid else "" 114 results.append({ 115 "name": s['songname'], 116 "artist": s['singer'][0]['name'], 117 "id": songmid, 118 "media_id": media_mid, 119 "pic": pic, 120 "url": "", 121 "source": "QQ" 122 }) 123 return results 124 except: 125 return [] 126 127 async def search_kugou(self, keyword): 128 search_url = f"http://mobilecdn.kugou.com/api/v3/search/song?format=json&keyword={keyword}&page=1&pagesize=6" 129 async with httpx.AsyncClient(verify=False) as client: 130 try: 131 headers = self.helper.get_headers("kugou") 132 resp = await client.get(search_url, headers=headers) 133 data = resp.json() 134 songs = data['data']['info'] 135 tasks = [] 136 for s in songs: 137 tasks.append(client.get( 138 f"http://www.kugou.com/yy/index.php?r=play/getdata&hash={s['hash']}&album_id={s.get('album_id', '')}", 139 headers=headers)) 140 detail_resps = await asyncio.gather(*tasks, return_exceptions=True) 141 results = [] 142 for r in detail_resps: 143 if isinstance(r, httpx.Response): 144 try: 145 d = r.json()['data'] 146 if d['play_url']: 147 results.append({ 148 "name": d['audio_name'], 149 "artist": d['author_name'], 150 "id": d['hash'], 151 "media_id": d['hash'], 152 "pic": d['img'], 153 "url": d['play_url'], 154 "source": "酷狗" 155 }) 156 except: 157 pass 158 return results 159 except: 160 return [] 161 162 async def search_all(self, keyword, platform="all"): 163 tasks = [] 164 if platform in ["all", "netease"]: tasks.append(self.search_netease(keyword)) 165 if platform in ["all", "qq"]: tasks.append(self.search_qq(keyword)) 166 if platform in ["all", "kugou"]: tasks.append(self.search_kugou(keyword)) 167 results = await asyncio.gather(*tasks) 168 merged = [] 169 if results: 170 max_len = max(len(r) for r in results) 171 for i in range(max_len): 172 for r in results: 173 if i < len(r): merged.append(r[i]) 174 return merged 175 176 async def search_images_bing(self, keyword): 177 url = f"https://www.bing.com/images/search?q={keyword}&form=HDRSC2&first=1" 178 async with httpx.AsyncClient(verify=False, follow_redirects=True) as client: 179 try: 180 headers = { 181 "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36", 182 "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8", 183 "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8", 184 "Referer": "https://www.bing.com/" 185 } 186 resp = await client.get(url, headers=headers, timeout=8) 187 soup = BeautifulSoup(resp.text, 'html.parser') 188 189 results = [] 190 iusc_links = soup.select('a.iusc') 191 for link in iusc_links: 192 try: 193 m_str = link.get('m') 194 if m_str: 195 m_data = json.loads(m_str) 196 img_url = m_data.get('turl') or m_data.get('murl') 197 full_url = m_data.get('murl') 198 if img_url: 199 results.append({"url": full_url, "thumb": img_url}) 200 except: 201 continue 202 203 if not results: 204 imgs = soup.select('img.mimg') 205 for img in imgs: 206 src = img.get('src') or img.get('data-src') 207 if src and src.startswith('http'): 208 results.append({"url": src, "thumb": src}) 209 210 random.shuffle(results) 211 return results[:24] 212 except Exception as e: 213 print(f"搜图出错: {e}") 214 return [] 215 216 async def search_social_users(self, keyword, platform="all"): 217 results = [] 218 219 # 内部函数:B站 220 async def fetch_bili(client): 221 try: 222 bili_url = f"https://api.bilibili.com/x/web-interface/search/type?search_type=bili_user&keyword={urllib.parse.quote(keyword)}" 223 headers = self.helper.get_headers("bilibili") 224 if "Cookie" not in headers: headers["Cookie"] = "buvid3=infoc;" 225 resp = await client.get(bili_url, headers=headers) 226 data = resp.json() 227 local_res = [] 228 if data.get('code') == 0 and data.get('data') and data['data'].get('result'): 229 for user in data['data']['result'][:4]: 230 local_res.append({ 231 "platform": "Bilibili", 232 "name": user['uname'], 233 "desc": f"粉丝: {user.get('fans', 0)} | {user.get('usign', '')[:20]}...", 234 "pic": user['upic'].replace("http://", "https://"), 235 "url": f"https://space.bilibili.com/{user['mid']}" 236 }) 237 return local_res 238 except: 239 return [] 240 241 # 内部函数:微博 242 async def fetch_weibo(client): 243 try: 244 encoded_q = urllib.parse.quote(keyword) 245 weibo_url = f"https://m.weibo.cn/api/container/getIndex?containerid=100103type%3D3%26q%3D{encoded_q}&page_type=searchall" 246 headers = { 247 "User-Agent": "Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Mobile Safari/537.36", 248 "Referer": "https://m.weibo.cn/" 249 } 250 resp = await client.get(weibo_url, headers=headers) 251 data = resp.json() 252 local_res = [] 253 cards = data.get('data', {}).get('cards', []) 254 count = 0 255 for card in cards: 256 if count >= 3: break 257 if 'card_group' in card: 258 for item in card['card_group']: 259 if item.get('card_type') == 11 and 'user' in item: 260 u = item['user'] 261 local_res.append({ 262 "platform": "微博", 263 "name": u.get('screen_name'), 264 "desc": f"粉丝: {u.get('followers_count', 0)} | {u.get('description', '')[:20]}", 265 "pic": u.get('profile_image_url', ''), 266 "url": f"https://m.weibo.cn/u/{u.get('id')}" 267 }) 268 count += 1 269 return local_res 270 except: 271 return [] 272 273 # 使用短超时 4s,防止长时间阻塞 274 async with httpx.AsyncClient(verify=False, timeout=4.0) as client: 275 tasks = [] 276 if platform in ["all", "bilibili"]: 277 tasks.append(fetch_bili(client)) 278 if platform in ["all", "weibo"]: 279 tasks.append(fetch_weibo(client)) 280 281 # 并行执行 282 if tasks: 283 task_results = await asyncio.gather(*tasks, return_exceptions=True) 284 for tr in task_results: 285 if isinstance(tr, list): 286 results.extend(tr) 287 288 # 直达卡片 289 if platform in ["all", "douyin"]: 290 results.append({ 291 "platform": "抖音", 292 "name": f"搜索: {keyword}", 293 "desc": "点击直接跳转抖音网页版搜索", 294 "pic": "https://lf1-cdn-tos.bytegoofy.com/goofy/ies/douyin_web/public/favicon.ico", 295 "url": f"https://www.douyin.com/search/{urllib.parse.quote(keyword)}" 296 }) 297 298 if platform in ["all", "xiaohongshu"]: 299 results.append({ 300 "platform": "小红书", 301 "name": f"搜索: {keyword}", 302 "desc": "点击直接跳转小红书搜索页", 303 "pic": "https://ci.xiaohongshu.com/fd579468-69cb-4190-8457-377eb60c1d68", 304 "url": f"https://www.xiaohongshu.com/search_result?keyword={urllib.parse.quote(keyword)}" 305 }) 306 307 return results 308 309if __name__ == "__main__": 310 ft.run(main)
GitHub 仓库:
Github:MoonPointer-Byte/MoonMusic
核心文件说明:
- services/crawler.py: 爬虫核心逻辑(建议重点阅读)
- core/player.py: 播放器逻辑
- main.py: UI 界面逻辑
⚠️ 郑重声明
- 技术边界:本文仅探讨 httpx 异步请求技术与 App 打包流程。
- 仅供学习交流:本文所涉及的网易云音乐数据爬取相关内容,仅为个人学习交流目的而创作。旨在分享技术探索过程、帮助初学者理解网络数据获取原理,绝无任何商业用途。若您将相关技术用于商业活动,由此引发的一切法律责任与经济纠纷,均与作者无关。
- 数据使用限制:通过文中方法获取的网易云音乐数据,应严格遵循数据来源平台的使用规则及相关法律法规。禁止将这些数据用于非法目的,如侵犯他人知识产权、恶意传播、用于不正当竞争等行为。否则,您需自行承担相应的法律后果。
- 技术风险提示:文中所描述的技术手段可能会因网易云音乐平台的更新、反爬虫策略调整等因素而失效。在尝试复现相关操作时,您可能会遇到各种技术问题,甚至导致账号受限、设备异常等情况。作者无法对这些风险提供任何担保或承担责任,请您谨慎操作。
- **法律责任自负:**网络数据爬取涉及诸多法律问题,不同地区的法律法规对数据获取、使用的规定存在差异。在使用本文技术前,请确保您已充分了解并遵守当地法律法规。若因您的操作违反法律规定而产生法律纠纷,作者不承担任何法律责任,一切后果由您自行承担。
- 内容准确性与时效性:尽管作者在创作过程中尽力确保内容的准确性,但由于技术的快速发展和平台的不断变化,文中信息可能存在过时或不准确的情况。如果您发现内容存在错误或需要更新,请及时指出,但作者不承担因内容不准确或过时给您造成的任何损失。
⚠️ 版权声明
本文章及所属内容完全属于MoonPointer-Byte也就是本作者,其余用户可以进行学习,但不可商用,任何后续问题本作者不承担任何责任!!!
《【Python爬虫实战】用 Flet 把爬虫做成手机 App》 是转载文章,点击查看原文。
