Commit 0be81c2
Fix broken Facebook profile parsing (#217)
* Initial plan
* Fix Facebook parsing: use meta tags + facebookexternalhit UA
Facebook changed their page structure: the old __bbox JSON regex with
"complete" and "sequence_number" no longer exists, and the flag
<title>Facebook</title> doesn't match public profiles (which have
user-specific titles). Additionally, the default Chrome User-Agent
causes Facebook to redirect to login.
Fix by:
- Switching to BeautifulSoup meta tag extraction (og:title, og:url,
og:image, og:description, al:android:url for uid)
- Updating flags to match public profile pages
- Adding url_mutations with facebookexternalhit User-Agent so
Facebook serves actual page content
Agent-Logs-Url: https://github.com/soxoj/socid-extractor/sessions/e66dfc76-d822-40b9-ad17-6af6d2e4e19b
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>
---------
Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: soxoj <31013580+soxoj@users.noreply.github.com>1 parent 7d7f654 commit 0be81c2
3 files changed
Lines changed: 66 additions & 23 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
67 | 67 | | |
68 | 68 | | |
69 | 69 | | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
76 | | - | |
77 | | - | |
| 70 | + | |
| 71 | + | |
78 | 72 | | |
79 | | - | |
80 | | - | |
81 | | - | |
82 | | - | |
83 | | - | |
84 | | - | |
85 | | - | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
86 | 86 | | |
87 | 87 | | |
88 | 88 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
218 | 218 | | |
219 | 219 | | |
220 | 220 | | |
221 | | - | |
| 221 | + | |
222 | 222 | | |
223 | | - | |
224 | | - | |
| 223 | + | |
| 224 | + | |
| 225 | + | |
225 | 226 | | |
226 | | - | |
227 | | - | |
228 | | - | |
229 | | - | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
230 | 230 | | |
231 | | - | |
232 | 231 | | |
233 | 232 | | |
234 | 233 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1245 | 1245 | | |
1246 | 1246 | | |
1247 | 1247 | | |
| 1248 | + | |
| 1249 | + | |
| 1250 | + | |
| 1251 | + | |
| 1252 | + | |
| 1253 | + | |
| 1254 | + | |
| 1255 | + | |
| 1256 | + | |
| 1257 | + | |
| 1258 | + | |
| 1259 | + | |
| 1260 | + | |
| 1261 | + | |
| 1262 | + | |
| 1263 | + | |
| 1264 | + | |
| 1265 | + | |
| 1266 | + | |
| 1267 | + | |
| 1268 | + | |
| 1269 | + | |
| 1270 | + | |
| 1271 | + | |
| 1272 | + | |
| 1273 | + | |
| 1274 | + | |
| 1275 | + | |
| 1276 | + | |
| 1277 | + | |
| 1278 | + | |
| 1279 | + | |
| 1280 | + | |
| 1281 | + | |
| 1282 | + | |
| 1283 | + | |
| 1284 | + | |
| 1285 | + | |
| 1286 | + | |
| 1287 | + | |
| 1288 | + | |
| 1289 | + | |
| 1290 | + | |
| 1291 | + | |
0 commit comments