Harvesting google profiles

Home > privacy, programming, python, Uncategorized > Harvesting google profiles

Harvesting google profiles

May 19, 2011 ilektrojohn Leave a comment Go to comments

Some minutes ago, I saw an interesting tweet from Mikko H. Hypponen saying that he found out that all (yes, as in ALL – 35,513,445 ) google profiles addresses can be retrieved from a single XML file . Looked through it and , yeap, he was quite right.

Well , all these information is going to be useful somehow ,right? Right. In case it’s going to be removed here is a simple way to harvest them before that happens :

#!/usr/bin/env python

import urllib
from BeautifulSoup import BeautifulStoneSoup as bs

xml = bs(urllib.urlopen('http://www.gstatic.com/s2/sitemaps/profiles-sitemap.xml').read())
for i in xml.findAll('loc'):
    try:
        urllib.urlretrieve(i.text, i.text[35:])
        print 'Downloaded %s' % i.text[35:]
    except Exception, err:
        print '%s could not be retrieved' % i.text
print 'All done'

That’s it, save it , run it and wait 🙂 Not that I used it, but I calculate that you get around 1.7 GB worth of profile links .

Well , the juicy part is obviously the harvesting of the information from the profiles themselves. People are mentioning on twitter that Google is aware for a long time, or at least should be. Thoughts about the potential implications from that harvesting, on a blogpost to come .

Categories: privacy, programming, python, Uncategorized Tags: google, harvesting, profiles

Comments (7) Trackbacks (1) Leave a comment Trackback

Sidney de Koning

May 23, 2011 at 1:45 pm

Reply

At least they are transparent 😉
Tom Νee

June 8, 2011 at 6:12 pm

Reply

For people getting a message about a beautiful soup that is missing:

sudo apt-get install python-beautifulsoup

Just sayin!
K. R.

February 13, 2012 at 6:17 pm

Reply

Sorry, new to linux. How do I “save” and then “run” the script?
- ilektrojohn
  
  March 6, 2012 at 10:03 pm
  
  Reply
  
  Hey,
  
  Sorry I had missed your comment somehow. The instructions from daneelrsixth are valid. You will need to have python ofcourse installed and BeautifulSoup ( either via easy_install ) or via your distributions package manager
daneelrsixth

March 6, 2012 at 9:51 pm

Reply

KR – copy the source code, open a text editor paste it, save the file as “google.py” than open the terminal go to the directory where you saved the file and digit “python google.py”. (PS i hope you are using an *nix system).

Btw the script doesn’t work know, they kinda fixed the issue.
- ilektrojohn
  
  March 6, 2012 at 10:04 pm
  
  Reply
  
  Just run it out of curiosity, seems to still work fine. From what I gather, from Google’s perspective this is not an “issue” .
  - daneelrsixth
    
    March 13, 2012 at 9:41 pm
    
    Well silly me, silly silly me… i had some problems with the beautiful soup module… i can confirm that the script works. 🙂