Skip to content

Pulling data from the LEGO product website

Reading Time: 1

As I mentioned in a tweet, I worked with my son in realtime to pull data from the LEGO website. He wanted to see all the pieces counts for products. So we built this ruby script together to pull the data and parse it with Hpricot. Took about 10 minutes.


require 'rubygems'
require 'open-uri'
require 'hpricot'

# URL below is for 7-9 year-old full listing of products
LEGOURL = "http://shop.lego.com/ByAge/Leaf.aspx?cn=100005&d=100001&va=1"
# Might need these to change country and language
#/xt/ChangeCountry.aspx?ShipTo=US&return=' + returnPage;
#/xt/ChangeLanguage.aspx?LangId=2057&return=' + returnPage;

# Basic structure of lego product listing
# div.class = ThumbText -> ul -> li -> p.class=underline
# div.class = ThumbText -> ul -> li -> p.class=itemsText2
# 

doc = Hpricot(open(LEGOURL))
i = 1
doc.search("div.ThumbText").each do |description_box|
  description_box.search("p.underline").each do |title|
    item_name = title.to_plain_text.delete("\n")
    pieces_section = description_box.search("p.itemsText2")
    if pieces_section && !pieces_section.empty?
      pieces_section.each do |pieces|
        if pieces.to_plain_text =~ /Pieces/
          puts "#{i}\t#{item_name}\t" + pieces.to_plain_text.delete("\n").delete("Pieces:")
        else
          puts "#{i}\t#{item_name}\t0"
        end
      end
    else
      puts "#{i}\t#{item_name}\t0"
    end
    i += 1
  end
end

Post a Comment

Your email is never published nor shared. Required fields are marked *
*
*