As I mentioned in a tweet, I worked with my son in realtime to pull data from the LEGO website. He wanted to see all the pieces counts for products. So we built this ruby script together to pull the data and parse it with Hpricot. Took about 10 minutes.
require 'rubygems'
require 'open-uri'
require 'hpricot'
# URL below is for 7-9 year-old full listing of products
LEGOURL = "http://shop.lego.com/ByAge/Leaf.aspx?cn=100005&d=100001&va=1"
# Might need these to change country and language
#/xt/ChangeCountry.aspx?ShipTo=US&return=' + returnPage;
#/xt/ChangeLanguage.aspx?LangId=2057&return=' + returnPage;
# Basic structure of lego product listing
# div.class = ThumbText -> ul -> li -> p.class=underline
# div.class = ThumbText -> ul -> li -> p.class=itemsText2
#
doc = Hpricot(open(LEGOURL))
i = 1
doc.search("div.ThumbText").each do |description_box|
description_box.search("p.underline").each do |title|
item_name = title.to_plain_text.delete("\n")
pieces_section = description_box.search("p.itemsText2")
if pieces_section && !pieces_section.empty?
pieces_section.each do |pieces|
if pieces.to_plain_text =~ /Pieces/
puts "#{i}\t#{item_name}\t" + pieces.to_plain_text.delete("\n").delete("Pieces:")
else
puts "#{i}\t#{item_name}\t0"
end
end
else
puts "#{i}\t#{item_name}\t0"
end
i += 1
end
end
Post a Comment