Quote:
Originally Posted by Michael Hill
|
I'm impressed! Yesterday I worked up the following Python code which harvests the FC offerings from the website and outputs them as a comma-separated-values list. But it's really nice to see these items compared to their commercial counterparts. The one column in my list that is not on your spreadsheet is "Max Qty" for those items that are quantity-limited.
I imported a run from this morning into Google Sheets at
https://docs.google.com/spreadsheets...it?usp=sharing
Code:
# -*- coding: utf-8 -*-
"""
Created on Wed Nov 26 21:39:26 2014
@author: John
"""
from __future__ import division # for python 2
import nltk, re, pprint
from nltk import word_tokenize
url = "http://www.firstchoicebyandymark.com/everything/"
html=urllib.urlopen(url).read()
#print len(html)," characters read from ",url
#raw=nltk.clean_html(html)
#print len(raw)," characters in cleaned html"
# cut out the useless stuff at the start and end
#istart=raw.find('<div class="product-item"')
#iend=raw.find('<div class="footer">')
#raw = raw[istart:iend]
# turn into tokens
tokens = word_tokenize(html)
text = nltk.Text(tokens)
def findy(searchFor,searchIn):
for num,value in enumerate(searchIn):
if value==searchFor:
return num
return 0
qx=[]
tokens=tokens[findy('product-item',tokens):] # Advance to first item
while findy('data-productid=',tokens)>0:
prodid=tokens[findy('data-productid=',tokens)+2]
link="http://www.firstchoicebyandymark.com"+tokens[findy('href=',tokens)+2]
if1=findy('Show',tokens)+3
if2=findy("''",tokens[if1:])
descrip=' '.join(tokens[if1:if1+if2])
if1=findy('actual-price',tokens)+3
price=tokens[if1]
if2=findy('maxqty',tokens)
if if2==0 or if2>if1:
maxqty='none'
else:
maxqty=tokens[if2+11]
end=findy('/div',tokens[if1:])+if1
qx.append([prodid,descrip,price,maxqty,link])
tokens=tokens[end:] # Advance to next item
print 'product id,description,price,max qty,link'
for x in qx:
print x[0],',"',x[1],'",',x[2],",",x[3],",",x[4]