Tuesday, 27 August 2013

Using scrapy to scrap an ASP.Net with AJAX requests

Using scrapy to scrap an ASP.Net with AJAX requests

I'm fairly new to python and I've been using scrapy to scrap an ASP.Net
with AJAX requests. I tried to follow this tutorial as a sort of guideline
to what I am willing to achieve. The main difference between this page and
the one I'm trying to scrap is the page navigation link. On the tutorial,
we have: Next >> and therefore it tells us to pass the name of the anchor
to __EVENTTARGET. The page I want to scrap do not call the javascript
function __doPostBack explicitly. Instead it does the following:
Since the call is different, I don't know what values should be passed to
__EVENTTARGET and __EVENTARGUMENT to navigate between pages and also, how
to pass them correctly.
Here's my code:
for i in range(10):
html = response.read()
print "Page %d :" % i
br.select_form("aspnetForm")
print br.form
br.set_all_readonly(False)
mnext = re.search("""<input type="button" class="next" name="next"
onclick="return false;">""", html)
if not mnext:
print "button not found, breaking\n\n"
break
br["__EVENTTARGET"] = mnext.group(0) #this was changed. It's probably
wrong...
br["__EVENTARGUMENT"] = ""
#br.find_control("btnSearch").disabled = True #commented
response = br.submit()

No comments:

Post a Comment