Using scrapy to scrap an ASP.Net with AJAX requests
I'm fairly new to python and I've been using scrapy to scrap an ASP.Net
with AJAX requests. I tried to follow this tutorial as a sort of guideline
to what I am willing to achieve. The main difference between this page and
the one I'm trying to scrap is the page navigation link. On the tutorial,
we have: Next >> and therefore it tells us to pass the name of the anchor
to __EVENTTARGET. The page I want to scrap do not call the javascript
function __doPostBack explicitly. Instead it does the following:
Since the call is different, I don't know what values should be passed to
__EVENTTARGET and __EVENTARGUMENT to navigate between pages and also, how
to pass them correctly.
Here's my code:
for i in range(10):
html = response.read()
print "Page %d :" % i
br.select_form("aspnetForm")
print br.form
br.set_all_readonly(False)
mnext = re.search("""<input type="button" class="next" name="next"
onclick="return false;">""", html)
if not mnext:
print "button not found, breaking\n\n"
break
br["__EVENTTARGET"] = mnext.group(0) #this was changed. It's probably
wrong...
br["__EVENTARGUMENT"] = ""
#br.find_control("btnSearch").disabled = True #commented
response = br.submit()
No comments:
Post a Comment