I am scraping and would like to extract Shinjuku from the data below.
<td>1</td>,<td class="stationName"><a href="http://www.jreast.co.jp/estation/station/info.aspx?StationCD=866">Shinjuku</a></td>,>355,778>/td>41>>>60>>
So I checked the regular expression I created in the regular expression checker for here.
After confirming that it could be retrieved, I executed the following code:
import re
data='<td>1</td>,<td class="stationName"><a href="http://www.jreast.co.jp/estation/station/info.aspx?StationCD=866">Shinjuku</a>/td>;,<355,778>/td>,>41>>60;
r=re.findall('(?<=(<td class="stationName"><a href=".*">))(.*?)(?=</a>),data)
The following error appears in findall
:
------------------------------------------------------------------------------------------------
error Traceback (most recent call last)
<ipython-input-24-a625cd10ed2c>in<module>()
1# station_name_list=[ ]
---->2r=re.findall('(?<=(<td class="stationName"><a href=".*">))(.*?)(?=</a>')', data2[0])
3
4 for num in r:
5 station_name_list.append(num[1])
4 frames
/usr/lib/python 3.7/sre_compile.py in_compile(code, pattern, flags)
180 lo, hi = av[1].getwidth()
181 iflo!=hi:
-->182 raise error ("look-behind requirements fixed-width pattern")
183 emit(lo)#look behind
184_compile(code, av[1], flags)
error:look-behind requirements fixed-width pattern
I'm not used to regular expressions and need to set a fixed width for postreading, but I didn't know what to do.
Please let me know if you understand.Thank you for your cooperation.
python regular-expression
I didn't have to read it after reading it, so I changed it to the following and it worked.
import re
data='<td>1</td>,<td class="stationName"><a href="http://www.jreast.co.jp/estation/station/info.aspx?StationCD=866">Shinjuku</a>/td>;,<355,778>/td>,>41>>60;
r=re.findall('<td class="stationName"><a href=".*?">(.*?)</a>', page)
# Displayed as Shinjuku
© 2023 OneMinuteCode. All rights reserved.