Python - 웹크롤링

1. 웹드라이버 사용 방법

element = driver.find_element_by_id("q")
=> 검색어를 입력하는 element를 지정
driver.find_element_by_id("q").click()
=> 현재 웹페이지에서 "q"라는 id를 찾아서 클릭해라
element.send_keys("날씨")
=> id가 "q"인 element에 "날씨"를 입력
element.send_keys("\n")
=> 엔터키를 입력

1.1. 웹페이지의 특정 element에 접근하는 방식들

- find_element_by_id('html_id')
- find_element_by_name('html_name')
- find_element_by_xpath('/html/body/some/xpath')
- find_element_by_cs_selector('#css > div.selector')
- find_element_by_class_name('class_name')
- find_element_by_tag_name('h1')
- find_element_by_link_text('텍스트 이름')

2. BeautifulSoup

from bs4 import BeautifulSoup
ex1 = '''
<html>
<head>
  <title> HTML TEST </title>
  </head>
  <body>
  <p align="center"> text1 </p>
<p align="right"> text2 </p>
  <img src="C:\\temp\\test.png">
  </body>
</html> '''

soup = BeautifulSoup(ex1, 'html.parser')
    => html 코드를 beautiful soup 에게 넘겨서 파싱하고 그 결과를 soup 이라는 변수에 저장
soup.find('p')
    => 파싱 결과가 저장된 soup 변수에서 find 함수를 통해 'p'태그를 찾도록 시킴. 첫번째만 검출 됨.
soup.find_all('p')
    => 파싱 결과가 저장된 soup 변수에서 find 함수를 통해 'p'태그를 모두 찾도록 시킴

2.2 select()

soup.select('p')
soup.select('.클래스명')

soup.select('상위태그 > 하위태그 > 하위태그')
=> 부등호 앞뒤로 공백이 반드시 들어가야 한다
soup.select('div > p > span')
soup.select('div > p > span')[0]
soup.select('div > p > span')[1]

soup.select('상위태그.클래스이름 > 하위태그.클래스이름')
soup.select('p.name > span.store')

soup.select('#아이디명')
soup.select('#fruits')

soup.select('#아이디명 > 태그명.클래스명')
soup.select('#fruits > span.store')

soup.select('#태그명[속성]')
soup.select('a[href]')

저작자표시 (새창열림)

'지식 > Python' 카테고리의 다른 글

pytube 로 mp3/mp4 download (0)	2022.05.29
Windows10 에 ffmpeg 설치하기 (0)	2022.05.29
Python - Anaconda 설치 및 Notepad++로 실행하기 (0)	2020.10.05
Python - exe 실행파일 만들기 (0)	2020.10.04
pip install openpyxl 설치 하기(오류 확인) (0)	2019.03.02

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

The EC

Python - 웹크롤링

'지식 > Python' 카테고리의 다른 글

댓글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

Python - 웹크롤링

'지식 > Python' 카테고리의 다른 글

관련글

댓글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역