Web Scrappoing usin Python

This are the parctice scrips used to practice web scapping

bold italics

REGEX Cheatsheet

Character	Example	Definition
*	ab	Matches the previous character 0 or more times
+	a+b+	Matches the previous character 1 or more times
[ ]	[a-z]	Matches any character from a to z
[^ ]]	[a-z]	Does not matches any character from a to z
()	(ab)	A grouped subexpression, this are executed first
`\|`	`(foo\|foot)s`	or Matches one of the other expression
{m,n}	a{2,3}	Matches the preceding character, m to n
.	b.d	Matches any charater
^	^a	Indicates an expression at the begining of the sting
\	^	An escape charater
$	[A-Z]*$	Often at the of the expression it matches the end of the string
?!	^((?![A-Z]).)*$	Does not contain seomthing?? expand
?	(swimming )? pool	makes the previous expression optional
??	(swimming )? pool	lazy
(?=)	A(?=B)	look ahead Matches an A followed by a B: AB, ABC,
(?!)	A(?!B)	look ahead negatice find a expression A where B does not follows
(?<=)	(?<=B)A	look behind Find Expresion A where B preceds it
(?<!)	(?<!B)A	look behind negatice find expression A where expression B does not precced
(?>)	`(?>foo\|foot)s`	atomic groups a groupe which trows away altenative patterns if the first alternative does not match

###BeautifulSoup4

It is a Python libraby used for scrapping websites

It probably might have to be installed. I used pip-3.6 install beautifulsoup4

The beautifulSoup librabry creates a data structure out of the html document, enabiling the user to maniputale HTML tags a data objs. This is very useful if one is looking traverse links.

One can create a beautifulSoup object by passing the the html document and a parser.

soup = BaautifulSoup(html_doc, 'html_parser')

one can see the html page with:

print(soup.prettify())

GoranTopic/Web-Scrapping-with-Python

Web Scrappoing usin Python

This are the parctice scrips used to practice web scapping

REGEX Cheatsheet