Package 'ralger' reference manual

Title:	Easy Web Scraping
Description:	The goal of 'ralger' is to facilitate web scraping in R.
Authors:	Mohamed El Fodil Ihaddaden [aut, cre], Ezekiel Ogundepo [ctb], Romain François [ctb]
Maintainer:	Mohamed El Fodil Ihaddaden <[email protected]>
License:	MIT + file LICENSE
Version:	2.2.4
Built:	2025-03-13 04:47:20 UTC
Source:	https://github.com/feddelegrand7/ralger

Scraping attributes from HTML elements

Description

This function is used to scrape attributes from HTML elements

Usage

attribute_scrap(link, node, attr, askRobot = FALSE)
attribute_scrap(link, node, attr, askRobot = FALSE)

Arguments

`link`	the link of the web page to scrape
`node`	the HTML element to consider
`attr`	the attribute to scrape
`askRobot`	logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.

Value

a character vector.

Examples


# Extracting the web links within the World Bank research and publications page

link <- "https://ropensci.org/"

# scraping the class attributes' names from all the anchor

attribute_scrap(link = link, node = "a", attr = "class")


# Extracting the web links within the World Bank research and publications page

link <- "https://ropensci.org/"

# scraping the class attributes' names from all the anchor

attribute_scrap(link = link, node = "a", attr = "class")

Scrape and download CSV files from a Web Page

Description

Scrape and download CSV files from a Web Page

Usage

csv_scrap(link, path = getwd(), askRobot = FALSE)
csv_scrap(link, path = getwd(), askRobot = FALSE)

Arguments

`link`	the link of the web page
`path`	the path where to save the CSV files. Defaults to the current directory
`askRobot`	logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.

Value

called for the side effect of downloading CSV files from a website

Scrape Images URLS that don't have 'alt' attributes

Description

Scrape Images URLS that don't have 'alt' attributes

Usage

images_noalt_scrap(link, askRobot = FALSE)
images_noalt_scrap(link, askRobot = FALSE)

Arguments

`link`	the URL of the web page
`askRobot`	logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.

Value

a character vector of images' URL without "alt" attribute

Examples



images_noalt_scrap(link = "https://www.r-consortium.org/")



images_noalt_scrap(link = "https://www.r-consortium.org/")

Scrape Images URLs

Description

Scrape Images URLs

Usage

images_preview(link, askRobot = FALSE)
images_preview(link, askRobot = FALSE)

Arguments

`link`	the link of the web page
`askRobot`	logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.

Value

Images URLs

Examples



images_preview(link = "https://posit.co/")



images_preview(link = "https://posit.co/")

Scrape Images from a Web Page

Description

Scrape Images from a Web Page

Usage

images_scrap(link, imgpath = getwd(), extn, askRobot = FALSE)
images_scrap(link, imgpath = getwd(), extn, askRobot = FALSE)

Arguments

`link`	the link of the web page
`imgpath`	the path of the images. Defaults to the current directory
`extn`	the extension of the image: png, jpeg ...
`askRobot`	logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.

Value

called for the side effect of downloading images

Examples

## Not run: 

images_scrap(link = "https://posit.co/", extn = "jpg")


## End(Not run)

## Not run: 

images_scrap(link = "https://posit.co/", extn = "jpg")


## End(Not run)

Website text paragraph scraping

Description

This function is used to scrape text paragraphs from a website.

Usage

paragraphs_scrap(
  link,
  contain = NULL,
  case_sensitive = FALSE,
  collapse = FALSE,
  askRobot = FALSE
)
paragraphs_scrap(
  link,
  contain = NULL,
  case_sensitive = FALSE,
  collapse = FALSE,
  askRobot = FALSE
)

Arguments

`link`	the link of the web page to scrape
`contain`	filter the paragraphs according to the character string provided.
`case_sensitive`	logical. Should the contain argument be case sensitive ? defaults to FALSE
`collapse`	if TRUE the paragraphs will be collapsed into one element and the contain argument ignored.
`askRobot`	logical. Should the function ask the robots.txt if we're allowed or not to scrap the web page ? Default is FALSE.

Value

a character vector.

Examples


# Extracting the paragraphs displayed on the health page of the New York Times

link     <- "https://www.nytimes.com/section/health"

paragraphs_scrap(link)

# Extracting the paragraphs displayed on the health page of the New York Times

link     <- "https://www.nytimes.com/section/health"

paragraphs_scrap(link)

Scrape and download pdf files from a Web Page

Description

Scrape and download pdf files from a Web Page

Usage

pdf_scrap(link, path = getwd(), askRobot = FALSE)
pdf_scrap(link, path = getwd(), askRobot = FALSE)

Arguments

`link`	the link of the web page
`path`	the path where to save the PDF files. Defaults to the current directory
`askRobot`	logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.

Value

called for the side effect of downloading PDF files from a website

Simple website scraping

Description

This function is used to scrape one element from a website.

Usage

scrap(link, node, clean = FALSE, askRobot = FALSE)
scrap(link, node, clean = FALSE, askRobot = FALSE)

Arguments

`link`	the link of the web page to scrape
`node`	the HTML or CSS element to consider, the SelectorGadget tool is highly recommended
`clean`	logical. Should the function clean the extracted vector or not ? Default is FALSE.
`askRobot`	logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.

Value

a character vector

Examples


# Extracting imdb top 250 movie titles
  link <- "https://www.imdb.com/chart/top/"
  node <- "h3.ipc-title__text"
  scrap(link, node)


# Extracting imdb top 250 movie titles
  link <- "https://www.imdb.com/chart/top/"
  node <- "h3.ipc-title__text"
  scrap(link, node)

HTML table scraping

Description

This function is used to scrape an html table from a website.

Usage

table_scrap(link, choose = 1, header = TRUE, askRobot = FALSE)
table_scrap(link, choose = 1, header = TRUE, askRobot = FALSE)

Arguments

`link`	the link of the web page containing the table to scrape
`choose`	an integer indicating which table to scrape
`header`	do you want the first line to be the leader (default to TRUE)
`askRobot`	logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.

Value

a data frame object.

Examples


# Extracting premier ligue 2019/2020 top scorers

link     <- "https://www.topscorersfootball.com/premier-league"
table_scrap(link)



# Extracting premier ligue 2019/2020 top scorers

link     <- "https://www.topscorersfootball.com/premier-league"
table_scrap(link)

Website Tidy scraping

Description

This function is used to scrape a tibble from a website.

Usage

tidy_scrap(link, nodes, colnames, clean = FALSE, askRobot = FALSE)
tidy_scrap(link, nodes, colnames, clean = FALSE, askRobot = FALSE)

Arguments

`link`	the link of the web page to scrape
`nodes`	the vector of HTML or CSS elements to consider, the SelectorGadget tool is highly recommended.
`colnames`	the names of the expected columns.
`clean`	logical. Should the function clean the extracted tibble or not ? Default is FALSE.
`askRobot`	logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.

Value

a tidy data frame.

Examples


# Extracting imdb movie titles and rating
link     <- "https://www.imdb.com/chart/top/"
my_nodes <- c("a > h3.ipc-title__text", "span.ratingGroup--imdb-rating")
names    <- c("title", "rating")
tidy_scrap(link, my_nodes, names)


# Extracting imdb movie titles and rating
link     <- "https://www.imdb.com/chart/top/"
my_nodes <- c("a > h3.ipc-title__text", "span.ratingGroup--imdb-rating")
names    <- c("title", "rating")
tidy_scrap(link, my_nodes, names)

Website title scraping

Description

This function is used to scrape titles (h1, h2 & h3 html tags) from a website. Useful for scraping daily electronic newspapers' titles.

Usage

titles_scrap(link, contain = NULL, case_sensitive = FALSE, askRobot = FALSE)
titles_scrap(link, contain = NULL, case_sensitive = FALSE, askRobot = FALSE)

Arguments

`link`	the link of the web page to scrape
`contain`	filter the titles according to a character string provided.
`case_sensitive`	logical. Should the contain argument be case sensitive ? defaults to FALSE
`askRobot`	logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE

Value

a character vector

Examples


# Extracting the current titles of the New York Times

link     <- "https://www.nytimes.com/"

titles_scrap(link)

# Extracting the current titles of the New York Times

link     <- "https://www.nytimes.com/"

titles_scrap(link)

Website web links scraping

Description

This function is used to scrape web links from a website.

Usage

weblink_scrap(link, contain = NULL, case_sensitive = FALSE, askRobot = FALSE)
weblink_scrap(link, contain = NULL, case_sensitive = FALSE, askRobot = FALSE)

Arguments

`link`	the link of the web page to scrape
`contain`	filter the web links according to the character string provided.
`case_sensitive`	logical. Should the contain argument be case sensitive ? defaults to FALSE
`askRobot`	logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.

Value

a character vector.

Examples


# Extracting the web links within the World Bank research and publications page

link <- "https://www.worldbank.org/en/research"

weblink_scrap(link)


# Extracting the web links within the World Bank research and publications page

link <- "https://www.worldbank.org/en/research"

weblink_scrap(link)

Scrape and download Excel xls files from a Web Page

Description

Scrape and download Excel xls files from a Web Page

Usage

xls_scrap(link, path = getwd(), askRobot = FALSE)
xls_scrap(link, path = getwd(), askRobot = FALSE)

Arguments

`link`	the link of the web page
`path`	the path where to save the Excel xls files. Defaults to the current directory
`askRobot`	logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.

Value

called for the side effect of downloading Excel xls files from a website

Scrape and download Excel xlsx files from a Web Page

Description

Scrape and download Excel xlsx files from a Web Page

Usage

xlsx_scrap(link, path = getwd(), askRobot = FALSE)
xlsx_scrap(link, path = getwd(), askRobot = FALSE)

Arguments

`link`	the link of the web page
`path`	the path where to save the Excel xlsx files. Defaults to the current directory
`askRobot`	logical. Should the function ask the robots.txt if we're allowed or not to scrape the web page ? Default is FALSE.

Value

called for the side effect of downloading Excel xlsx files from a website

Examples

## Not run: 

excel_scrap(
link = "https://www.rieter.com/investor-relations/results-and-presentations/financial-statements"
)


## End(Not run)
## Not run: 

excel_scrap(
link = "https://www.rieter.com/investor-relations/results-and-presentations/financial-statements"
)


## End(Not run)

Package 'ralger'

Help Index

Scraping attributes from HTML elements

Description

Usage

Arguments

Value

Examples

Scrape and download CSV files from a Web Page

Description

Usage

Arguments

Value

Scrape Images URLS that don't have 'alt' attributes

Description

Usage

Arguments

Value

Examples

Scrape Images URLs

Description

Usage

Arguments

Value

Examples

Scrape Images from a Web Page

Description

Usage

Arguments

Value

Examples

Website text paragraph scraping

Description

Usage

Arguments

Value

Examples

Scrape and download pdf files from a Web Page

Description

Usage

Arguments

Value

Simple website scraping

Description

Usage

Arguments

Value

Examples

HTML table scraping

Description

Usage

Arguments

Value

Examples

Website Tidy scraping

Description

Usage

Arguments

Value

Examples

Website title scraping

Description

Usage

Arguments

Value

Examples

Website web links scraping

Description

Usage

Arguments

Value

Examples

Scrape and download Excel xls files from a Web Page

Description

Usage

Arguments

Value

Scrape and download Excel xlsx files from a Web Page

Description

Usage