Quantcast
Channel: configuration - Forum - FlexGet
Viewing all articles
Browse latest Browse all 716

New User looking for help with RSS and HTML scraping setup

$
0
0

@pgod wrote:

I am looking for advice as to how to configure flexget for the following. My source is an RSS feed that provides links to webpages that list multiple qualities for a particular episode of a series, and links to one or more hosters for each quality. Ideally i would like to accept the best quality for each entry and the hoster (based on some sort of priority list), and pass that to a list for manual approval. Once approved the link should be output to a crawljb for Jdownloader2.

With the above in mind i have written python code to accept the link from the RSS feed and scrape the links from the HTML of the associated webpage. I adapted the code from the HTML plugin, and have a version that funtions like the HTML plugin and one adapted into a urlrewriter plugin. This is where i hit a bit of a snag. When the item appears on the RSS feed it may or may not have all of the hoster/quality urls populated on the webpage. What i would like to do is have a task that pulls in the links from the RSS feed and saves them to multiple lists. Then have a task for each list (running on different schedules) process/reprocess the webpage looking for new links. This is where i am stuck. If I use my urlrewriter plugin is there any way to prevent it from processing the URLs from the RSS feed in the initial scraping task? If i use the HTML plugin equivalent is there a way to feed it urls from a list or another task (or a way to modify the plugin so that it accepts links piped from a source)? Additionally if i am not using urlrewriting is there a way to make the RSS plugin stop complaining that it is finding links to a webpage?

Posts: 1

Participants: 1

Read full topic


Viewing all articles
Browse latest Browse all 716

Trending Articles