Quantcast
Channel: configuration - Forum - FlexGet
Viewing all articles
Browse latest Browse all 716

How do I call a sub-page from the main page?

$
0
0

@quekky wrote:

There are some sites that have the links deep inside the site. After calling the main page, I get a list of pages that contain the links. How do I crawl the 2nd page?

I tried this but it give me error "unknown url type: '{{url}}'"

tasks:
  feed:
    rss: http://somewordpresssite.com/feed/
    accept_all: yes
    exec: echo "Got wordpress page {{title}} - {{url}}"
    template: crawlpage

templates:
  crawlpage:
    html: "{{url}}"
    regexp:
      accept:
        - sometext
      from: title
    exec:
      - echo "Got link {{title}} - {{url}}"
      - my_own_script.sh "{{url}}"

Another code I tried:

tasks:
  feed:
    html: https://somesite.com/
    accept_all: yes
    exec: echo "Got page {{title}} - {{url}}"
    list_add:
      - entry_list: pages

  crawlpage:
    entry_list: pages
    html: "{{url}}"
    regexp:
      accept:
        - sometext
      from: title
    exec:
      - echo "Got link {{title}} - {{url}}"
      - my_own_script.sh "{{url}}"

There are a few of sites that I want to process that is something similar.
Some sites have 3 or 4 links deep

Posts: 2

Participants: 2

Read full topic


Viewing all articles
Browse latest Browse all 716

Trending Articles