Scraping specific html content

I keep testing your application and I really like to so far :). Now I do have a question!

I have a site that I wish to read:

The HTML is as follows:

<span id="AjaxCon96" class="segment">
<a onclick="DoEdit('OSG016536');" class="blue">
<span>There is some text here</span>
</a>
</span>

I can easily use the JQuery Selector reader on AjaxCon96 to retrieve “There is some text here”.
However, I need to access the onclick value in the HTML.

The HTML Data Attribute brick won’t give the HTML either
The HTML element brick reader access the entire body of the page.
The HTML Reader brick pulls the entire source code of the site.

What are my options to access the specific html part of the page?
If I were to use the entire html in a variable what brick could I use to make transformations so that I end up with “OSG016536”?

Thanks in advance for your help!

Hey @Sufian ,
Thanks for asking this question, a lot of people wonder the same often times:
“How can I grab some specific value from within the HTML of a page”

My question to you:
Is the DoEdit() function always present on the page’s HTML?

We could use RegEx to extract the string inside DoEdit('OSG016536');

Is there a publicly available page that I could consult to provide you with a better answer and run some tests?

Thanks

There is no publicly available website. However, the source code on my page maybe a lot (like 10kb of text). So it’s a big variable :slight_smile:

This specific string within span AjaxCon96 only occurs once altogether.
Yes it is always there as DoEdit('OSG01XXXX). The last four numbers change and are an identifier.

Would that help you to gear me towards the right regex expression?

Absolutely!

Use this value in the RegEx brick if you can - otherwise I am happy to send you a video walkthrough:
DoEdit\('(?<id>.*)'\)

1 Like

Orlando, that worked like a charm! Thank you so much!

That’s great to hear @Sufian - if you have other questions feel free to open a new thread or…ask directly on our community Slack