HackerRank Build a Stack Exchange Scraper Solution

Hello Programmers, In this post, you will learn how to solve HackerRank Build a Stack Exchange Scraper Solution. This problem is a part of the Regex HackerRank Series.

One more thing to add, don’t straight away look for the solutions, first try to solve the problems by yourself. If you find any difficulty after trying several times, then look for the solutions. We are going to solve the  Regex HackerRank Solutions using  CPP, JAVA, PYTHON, JavaScript & PHP Programming Languages.

You can practice and submit all HackerRank problem solutions in one place. Find a solution for other domains and Sub-domain. I.e. Hacker Rank solution for HackerRank C ProgrammingHackerRank C++ ProgrammingHackerRank Java Programming, HackerRank Python ProgrammingHackerRank Linux ShellHackerRank SQL Programming, and HackerRank 10 days of Javascript.

HackerRank Build a Stack Exchange Scraper Solution
HackerRank Build a Stack Exchange Scraper Solution

As you already know that this site does not contain only the Hacker Rank solutions here, you can also find the solution for other problems. I.e. Web Technology, Data StructuresRDBMS ProgramsJava Programs Solutions,  Fiverr Skills Test answersGoogle Course AnswersLinkedin Assessment, and Coursera Quiz Answers.

HackerRank Build a Stack Exchange Scraper Solution

Problem

Stack Exchange is an information power-house, which contains libraries of crowdsourced problems (with answers) across a large number of topics which are as diverse as electronics, cooking , programming, etc.

We are greatly interested in crawling and scraping as many questions, as we can, from stack-exchange. This is an example of a question library page from stackexchange.

Your task will be, to scrape the questions from each library page, in the order in which they are listed. You will be provided with the markup of question listing pages, from which you need to detect:
(1) Identifier (2) Question text (which is on the Hyperlink to the question) (3) How long ago the question was asked.

The Markup in the Test Cases will be similar to the sample fragment shown below. Please note, that since this markup is real markup from the website, it is likely to contain some stray control and escape characters, unexpected whitespaces and newlines.

Sample Markup Fragment

        <div class="question-summary" id="question-summary-80407">
        <div class="statscontainer">
            <div class="statsarrow"></div>
            <div class="stats">
                <div class="vote">
                    <div class="votes">
                        <span class="vote-count-post "><strong>2</strong></span>
                        <div class="viewcount">votes</div>
                    </div>
                </div>
                <div class="status answered">
                    <strong>1</strong>answer
                </div>
            </div>



    <div class="views " title="60 views">
                        60 views
    </div>
        </div>
        <div class="summary">
            <h3><a href="/questions/80407/about-power-supply-of-opertional-amplifier" class="question-hyperlink">about power supply of opertional amplifier</a></h3>
            <div class="excerpt">
                I am constructing an operational amplifier as shown in the following figure. I use a batter as supplier for the OP Amp and set it up as a non-inverting amp circuit. I saw that the output was clipped ...
            </div>

            <div class="tags t-op-amp">
                <a href="/questions/tagged/op-amp" class="post-tag" title="show questions tagged 'op-amp'" rel="tag">op-amp</a>

            </div>
            <div class="started fr">


        <div class="user-info ">
            <div class="user-action-time">


                        asked <span title="2013-08-27 21:49:14Z" class="relativetime">11 hours ago</span>
            </div>
            <div class="user-gravatar32">
                <a href="/users/17060/user1285419"><div class=""><img src="https://www.gravatar.com/avatar/08ee68b20a4eceff26f7eee99b708c08?s=32&d=identicon&r=PG" alt="" width="32" height="32"></div></a>
            </div>
            <div class="user-details">
                <a href="/users/17060/user1285419">user1285419</a><br>
                <span class="reputation-score" title="reputation score" dir="ltr">165</span><span title="5 bronze badges"><span class="badge3"></span><span class="badgecount">5</span></span>
            </div>
        </div>

            </div>
        </div>
    </div>

    <div class="question-summary" id="question-summary-80405">
        <div class="statscontainer">
            <div class="statsarrow"></div>
            <div class="stats">
                <div class="vote">
                    <div class="votes">
                        <span class="vote-count-post "><strong>4</strong></span>
                        <div class="viewcount">votes</div>
                    </div>
                </div>
                <div class="status answered-accepted">
                    <strong>2</strong>answers
                </div>
            </div>



    <div class="views " title="64 views">
                        64 views
    </div>
        </div>
        <div class="summary">
            <h3><a href="/questions/80405/5v-regulator-power-dissipation" class="question-hyperlink">5V Regulator Power Dissipation</a></h3>
            <div class="excerpt">
                I am using a 5V regulator (LP2950) from ON Semiconductor. I am using this for USB power and I'm feeding in 9V from an adapter. USB requires maximum of 500mA right? So the maximum power dissipation in ...
            </div>

            <div class="tags t-voltage-regulator t-surface-mount t-heatsink t-5v t-power-dissipation">
                <a href="/questions/tagged/voltage-regulator" class="post-tag" title="show questions tagged 'voltage-regulator'" rel="tag">voltage-regulator</a> <a href="/questions/tagged/surface-mount" class="post-tag" title="show questions tagged 'surface-mount'" rel="tag">surface-mount</a> <a href="/questions/tagged/heatsink" class="post-tag" title="show questions tagged 'heatsink'" rel="tag">heatsink</a> <a href="/questions/tagged/5v" class="post-tag" title="show questions tagged '5v'" rel="tag">5v</a> <a href="/questions/tagged/power-dissipation" class="post-tag" title="show questions tagged 'power-dissipation'" rel="tag">power-dissipation</a>

            </div>
            <div class="started fr">


        <div class="user-info ">
            <div class="user-action-time">


                        asked <span title="2013-08-27 21:39:31Z" class="relativetime">11 hours ago</span>
            </div>
            <div class="user-gravatar32">
                <a href="/users/10082/david-norman"><div class=""><img src="https://www.gravatar.com/avatar/8b073417e471077280b3fc5ff2eaf1f7?s=32&d=identicon&r=PG" alt="" width="32" height="32"></div></a>
            </div>
            <div class="user-details">
                <a href="/users/10082/david-norman">David Norman</a><br>
                <span class="reputation-score" title="reputation score" dir="ltr">322</span><span title="3 silver badges"><span class="badge2"></span><span class="badgecount">3</span></span><span title="10 bronze badges"><span class="badge3"></span><span class="badgecount">10</span></span>
            </div>
        </div>

            </div>
        </div>
    </div>

Output Format
The output file should contain N lines, where N is the number of questions you have identified in the provided fragment.Each line contains the identifier, question text and (relative) time when the question was asked (with no leading or trailing spaces surrounding each section); separated by semi-colons. The information about the questions in the output file should match with the ordering in the original markup.

Sample Output

80407;about power supply of operational amplifier;11 hours ago
80405;5V Regulator Power Dissipation;11 hours ago

Explanation
The given markup fragment points to two questions on electronics.stackexchange.com (at the time the markup was noted).
The first question has ID 80407, it is “about power supply of operational amplifier” and it was asked “11 hours ago” (relative to the time when this markup was noted). Search for these values in the given markup fragment to gain a better understanding of where we identified these values from. The second question has ID 80405, it is about “5V Regulator Power Dissipation”, and it was asked “11 hours ago” (relative to the time when this markup was noted).

A Note Regarding the Test Cases
The markup in the test cases will resemble the markup fragment provided above, however, each markup fragment might contain a larger number of questions embedded in it. A markup fragment will have no more than 100 questions embedded in it.

HackerRank Build a Stack Exchange Scraper Solution in Cpp

#include <stdio.h>
#include <string.h>
const char * p1 = "question-summary-";
const char * p2 = "question-hyperlink";
const char * p3 = "relativetime";

void setPalavra(char * in, char * out, int k, int size);
bool letra(char a);

int main() {
	char ent[1010], aux[1010];
	int size;
	char saida[3][1010];
	
	while( gets(ent) != NULL ) {
		size = strlen(ent);
		ent[size++] = '.';
		
		for(int i=0; i<size; i++) if(letra(ent[i])) {
			setPalavra(ent, aux, i, size);
			
			if(!strcmp(aux, p1)) {
				int a = 0;
				for(int j=i+17; ent[j] != '\"'; j++) {
					saida[0][a++] = ent[j];
				}
				saida[0][a] = 0;
			} else if(!strcmp(aux, p2)) {
				int a = 0;
				for(int j=i+20; ent[j] != '<'; j++) {
					saida[1][a++] = ent[j];
				}
				saida[1][a] = 0;
			} else if(!strcmp(aux, p3)) {
				int a = 0;
				for(int j=i+14; ent[j] != '<'; j++) {
					saida[2][a++] = ent[j];
				}
				saida[2][a] = 0;
				
				printf("%s;%s;%s\n", saida[0], saida[1], saida[2]);
			}
			
			i += strlen(aux)-1;
		}
	}
}

void setPalavra(char * in, char * out, int k, int size) {
	int a = 0;
	
	for(int i=k; i<size; i++) {
		if(letra(in[i])) {
			out[a++] = in[i];
		} else {
			out[a] = 0;
			return;
		}
	}
}

bool letra(char a) {
	if((a >= 'a' && a <= 'z') || (a >= 'A' && a <= 'Z') || a == '-')
		return true;
	else
		return false;
}

HackerRank Build a Stack Exchange Scraper Solution in Java

import java.io.*;
import java.util.*;
import java.text.*;
import java.math.*;
import java.util.regex.*;

public class Solution {

    public static void main(String[] args) {
        /* Enter your code here. Read input from STDIN. Print output to STDOUT. Your class should be named Solution. */
		Scanner in = new Scanner(new BufferedInputStream(System.in));
		String format1 = "<a href=\"/questions/([0-9]+).*>(.*)</a>";
		String format2 = ".*class=\"relativetime\">(.*)</span>";
		Pattern pattern1 = Pattern.compile(format1);
		Pattern pattern2 = Pattern.compile(format2);
		ArrayList<String>ID = new ArrayList<String>();
		ArrayList<String>question = new ArrayList<String>();
        ArrayList<String>time = new ArrayList<String>();
		while(in.hasNext()){
			String assessed = in.nextLine();
			Matcher match = pattern1.matcher(assessed);
			Matcher match2 = pattern2.matcher(assessed);
			while(match.find()){
				match.groupCount();
				ID.add(match.group(1));
				question.add(match.group(2));
            }
            while(match2.find()){
                match2.groupCount();
                time.add(match2.group(1));
            }
		}
		for(int j = 0;j<ID.size();j++){
			System.out.println(ID.get(j) + ";"+question.get(j)+";" + time.get(j));
		}
    }
}

HackerRank Build a Stack Exchange Scraper Solution in Python

import sys
import re
s=sys.stdin.read()
pQ=[]
pI=[]
pT=[]
patternQuestion='<a.* class="question-hyperlink">.*</a>'
for x in re.findall(patternQuestion,s):
	pQ.append(re.sub("<[^>]*>","",x))
patternId='[^<]*id="question-summary-[0-9]*';
for x in re.findall(patternId,s):
	x=re.sub('div class="question-summary" id="question-summary-',"",x)
	pI.append(x)

patternTime='<.*relativetime.*';
for x in re.findall(patternTime,s):
	x=re.sub('<[^>]*>',"",x)
	pT.append(x)

for x in xrange(len(pT)):
	print pI[x]+";"+pQ[x]+";"+pT[x]

HackerRank Build a Stack Exchange Scraper Solution in JavaScript

'use strict';


function processData(input) {
    var lines = input.split('\n').join(' ');

    var questionREStr = '<\\s*a[^>]+href="/questions/([0-9]+)/[^"]*"[^>]*>([^<]*)<';
    var timeREStr     = '<\\s*span[^>]+class="relativetime"[^>]*>([^<]*)<';

    var re = new RegExp('(?:' + questionREStr + '|' + timeREStr + ')', 'ig');

    var res = [];
    var arr = null;
    while ((arr = re.exec(lines)) != null) {
        if (arr[1] !== undefined && arr[2] !== undefined) {
            res.push({id: arr[1], text: arr[2].trim() });
        }
        if (arr[3] !== undefined) {
            res[res.length - 1].time = arr[3].trim();
        }
    }

    res.forEach(function (o) {
        console.log(o.id + ';' + o.text + ';' + o.time);
    });
}


process.stdin.resume();
process.stdin.setEncoding("ascii");
var _input = "";
process.stdin.on("data", function (input) { _input += input; });
process.stdin.on("end", function () { processData(_input); });

HackerRank Build a Stack Exchange Scraper Solution in PHP

<?php
	$f = fopen( 'php://stdin', 'r' );
	$markup = "";
    while( $line = fgets( $f ) ) $markup .= $line;
	fclose( $f );
	
	$matches = array();
    $regEx = '/class="question-summary" id="question-summary-([0-9]*)">.*class="question-hyperlink">(.*)<\/a>.*class="relativetime">(.*)<\/span>/siU';
	preg_match_all( $regEx, $markup, $matches );
	foreach( $matches[ 1 ] as $key => $id ) print $id . ";" . $matches[ 2 ][ $key ] . ";" . $matches[ 3 ][ $key ] . "\n";
?>

Disclaimer: This problem (Build a Stack Exchange Scraper) is generated by HackerRank but the solution is provided by Chase2learn. This tutorial is only for Educational and Learning purposes.

FAQ:

1. How do you solve the first question in HackerRank?

If you want to solve the first question of Hackerrank then you have to decide which programing language you want to practice i.e C programming, Cpp Programing, or Java programming then you have to start with the first program HELLO WORLD.

2. How do I find my HackerRank ID?

You will receive an email from HackerRank to confirm your access to the ID. Once you have confirmed your email, the entry will show up as verified on the settings page. You will also have an option to “Make primary”. Click on that option. Read more

3. Does HackerRank detect cheating?

yes, HackerRank uses a powerful tool to detect plagiarism in the candidates’ submitted code. The Test report of a candidate highlights any plagiarized portions in the submitted code and helps evaluators to verify the integrity of answers provided in the Test.

4. Does HackerRank use camera?

No for coding practice Hackerrank does not use camera but for companies’ interviews code submission time Hackerrank uses the camera.

5. Should I put HackerRank certificate on resume?

These certificates are useless, and you should not put them on your resume. The experience you gained from getting them is not useless. Use it to build a portfolio, and link to it on your resume. 

6. Can I retake HackerRank test?

The company which sent you the HackerRank Test invite owns your Test submissions and results. It’s their discretion to permit a reattempt for a particular Test. If you wish to retake the test, we recommend that you contact the concerned recruiter who invited you to the Test and request a re-invite. 

7. What is HackerRank?

HackerRank is a tech company that focuses on competitive programming challenges for both consumers and businesses. Developers compete by writing programs according to provided specifications. Wikipedia


Finally, we are now, in the end, I just want to conclude some important message for you

Note:- I compile all programs, if there is any case program is not working and showing an error please let me know in the comment section. If you are using adblocker, please disable adblocker because some functions of the site may not work correctly.

Please share our posts on social media platforms and also suggest to your friends to Join Our Groups. Don’t forget to subscribe. 

Sharing Is Caring

Leave a Comment