Solved: Identify the key properties of a web crawler describe in, JAVA Programming

Identify the key properties of a web crawler describe in

Use Crawler Java Assignment

Review, fix and run the crawler.

Add code for additional requiments.

Make sure you crawler does the following.

Test your crawler only on the data in:

https://lyle.smu.edu/~fmoore

Make sure that your crawler is not allowed to get out of this directory!!! Yes, there is a robots.txt file that must be used. Note that it is in a non-standard location.

The required input to your program is N, the limit on the number of pages to retrieve and a list of stop words (of your choosing) to exclude.

Perform case insensitive matching.

You can assume that there are no errors in the input. Your code should be robust under errors in the Web pages you're searching. If an error is encountered, feel free, if necessary, just to skip the page where it is encountered.

1. Identify the key properties of a web crawler. Describe in detail how each of these properties is implemented in your code.

2. Use your crawler to list the URL of all pages in the test data and report all out-going links of the test data. [10 points] display the contents of the tag</p> <p style="text-align: justify;">3. Implement duplicate detection, and report if any URLs refer to already seen content.</p> <p style="text-align: justify;">4. Use your crawler to list all broken links within the test data.</p> <p style="text-align: justify;">5. How many graphic files are included in the test data?</p> <p style="text-align: justify;">6. Have your crawler save the words from each page of type (.txt, .htm, .html). Make sure that you do not save HTML markup. Explain your definition of "word". In this process, give each page a unique document ID.</p> <p style="text-align: justify;">Implement Stemming</p> <p style="text-align: justify;">7. Report the 20 most common words with its document frequency. words or stemmed words?</p> <p><strong>Attachment:-</strong> <a href="https://secure.tutorsglobe.com/Atten_files/409_crawler_project.zip" target="_blank">crawler_project.zip</a></p></p> </div> <div id="viewreadmore" class="link"> <a id="readmore" href="javascript:void(0);" class="read-more-trigger mar_top10" onclick="changeheight(this)">View Complete Question</a> </div> <div id="DivSolution"> <h4> Solution Preview : </h4> <div class="seprator"> </div> <p> </p> <div class="downloadfiles"> <h5> Prepared by a verified Expert</h5> <h6> JAVA Programming: Identify the key properties of a web crawler describe in</h6> <h5> Reference No:- TGS02238162</h5> <input type="submit" name="getPaid" value="Purchase Solution File" id="getPaid" class="btn btn-success btn-lg btn-block-sm mar_btm20" /> <p> Now Priced at $70 (50% Discount)</p> </div> <div style="text-align: justify"></div> </div> </div> <div class="row"> <div class="col-sm-12 reviewbox"> <div id="PlnRated"> <div class="row recomded"> <div class="recomdedbox col-sm-2 col-xs-12"> <p class="inner"><i class="fa fa-thumbs-o-up"></i> Recommended <b>(90%)</b></p> </div> <div class="recomdedbox col-sm-2 col-xs-12"> <p class="inner rating"><i class="fa fa-star"></i> Rated <b>(4.3/5)</b></p> </div> </div> </div> <div class="row "> <div class="panel-group review" id="accordion" role="tablist" aria-multiselectable="true"> <div class="panel-heading" role="tab" id="headingTwo"> <h4 class="panel-title"> <a class="collapsed" role="button" data-toggle="collapse" data-parent="#accordion" href="#collapseTwo" aria-expanded="false" aria-controls="collapseTwo"> Have a Question? (oR Write a Review) </a> </h4> </div> <div id="collapseTwo" class="panel-collapse collapse" role="tabpanel" aria-labelledby="headingTwo"> <div class="panel-body"> <div class="col-sm-12"> <div class="row search searchbg message"> <span id="RequiredFieldValidator1" style="visibility:hidden;">Write atleast 100 words!!</span> <textarea name="txtcomments" id="txtcomments" maxlength="1000" ValidationGroup="Review" placeholder="Write your review" class="form-control" rows="6"></textarea> <div class="pull-right mar_top20"> <input type="submit" name="btnReviewSubmit" value="Submit" onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("btnReviewSubmit", "", true, "Review", "", false, false))" id="btnReviewSubmit" class="btn btn-primary pull-right" /> </div> </div> </div> </div> </div> </div> </div> </div> </div> <div class="user-comments-area hidden-xs"> <h4 class="text-uppercase mar_btm20"> <i class="fa fa-question-circle"></i> Recent Questions Asked JAVA Programming</h4> <ul class="user-comments-list"> <table id="dlMaterials" cellspacing="0" style="width:100%;border-collapse:collapse;"> <tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_0" class="studenthdname" href="https://www.tutorsglobe.com/question/according-to-a-recent-study93--of-high-school-dropouts-are-52238158.aspx">According to a recent study93 of high school dropouts are</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_0">according to a recent study93 of high school dropouts are 16- to 17-year-olds in addition65 of high school dropouts</span></p> </div>  </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_1" class="studenthdname" href="https://www.tutorsglobe.com/question/why-would-it-be-important-to-occasionally-check-your-52238159.aspx">Why would it be important to occasionally check your</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_1">assignment web scenerio foruminstructions discuss the following below1 the role of css in htmla advantages of style</span></p> </div>  </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_2" class="studenthdname" href="https://www.tutorsglobe.com/question/design-a-database-diagram-for-a-database-that-stores-52238160.aspx">Design a database diagram for a database that stores</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_2">sql server 2012 assingment1 design a database diagram for a database that stores information about the downloads</span></p> </div>  </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_3" class="studenthdname" href="https://www.tutorsglobe.com/question/write-an-essay-on-the-effects-of-internet-usage-or-lack-52238161.aspx">Write an essay on the effects of internet usage or lack</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_3">write an essay on the effects of internet usage or lack thereof on your daily life following the steps diane wood took</span></p> </div>  </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_4" class="studenthdname" href="https://www.tutorsglobe.com/question/identify-the-key-properties-of-a-web-crawler-describe-in-52238162.aspx">Identify the key properties of a web crawler describe in</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_4">use crawler java assignmentreview fix and run the crawleradd code for additional requimentsmake sure you crawler does</span></p> </div>  </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_5" class="studenthdname" href="https://www.tutorsglobe.com/question/we-toss-an-unfair-coin-100-times-in-a-row-we-play-according-52238163.aspx">We toss an unfair coin 100 times in a row we play according</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_5">we toss an unfair coin 100 times in a row we play according to following rules if tail 1 if head -145 p head04 estimate</span></p> </div>  </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_6" class="studenthdname" href="https://www.tutorsglobe.com/question/based-on-the-answer-from-question-9-calculate-90-confidence-52238164.aspx">Based on the answer from question 9 calculate 90 confidence</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_6">hollow is proud of their energy saving program a sample of 29 houses reveals an average saving of 475 kilowatt hours</span></p> </div>  </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_7" class="studenthdname" href="https://www.tutorsglobe.com/question/psyc-164--please-watch-the-following-ted-talk-there-is-some-52238165.aspx">Psyc 164 please watch the following ted talk there is some</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_7">assignmentplease watch the following ted talk there is some overlap with my module - wish id known that before i</span></p> </div>  </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_8" class="studenthdname" href="https://www.tutorsglobe.com/question/you-are-skeptical-of-the-business-school-claim-and-decide-52238166.aspx">You are skeptical of the business school claim and decide</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_8">a local business school claims that its graduating seniors get higher-paying jobs than the national average for</span></p> </div>  </li> </td> </tr> </table> </ul>  </div> </div> <div class="col-md-4 col-xs-12 login-area innerpage"> <div class="row"> <div class="details col-md-12"> <div class="col-md-4"> <div class="circle orange"> <i class="fa fa-question"></i> </div> <p> 1939725 </p> <p> Questions<br /> Asked</p> </div> <div class="col-md-4"> <div class="circle yellow"> <i class="fa fa-user-secret"></i> </div> <p> 3,689</p> <p> Active Tutors</p> </div> <div class="col-md-4"> <div class="circle green"> <i class="fa fa-thumbs-o-up"></i> </div> <p> 1429257</p> <p> Questions<br /> Answered</p> </div> <p><b> Start Excelling in your courses, Ask a tutor for help and get answers for your problems !! </b></p> <a href="https://www.tutorsglobe.com/post-your-job-for-free.aspx" class="btn btn-primary btn-lg mar_top10">ask Question</a> </div> </div> <div class="row"> <div class="user-comments-area hidden-xs"> <hr /> <h4 class="text-uppercase mar_btm20"> <i class="fa fa-question-circle"></i> Asked Questions</h4> <hr /> <ul class="user-comments-list"> <table id="dlNewReviews" cellspacing="0" style="width:100%;border-collapse:collapse;"> <tr> <td> <li> <div class="comment-box"> <h5> <a id="dlNewReviews_hyperQues_0" class="studenthdname" href="https://www.tutorsglobe.com/question/identify-the-addiction-that-client-presents-with-53493203.aspx">Identify the addiction that client presents with</a></h5> <p> <span id="dlNewReviews_lblReviews_0">Identify the client (gender, age, race, profession). Identify the addiction that this client presents with. Does she have a substance addiction like alcohol, </span></p> </div>  </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <a id="dlNewReviews_hyperQues_1" class="studenthdname" href="https://www.tutorsglobe.com/question/what-are-the-perspectives-and-outcomes-of-older-adults-53493204.aspx">What are the perspectives and outcomes of older adults</a></h5> <p> <span id="dlNewReviews_lblReviews_1"> The research question: "What are the perspectives and outcomes of older adults participating in a dog walking program?"</span></p> </div>  </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <a id="dlNewReviews_hyperQues_2" class="studenthdname" href="https://www.tutorsglobe.com/question/what-laboratory-study-on-aggression-tightly-controls-53493205.aspx">What laboratory study on aggression tightly controls</a></h5> <p> <span id="dlNewReviews_lblReviews_2"> A laboratory study on aggression tightly controls all variables but uses only college males. Which conclusion is MOST justified? </span></p> </div>  </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <a id="dlNewReviews_hyperQues_3" class="studenthdname" href="https://www.tutorsglobe.com/question/what-adjusted-self-concept-reflects-the-basic-concepts-of-53493206.aspx">What adjusted self-concept reflects the basic concepts of</a></h5> <p> <span id="dlNewReviews_lblReviews_3"> Aaruna believes that all fans of the Toronto Maple Leafs are strong, tough people who are loyal in any circumstance. Since he has become a fan</span></p> </div>  </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <a id="dlNewReviews_hyperQues_4" class="studenthdname" href="https://www.tutorsglobe.com/question/write-a-short-rationale-for-persistent-depressive-disorder-53493207.aspx">Write a short rationale for persistent depressive disorder</a></h5> <p> <span id="dlNewReviews_lblReviews_4">Write a short rationale for Persistent depressive disorder as a differential diagnoses with rationale for L.P. and also why MDD is a better diagnoses.</span></p> </div>  </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <a id="dlNewReviews_hyperQues_5" class="studenthdname" href="https://www.tutorsglobe.com/question/prioritizing-enhanced-screening-to-the-social-determinants-53493208.aspx">Prioritizing enhanced screening to the social determinants</a></h5> <p> <span id="dlNewReviews_lblReviews_5">CONDENSE: Prioritizing enhanced screening and response to the social determinants of mental health reflects their substantial impact on vulnerable populations</span></p> </div>  </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <a id="dlNewReviews_hyperQues_6" class="studenthdname" href="https://www.tutorsglobe.com/question/problem-about-inequitable-social-conditions-53493209.aspx">Problem about inequitable social conditions</a></h5> <p> <span id="dlNewReviews_lblReviews_6"> Condense more: Extensive evidence indicates that inequitable social conditions significantly increase the risk of mental health disorders, particularly among</span></p> </div>  </li> </td> </tr> </table> </ul> </div> </div> </div> </div> </div> </div> </div> <script> var url = 'https://www.tutorsglobe.com/include/javascript/watiWidget.js'; var s = document.createElement('script'); s.type = 'text/javascript'; s.async = true; s.src = url; var options = { "enabled":true, "chatButtonSetting":{ "backgroundColor":"#00e785", "ctaText":"Whatsapp Support!!", "borderRadius":"25", "marginLeft": "0", "marginRight": "20", "marginBottom": "20", "ctaIconWATI":false, "position":"left" }, "brandSetting":{ "brandName":"Tutorsglobe", "brandSubTitle":"Trusted Since 2005", "brandImg":"https://www.tutorsglobe.com/include/images/chat-logo.svg", "welcomeText":"Hi there!\nDo you Need help?", "messageText":"Hello, Tutorsglobe !! I have a question!", "backgroundColor":"#00e785", "ctaText":"Chat with Whatsapp", "borderRadius":"25", "autoShow":false, "phoneNumber":"441416286080" } }; s.onload = function() { CreateWhatsappChatWidget(options); }; var x = document.getElementsByTagName('script')[0]; x.parentNode.insertBefore(s, x); </script> <footer class="site-footer"> <div class="container"> <div class="footerlinks"> <a href="https://www.tutorsglobe.com/">Home</a> | <a href="https://www.tutorsglobe.com/about-us.aspx">Company Overview</a> | <a href="https://www.tutorsglobe.com/services.aspx">Services</a> | <a href="https://www.tutorsglobe.com/library/">Discover Q&A</a> | <a href="https://www.tutorsglobe.com/sitemap.aspx">Sitemap</a> | <a href="https://www.tutorsglobe.com/contact-us.aspx">Contact Us</a> | <a href="https://www.tutorsglobe.com/terms-and-conditions.aspx">T & C</a> | <a href="https://www.tutorsglobe.com/refundcancelpolicy.aspx">Refund Policy</a> | <a href="https://www.tutorsglobe.com/copyright-infringement-policy.aspx">Copyright Policy</a> | <a href="https://www.tutorsglobe.com/blog/archive/">Blog</a> | <a href="https://www.tutorsglobe.com/library/archive.aspx">Q&A</a> | <a href="https://www.tutorsglobe.com/education-directory.aspx">Directory</a> </div> <p>Â©TutorsGlobe</a> All rights reserved 2022-2023. </p> <script type="application/ld+json"> { "@context": "http://schema.org/", "@type": "product", "name": "Tutorsglobe", "image": "https://www.tutorsglobe.com/IncludeLib/Images/logo.png", "description": "elearning Platform - Tutor Service", "brand": { "@type": "elearning", "name": "Tutorsglobe" }, "aggregateRating": { "@type": "AggregateRating", "ratingValue": "4.9", "ratingCount": "37128" } } </script> <a href="#" class="settings"><i class="fa fa-angle-up"></i></a> <ul class="social-icons"> <li><a href="https://www.facebook.com/TutorsGlobe" rel="nofollow" target="_blank"><i class="fa fa-facebook-square"></i></a></li> <li><a href="https://twitter.com/Tutorsglobe" rel="nofollow" target="_blank"><i class="fa fa-twitter-square"></i></a></li> <li><a href="#" rel="nofollow"><i class="fa fa-youtube-square"></i></a></li> <li><a href="https://www.linkedin.com/company/tutorsglobe" target="_blank" rel="nofollow"><i class="fa fa-linkedin-square"></i></a></li> </ul> </div> <script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-32333066-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.tutorsglobe.com/IncludeLib/js/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script> <script async src="https://www.googletagmanager.com/gtag/js?id=G-5E9QFMFDJR"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-5E9QFMFDJR'); </script> </footer> </div>  <div class="overlay"> </div>  <script type="text/javascript" src="../IncludeLib/js/jquery-1.11.2.min.js"></script> <script type="text/javascript" src="../IncludeLib/js/bootstrap.min.js"></script> <script type="text/javascript" src="../IncludeLib/js/jquery.mCustomScrollbar.concat.min.js"></script> <script type="text/javascript" src="../IncludeLib/js/script.js"></script> <script type="text/javascript" src="../IncludeLib/js/ie10-viewport-bug-workaround.js"></script> <script type="text/javascript"> //<![CDATA[ var Page_Validators = new Array(document.getElementById("RequiredFieldValidator1")); //]]> </script> <script type="text/javascript"> //<![CDATA[ var RequiredFieldValidator1 = document.all ? document.all["RequiredFieldValidator1"] : document.getElementById("RequiredFieldValidator1"); RequiredFieldValidator1.controltovalidate = "txtcomments"; RequiredFieldValidator1.errormessage = "Write atleast 100 words!!"; RequiredFieldValidator1.validationGroup = "Review"; RequiredFieldValidator1.evaluationfunction = "RequiredFieldValidatorEvaluateIsValid"; RequiredFieldValidator1.initialvalue = ""; //]]> </script> <script type="text/javascript"> //<![CDATA[ var Page_ValidationActive = false; if (typeof(ValidatorOnLoad) == "function") { ValidatorOnLoad(); } function ValidatorOnSubmit() { if (Page_ValidationActive) { return ValidatorCommonOnSubmit(); } else { return true; } } //]]> </script> </form> </body> </html>