Identify the key properties of a web crawler describe in


Use Crawler Java Assignment

Review, fix and run the crawler.

Add code for additional requiments.

Make sure you crawler does the following.

Test your crawler only on the data in:

https://lyle.smu.edu/~fmoore

Make sure that your crawler is not allowed to get out of this directory!!! Yes, there is a robots.txt file that must be used. Note that it is in a non-standard location.

The required input to your program is N, the limit on the number of pages to retrieve and a list of stop words (of your choosing) to exclude.

Perform case insensitive matching.

You can assume that there are no errors in the input. Your code should be robust under errors in the Web pages you're searching. If an error is encountered, feel free, if necessary, just to skip the page where it is encountered.

1. Identify the key properties of a web crawler. Describe in detail how each of these properties is implemented in your code.

2. Use your crawler to list the URL of all pages in the test data and report all out-going links of the test data. [10 points] display the contents of the tag</p> <p style="text-align: justify;">3. Implement duplicate detection, and report if any URLs refer to already seen content.</p> <p style="text-align: justify;">4. Use your crawler to list all broken links within the test data.</p> <p style="text-align: justify;">5. How many graphic files are included in the test data?</p> <p style="text-align: justify;">6. Have your crawler save the words from each page of type (.txt, .htm, .html). Make sure that you do not save HTML markup. Explain your definition of "word". In this process, give each page a unique document ID.</p> <p style="text-align: justify;">Implement Stemming</p> <p style="text-align: justify;">7. Report the 20 most common words with its document frequency. words or stemmed words?</p> <p><strong>Attachment:-</strong> <a href="https://secure.tutorsglobe.com/Atten_files/409_crawler_project.zip" target="_blank">crawler_project.zip</a></p></p> </div> <div id="viewreadmore" class="link"> <a id="readmore" href="javascript:void(0);" class="read-more-trigger mar_top10" onclick="changeheight(this)">View Complete Question</a> </div> <div id="DivSolution"> <h4> Solution Preview : </h4> <div class="seprator"> </div> <p> </p> <div class="downloadfiles"> <h5> Prepared by a verified Expert</h5> <h6> JAVA Programming: Identify the key properties of a web crawler describe in</h6> <h5> Reference No:- TGS02238162</h5> <input type="submit" name="getPaid" value="Purchase Solution File" id="getPaid" class="btn btn-success btn-lg btn-block-sm mar_btm20" /> <p> Now Priced at $70 (50% Discount)</p> </div> <div style="text-align: justify"></div> </div> </div> <div class="row"> <div class="col-sm-12 reviewbox"> <div id="PlnRated"> <div class="row recomded"> <div class="recomdedbox col-sm-2 col-xs-12"> <p class="inner"><i class="fa fa-thumbs-o-up"></i> Recommended <b>(90%)</b></p> </div> <div class="recomdedbox col-sm-2 col-xs-12"> <p class="inner rating"><i class="fa fa-star"></i> Rated <b>(4.3/5)</b></p> </div> </div> </div> <div class="row "> <div class="panel-group review" id="accordion" role="tablist" aria-multiselectable="true"> <div class="panel-heading" role="tab" id="headingTwo"> <h4 class="panel-title"> <a class="collapsed" role="button" data-toggle="collapse" data-parent="#accordion" href="#collapseTwo" aria-expanded="false" aria-controls="collapseTwo"> Have a Question? (oR Write a Review) </a> </h4> </div> <div id="collapseTwo" class="panel-collapse collapse" role="tabpanel" aria-labelledby="headingTwo"> <div class="panel-body"> <div class="col-sm-12"> <div class="row search searchbg message"> <span id="RequiredFieldValidator1" style="visibility:hidden;">Write atleast 100 words!!</span> <textarea name="txtcomments" id="txtcomments" maxlength="1000" ValidationGroup="Review" placeholder="Write your review" class="form-control" rows="6"></textarea> <div class="pull-right mar_top20"> <input type="submit" name="btnReviewSubmit" value="Submit" onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("btnReviewSubmit", "", true, "Review", "", false, false))" id="btnReviewSubmit" class="btn btn-primary pull-right" /> </div> </div> </div> </div> </div> </div> </div> </div> </div> <div class="user-comments-area hidden-xs"> <h4 class="text-uppercase mar_btm20"> <i class="fa fa-question-circle"></i>   Recent Questions Asked JAVA Programming</h4> <ul class="user-comments-list"> <table id="dlMaterials" cellspacing="0" style="width:100%;border-collapse:collapse;"> <tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_0" class="studenthdname" href="https://www.tutorsglobe.com/question/according-to-a-recent-study93--of-high-school-dropouts-are-52238158.aspx">According to a recent study93 of high school dropouts are</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_0">according to a recent study93 of high school dropouts are 16- to 17-year-olds in addition65 of high school dropouts</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_1" class="studenthdname" href="https://www.tutorsglobe.com/question/why-would-it-be-important-to-occasionally-check-your-52238159.aspx">Why would it be important to occasionally check your</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_1">assignment web scenerio foruminstructions discuss the following below1 the role of css in htmla advantages of style</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_2" class="studenthdname" href="https://www.tutorsglobe.com/question/design-a-database-diagram-for-a-database-that-stores-52238160.aspx">Design a database diagram for a database that stores</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_2">sql server 2012 assingment1 design a database diagram for a database that stores information about the downloads</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_3" class="studenthdname" href="https://www.tutorsglobe.com/question/write-an-essay-on-the-effects-of-internet-usage-or-lack-52238161.aspx">Write an essay on the effects of internet usage or lack</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_3">write an essay on the effects of internet usage or lack thereof on your daily life following the steps diane wood took</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_4" class="studenthdname" href="https://www.tutorsglobe.com/question/identify-the-key-properties-of-a-web-crawler-describe-in-52238162.aspx">Identify the key properties of a web crawler describe in</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_4">use crawler java assignmentreview fix and run the crawleradd code for additional requimentsmake sure you crawler does</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_5" class="studenthdname" href="https://www.tutorsglobe.com/question/we-toss-an-unfair-coin-100-times-in-a-row-we-play-according-52238163.aspx">We toss an unfair coin 100 times in a row we play according</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_5">we toss an unfair coin 100 times in a row we play according to following rules if tail 1 if head -145 p head04 estimate</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_6" class="studenthdname" href="https://www.tutorsglobe.com/question/based-on-the-answer-from-question-9-calculate-90-confidence-52238164.aspx">Based on the answer from question 9 calculate 90 confidence</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_6">hollow is proud of their energy saving program a sample of 29 houses reveals an average saving of 475 kilowatt hours</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_7" class="studenthdname" href="https://www.tutorsglobe.com/question/psyc-164--please-watch-the-following-ted-talk-there-is-some-52238165.aspx">Psyc 164 please watch the following ted talk there is some</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_7">assignmentplease watch the following ted talk there is some overlap with my module - wish id known that before i</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <span class="mar_lft5">Q :</span> <a id="dlMaterials_hypermaterial_8" class="studenthdname" href="https://www.tutorsglobe.com/question/you-are-skeptical-of-the-business-school-claim-and-decide-52238166.aspx">You are skeptical of the business school claim and decide</a></h5> <p class="answer"> <span id="dlMaterials_lblQuestion_8">a local business school claims that its graduating seniors get higher-paying jobs than the national average for</span></p> </div> <!-- /comment-box --> </li> </td> </tr> </table> </ul> <!-- /user-comments-list --> </div> </div> <div class="col-md-4 col-xs-12 login-area innerpage"> <div class="row"> <div class="details col-md-12"> <div class="col-md-4"> <div class="circle orange"> <i class="fa fa-question"></i> </div> <p> 1941606 </p> <p> Questions<br /> Asked</p> </div> <div class="col-md-4"> <div class="circle yellow"> <i class="fa fa-user-secret"></i> </div> <p> 3,689</p> <p> Active Tutors</p> </div> <div class="col-md-4"> <div class="circle green"> <i class="fa fa-thumbs-o-up"></i> </div> <p> 1441459</p> <p> Questions<br /> Answered</p> </div> <p><b> Start Excelling in your courses, Ask a tutor for help and get answers for your problems !! </b></p> <a href="https://www.tutorsglobe.com/post-your-job-for-free.aspx" class="btn btn-primary btn-lg mar_top10">ask Question</a> </div> </div> <div class="row"> <div class="user-comments-area hidden-xs"> <hr /> <h4 class="text-uppercase mar_btm20"> <i class="fa fa-question-circle"></i> Asked Questions</h4> <hr /> <ul class="user-comments-list"> <table id="dlNewReviews" cellspacing="0" style="width:100%;border-collapse:collapse;"> <tr> <td> <li> <div class="comment-box"> <h5> <a id="dlNewReviews_hyperQues_0" class="studenthdname" href="https://www.tutorsglobe.com/question/nurse-is-caring-for-a-client-who-has-a-pneumothorax-53447481.aspx">Nurse is caring for a client who has a pneumothorax</a></h5> <p> <span id="dlNewReviews_lblReviews_0">Problem: A nurse is caring for a client who has a pneumothorax. A nurse is reviewing the client's medical record. </span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <a id="dlNewReviews_hyperQues_1" class="studenthdname" href="https://www.tutorsglobe.com/question/what-are-the-potential-consequences-of-overusing-antibiotics-53447482.aspx">What are the potential consequences of overusing antibiotics</a></h5> <p> <span id="dlNewReviews_lblReviews_1">What are the potential consequences of overusing antibiotics, and how can this contribute to the development of antibiotic-resistant infections like MRSA?</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <a id="dlNewReviews_hyperQues_2" class="studenthdname" href="https://www.tutorsglobe.com/question/what-is-the-purpose-of-the-uhdds-definitions-53447483.aspx">What is the purpose of the uhdds definitions</a></h5> <p> <span id="dlNewReviews_lblReviews_2">Question: What is the purpose of the UHDDS definitions regarding the principal diagnosis?</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <a id="dlNewReviews_hyperQues_3" class="studenthdname" href="https://www.tutorsglobe.com/question/what-is-the-responsibility-of-coding-professionals-53447484.aspx">What is the responsibility of coding professionals</a></h5> <p> <span id="dlNewReviews_lblReviews_3">Question: What is the responsibility of coding professionals regarding a provider's clinical criteria for diagnosis?</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <a id="dlNewReviews_hyperQues_4" class="studenthdname" href="https://www.tutorsglobe.com/question/review-the-performance-measures-under-each-hedis-domain-53447485.aspx">Review the performance measures under each hedis domain</a></h5> <p> <span id="dlNewReviews_lblReviews_4">Expand each domain and review the performance measures under each HEDIS domain. Choose one NCQA HEDIS performance measure </span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <a id="dlNewReviews_hyperQues_5" class="studenthdname" href="https://www.tutorsglobe.com/question/caring-for-a-client-who-has-sickle-cell-disease-53447486.aspx">Caring for a client who has sickle cell disease</a></h5> <p> <span id="dlNewReviews_lblReviews_5">A nurse is caring for a client who has sickle cell disease. For each client finding, click to specify if the finding is consistent with sickle cell disease</span></p> </div> <!-- /comment-box --> </li> </td> </tr><tr> <td> <li> <div class="comment-box"> <h5> <a id="dlNewReviews_hyperQues_6" class="studenthdname" href="https://www.tutorsglobe.com/question/which-intervention-help-patient-from-experiencing-heartburn-53447487.aspx">Which intervention help patient from experiencing heartburn</a></h5> <p> <span id="dlNewReviews_lblReviews_6">A patient has a sliding hiatal hernia. Which intervention will help prevent the patient from experiencing heartburn and dyspepsia?</span></p> </div> <!-- /comment-box --> </li> </td> </tr> </table> </ul> </div> </div> </div> </div> </div> </div> </div> <script> var url = 'https://www.tutorsglobe.com/include/javascript/watiWidget.js'; var s = document.createElement('script'); s.type = 'text/javascript'; s.async = true; s.src = url; var options = { "enabled":true, "chatButtonSetting":{ "backgroundColor":"#00e785", "ctaText":"Whatsapp Support!!", "borderRadius":"25", "marginLeft": "0", "marginRight": "20", "marginBottom": "20", "ctaIconWATI":false, "position":"left" }, "brandSetting":{ "brandName":"Tutorsglobe", "brandSubTitle":"Trusted Since 2005", "brandImg":"https://www.tutorsglobe.com/include/images/chat-logo.svg", "welcomeText":"Hi there!\nDo you Need help?", "messageText":"Hello, Tutorsglobe !! I have a question!", "backgroundColor":"#00e785", "ctaText":"Chat with Whatsapp", "borderRadius":"25", "autoShow":false, "phoneNumber":"441416286080" } }; s.onload = function() { CreateWhatsappChatWidget(options); }; var x = document.getElementsByTagName('script')[0]; x.parentNode.insertBefore(s, x); </script> <footer class="site-footer"> <div class="container"> <div class="footerlinks"> <a href="https://www.tutorsglobe.com/">Home</a> | <a href="https://www.tutorsglobe.com/about-us.aspx">Company Overview</a> | <a href="https://www.tutorsglobe.com/services.aspx">Services</a> | <a href="https://www.tutorsglobe.com/library/">Discover Q&A</a> | <a href="https://www.tutorsglobe.com/sitemap.aspx">Sitemap</a> | <a href="https://www.tutorsglobe.com/contact-us.aspx">Contact Us</a> | <a href="https://www.tutorsglobe.com/terms-and-conditions.aspx">T & C</a> | <a href="https://www.tutorsglobe.com/refundcancelpolicy.aspx">Refund Policy</a> | <a href="https://www.tutorsglobe.com/copyright-infringement-policy.aspx">Copyright Policy</a> | <a href="https://www.tutorsglobe.com/blog/archive/">Blog</a> | <a href="https://www.tutorsglobe.com/library/archive.aspx">Q&A</a> | <a href="https://www.tutorsglobe.com/education-directory.aspx">Directory</a> </div> <p>©TutorsGlobe</a> All rights reserved 2022-2023. </p> <script type="application/ld+json"> { "@context": "http://schema.org/", "@type": "product", "name": "Tutorsglobe", "image": "https://www.tutorsglobe.com/IncludeLib/Images/logo.png", "description": "elearning Platform - Tutor Service", "brand": { "@type": "elearning", "name": "Tutorsglobe" }, "aggregateRating": { "@type": "AggregateRating", "ratingValue": "4.9", "ratingCount": "37128" } } </script> <a href="#" class="settings"><i class="fa fa-angle-up"></i></a> <ul class="social-icons"> <li><a href="https://www.facebook.com/TutorsGlobe" rel="nofollow" target="_blank"><i class="fa fa-facebook-square"></i></a></li> <li><a href="https://twitter.com/Tutorsglobe" rel="nofollow" target="_blank"><i class="fa fa-twitter-square"></i></a></li> <li><a href="#" rel="nofollow"><i class="fa fa-youtube-square"></i></a></li> <li><a href="https://www.linkedin.com/company/tutorsglobe" target="_blank" rel="nofollow"><i class="fa fa-linkedin-square"></i></a></li> </ul> </div> <script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-32333066-1']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.tutorsglobe.com/IncludeLib/js/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script> <script async src="https://www.googletagmanager.com/gtag/js?id=G-5E9QFMFDJR"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-5E9QFMFDJR'); </script> </footer> </div> <!-- /pageWrap --> <div class="overlay"> </div> <!-- JavaScript Files ================================================== --> <script type="text/javascript" src="../IncludeLib/js/jquery-1.11.2.min.js"></script> <script type="text/javascript" src="../IncludeLib/js/bootstrap.min.js"></script> <script type="text/javascript" src="../IncludeLib/js/jquery.mCustomScrollbar.concat.min.js"></script> <script type="text/javascript" src="../IncludeLib/js/script.js"></script> <script type="text/javascript" src="../IncludeLib/js/ie10-viewport-bug-workaround.js"></script> <script type="text/javascript"> //<![CDATA[ var Page_Validators = new Array(document.getElementById("RequiredFieldValidator1")); //]]> </script> <script type="text/javascript"> //<![CDATA[ var RequiredFieldValidator1 = document.all ? document.all["RequiredFieldValidator1"] : document.getElementById("RequiredFieldValidator1"); RequiredFieldValidator1.controltovalidate = "txtcomments"; RequiredFieldValidator1.errormessage = "Write atleast 100 words!!"; RequiredFieldValidator1.validationGroup = "Review"; RequiredFieldValidator1.evaluationfunction = "RequiredFieldValidatorEvaluateIsValid"; RequiredFieldValidator1.initialvalue = ""; //]]> </script> <script type="text/javascript"> //<![CDATA[ var Page_ValidationActive = false; if (typeof(ValidatorOnLoad) == "function") { ValidatorOnLoad(); } function ValidatorOnSubmit() { if (Page_ValidationActive) { return ValidatorCommonOnSubmit(); } else { return true; } } //]]> </script> </form> </body> </html>