$20 Bonus + 25% OFF
Securing Higher Grades Costing Your Pocket?
Book Your Assignment at The Lowest Price
Now!
Students Who Viewed This Also Studied
ICS211 Introduction To Computer Science
Task
Submission Instructions:
1. For each problem, submit 1) your script/program, and 2) one or two screenshots in jpg format showing that your program really works.
2. Name your script and program using the pattern NJITID#_1.sh and NJITID#_2.py. Name your screenshots using the pattern NJITID#_Problem#_Index.jpg. NJITID# is the eight-digit NJIT ID (Not your UCID, Rutgers students also have NJIT IDs). Problem# is the problem number (e.g., 1, 2, 3, 4, etc). Index is the index number showing the order of the screenshots (e.g., 1 or 2).
3. Submit individual files. Do not submit a Zip File.
A Big-Data Processing Task:
We need to find out 15 most frequently used words on a set of Wikipedia pages. Specifically, we need to find out a list of these words and the number of occurrences of each word on the pages. The list should be sorted in descending order based on the number of occurrences.
The following is a sample of output generated for 4 Wikipedia pages.
375 advertising
Since there are a huge number of pages in Wikipedia, it is not realistic to analyze all of them in short time on one machine. Here we only need to analyze all the pages for the Wikipedia entries with two capital letters.
Problem 1:
Write a bash script, which combines a few tools in Linux to finish the above big-data processing task. You can use wget to download and save a page. For example, the following command downloads and save the AC wiki page into file AC.html:
wget https://en.wikipedia.org/wiki/AC -O AC.html
A HTML page has HTML tags, which should be removed before the analysis. (Open a .html file using vi and a web browser, and you will find the differences.) You can use lynx to extract the text content into a text file. For example, the following command extract the content for entry “AC” into AC.txt
lynx -dump –nolist AC.html > AC.txt
After the contents for all the required entries have been extracted, you need to find all the words using grep. You need to use a regular expression to guide grep to do the search. All the words found by grep should be saved into the same file, which is then used to find the most frequently used words. Note that you need to find distinct words and count the number of times that each
distinct word appears in file. Using the -o option (i.e., grep –o) will simplify the processing, since you only need the matching parts. You may need sort,cut and uniq in this step. Read the man pages of sort, cut and uniq to understand how this can be achieved.
Hint: You don’t need to write code to count the number of occurrences for each distinct word. Use sort and uniq smartly — sort groups the occurrences, and uniq counts the number of occurrences.
Problem 2:
Write a python program to finish the above big-data processing task. Use urllib or urllib2 module to download a page. Use re module to search for words.
A HTML page has HTML tags, which should be removed before the analysis. Use Beautiful Soup to convert a text from HTML format to text format.
Check the attached slides for how to use urllib/urllib2 and BeautifulSoup.
Note: It is possible that the list generated by the python program and the list generated by the bash script differ slightly. For example, the word “Wikipedia” is included on one list, but not on theother; for a word on both lists, the count may be slightly larger on one list than the other. This is caused by the different tools used in bash and python to convert HTML into text. For the same HTML page, the text files generated by lynx and BeautifulSoup may be slightly different
But the results generated by your bash script and python program should not differ significantly. If you see the words on these two lists differ by more than 50% or the counts of the same word differ by more than 50%, probably there is a bug in you script/program.
ICS211 Introduction To Computer Science
Answer in Detail
Solved by qualified expert
Get Access to This Answer
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Hac habitasse platea dictumst vestibulum rhoncus est pellentesque. Amet dictum sit amet justo donec enim diam vulputate ut. Neque convallis a cras semper auctor neque vitae. Elit at imperdiet dui accumsan. Nisl condimentum id venenatis a condimentum vitae sapien pellentesque. Imperdiet massa tincidunt nunc pulvinar sapien et ligula. Malesuada fames ac turpis egestas maecenas pharetra convallis posuere. Et ultrices neque ornare aenean euismod. Suscipit tellus mauris a diam maecenas sed enim. Potenti nullam ac tortor vitae purus faucibus ornare. Morbi tristique senectus et netus et malesuada. Morbi tristique senectus et netus et malesuada. Tellus pellentesque eu tincidunt tortor aliquam. Sit amet purus gravida quis blandit. Nec feugiat in fermentum posuere urna. Vel orci porta non pulvinar neque laoreet suspendisse interdum. Ultricies tristique nulla aliquet enim tortor at auctor urna. Orci sagittis eu volutpat odio facilisis mauris sit amet.
Tellus molestie nunc non blandit massa enim nec dui. Tellus molestie nunc non blandit massa enim nec dui. Ac tortor vitae purus faucibus ornare suspendisse sed nisi. Pharetra et ultrices neque ornare aenean euismod. Pretium viverra suspendisse potenti nullam ac tortor vitae. Morbi quis commodo odio aenean sed. At consectetur lorem donec massa sapien faucibus et. Nisi quis eleifend quam adipiscing vitae proin sagittis nisl rhoncus. Duis at tellus at urna condimentum mattis pellentesque. Vivamus at augue eget arcu dictum varius duis at. Justo donec enim diam vulputate ut. Blandit libero volutpat sed cras ornare arcu. Ac felis donec et odio pellentesque diam volutpat commodo. Convallis a cras semper auctor neque. Tempus iaculis urna id volutpat lacus. Tortor consequat id porta nibh.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Hac habitasse platea dictumst vestibulum rhoncus est pellentesque. Amet dictum sit amet justo donec enim diam vulputate ut. Neque convallis a cras semper auctor neque vitae. Elit at imperdiet dui accumsan. Nisl condimentum id venenatis a condimentum vitae sapien pellentesque. Imperdiet massa tincidunt nunc pulvinar sapien et ligula. Malesuada fames ac turpis egestas maecenas pharetra convallis posuere. Et ultrices neque ornare aenean euismod. Suscipit tellus mauris a diam maecenas sed enim. Potenti nullam ac tortor vitae purus faucibus ornare. Morbi tristique senectus et netus et malesuada. Morbi tristique senectus et netus et malesuada. Tellus pellentesque eu tincidunt tortor aliquam. Sit amet purus gravida quis blandit. Nec feugiat in fermentum posuere urna. Vel orci porta non pulvinar neque laoreet suspendisse interdum. Ultricies tristique nulla aliquet enim tortor at auctor urna. Orci sagittis eu volutpat odio facilisis mauris sit amet.
Tellus molestie nunc non blandit massa enim nec dui. Tellus molestie nunc non blandit massa enim nec dui. Ac tortor vitae purus faucibus ornare suspendisse sed nisi. Pharetra et ultrices neque ornare aenean euismod. Pretium viverra suspendisse potenti nullam ac tortor vitae. Morbi quis commodo odio aenean sed. At consectetur lorem donec massa sapien faucibus et. Nisi quis eleifend quam adipiscing vitae proin sagittis nisl rhoncus. Duis at tellus at urna condimentum mattis pellentesque. Vivamus at augue eget arcu dictum varius duis at. Justo donec enim diam vulputate ut. Blandit libero volutpat sed cras ornare arcu. Ac felis donec et odio pellentesque diam volutpat commodo. Convallis a cras semper auctor neque. Tempus iaculis urna id volutpat lacus. Tortor consequat id porta nibh.
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Hac habitasse platea dictumst vestibulum rhoncus est pellentesque. Amet dictum sit amet justo donec enim diam vulputate ut. Neque convallis a cras semper auctor neque vitae. Elit at imperdiet dui accumsan. Nisl condimentum id venenatis a condimentum vitae sapien pellentesque. Imperdiet massa tincidunt nunc pulvinar sapien et ligula. Malesuada fames ac turpis egestas maecenas pharetra convallis posuere. Et ultrices neque ornare aenean euismod. Suscipit tellus mauris a diam maecenas sed enim. Potenti nullam ac tortor vitae purus faucibus ornare. Morbi tristique senectus et netus et malesuada. Morbi tristique senectus et netus et malesuada. Tellus pellentesque eu tincidunt tortor aliquam. Sit amet purus gravida quis blandit. Nec feugiat in fermentum posuere urna. Vel orci porta non pulvinar neque laoreet suspendisse interdum. Ultricies tristique nulla aliquet enim tortor at auctor urna. Orci sagittis eu volutpat odio facilisis mauris sit amet.
Tellus molestie nunc non blandit massa enim nec dui. Tellus molestie nunc non blandit massa enim nec dui. Ac tortor vitae purus faucibus ornare suspendisse sed nisi. Pharetra et ultrices neque ornare aenean euismod. Pretium viverra suspendisse potenti nullam ac tortor vitae. Morbi quis commodo odio aenean sed. At consectetur lorem donec massa sapien faucibus et. Nisi quis eleifend quam adipiscing vitae proin sagittis nisl rhoncus. Duis at tellus at urna condimentum mattis pellentesque. Vivamus at augue eget arcu dictum varius duis at. Justo donec enim diam vulputate ut. Blandit libero volutpat sed cras ornare arcu. Ac felis donec et odio pellentesque diam volutpat commodo. Convallis a cras semper auctor neque. Tempus iaculis urna id volutpat lacus. Tortor consequat id porta nibh.
22 More Pages to Come in This Document. Get access to the complete answer.
If you are searching ‘who can write my essay at a cheap price’, then MyAssignmenthelp.com is the right place for you. The reason is, we are the best essay writing service in Australia that provides quality academic assistance at an affordable price. With a pool of best-qualified, skilled and experienced essay writers from australia, we competent of providing highest quality plagiarism free custom essay help in more than 100 subjects.
More ICS211 ICS211 Introduction To Computer Science: Questions & Answers
7135CEM Modelling and Optimisation Under Uncertainty
Questions:
Module Learning Outcomes Assessed: On completion of this module the student should be able to: 1. Apply supervised and unsupervised learning applications using Gaussian process emulators. 2. Apply Dirichlet processes for unsupervised learning applications 3. Develop t …
View Answer
CPS 5951 Advanced Software Engineering
Task:
All term papers should be typed clearly. Be prepared to present your answer.
Attention: The pdf document must be less than or equal to 4 pages in A4 paper size. Please use Time New Roman font in size 12. You can use charts or figures to help the illustration and explanation. Be sure you have …
View Answer
SODV2202 object oriented programming
OverviewDesign and implement a program to evaluate mathematical expressions typed by the user. For example, if the user typed:5 + 2 * -3 + (12.4 – 7.6) * 10 / 2 Your program would print the result:23 DirectionsStart by creating a UML diagram outlining your planned design for the assignment.Ret …
View Answer
Raspberry pi
This is the task where you had to use a raspberry pi to create a memory game using LED’s. Please read the specification on Canvas that Rose initially put up. The re-submission on canvas should be a ZIP file which includes a report consisting of a project plan, algorithm (flow chart), justification o …
View Answer
Content Removal Request
If you are the original writer of this content and no longer wish to have your work published on Myassignmenthelp.com then please raise the
content removal request.
Choose Our Best Expert to Help You
Faiz Kwok
Pursuing PhD in Medicine with Specialization in Nursing & Paramedical
1972 – Completed Orders
Hire Me
Haidil Jiang
PhD in Law with Specialization in Civil Law and Aviation Law
1047 – Completed Orders
Hire Me