Find Jobs
Hire Freelancers

Scraper application

$500-5000 USD

Cancelled
Posted over 15 years ago

$500-5000 USD

Paid on delivery
Scraper application that collects data from google blogs and google images and updates db. Also need backend admin functionality. Word doc attached that describes requirements ## Deliverables rac won't accept a rar file - so I'm imbedding requirements here Requirements Need an application that will read each record from a csv input file and capture Google’s “blogs search?? results ([[login to view URL]][1]) for each input records “keyword?? field that is the first field in CSV input file. It will also need to capture images from Google’s “Image Search??. The data will be used for a subsequent website that will present the data based on the “keyword?? that user selects. The application would have two purposes. First it would need to prime the data base (assume SQL) with up to 500 entries found from Google blog search and 3 thumbnails from Google’s “image search?? for each keyword in the input CSV file. The other purpose is to collect new blog entries on a daily basis and add those to the database. At the same time the program would purge the same number of old entries that were added. For example ??" if the program found 5 new entries that day, it would delete the 5 oldest entries. Google blog search has a feature to search for last day, last week, last month, etc. If there are no results for a keyword, a database record still needs to be built during the “priming phase??that says “There is no information at this time??. This equates to having one entry in the database versus the maximum of 500. An example for illustrative purposes - assume there is only one record in input csv file (application will need to handle unlimited number of records in the input file) and the “keyword?? field is HDTV The data base primer phase would capture the latest 500 entries for HDTV and add them to the database. It would also add 3 images. During the daily update phase the program would find the new posts for HDTV since the last day, add them to the database and at the same time delete the same number of old records. Program must run undetected using some type of human emulation technique so as to not get banned by Google for excessive searches. Program must also have some type of checkpoint process so it can start off where it left off in the event of some type of failure. Program should also provide an admin interface to allow starting it or re-starting it where it left off. Need an admin interface to be able to: Start the application for priming phase. Start the application to run the update phase. Checkpoint capability to restart in the event of a failure.(i.e. restart from last good checkpoint). Ability to add or delete database records based on keyword and if adding a new keyword then prime the database for update phase. Ability to add or delete thumbnail images. Database schema must be clearly defined so any programmer would be able to write code for website presentation or report writing. Programmer who writes the above described solution will be used to develop follow on website if they want they work. The source code must be fully documented. (i.e. “this section of code adds a new post and then purges an old post?? RAWC CANCELLED THIS BID SAYING IT WAS AGAINST GOOGLE TERMS OF SERVICE.
Project ID: 3463127

About the project

Remote project
Active 15 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs

About the client

Flag of UNITED STATES
United States
4.9
8
Member since Aug 21, 2002

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.