Find Jobs
Hire Freelancers

Web Crawling Text Mining Specialist

$5000-25000 USD

Cancelled
Posted about 15 years ago

$5000-25000 USD

Paid on delivery
Our firm is looking for an experienced text mining developer/programmer to help with the development of our Journal Monitoring Project. ? We are developing this tool for a major BioTech company to search their products against PubMed. ? The search results will be downloaded and parsed to extract certain features including: Author, Article Title, Publication Date, Journal Name (volume pages), PubMEd DOI, Publication Date, methods. The results from the searches and extraction will be stored in a DB to be used for some minor BI. We will also need to include links to the abstract and full-text article if available without subscription. ? The entire project will be build in the cloud using Amazon EC2/S3 and a LAMP stack. ? We are looking to use open-source software whenever possible, only customizing where essential to deliver superior results.? ? ? ? ? ? ? ? ? ? ## Deliverables Here is an outline of the architecture. ? We are open to suggestions and would prefer to work with someone who has experience in the area and could add value to the architecture team not only the development side. Experience in searching PubMed Journal articles would be a plus. •Project Overview? •We are looking for a text mining specialist to assist withthe development of an automated data monitoring and analytics of product citation references for peer-reviewed journal articles in Pubmed •Deliverables:.? The project involves creating a database of journal articles that contain references to the clients Antibodies and Related Products (list ofproducts to be searched will be provided).? Our clients products are referenced in thousands of journalarticles, with more popular products appearing hundreds of times. •Pubmed contains peer-reviewed journal articles which willbe used to search against our clients products.? There are a couple ways to access PubMed Journalarticles.? During this first phaseof the project, we can use [login to view URL] or [login to view URL] to do the [login to view URL] advanced search methods are available and domain expertise in searchingPeer-reviewed journal articles is a plus.? We do have the domain expertise to assist in this area, we will workclosely with you to make sure that we are getting the best results possiblefrom the searches with our client. •Scope - Journal Monitoring •We will search across all scholarly journals indexed byPubmed to track for references to our clients products and extract the followingdata: ? (Listed below) ? The Phase I of the project will include followingdeliverables: Set-up development/staging environment in the cloud usingAmazon EC2.? We will work togetheron the stack to be used.? [login to view URL] Architecture ??" (develop a proof of concept.) Create Search Query Parser ??"? (unique solution for each site being searched?) Web Crawling - Searching product and company name in [[login to view URL]][1] and/[login to view URL] to find matches.? There will need to be some qualification of the search results to makesure that the results are from Journal articles that reference the clientproduct and name together.? (Ex.? The product and client name are within5 words of each other.) We will work together on the qualifying of results. ? ? ? ? ? ? ? ? ? ? ? Sampleof Product list | Anti-Rhodopsin | | Anti-RhoG, clone 1F3 B3 E5 | | Anti-Riboflavin | | Anti-Ribonucleotide Reductase, M1 subunit, clone AD203 | | Anti-Rig1/Robo3 | | Anti-MDR1b, ATP Binding Cassette Sub family B | | Anti-RNA Polymerase II, clone ARNA-3 | | Anti-RO52 | | Anti-ROCK-1 | ? Screen Scraper Cache SERP (search engine results pages) Search Results parser Download to database Document parsing ??" tokenizing? Store links to abstract and full text Feature Extraction We? will searchacross all scholarly journals indexed by Pubmed to track for references to ourclients products and extract the following data: ? Phase 1,Phase 2 •Product Names •Product Catalog Part Numbers (if available) •Journal Name, Volume Number and Pages •PubMed Digital Object Identifier (DOI citation number) •Author •Article Title •Publication Date •Methods •Applications •Hyperlink to Abstract •Hyperlink to Full Text •Targets - Pathways or Diseases •In Silico Pathway Analysis • Contact Mining process will extract name, title, employer,postal address, phone, email and article hyperlink from articles where the data is available Download all features to MySQL database Indexing to allow for some simple searching Provide XML file back to the client Integrated testing and analysis ? Phase II ??" not included this initial proposal Monthly Monitoring and feature extraction Integrate added feature extraction Advanced search and qualifying techniques Expanded product synonym search Additional search of data for: Targets ??" Pathways or Disease In Silico Pathway analysis Additional contact mining * * *This broadcast message was sent to all bidders on Thursday Mar 12, 2009 4:26:13 PM: I wanted to give a bit more specifics as to what the first phase of this project entails. Below is a description of what is needed. Please forward your proposals with these requirements. I welcome the opportunity to discuss this over the phone, but I would ask that you respond with something (the most difficult part, basic architecture, risks etc) to show you understand the scope of the project and ability to deliver the desired results.•Project Overview•We are looking for a text mining specialist to assist with the development of an automated data monitoring and analytics of product citation references for peer-reviewed journal articles in Pubmed•Deliverables:. The project involves creating a database of journal articles that contain references to the clients Antibodies and Related Products (list of products to be searched will be provided). Our clients products are referenced in thousands of journal articles, with more popular products appearing hundreds of times.•Pubmed contains peer-reviewed journal articles which will be used to search against our clients products. There are a couple ways to access PubMed Journal articles. During this first phase of the project, we can use [login to view URL] or [login to view URL] to do the searches. More advanced search methods are available and domain expertise in searching Peer-reviewed journal articles is a plus. We do have the domain expertise to assist in this area, we will work closely with you to make sure that we are getting the best results possible from the searches with our client.•Scope - Journal Monitoring•We will search across all scholarly journals indexed by Pubmed to track for references to our clients products and extract the following data: (Listed below)The Phase I of the project will include following deliverables:Set-up development/staging environment in the cloud using Amazon EC2. We will work together on the stack to be used. Ex. LAMP/PERLArchitecture ??" (develop a proof of concept.)Create Search Query Parser ??" (unique solution for each site being searched?)Web Crawling - Searching product and company name in [login to view URL] and/or [login to view URL] to find matches. There will need to be some qualification of the search results to make sure that the results are from Journal articles that reference the client product and name together. (Ex. The product and client name are within 5 words of each other.) We will work together on the qualifying of [login to view URL] the use of PubMed MESH, Synonym Citations, UMLS for more advanced search methods. (we have domain expertise in this area to help with but domain knowledge would be helpful)Sample of Product listAnti-RhodopsinAnti-RhoG, clone 1F3 B3 E5Anti-RiboflavinAnti-Ribonucleotide Reductase, M1 subunit, clone AD203Anti-Rig1/Robo3Anti-MDR1b, ATP Binding Cassette Sub family BAnti-RNA Polymerase II, clone ARNA-3Anti-RO52Anti-ROCK-1Screen ScraperCache SERP (search engine results pages)Download to databaseSearch Results parserDocument parsing ??" tokenizing?Download to data baseStore links to abstract and full textFeature ExtractionWe will search across all scholarly journals indexed by Pubmed to track for references to our clients products and extract the following data: Phase 1,Phase 2•Product Names•Product Catalog Part Numbers (if available)•Journal Name, Volume Number and Pages•PubMed Digital Object Identifier (DOI citation number)•Author•Article Title•Publication Date•Methods•Applications•Hyperlink to Abstract•Hyperlink to Full Text•Targets - Pathways or Diseases•In Silico Pathway Analysis• Contact Mining process will extract name, title, employer, postal address, phone, email and article hyperlink from articles where the data is availableScreen Shot of a Journal Page showing most of the required features to [login to view URL] all features to MySQL databaseIndexing to allow for some simple searchingProvide XML file back to the clientIntegrated testing and analysisPhase II ??" not included this initial proposalMonthly Monitoring and feature extractionIntegrate added feature extractionAdvanced search and qualifying techniquesExpanded product synonym searchAdditional search of data for:Targets ??" Pathways or DiseaseIn Silico Pathway analysisAdditional contact mining
Project ID: 3717085

About the project

14 proposals
Remote project
Active 15 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
14 freelancers are bidding on average $9,332 USD for this job
User Avatar
See private message.
$6,800 USD in 14 days
4.8 (55 reviews)
6.8
6.8
User Avatar
See private message.
$4,250 USD in 14 days
4.9 (134 reviews)
6.7
6.7
User Avatar
See private message.
$4,250 USD in 14 days
4.6 (28 reviews)
6.7
6.7
User Avatar
See private message.
$4,250 USD in 14 days
4.3 (5 reviews)
6.3
6.3
User Avatar
See private message.
$4,250 USD in 14 days
5.0 (11 reviews)
4.4
4.4
User Avatar
See private message.
$8,500 USD in 14 days
4.9 (13 reviews)
4.1
4.1
User Avatar
See private message.
$4,250 USD in 14 days
4.9 (14 reviews)
4.0
4.0
User Avatar
See private message.
$4,250 USD in 14 days
5.0 (6 reviews)
3.6
3.6
User Avatar
See private message.
$4,250 USD in 14 days
5.0 (2 reviews)
1.3
1.3
User Avatar
See private message.
$4,250 USD in 14 days
0.0 (0 reviews)
0.0
0.0
User Avatar
See private message.
$5,270 USD in 14 days
0.0 (0 reviews)
0.0
0.0
User Avatar
See private message.
$7,650 USD in 14 days
0.0 (0 reviews)
0.0
0.0
User Avatar
See private message.
$63,750 USD in 14 days
0.0 (0 reviews)
3.4
3.4
User Avatar
See private message.
$4,675 USD in 14 days
0.0 (0 reviews)
0.0
0.0

About the client

Flag of UNITED STATES
United States
0.0
0
Member since Jul 22, 2008

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.