Find Jobs
Hire Freelancers

Process 100 million lines of domains and output the UNIQUE lines.

$30-250 USD

Closed
Posted over 9 years ago

$30-250 USD

Paid on delivery
We have a list of domains about 100,000,000 in total, I imagine about 50% are duplicates. We need to process the entire list and remove the duplicates. The final output should be a list of UNIQUE domains.. I have been processing using EMeditor for the past 48 hours on a an i7 PC with 16GB of RAM, and it's still no where near finished. We need some massive power to process this. Please do not bid unless you have worked with data of this size before. Thanks!
Project ID: 6841247

About the project

42 proposals
Remote project
Active 9 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
42 freelancers are bidding on average $147 USD for this job
User Avatar
Hello, This is vaishali from \"Hire WordPress Experts\" and I am here to help you with Process 100 million lines of domains and output the UNIQUE lines.. We have gone through the information provided by you. and I can assure you that we are expert with Data Processing, Excel, Microsoft SQL Server, MySQL, PHP, and I will make your project a reality. Surely, We are ready to help you with process the entire list and remove the duplicates. Yes, The final output will be a list of UNIQUE domains.. Yes, I have understood your requirement. We will assure you that we can easily do it. Give us an opportunity to prove our self. I can promise following 1. You will have exceptional time working 2. We do not hesitate if specifications change for good. 3. The project will be completed in far less than 25 days and you receive free support for up to 25 days. 4. We will be available on sky-pe 5. We do not take any upfront payments Please select us right away, so that we can get started without wasting a day! Thanks P.S. Be assured you will have a nice time, and this is a placeholder vaishali Business Analyst
$99 USD in 25 days
4.9 (587 reviews)
8.9
8.9
User Avatar
Hi ready to work with you if you think you need a good worker then you can hire me...............Thanks.
$277 USD in 3 days
4.9 (483 reviews)
8.2
8.2
User Avatar
I can remove duplicates from 100 million lines of domains list and output the unique lines in sql or text format. I will complete this work in 2 days. Looking for your reply to start this work immediately.
$79 USD in 2 days
5.0 (1337 reviews)
8.3
8.3
User Avatar
hi I'd like to help you on this. I have had to accomplish similar goals before and am ready to provide you with a custom, repeatable process in the form of a bespoke application that will run utilising all the power your pc has (all cores, parallel) and cleanse the domain list in the form of a bespoke .net application. Also I will clean the data for you if you provide the file. regards Janos
$389 USD in 3 days
5.0 (90 reviews)
7.9
7.9
User Avatar
I have experience to work with 20 million rows, I have dedicated server located in texas, I can process your data within 24 hours, I have many alternative way to process data. I will not demand any penny if I failed to do so ( 0.00001% possibility of being getting failed )
$50 USD in 1 day
5.0 (249 reviews)
7.3
7.3
User Avatar
Hello, removing the duplicates won't take more than 48 hours. Regards.
$200 USD in 3 days
5.0 (89 reviews)
7.1
7.1
User Avatar
Hello Sir, I can create this list for you but it will need different approach than you were using. Check my profile to see that people who worked with me are extremely satisfied with results and speed. I have 100% completion rate and I can start right away. Best regards, Dusan
$99 USD in 3 days
5.0 (153 reviews)
7.2
7.2
User Avatar
Hello I can do it Please provide me the domains list. ......................................................... Best Regards Bill Lee
$100 USD in 3 days
5.0 (61 reviews)
6.9
6.9
User Avatar
A proposal has not yet been provided
$200 USD in 3 days
5.0 (61 reviews)
6.5
6.5
User Avatar
Hi there, I'm exert in Database Management. I can do this. Please PM me for further discuss. Thank you, FARZANA PINKY.
$222 USD in 2 days
4.8 (118 reviews)
6.3
6.3
User Avatar
Hi, I'm an expert in database and data processing with very good feedback and completion rate. I'm very interesting in your project and willing to do it for $100. I used to process big databases up to 24 millions records and believe I can do your project in 3 days. My method is importing 100 million lines to a database, using powerful function of the database to remove duplicates (much faster than text editor like EMeditor), and then exporting unique records to the result file in original format. Best, svteam.
$100 USD in 3 days
5.0 (62 reviews)
6.2
6.2
User Avatar
What is the file format of the data ? I propose a custom 3 step approach: 1. filter data, separate files for each letter; 2. sort alphabetically each fille; 3. process each file and keep unique records. Filtering is fast, sorting takes some time, processing is much faster on sorted data. Can you please send me a small sample file ?
$149 USD in 5 days
5.0 (216 reviews)
5.7
5.7
User Avatar
Hi, I am experienced systems administrator, I worked for a company processing large amounts of similar data (traffic logs for telco) - I performed analysis and reporting of that data in both databases and flat files. What's needed in this project is to analyze input file(s) and transforming the data to the form it is optimized to processing. I can handle this task and deliver the required, unique domain list (along with number of occurences in the input file(s) if wished for). I can also deliver the procedure for analysing the data in this case.
$80 USD in 3 days
4.8 (15 reviews)
4.4
4.4
User Avatar
Hello, I can help you with your project. If it's OK with you, could you share the file containing 100M domains? I can give you the exact number how many unique domains. Thank you.
$222 USD in 2 days
4.6 (2 reviews)
3.4
3.4
User Avatar
Software developer since 2000, specializing in Visual Basic, SQL Server, Crystal Reports, and VBA for MS Office. please send a sample of your list of domains
$100 USD in 3 days
5.0 (4 reviews)
3.3
3.3
User Avatar
Just tell me that where these "100 million lines of domains" resides. Either these are in a text file or any other format (please specify). I will make a program / macro that will read this file and remove the duplicate domain names. More discussions will be after you accept my bid. Thanks.
$250 USD in 3 days
4.9 (13 reviews)
3.5
3.5
User Avatar
Hi, expert web/data scraper here with over 17 years experience in programming and RDBMS - please see my reviews. I'm using Perl for this kind of jobs. I'm able to finish fast.
$244 USD in 3 days
5.0 (5 reviews)
3.5
3.5
User Avatar
Hi! I am interested in your project. I am a software developer and I am working in data + web analysis so I strongly believe that my abilities fit to your requirements. I look forward to working with you!
$150 USD in 3 days
4.8 (1 review)
3.1
3.1
User Avatar
Hi I worked on US Health Care dataset of 20 million rows. 100 million rows not yet. If you could let me work on the file now, I can confirm if my solution work prior to award. If not successful, no charge please. KC
$166 USD in 3 days
5.0 (6 reviews)
3.1
3.1
User Avatar
Greetings! Software developer with expert knowlege of data processing, at your service. Let me code a script for you that will go through your input and output a list of unique domains. I'm ready to start working now, let's get the job done! Thank you.
$30 USD in 1 day
5.0 (2 reviews)
3.0
3.0

About the client

Flag of AUSTRALIA
Melbourne, Australia
5.0
216
Payment method verified
Member since Apr 20, 2012

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.