Find Jobs
Hire Freelancers

apache spark using Pyspark ETL help

$30-50 USD

Cancelled
Posted about 4 years ago

$30-50 USD

Paid on delivery
Basically I have an ETL with 2 updates and I want to write the same updates in Pyspark table_a: +---+-----------+-------+--------------+ |key|col_a | col_b | current_flag | +---+-----------+-------+--------------+ |001| Value1 | T123 | Y | |002| oth_val1 | T123 | N | |003| oth_val2 | T123 | N | |004| oth_val3 | T123 | N | |005| Value2 | T123 | Y | |006| oth_val4 | T789 | N | |007| Value2 | T789 | Y | |008| Value1 | T789 | N | +---+-----------+-------+--------------+ UPDATE table_abc SET col_a = 'Value1' WHERE col_b IN ( SELECT col_b FROM table_abc WHERE col_a = 'Value1' and current_flag = 'Y' ) AND current_flag = 'N' COMMIT; +---+-----------+-------+--------------+ |key|col_a | col_b | current_flag | +---+-----------+-------+--------------+ |001| Value1 | T123 | Y | |002| Value1 | T123 | N | -- updated |003| Value1 | T123 | N | -- updated |004| Value1 | T123 | N | -- updated |005| Value2 | T123 | Y | |006| oth_val4 | T789 | N | |007| Value2 | T789 | Y | |008| Value1 | T789 | N | +---+-----------+-------+--------------+ UPDATE table_abc SET col_a = 'Value2' WHERE col_b IN ( SELECT col_b FROM table_abc WHERE col_a = 'Value2' and current_flag = 'Y' ) AND current_flag = 'N' COMMIT +---+-----------+-------+--------------+ |key|col_a | col_b | current_flag | +---+-----------+-------+--------------+ |001| Value1 | T123 | Y | |002| Value1 | T123 | N | |003| Value1 | T123 | N | |004| Value1 | T123 | N | |005| Value2 | T123 | Y | |006| Value2 | T789 | N | -- updated |007| Value2 | T789 | Y | |008| Value2 | T789 | N | -- updated +---+-----------+-------+--------------+ --------------------------------------------------------- #pyspark code to reproduce the updates #initial dataframe is "table_a" tval1 = [login to view URL]( col("col_a") == lit("Value1") & col("current_flag") == lit("Y") ) t= [login to view URL]("t1").join( [login to view URL]("tval1"), col("t1.col_b") == col("tval1.col_b"), "left-outer" ).select( col("[login to view URL]"), when( col("tval1.col_b").isNotNull(), lit("Value1") ).otherwise(col("t1.col_a")).alias("col_a"), col("t1.col_b"), col("t1.current_flag") ) #use data frame t from above tval2 = [login to view URL]( col("col_a") == lit("Value2") & col("current_flag") == lit("Y") ) t_new = [login to view URL]("t1").join( [login to view URL]("tval2"), col("t1.col_b") == col("tval2.col_b"), "left-outer" ).select( col("[login to view URL]"), when( col("tval2.col_b").isNotNull(), lit("Value2") ).otherwise(col("t1.col_a")).alias("col_a"), col("t1.col_b"), col("t1.current_flag") ) but what really happens in Pyspark is this: t_new: +---+-----------+-------+--------------+ |key|col_a | col_b | current_flag | +---+-----------+-------+--------------+ |001| Value1 | T123 | Y | |002| Value2 | T123 | N | |003| Value2 | T123 | N | |004| Value2 | T123 | N | |005| Value2 | T123 | Y | |006| Value2 | T789 | N | |007| Value2 | T789 | Y | |008| Value2 | T789 | N | +---+-----------+-------+--------------+
Project ID: 25337503

About the project

23 proposals
Remote project
Active 4 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
23 freelancers are bidding on average $82 USD for this job
User Avatar
Hi, I have more than a year of experience of working with pyspark ETL jobs. I have written big data ETL jobs with complex operations as well. Ping me to discuss about it.
$50 USD in 1 day
5.0 (30 reviews)
5.1
5.1
User Avatar
hello, i just need 2 to 3 hours max to get this job done, waiting for your reply as i am ready to start work from now
$55 USD in 1 day
4.8 (17 reviews)
5.0
5.0
User Avatar
Hi, I have 8 years of experience and working on hadoop, spark, nosql, java, BI tools(tableau, powerbi), cloud(Amazon, Google, Microsoft Azure)... Done end to end data warehouse management projects on aws cloud with hadoop, hive, spark and presodb. Worked on multiple etl project like springboot, angular, node, PHP, Kafka, nifi, flume, mapreduce, spark with XML/JSON., Cassandra, mongodb, hbase, redis, oracle, sap hana, ASE.... Many more. Let's discuss the required things in detail. I am committed to work done and strong in issue resolving as well. Thanks
$56 USD in 1 day
5.0 (6 reviews)
4.2
4.2
User Avatar
Hi, Project - I have used Pyspark for data cleaning and updates in the previous projects. I would need some sampel data to help you the issue. I am a Data Scientist with 9+ years of experience with expertise in Machine learning using tools like R, Python, SQL and Excel. I am new to freelancing and I would want to make sure my clients get the best work from me and they choose me again in the future. I keep up deadlines and make sure they are well tracked and communicated. Let me know if you have time to discuss the project so you know I am the PERSON for the job. Thanks, Md Irfaan Meah
$50 USD in 1 day
4.9 (3 reviews)
3.4
3.4
User Avatar
Hi, I am a certified bigdata developer and used pyspark extensively. Please let’s connect and discuss more on your requirements.
$111 USD in 5 days
5.0 (4 reviews)
3.2
3.2
User Avatar
hello there you? i am python expert. i am live in python and dijango frameworks because it's my major skill. i can complete your project in a short time. Happy day :)
$100 USD in 1 day
5.0 (5 reviews)
3.0
3.0
User Avatar
Hey, Let me know if you agree with the price and I can resolve it ASAP. I have a lot of experience with Spark :) I will provide unit-tests on top of the code for free.
$170 USD in 1 day
5.0 (1 review)
2.8
2.8
User Avatar
Hi there , I have about 16 years of experience in java , python and big data and associated frameworks like spring , hadoop, mapreduce , Spark etc . I have reviewed your problem and it looks Like a quick fix. Please feel free to review the feedback I have reviewed on other projects on freelancer . Kindly do consider my proposal. Regards, Rabiya
$56 USD in 1 day
5.0 (5 reviews)
3.0
3.0
User Avatar
hello, It's late to bid on that project. but if still it's open then I am interested. let me know if you consider my proposal. thanks.
$356 USD in 2 days
4.1 (5 reviews)
1.8
1.8
User Avatar
Hi, I am working in MNC as Data Engineer and currently working on Big Data Fields using PySpark and Hadoop Frameworks. Having more than 4 years of experience in Big Data Field in production, have worked for freelance work as a Pyspark and hadoop Developer. Requesting you to please share the details so we can start . I am a certified Pysaprk developer. Thanks Rahul.
$40 USD in 1 day
5.0 (2 reviews)
1.2
1.2
User Avatar
Hi Row 2, 3 and 4 are wrongly updated using Pyspark code. where is your solution hosted on the cloud? I can help you to fix this issue and will require access to the cloud. Looking forward to your reply.
$50 USD in 2 days
5.0 (3 reviews)
1.1
1.1
User Avatar
Hello, I'm a python expert with experience spanning 6+ years. I'd kindly like to know the details of the project. Thank you for cooperation.
$299 USD in 1 day
0.0 (0 reviews)
0.0
0.0
User Avatar
Hi, I've been working as a data engineer for almost two years. I am currently working in the Scala and Spark programming languages but I can work in pySpark as well it is pretty similar. I've seen your issue and understood it, and there are a couple of ways for solving this. P.S I've already found one way to solve the first issue. The second issue is pretty much the same, just with other parameters. Kind regards, Danilo
$50 USD in 1 day
0.0 (0 reviews)
0.0
0.0
User Avatar
Hi i am having an experience of more than 4 years in Pyspark ETL , which makes me to complete the work more efficiently.
$30 USD in 7 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Hi, I am experienced in Python and Sql. Do let me know if you still need help for this task. I could do this within 1 hour. Thanks.
$50 USD in 1 day
0.0 (0 reviews)
0.0
0.0
User Avatar
I am an expert in pyspark .working on big data making etl jobs with pyspark.I can do this task easily !
$35 USD in 1 day
0.0 (0 reviews)
0.0
0.0
User Avatar
i am good with the following: Pyspark and spark streaming .worked on large datasets and larger tables
$30 USD in 7 days
0.0 (0 reviews)
0.0
0.0
User Avatar
I am a software engineer working in Big Data technologies like pyspark for the last 1 year and hence I can achieve the results pretty well by using sql equivalents there like the used queries as it is. Connect to discuss further.
$40 USD in 1 day
0.0 (0 reviews)
0.0
0.0
User Avatar
Hi, I've 12 years experience in Spark with python and scala. I've done similar work in past and I am confident to complete this work in given time. It is just one hour job for me. Please hire me, You will not be disappointed and will re-hire me for sure.
$40 USD in 1 day
0.0 (0 reviews)
0.0
0.0
User Avatar
Hi I am Databricks and Azure certified professional Data Engineer with expertise on - Big data architecture Azure cloud Architecture Spark/Scala/ETL Hadoop MySQL,MongoDB Completed around 4 projects in end to end development and data pipeline implementation
$50 USD in 1 day
0.0 (0 reviews)
0.0
0.0

About the client

Flag of UNITED STATES
Bear, United States
5.0
28
Payment method verified
Member since Sep 15, 2005

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.