Wednesday, October 28, 2020

PySpark - Map values

 Sometimes you want to define the hierarchy of the data source or just want to define the importance of something data. You can use create_map() function

  src_type = {"first_important": 1, "second_important": 2, "third_important": 3}

  src_type = F.create_map([F.lit(x) for x in chain(*src_type.items())])

Then you can get result like this:

+-------------------------------------------------------------------------+
|Map(first_important-> 1, second_important-> 2, third_important-> 3)      |
+----+--------------------------------------------------------------------+
When you use it, you can do something like
df=df.withColumn("is_type",F.when((src_type[F.col("tgt_type")] \
      >src_type[F.col("src_type")]), 'True').otherwise("False"))

1 comment: