How To : Remove Dir marker folders in S3 NativeFS


S3 is an object store so everything is an object in there. When Hadoop uses it as a filesystem, it requires to organize those objects so that it appears as a file system tree, so it creates some special objects to mark an object as a directory.

S3://abc/def is a dir if it has another object at the same level 


However, sometimes when these objects are created in a wrong location like S3://abc/def/_$folder$ where now _folder_ object is after a "/" , now it confuses S3FS code and leads to unexpected issues in Hive where Hive thinks there is a folder inside S3://abc/def/ where it only expected files because it was your table location ( an example ).

As _$folder$ is a special object , we cannot delete it directly by hadoop dfs or s3cmd and should be deleted in the following way.

How To:

To move/rename this object use the command below:  

s3cmd -c /usr/lib/hustler/s3cfg mv s3://<bucket>/ETL/hive_runtime/_$folder* s3://bex-analytics-prod/ETL/hive_runtime/badfile

Then remove it. Direct rm does not working on the same object even though it lists as the same name in 'ls'

s3cmd -c /usr/lib/hustler/s3cfg rm s3://<bucket>/ETL/hive_runtime/badfile

Have more questions? Submit a request


Powered by Zendesk