You can edit almost every page by Creating an account. Otherwise, see the FAQ.

HDFSFileTransfer

From EverybodyWiki Bios & Wiki


HDFSFileTransfer
Developer(s)Daniel Kozlowski
Stable release
1.0 / August 31, 2014 (2014-08-31)
Written inBash
Engine
    Operating systemLinux
    LicenseMIT License, Boost Software License
    Websitehttp://sourceforge.net/projects/hdfsfiletransfer/

    Search HDFSFileTransfer on Amazon.

    Some use of "" in your query was not closed by a matching "".Some use of "" in your query was not closed by a matching "". HDFSFileTransfer is a tool written in 'bash' script for a quick transfer any files into HDFS. It allows users to copy files within the same physical machine as well as between two machines. It means to copy files from linux file system with HDFS cluster to another HDFS cluster. E.g. one can have two single Hadoop clusters installed on two different linux machines. The script can copy files from one linux machine with Hadoop installed to the other one. HDFSFileTransfer is dual-licensed under the MIT license and the Boost Software License, and its source code is freely available.

    Features[edit]

    * Local Copy

    The script runs on the same machine HDFS was installed. What the script does is to:

    - Copy files from local file system

    - Paste the files into HDFS

    - Archive the copied files on the local file system under the archive folder

    * Copy From One Machine To Another With Different HDFS Cluster

    The script allows you to copy files from linux file system with HDFS cluster installed to another HDFS cluster. It is, however, not to copy from one HDFS cluster to another one. E.g. you can have two single clustered Hadoop installed on two different linux machines – in my particular situation I have done so using VM. The script can copy files from one linux machine with Hadoop installed on the other one.

    * Email errors encountered during the process

    The process consists of few validations. Should the validation fails it gets the error status. The error process:

    - terminates the transfer process

    - sends an email containing the error message to a user/group of people

    Diagram Process[edit]

    The process is broken down into two parts:

    - initial validations - there are three checks being done at the very beginning - each time the script starts up

    - a 'while' loop - all the copy and archive bits - although it is done in loop (so can be run as a deamon), there is also an option to run the process each time you wish

    Validations[edit]

    The list of all validations carried out within the copying process:

    • Local Folders – checks if all local folders, the files are copied from (SOURCE site), exist. If any of the folder does not exist, the error gets written into a log file, emailed sent to the user and process gets terminated.
    • Hadoop - DataNode - checks if DataNode process is up and running. It uses JPS command to validate if hadoop process is up and running. If the process is down, the error gets written into a log file, emailed sent to the user and process gets terminated
    • HDFS Folders – checks if all HDFS folders, the files are copied into (DESTINATION site), exist. If any of the folder does not exist, the error gets written into a log file, emailed sent to the user and process gets terminated.
    • Number Of Copied Files - the validation is carried out each and every time the files get copied from SOURCE to DESTINATION. The process counts the number of files picked up from SOURCE folder and the one pasted into DESTINATION one and compare these two. Should the numbers do not match, the error gets written into a log file, emailed sent to the user and process gets terminated.

    Limitations[edit]

    * Copy files with blank space(s) - Hadoop 2.4.0

    Problem: There is a problem when trying to copy files for which their name contains spaces.

    When doing the following: hdfs dfs –put /home/testUser/New filename.txt /user/hduser/temp

    This returns the following error: put: unexpected URISyntaxException

    Solution: See the documentation of this project available on sourceforge website.

    External links[edit]


    This article "HDFSFileTransfer" is from Wikipedia. The list of its authors can be seen in its historical. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.