sh [数据迁移和清理工具]用于在迁移存储系统或从l切换到更严格的协议后处理问题的脚本

Posted

tags:

篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了sh [数据迁移和清理工具]用于在迁移存储系统或从l切换到更严格的协议后处理问题的脚本相关的知识,希望对你有一定的参考价值。

#!/bin/sh
# MIT License
#
# Copyright (c) 2016 RackTop Systems / Sam Zaydel.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
#
###############################################################################
# Script: invalidrename.sh
###############################################################################
# Description:
# The purpose of this tool is to identify and rename any directories / files,
# which use characters that are known to not be compatible with SMB protocol
# specification from Microsoft.
# The tool has two modes of operation, one being a dry run, when `-n` argument
# is the first argument after the name of the program, or normal mode, where
# the first argument is valid path to directory.
#
# In dry-run, i.e. `-n` mode, nothing is actually done, other than resulting
# changes printed to screen. The point is to learn what changes will be made
# when the command is executed without the dry-run flag.
#
# In normal mode, the only argument is path to a directory. The scan is done
# with find, which is recursive, and it is possible that, if a path has
# characters which need to be changed at multiple levels, these `mv` operations
# will not succeed. It is best to try running this script close to where
# problems are suspected, instead of running it against deep hierarchies.

PROGNAME=`basename $0`
SUB_CHAR="_"
SED_CHARSET="s/[^A-Za-z0-9._-/]/${SUB_CHAR}/g"
PREFIX=
DEBUG=0

[ ${DEBUG} -gt 0 ] && set -o xtrace

function die {
  printf "Error: %s\n" \
  "Missing path or not a directory."
  printf "Usage: ${PROGNAME} [-n -d{0-9}] /path/to/directory\n" >&2
  exit 1
}

function err_exit {
    printf "Command partially or completely failed." >&2
    exit 1
}

function dry_run {
  printf "Dry Run Mode: No changes will be made!\n" >&2
}

function process_args {
  [ ${DEBUG} -gt 0 ] && set -o xtrace
  case $1 in
    "-d"[0-9]) recurse=${1//-d} ; shift 1 ; process_args "$@"
    ;;
    # -n is a dry-run, i.e. tell me what you are going to do, but don't do it.
    "-n") PREFIX="echo" ; dry_run ; shift 1 ; process_args "$@"
    ;;
    "") die # If there are no more arguments, but we got here, die.
    ;;
    # If path is anything other than a directory, it is an error, die.
    *)  if [ ! -d "$@" ] ; then die ; else directory="$@" ; fi
    ;;
  esac
}

function main {
  [ ${DEBUG} -gt 0 ] && set -o xtrace
  while IFS= read -r p ; do
    filtered_p=`echo "${p}" | sed -e "${SED_CHARSET}"`
    # If after being passed through filter the path does not match
    # original, we have a path with characters which we consider illegal.
    # In which case we want to rename to filtered name, which replaces
    # any invalid/illegal characters with $SUB_CHAR.
    if [ "${filtered_p}" != "${p}" ]; then
      ${PREFIX} mv "${p}" "${filtered_p}"
    fi
  done <<< `find "${directory}" -maxdepth ${recurse:-10} -print`
}

# Process Command-line Arguments before running body.
process_args "$@"
main || err_exit
#!/usr/bin/env python
"""
MIT License

Copyright (c) 2017 RackTop Systems / Sam Zaydel.

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Script renames in bulk files and directories that do contain characters
which we believe cause issues with SMB.

With deeply nested directories this needs to be run multiple times.
Because as soon as we make changes to subdirectory, we can no-longer
traverse it, thereby requiring another walk to descend down into that
directory.

To run the script, simply issue the command with up to two arguments.
If only a preview is desired, run command with `-n` argument, which
avoids making changes, instead changes are only displayed. Second arg
is path to directory being scanned. If no path is supplied, script
assumes here ".", i.e. current working directory is used instead.
"""
import os
import sys
import re
import shutil
import time

# This is not an exhaustive list of characters, and we may have to add more
# as we identify other issue causing chars.
PAT = r'^[\s]+|[\|\%\#\?\*\~\$\,\:\;\\/\=\\\[\]\'\"]|[\s]+$'


def process_cmdl_args():
    """
    process_cmdl_args:
    Skip first argument, which is the name of the program and
    process all subsequent strings. We expect to get out of this the path
    as well as whether or not dry run is used.
    """
    s = {
        "path": ".",
        "nochange": False,
    }
    for arg in sys.argv[1:]:
        if arg.startswith("-"):
            if arg[1] == "n":
                print "Dry Run Mode - No changes are made!"
                s["nochange"] = True
            else:
                print "Encountered unsupported argument %s" % arg
        if arg.startswith(".") or arg.startswith("/"):
            s["path"] = arg
    return s


def rename_filedir(pairs, dryrun):
    """
    rename_filedir:
    Takes list of tuples and assumes first element is current path, while
    second element is desired path, and attempts to rename accordingly.
    """
    for pair in pairs:
        current, new = pair
        # Check if the rename-to would clobber an existing path.
        # If so, just break current iteration and move onto next pair.
        if os.path.exists(new):
            print \
            "Skip [non-unique]: %s" % current
            continue
        else:
            print "Move: %s -> %s" % (pair)
            if not dryrun:
                shutil.move(current, new)

if len(sys.argv) > 3:
    sys.stderr.write("Too many positional arguments.\n")
    exit(1)

# filepairs and dirpairs end-up storing pairs of current name and new name
# combinations for each file or directory that we need to rename.
filepairs   = []
dirpairs    = []

SETTINGS = process_cmdl_args() # Process command line arguments.

# Recursively walk all directories and subdirectories and rename files and
# subdirectories inside.
# Instead of renaming in place we build two lists, one of all files and one
# of all directories that we need to rename, since they matched our rule.
for directory, subdirs, files in os.walk(SETTINGS["path"], topdown=True):
    t = time.localtime()
    if directory == ".":
        print "[%s] Base Directory" % time.asctime(t)
    else:
        print "[%s] %s" % (time.asctime(t), directory)
    # Process all files in *all* sub-directories in this tree.
    for fn in files:
        # Skip files that start with `~$`, because these are likely lock
        # files created by Office applications, and will exist anywhere
        # that users save and edit office documents.
        if fn.startswith(r'~$'):
            continue
        if re.search(PAT, fn):
            newfn = re.sub(PAT, "", fn)
            oldpath = os.path.join(directory, fn)
            newpath = os.path.join(directory, newfn)
            filepairs.append((oldpath, newpath))

    # Repeat above process for subdirectories as well.
    for dn in subdirs:
        # If we are dealing with a trailing period, only act if this is a dir.
        if dn != "." and dn.endswith("."):
            newdn = dn.rstrip(".")
            oldpath = os.path.join(directory, dn)
            newpath = os.path.join(directory, newdn)
            filepairs.append((oldpath, newpath))
        elif re.search(PAT, dn):
            newdn = re.sub(PAT, "", dn)
            oldpath = os.path.join(directory, dn)
            newpath = os.path.join(directory, newdn)
            dirpairs.append((oldpath, newpath))

# We process all files first, because files are leaf objects and renaming
# them does not affect the rest of the path.
# Next, we reverse the list of directories and work backwards from leaf
# nodes to the top of the tree.
rename_filedir(filepairs, SETTINGS["nochange"])
rename_filedir(reversed(dirpairs), SETTINGS["nochange"])
#!/bin/sh
# MIT License
#
# Copyright (c) 2016 RackTop Systems / Sam Zaydel.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
#
###############################################################################
# Script: droptrailwhite.sh
###############################################################################
# Script is used to sanitize directory names by removing trailing white space.
# We are assuming that script is run against a directory which has one
# or more children, which themselves may have children. The rename operation
# does not rename all items in the pathname, instead only the last one will be
# renamed. But, if multiple items in the path contain trailing space we have
# to essentially rename those individual directories in the pathname.
# What we have to do is run this script multiple times, potentially over the
# same parent and each time we will likely have some failures on rename,
# which will be caught and fixed on next pass.
# This is less than ideal to be sure, and if need be will be improved upon.
#
# Running this script is easy. The only required argument is path to dir.
# which we are scanning. Avoid scanning really deep directories, instead
# consider switching into the children, or giving a full path of the child.
# # /path/to/droptrailwhite.sh /path/to/directory
#
PROGNAME=`basename $0`
DEBUG=0
PREFIX=

[ ${DEBUG} -gt 0 ] && set -o xtrace

function die {
  printf "Error: %s\n" \
  "Missing path or not a directory."
  printf "Usage: ${PROGNAME} [-n] /path/to/directory\n" >&2
  exit 1
}

function err_exit {
    printf "Command partially or completely failed.\n" >&2
    exit 1
}

function dry_run {
  printf "Dry Run Mode: No changes will be made!\n" >&2
}

function process_args {
  [ ${DEBUG} -gt 0 ] && set -o xtrace
  case $1 in
    # -n is a dry-run, i.e. tell me what you are going to do, but don't do it.
    "-n") PREFIX="echo" ; dry_run ; shift 1 ; process_args "$@"
    ;;
    "") die # If there are no more arguments, but we got here, die.
    ;;
    # If path is anything other than a directory, it is an error, die.
    *)  if [ ! -d "$@" ] ; then die ; else directory="$@" ; fi
    ;;
  esac
}

# Scan through directories under `$directory` and identify any whose
# names have one or more trailing spaces. Trim trailing spaces on
# all such directories. On macs it was observed that trailing spaces are
# actually followed by another character, which does not belong to the
# same class as whitespace(s), hence the use of *$ to signal something
# else after the [[:blank:]].

function main {
  [ ${DEBUG} -gt 0 ] && set -o xtrace
  rc=0
  find "${directory}" -print | while IFS= read name; do
    new_name=`echo "${name}" \
    | sed \
        #   -e 's/[^A-Za-z0-9._-/[:blank:]]//g' \
          -e 's/[[:space:]]*$//g'`

  [[ "${name}" != "${new_name}" ]] && {
    ${PREFIX} mv "${name}" "${new_name}" || rc=1
  }
  done
  return ${rc}
}

# Process Command-line Arguments before running body.
process_args "$@"
main || err_exit
#!/bin/sh
# MIT License
#
# Copyright (c) 2017 RackTop Systems / Sam Zaydel.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
#
# set -o xtrace
###############################################################################
# Script: cleanuphomedir.sh
###############################################################################
# Description: The purpose of the script is to clean-up homedirs, giving them
# proper ACLs and downcasing them as necessary.
#

# String Constants
OWNER_ACL=owner@:rwxpd-aARWc--s:fd-----:allow
SUPER_ACL=group:2147541000:rwxpdDaARWcCos:fd-----:allow
PREFIX=echo
#
# downcase: sends a string to stdout with all uppercase letters down-cased.
#
function downcase {
    echo "$@" | /usr/bin/awk '{print tolower($1)}'
}

[ -z $1 ] && { 
    printf "Error: %s\n" "Please enter working directory." >&2 ;
    exit 1
}

[ ! -d $1 ] && {
    printf "Error: %s\n" "Path is not valid, please make sure path exists." >&2 ;
    exit 1
}

# We need to operate inside of the requested working directory.
cd $1
WD=`pwd`
# Process each home directory, checking for proper owner and assigning root
# as group owner.
for d in `ls` ; do
    IFS=: passwd_record=(`getent passwd ${d}`) || \
    { 
        printf "%s Failed to lookup owner of directory.\n" ${d} >&2;
        continue
    }
    printf "Name: %s Username: %s Uid: %s Dir: %s\n" \
        "${passwd_record[4]}" \
        "${passwd_record[0]}" \
        "${passwd_record[2]}" \
        "${WD}/${d}"
    username=`downcase ${passwd_record[0]}`
    ${PREFIX} chown -R ${username}:root ${d} || \
    {
        printf "%s Failed to change owner:group of directory.\n" ${d} >&2;
    }
    user_acl="user:${username}:rwxpd-aARWc--s:fd-----:allow"
    ${PREFIX} chmod -R \
        "A=${OWNER_ACL},${SUPER_ACL},${user_acl}" ${d} || \
        printf "%s Failed to \'chmod\' directory.\n" ${d} >&2;
    # If there is a mismatch in mixed-case dirname lower-cased username,
    # we want to rename directory to match username.
    if [[ ${username} != ${d} ]]; then
        [ -d ${username} ] && {
            printf \
                "%s Unable to downcase, without clobbering.\n" ${username} >&2
            continue
        }
        ${PREFIX} mv ${d} ${username} || \
            printf "%s Failed to rename home directory.\n" ${username} >&2
    fi
done
#!/bin/sh
# Run this script with the following `find` command to generate
# a list of all files which are considered incompatible with SMB
# on Windows clients.

# find /path/to/directory -depth -exec /path/to/this/script.sh '{}' \;
# Replace `/path/to/directory` with actual directory name.
# Consider placing this in /bp/bin, naming it checksmbcompat.sh .
# Replace `/path/to/this/script.sh` with real path to this script.
#
filename=`basename "$1"`
testfilename=`echo "${filename}" | sed -e 's/[^A-Za-z0-9._-]/_/g'`
# If the two variables match exactly, we have a compatible filename.
if [ "${testfilename}" == "${filename}" ]; then
  exit 0
else
  printf "%s\n" "$1"
fi
#!/bin/sh
# MIT License
#
# Copyright (c) 2017 RackTop Systems / Sam Zaydel.
#
# Permission is hereby granted, free of charge, to any person obtaining a copy
# of this software and associated documentation files (the "Software"), to deal
# in the Software without restriction, including without limitation the rights
# to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
# copies of the Software, and to permit persons to whom the Software is
# furnished to do so, subject to the following conditions:
#
# The above copyright notice and this permission notice shall be included in all
# copies or substantial portions of the Software.
#
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
# IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
# FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
# AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
# LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
# OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
# SOFTWARE.
#
# set -o xtrace
###############################################################################
# Script: acl_from_parent.sh
###############################################################################
# Description: The purpose of the script is to modify ACL and ownership on a
# directory which may have ACL no longer consistent with its parent, resulting
# in access issues experienced by clients.
#
# If the directory path contains spaces, either use `\` slashes to escape
# correctly, or enclose in double-quotes.
#
PROGNAME=$0
typeset -a aces
parent=
ug=

function usage {
    printf "Usage: ${PROGNAME} </path/to/target/directory>\n"
}

function print_err {
    printf "ERROR: %s\n" "$@" >&2
}

#
# get_aces: Obtain ACL from parent directory.
#
function get_aces {
    dir="$1"
    aces=(`ls -luVd "${dir}" | nawk '/(^\ .*:allow)/ {print $0}'`)
    return $?
}

#
# apply_acl: Apply ACL obtained from parent over target to match parent.
#
function apply_acl {
    if [ ${#aces[@]} -eq 0 ]; then
        return 1 # There are no ACEs in this list.
    else
        chmod_args=`tr ' ' ',' <<< ${aces[@]}`
        chmod -R "A=${chmod_args}" $1 ; return $?
    fi
}

#
# get_owner: Obtain information about user and group of parent dir.
#
function get_owner {
    set -o xtrace
    ug=(`stat -c '%u %g' "$1"`)
}

#
# set_owner: Apply owner and group information over target to match parent.
#
function set_owner {
    set -o xtrace
    chown -R "${ug[0]}:${ug[1]}" "$1"
}

if [ -z "$1" ] ; then 
    print_err \
        "Path to directory is a required positional argument."
    usage ; exit 1
else
    parent=`dirname "$1"`
    target="$1"
    if [ ! -d "${parent}" ] || [ "${parent}" == "." ] ; then
        print_err \
            "Supplied argument is not a valid path to directory."
            usage ; exit 1
    fi
fi

# Collect and update ownership on the target directory.
get_owner "${parent}" && set_owner "${target}"

[ $? -ne 0 ] && {
    print_err \
        "Failed to obtain or set ownership information." ; exit 1 ;
}

# Collect ACL from parent of this directory. We are assuming
# parent is in good shape.
get_aces "${parent}" || {
    print_err \
        "Failed to read ACL from parent directory." ; exit 1 ;
}

# Apply ACL over the target directory, and all its children, recursively.
apply_acl "${target}" || {
    print_err \
        "Failed to apply ACL from parent to target directory." ; exit 1 ;
}

以上是关于sh [数据迁移和清理工具]用于在迁移存储系统或从l切换到更严格的协议后处理问题的脚本的主要内容,如果未能解决你的问题,请参考以下文章

数据库迁移

使用Minio Clinet将老版本Minio的数据迁移到新版本的Minio

Gbase 8a数据迁移工具及其使用方法?

Sqoop数据迁移工具

如何将正在运行的 Oracle 数据库迁移到另一个系统

sqoop 数据迁移