从如何在Java删除乱码文件中开始说起

Posted 2022-12-05 raintungli

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了从如何在Java删除乱码文件中开始说起相关的知识，希望对你有一定的参考价值。

1. 乱码文件

为何会生成乱码文件？原因很多，在上传的过程中编码不对，操作提供不支持该编码等，如果你尝试用Java的File对象去删除文件，甚至判断是否存在该文件都会发现返回的都是False

String[]entries = file.list();
		for(String s: entries)
		    File currentFile = new File(file.getPath(),s);
		    currentFile.delete();

因为File对象是通过file.getPath()来生成的，而file.getPath()是String, 在Java中是char是需要正确的字符编码方式，我们先来看File删除文件的方式

JNIEXPORT jboolean JNICALL
Java_java_io_UnixFileSystem_delete0(JNIEnv *env, jobject this,
                                    jobject file)

    jboolean rv = JNI_FALSE;

    WITH_FIELD_PLATFORM_STRING(env, file, ids.path, path) 
        if (remove(path) == 0) 
            rv = JNI_TRUE;
        
     END_PLATFORM_STRING(env, path);
    return rv;


#define WITH_PLATFORM_STRING(env, strexp, var)                                \\
    if (1)                                                                   \\
        const char *var;                                                      \\
        jstring _##var##str = (strexp);                                       \\
        if (_##var##str == NULL)                                             \\
            JNU_ThrowNullPointerException((env), NULL);                       \\
            goto _##var##end;                                                 \\
                                                                             \\
        var = JNU_GetStringPlatformChars((env), _##var##str, NULL);           \\
        if (var == NULL) goto _##var##end;

#define WITH_FIELD_PLATFORM_STRING(env, object, id, var)                      \\
    WITH_PLATFORM_STRING(env,                                                 \\
                         ((object == NULL)                                    \\
                          ? NULL                                              \\
                          : (*(env))->GetObjectField((env), (object), (id))), \\
                         var)

调用了c的remove函数，其中path就是获取File.java对象里的String path对象，然后通过JNU_GetStringPlatformChars将String转化为C里的char

本质上最后调用了String.java中的getBytes()/getBytes(String charsetName)，将String的char数组转化为byte数组
复制了Byte数组的值到c里的char数组

当charset不准确/不支持的情况下，java将byte转成char的时候错误，或者在获取char的时候转成byte的时候错误，导致了文件的无法正确删除

2. 如何才能删去文件？

在前面的分析后，主要的问题就是出在了C的char数组，转成Java的char数组编码导致的问题，那么如果要准确的删除文件很容易获取到的思路就是不转码，直接使用C里的char数组，当然Java里对应的是byte数组

通过查找，在NIO的接口里提供了新的Path对象，同时Files.delete方法里使用了Path对象，不在是File了

在Path对象里，我们看到了路径的保存方式

class UnixPath
    extends AbstractPath

    private static ThreadLocal<SoftReference<CharsetEncoder>> encoder =
        new ThreadLocal<SoftReference<CharsetEncoder>>();

    // FIXME - eliminate this reference to reduce space
    private final UnixFileSystem fs;

    // internal representation
    private final byte[] path;

不在是File里的String path了，我们在来看Files.delete方法，最后调用了UnixFileSystemProvider.java的implDelete方法


    @Override
    boolean implDelete(Path obj, boolean failIfNotExists) throws IOException 
        UnixPath file = UnixPath.toUnixPath(obj);
        file.checkDelete();

        // need file attributes to know if file is directory
        UnixFileAttributes attrs = null;
        try 
            attrs = UnixFileAttributes.get(file, false);
            if (attrs.isDirectory()) 
                rmdir(file);
             else 
                unlink(file);
            
            return true;
         catch (UnixException x) 
            // no-op if file does not exist
            if (!failIfNotExists && x.errno() == ENOENT)
                return false;

            // DirectoryNotEmptyException if not empty
            if (attrs != null && attrs.isDirectory() &&
                (x.errno() == EEXIST || x.errno() == ENOTEMPTY))
                throw new DirectoryNotEmptyException(file.getPathForExceptionMessage());

            x.rethrowAsIOException(file);
            return false;
        
    

static void unlink(UnixPath path) throws UnixException 
        NativeBuffer buffer = copyToNativeBuffer(path);
        try 
            unlink0(buffer.address());
         finally 
            buffer.release();

将UnixPath里的path byte数组复制到堆外内存，并将内存的地址传给了JNI的unlink0方法

JNIEXPORT void JNICALL
Java_sun_nio_fs_UnixNativeDispatcher_unlink0(JNIEnv* env, jclass this,
    jlong pathAddress)

    const char* path = (const char*)jlong_to_ptr(pathAddress);

    /* EINTR not listed as a possible error */
    if (unlink(path) == -1) 
        throwUnixException(env, errno);

在整个过程并没有进行解码的动作，操作的都是byte数组

3. File.getPath依然删不了

既然Path里操作的是byte数组，那么删除应该是没问题了，如果直接用File.getPath来获取Path?NO

我们来看File.getPath的方法

 public Path toPath() 
        Path result = filePath;
        if (result == null) 
            synchronized (this) 
                result = filePath;
                if (result == null) 
                    result = FileSystems.getDefault().getPath(path);
                    filePath = result;
                
            
        
        return result;
    

@Override
    public final Path getPath(String first, String... more) 
        String path;
        if (more.length == 0) 
            path = first;
         else 
            StringBuilder sb = new StringBuilder();
            sb.append(first);
            for (String segment: more) 
                if (segment.length() > 0) 
                    if (sb.length() > 0)
                        sb.append('/');
                    sb.append(segment);
                
            
            path = sb.toString();
        
        return new UnixPath(this, path);

生成的Path里是通过传递的是String path,构建的，在UnixPath实现类里


    UnixPath(UnixFileSystem fs, String input) 
        // removes redundant slashes and checks for invalid characters
        this(fs, encode(fs, normalizeAndCheck(input)));

调用了encode进行了String转byte，这必然会出问题

4. NIO的方式删除

Files里提供了walktree可以遍历访问指定目录下的指定深度的文件和目录

Files.walkFileTree(path,new FileVisitor<Path>() 
            @Override
            public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs)
                    throws IOException 
                System.out.println("preVisitDirectory: " + dir);
                return FileVisitResult.CONTINUE;
            

            @Override
            public FileVisitResult visitFile(Path file, BasicFileAttributes attrs)
                    throws IOException 
                System.out.println("visitFile: " + file);
                Files.delete(file);
                return FileVisitResult.CONTINUE;
            

            @Override
            public FileVisitResult visitFileFailed(Path file, IOException exc)
                    throws IOException 
                System.out.println("visitFileFailed: " + file);
                
                return FileVisitResult.CONTINUE;
            

            @Override
            public FileVisitResult postVisitDirectory(Path dir, IOException exc)
                    throws IOException 
                System.out.println("postVisitDirectory: " + dir);
                Files.delete(dir);
                return FileVisitResult.CONTINUE;
            
        );

在WalkTree的场景下，会保存目录到Stack结构中，最后在postVisitDirectory的时候从Stack中取出目录，这是一个非常有用的结构，当你要循环的删除一个目录，如果你在初期就删除目录，当该目录下有文件，你是无法删除成功的，所以需要Stack结构，当第一次访问的时候Push Stack，当该目录下的每个文件都访问过后（删除文件），才最后在Pop Stack目录，访问该目录的时候删除目录，这样避免了删除目录要先删除目录下的所有文件，在删除所有的目录的两次轮训

  * Start walking from the given file.
     */
    Event walk(Path file) 
        if (closed)
            throw new IllegalStateException("Closed");

        Event ev = visit(file,
                         false,   // ignoreSecurityException
                         false);  // canUseCached
        assert ev != null;
        return ev;

在遍历的目录时候其中就是

    Event walk(Path file) 
        if (closed)
            throw new IllegalStateException("Closed");

        Event ev = visit(file,
                         false,   // ignoreSecurityException
                         false);  // canUseCached
        assert ev != null;
        return ev;
    


 Event next() 
        DirectoryNode top = stack.peek();
        if (top == null)
            return null;      // stack is empty, we are done

        // continue iteration of the directory at the top of the stack
        Event ev;
        do 
            Path entry = null;
            IOException ioe = null;

            // get next entry in the directory
            if (!top.skipped()) 
                Iterator<Path> iterator = top.iterator();
                try 
                    if (iterator.hasNext()) 
                        entry = iterator.next();
                    
                 catch (DirectoryIteratorException x) 
                    ioe = x.getCause();
                
            

            // no next entry so close and pop directory, creating corresponding event
            if (entry == null) 
                try 
                    top.stream().close();
                 catch (IOException e) 
                    if (ioe != null) 
                        ioe = e;
                     else 
                        ioe.addSuppressed(e);
                    
                
                stack.pop();
                return new Event(EventType.END_DIRECTORY, top.directory(), ioe);
            

            // visit the entry
            ev = visit(entry,
                       true,   // ignoreSecurityException
                       true);  // canUseCached

         while (ev == null);

        return ev;
    


private Event visit(Path entry, boolean ignoreSecurityException, boolean canUseCached) 
        // need the file attributes
        BasicFileAttributes attrs;
        try 
            attrs = getAttributes(entry, canUseCached);
         catch (IOException ioe) 
            return new Event(EventType.ENTRY, entry, ioe);
         catch (SecurityException se) 
            if (ignoreSecurityException)
                return null;
            throw se;
        

        // at maximum depth or file is not a directory
        int depth = stack.size();
        if (depth >= maxDepth || !attrs.isDirectory()) 
            return new Event(EventType.ENTRY, entry, attrs);
        

        // check for cycles when following links
        if (followLinks && wouldLoop(entry, attrs.fileKey())) 
            return new Event(EventType.ENTRY, entry,
                             new FileSystemLoopException(entry.toString()));
        

        // file is a directory, attempt to open it
        DirectoryStream<Path> stream = null;
        try 
            stream = Files.newDirectoryStream(entry);
         catch (IOException ioe) 
            return new Event(EventType.ENTRY, entry, ioe);
         catch (SecurityException se) 
            if (ignoreSecurityException)
                return null;
            throw se;
        

        // push a directory node to the stack and return an event
        stack.push(new DirectoryNode(entry, attrs.fileKey(), stream));
        return new Event(EventType.START_DIRECTORY, entry, attrs);
    


@Override
    public DirectoryStream<Path> newDirectoryStream(Path obj, DirectoryStream.Filter<? super Path> filter)
        throws IOException
    
        UnixPath dir = UnixPath.toUnixPath(obj);
        dir.checkRead();
        if (filter == null)
            throw new NullPointerException();

        // can't return SecureDirectoryStream on kernels that don't support openat
        // or O_NOFOLLOW
        if (!openatSupported() || O_NOFOLLOW == 0) 
            try 
                long ptr = opendir(dir);
                return new UnixDirectoryStream(dir, ptr, filter);
             catch (UnixException x) 
                if (x.errno() == ENOTDIR)
                    throw new NotDirectoryException(dir.getPathForExceptionMessage());
                x.rethrowAsIOException(dir);
            
        

        // open directory and dup file descriptor for use by
        // opendir/readdir/closedir
        int dfd1 = -1;
        int dfd2 = -1;
        long dp = 0L;
        try 
            dfd1 = open(dir, O_RDONLY, 0);
            dfd2 = dup(dfd1);
            dp = fdopendir(dfd1);
         catch (UnixException x) 
            if (dfd1 != -1)
                UnixNativeDispatcher.close(dfd1);
            if (dfd2 != -1)
                UnixNativeDispatcher.close(dfd2);
            if (x.errno() == UnixConstants.ENOTDIR)
                throw new NotDirectoryException(dir.getPathForExceptionMessage());
            x.rethrowAsIOException(dir);
        
        return new UnixSecureDirectoryStream(dir, dp, dfd2, filter);
    


@Override
        public synchronized Path next() 
            Path result;
            if (nextEntry == null && !atEof) 
                result = readNextEntry();
             else 
                result = nextEntry;
                nextEntry = null;
            
            if (result == null)
                throw new NoSuchElementException();
            return result;
        


private Path readNextEntry() 
            assert Thread.holdsLock(this);

            for (;;) 
                byte[] nameAsBytes = null;

                // prevent close while reading
                readLock().lock();
                try 
                    if (isOpen()) 
                        nameAsBytes = readdir(dp);
                    
                 catch (UnixException x) 
                    IOException ioe = x.asIOException(dir);
                    throw new DirectoryIteratorException(ioe);
                 finally 
                    readLock().unlock();
                

                // EOF
                if (nameAsBytes == null) 
                    atEof = true;
                    return null;
                

                // ignore "." and ".."
                if (!isSelfOrParent(nameAsBytes)) 
                    Path entry = dir.resolve(nameAsBytes);

                    // return entry if no filter or filter accepts it
                    try 
                        if (filter == null || filter.accept(entry))
                            return entry;
                     catch (IOException ioe) 
                        throw new DirectoryIteratorException(ioe);

返回的是UnixSecureDirectoryStream，迭代的next最后调用的是UnixDirectoryStream的readNextEntry方法

static native byte[] readdir(long dir) throws UnixException;

JNIEXPORT jbyteArray JNICALL
Java_sun_nio_fs_UnixNativeDispatcher_readdir(JNIEnv* env, jclass this, jlong value) 
    struct dirent64* result;
    struct 
        struct dirent64 buf;
        char name_extra[PATH_MAX + 1 - sizeof result->d_name];
     entry;
    struct dirent64* ptr = &entry.buf;
    int res;
    DIR* dirp = jlong_to_ptr(value);

    /* EINTR not listed as a possible error */
    /* TDB: reentrant version probably not required here */
    res = readdir64_r(dirp, ptr, &result);

#ifdef _AIX
    /* On AIX, readdir_r() returns EBADF (i.e. '9') and sets 'result' to NULL for the */
    /* directory stream end. Otherwise, 'errno' will contain the error code. */
    if (res != 0) 
        res = (result == NULL && res == EBADF) ? 0 : errno;
    
#endif

    if (res != 0) 
        throwUnixException(env, res);
        return NULL;
     else 
        if (result == NULL) 
            return NULL;
         else 
            jsize len = strlen(ptr->d_name);
            jbyteArray bytes = (*env)->NewByteArray(env, len);
            if (bytes != NULL) 
                (*env)->SetByteArrayRegion(env, bytes, 0, len, (jbyte*)(ptr->d_name));
            
            return bytes;

UnixNativeDispatcher的native方法readdir

在readdir的JNI方法里调用C的函数readdir64_r，获取了char的文件名(char 数组)，并且把char的数组复制到java的byte数组中
用Path entry = dir.resolve(nameAsBytes);来构建Path

    @Override
    public UnixPath resolve(Path obj) 
        byte[] other = toUnixPath(obj).path;
        if (other.length > 0 && other[0] == '/')
            return ((UnixPath)obj);
        byte[] result = resolve(path, other);
        return new UnixPath(getFileSystem(), result);
    

    UnixPath resolve(byte[] other) 
        return resolve(new UnixPath(getFileSystem(), other));

我们看到是直接byte数组传入构建UnixPath，并没有byte到char的之间的转换

结论：

使用NIO的方式来构建的Path，并没有进行byte到char的编码转换，是可以避免因为字符编码导致的文件名不对的情况

直接用删除inode的方式，有操作系统的限制，同时Java并没有提供原生态的删除API

以上是关于从如何在Java删除乱码文件中开始说起的主要内容，如果未能解决你的问题，请参考以下文章