从如何在Java删除乱码文件中开始说起
Posted raintungli
tags:
篇首语:本文由小常识网(cha138.com)小编为大家整理,主要介绍了从如何在Java删除乱码文件中开始说起相关的知识,希望对你有一定的参考价值。
1. 乱码文件
为何会生成乱码文件?原因很多,在上传的过程中编码不对,操作提供不支持该编码等,如果你尝试用Java的File对象去删除文件,甚至判断是否存在该文件都会发现返回的都是False
String[]entries = file.list();
for(String s: entries)
File currentFile = new File(file.getPath(),s);
currentFile.delete();
因为File对象是通过file.getPath()来生成的,而file.getPath()是String, 在Java中是char是需要正确的字符编码方式,我们先来看File删除文件的方式
JNIEXPORT jboolean JNICALL
Java_java_io_UnixFileSystem_delete0(JNIEnv *env, jobject this,
jobject file)
jboolean rv = JNI_FALSE;
WITH_FIELD_PLATFORM_STRING(env, file, ids.path, path)
if (remove(path) == 0)
rv = JNI_TRUE;
END_PLATFORM_STRING(env, path);
return rv;
#define WITH_PLATFORM_STRING(env, strexp, var) \\
if (1) \\
const char *var; \\
jstring _##var##str = (strexp); \\
if (_##var##str == NULL) \\
JNU_ThrowNullPointerException((env), NULL); \\
goto _##var##end; \\
\\
var = JNU_GetStringPlatformChars((env), _##var##str, NULL); \\
if (var == NULL) goto _##var##end;
#define WITH_FIELD_PLATFORM_STRING(env, object, id, var) \\
WITH_PLATFORM_STRING(env, \\
((object == NULL) \\
? NULL \\
: (*(env))->GetObjectField((env), (object), (id))), \\
var)
调用了c的remove函数,其中path就是获取File.java对象里的String path对象,然后通过JNU_GetStringPlatformChars将String转化为C里的char
- 本质上最后调用了String.java中的getBytes()/getBytes(String charsetName),将String的char数组转化为byte数组
- 复制了Byte数组的值到c里的char数组
当charset不准确/不支持的情况下,java将byte转成char的时候错误,或者在获取char的时候转成byte的时候错误,导致了文件的无法正确删除
2. 如何才能删去文件?
在前面的分析后,主要的问题就是出在了C的char数组,转成Java的char数组编码导致的问题,那么如果要准确的删除文件很容易获取到的思路就是不转码,直接使用C里的char数组,当然Java里对应的是byte数组
通过查找,在NIO的接口里提供了新的Path对象,同时Files.delete方法里使用了Path对象,不在是File了
在Path对象里,我们看到了路径的保存方式
class UnixPath
extends AbstractPath
private static ThreadLocal<SoftReference<CharsetEncoder>> encoder =
new ThreadLocal<SoftReference<CharsetEncoder>>();
// FIXME - eliminate this reference to reduce space
private final UnixFileSystem fs;
// internal representation
private final byte[] path;
不在是File里的String path了,我们在来看Files.delete方法,最后调用了UnixFileSystemProvider.java的implDelete方法
@Override
boolean implDelete(Path obj, boolean failIfNotExists) throws IOException
UnixPath file = UnixPath.toUnixPath(obj);
file.checkDelete();
// need file attributes to know if file is directory
UnixFileAttributes attrs = null;
try
attrs = UnixFileAttributes.get(file, false);
if (attrs.isDirectory())
rmdir(file);
else
unlink(file);
return true;
catch (UnixException x)
// no-op if file does not exist
if (!failIfNotExists && x.errno() == ENOENT)
return false;
// DirectoryNotEmptyException if not empty
if (attrs != null && attrs.isDirectory() &&
(x.errno() == EEXIST || x.errno() == ENOTEMPTY))
throw new DirectoryNotEmptyException(file.getPathForExceptionMessage());
x.rethrowAsIOException(file);
return false;
static void unlink(UnixPath path) throws UnixException
NativeBuffer buffer = copyToNativeBuffer(path);
try
unlink0(buffer.address());
finally
buffer.release();
将UnixPath里的path byte数组复制到堆外内存,并将内存的地址传给了JNI的unlink0方法
JNIEXPORT void JNICALL
Java_sun_nio_fs_UnixNativeDispatcher_unlink0(JNIEnv* env, jclass this,
jlong pathAddress)
const char* path = (const char*)jlong_to_ptr(pathAddress);
/* EINTR not listed as a possible error */
if (unlink(path) == -1)
throwUnixException(env, errno);
在整个过程并没有进行解码的动作,操作的都是byte数组
3. File.getPath依然删不了
既然Path里操作的是byte数组,那么删除应该是没问题了,如果直接用File.getPath来获取Path?NO
我们来看File.getPath的方法
public Path toPath()
Path result = filePath;
if (result == null)
synchronized (this)
result = filePath;
if (result == null)
result = FileSystems.getDefault().getPath(path);
filePath = result;
return result;
@Override
public final Path getPath(String first, String... more)
String path;
if (more.length == 0)
path = first;
else
StringBuilder sb = new StringBuilder();
sb.append(first);
for (String segment: more)
if (segment.length() > 0)
if (sb.length() > 0)
sb.append('/');
sb.append(segment);
path = sb.toString();
return new UnixPath(this, path);
生成的Path里是通过传递的是String path,构建的,在UnixPath实现类里
UnixPath(UnixFileSystem fs, String input)
// removes redundant slashes and checks for invalid characters
this(fs, encode(fs, normalizeAndCheck(input)));
调用了encode进行了String转byte,这必然会出问题
4. NIO的方式删除
Files里提供了walktree可以遍历访问指定目录下的指定深度的文件和目录
Files.walkFileTree(path,new FileVisitor<Path>()
@Override
public FileVisitResult preVisitDirectory(Path dir, BasicFileAttributes attrs)
throws IOException
System.out.println("preVisitDirectory: " + dir);
return FileVisitResult.CONTINUE;
@Override
public FileVisitResult visitFile(Path file, BasicFileAttributes attrs)
throws IOException
System.out.println("visitFile: " + file);
Files.delete(file);
return FileVisitResult.CONTINUE;
@Override
public FileVisitResult visitFileFailed(Path file, IOException exc)
throws IOException
System.out.println("visitFileFailed: " + file);
return FileVisitResult.CONTINUE;
@Override
public FileVisitResult postVisitDirectory(Path dir, IOException exc)
throws IOException
System.out.println("postVisitDirectory: " + dir);
Files.delete(dir);
return FileVisitResult.CONTINUE;
);
在WalkTree的场景下,会保存目录到Stack结构中,最后在postVisitDirectory的时候从Stack中取出目录,这是一个非常有用的结构,当你要循环的删除一个目录,如果你在初期就删除目录,当该目录下有文件,你是无法删除成功的,所以需要Stack结构,当第一次访问的时候Push Stack,当该目录下的每个文件都访问过后(删除文件),才最后在Pop Stack目录,访问该目录的时候删除目录,这样避免了删除目录要先删除目录下的所有文件,在删除所有的目录的两次轮训
* Start walking from the given file.
*/
Event walk(Path file)
if (closed)
throw new IllegalStateException("Closed");
Event ev = visit(file,
false, // ignoreSecurityException
false); // canUseCached
assert ev != null;
return ev;
在遍历的目录时候其中就是
Event walk(Path file)
if (closed)
throw new IllegalStateException("Closed");
Event ev = visit(file,
false, // ignoreSecurityException
false); // canUseCached
assert ev != null;
return ev;
Event next()
DirectoryNode top = stack.peek();
if (top == null)
return null; // stack is empty, we are done
// continue iteration of the directory at the top of the stack
Event ev;
do
Path entry = null;
IOException ioe = null;
// get next entry in the directory
if (!top.skipped())
Iterator<Path> iterator = top.iterator();
try
if (iterator.hasNext())
entry = iterator.next();
catch (DirectoryIteratorException x)
ioe = x.getCause();
// no next entry so close and pop directory, creating corresponding event
if (entry == null)
try
top.stream().close();
catch (IOException e)
if (ioe != null)
ioe = e;
else
ioe.addSuppressed(e);
stack.pop();
return new Event(EventType.END_DIRECTORY, top.directory(), ioe);
// visit the entry
ev = visit(entry,
true, // ignoreSecurityException
true); // canUseCached
while (ev == null);
return ev;
private Event visit(Path entry, boolean ignoreSecurityException, boolean canUseCached)
// need the file attributes
BasicFileAttributes attrs;
try
attrs = getAttributes(entry, canUseCached);
catch (IOException ioe)
return new Event(EventType.ENTRY, entry, ioe);
catch (SecurityException se)
if (ignoreSecurityException)
return null;
throw se;
// at maximum depth or file is not a directory
int depth = stack.size();
if (depth >= maxDepth || !attrs.isDirectory())
return new Event(EventType.ENTRY, entry, attrs);
// check for cycles when following links
if (followLinks && wouldLoop(entry, attrs.fileKey()))
return new Event(EventType.ENTRY, entry,
new FileSystemLoopException(entry.toString()));
// file is a directory, attempt to open it
DirectoryStream<Path> stream = null;
try
stream = Files.newDirectoryStream(entry);
catch (IOException ioe)
return new Event(EventType.ENTRY, entry, ioe);
catch (SecurityException se)
if (ignoreSecurityException)
return null;
throw se;
// push a directory node to the stack and return an event
stack.push(new DirectoryNode(entry, attrs.fileKey(), stream));
return new Event(EventType.START_DIRECTORY, entry, attrs);
@Override
public DirectoryStream<Path> newDirectoryStream(Path obj, DirectoryStream.Filter<? super Path> filter)
throws IOException
UnixPath dir = UnixPath.toUnixPath(obj);
dir.checkRead();
if (filter == null)
throw new NullPointerException();
// can't return SecureDirectoryStream on kernels that don't support openat
// or O_NOFOLLOW
if (!openatSupported() || O_NOFOLLOW == 0)
try
long ptr = opendir(dir);
return new UnixDirectoryStream(dir, ptr, filter);
catch (UnixException x)
if (x.errno() == ENOTDIR)
throw new NotDirectoryException(dir.getPathForExceptionMessage());
x.rethrowAsIOException(dir);
// open directory and dup file descriptor for use by
// opendir/readdir/closedir
int dfd1 = -1;
int dfd2 = -1;
long dp = 0L;
try
dfd1 = open(dir, O_RDONLY, 0);
dfd2 = dup(dfd1);
dp = fdopendir(dfd1);
catch (UnixException x)
if (dfd1 != -1)
UnixNativeDispatcher.close(dfd1);
if (dfd2 != -1)
UnixNativeDispatcher.close(dfd2);
if (x.errno() == UnixConstants.ENOTDIR)
throw new NotDirectoryException(dir.getPathForExceptionMessage());
x.rethrowAsIOException(dir);
return new UnixSecureDirectoryStream(dir, dp, dfd2, filter);
@Override
public synchronized Path next()
Path result;
if (nextEntry == null && !atEof)
result = readNextEntry();
else
result = nextEntry;
nextEntry = null;
if (result == null)
throw new NoSuchElementException();
return result;
private Path readNextEntry()
assert Thread.holdsLock(this);
for (;;)
byte[] nameAsBytes = null;
// prevent close while reading
readLock().lock();
try
if (isOpen())
nameAsBytes = readdir(dp);
catch (UnixException x)
IOException ioe = x.asIOException(dir);
throw new DirectoryIteratorException(ioe);
finally
readLock().unlock();
// EOF
if (nameAsBytes == null)
atEof = true;
return null;
// ignore "." and ".."
if (!isSelfOrParent(nameAsBytes))
Path entry = dir.resolve(nameAsBytes);
// return entry if no filter or filter accepts it
try
if (filter == null || filter.accept(entry))
return entry;
catch (IOException ioe)
throw new DirectoryIteratorException(ioe);
返回的是UnixSecureDirectoryStream,迭代的next最后调用的是UnixDirectoryStream的readNextEntry方法
static native byte[] readdir(long dir) throws UnixException;
JNIEXPORT jbyteArray JNICALL
Java_sun_nio_fs_UnixNativeDispatcher_readdir(JNIEnv* env, jclass this, jlong value)
struct dirent64* result;
struct
struct dirent64 buf;
char name_extra[PATH_MAX + 1 - sizeof result->d_name];
entry;
struct dirent64* ptr = &entry.buf;
int res;
DIR* dirp = jlong_to_ptr(value);
/* EINTR not listed as a possible error */
/* TDB: reentrant version probably not required here */
res = readdir64_r(dirp, ptr, &result);
#ifdef _AIX
/* On AIX, readdir_r() returns EBADF (i.e. '9') and sets 'result' to NULL for the */
/* directory stream end. Otherwise, 'errno' will contain the error code. */
if (res != 0)
res = (result == NULL && res == EBADF) ? 0 : errno;
#endif
if (res != 0)
throwUnixException(env, res);
return NULL;
else
if (result == NULL)
return NULL;
else
jsize len = strlen(ptr->d_name);
jbyteArray bytes = (*env)->NewByteArray(env, len);
if (bytes != NULL)
(*env)->SetByteArrayRegion(env, bytes, 0, len, (jbyte*)(ptr->d_name));
return bytes;
UnixNativeDispatcher的native方法readdir
- 在readdir的JNI方法里调用C的函数readdir64_r,获取了char的文件名(char 数组),并且把char的数组复制到java的byte数组中
- 用Path entry = dir.resolve(nameAsBytes);来构建Path
@Override
public UnixPath resolve(Path obj)
byte[] other = toUnixPath(obj).path;
if (other.length > 0 && other[0] == '/')
return ((UnixPath)obj);
byte[] result = resolve(path, other);
return new UnixPath(getFileSystem(), result);
UnixPath resolve(byte[] other)
return resolve(new UnixPath(getFileSystem(), other));
我们看到是直接byte数组传入构建UnixPath,并没有byte到char的之间的转换
结论:
使用NIO的方式来构建的Path,并没有进行byte到char的编码转换,是可以避免因为字符编码导致的文件名不对的情况
直接用删除inode的方式,有操作系统的限制,同时Java并没有提供原生态的删除API
以上是关于从如何在Java删除乱码文件中开始说起的主要内容,如果未能解决你的问题,请参考以下文章