如何从 Fortran 模式匹配的行开始读取数据?

Posted

技术标签:

【中文标题】如何从 Fortran 模式匹配的行开始读取数据?【英文标题】:How to READ data starting from a pattern-matched line with Fortran? 【发布时间】:2020-11-02 10:00:30 【问题描述】:

我有一个文件pos.xyz,格式如下,其中i = 6等代表帧索引。 (这里,第一帧有i = 6。一般情况下,第一帧的索引可以是i = 0i = 1,或者i = 2,...) 我想实现一个函数:对于 任意两个给定整数 ab,(a<b,例如 7 和 9),请阅读 将帧索引 7 到 9 的数据放入一个数组中。你能给我一个关于如何实现这个想法的建议吗?

    4
i =    6, time =      3.000, E =     -205.1846561900
O         2.6028572470        4.1666579520       12.7865910725
O         6.5415232423        8.8963227363       17.7533721708
O        15.6020396800       11.9022808314       15.2930838049
O        11.2843786793       13.2653367176       13.8186352548
    4
i =    7, time =    3.500, E =     -205.1845561905
O         5.1072569275       11.9945026418        4.1254340934
O         2.5299942732       11.4124710424        9.5495912455
O        14.8837181647       12.6571252157        7.8905997802
O        15.1684493877       10.7315923081        2.6631494700
       4
i =    8, time =    4.000, E =     -205.1846261900
O         2.6028572470        4.1666579520       12.7865910725
O         6.5415232423        8.8963227363       17.7533721708
O        15.6020396800       11.9922808314       15.2930838049
O        11.2843786793       13.2653367176       13.8186352548
    4
i =    9, time =    4.500, E =     -205.1846561805
O         5.1072569375       11.9945026418        4.1258340934
O         2.5299942732       11.4124710424        9.5495912455
O        14.8837181647       12.6570252157        7.8905997802
O        15.1684493877       10.7310923081        2.6630494700
    4
i =   10, time =    5.000, E =     -205.1846551805
O         5.1072569275       11.9945026418        4.1254340934
O         2.5299932732       11.4129710424        9.5495912455
O        14.8837181647       12.6571252157        7.8905997802
O        15.1684473877       10.7313923081        2.6631494700

我做了什么:对于以i = 0 作为第一帧的特殊案例。例如,如果我想从第 3 帧开始读取,我可以先跳过(m+2)*(3-1) 行,然后读取数据,m=4。函数如下。

  SUBROUTINE skip_lines(indx, i_input)
    ! Purpose: 
    ! To skip lines when read data from the input
    IMPLICIT NONE
    INTEGER :: i
    INTEGER,INTENT(IN) :: i_input,indx
    do i=1,i_input
       read(indx,*) !Neglect (nat+2)*(ns-1) lines
    enddo    
 END SUBROUTINE skip_lines

但是对于一般情况,如果第一帧有一个非零帧数,这个想法是没有效率的。我希望找到更好的方法来实现它。

【问题讨论】:

有几种方法可以做到这一点(包括读入line buffer and testing),但像read(unit, '(A3,I6)') x, y; if (x=='i ='.and.(y>a.and.y<b) ...) 这样简单的方法可能就足够了。请研究这些方法,看看您是否需要我们的进一步帮助。 文件的格式是否像样本所暗示的那样固定和规则?在文件的第二行中找到i 的值是唯一(或主要)问题吗? @HighPerformanceMark 是的,如示例所示,格式是常规的;是的,如果我能找到i的位置,那么我认为可以使用BACKSPACE两次读取i = a所在的块。 所以您阅读并忽略第一行,然后按照@francescalus 的评论建议阅读第二行,正如他们所说,鲍勃是您母亲的兄弟。 它读取一行然后测试它是否看起来像一行" i = a"(如果我们认为条件为y>=a)。如果您的行确实看起来像那样,那么将接下来的几行作为匹配框架读取;如果没有,请转到下一行并重新测试。就其本身而言,一条语句并不能解决您的问题,但没有一条语句可以解决您的问题。 【参考方案1】:
gfortran -Wall -fcheck=all parameter_shared.f95 atom_module.f95 traj.f95 sample.f95 test.f95 -o test.x
! test.f95
PROGRAM test 
  ! Purpose: To read data starting from any block.
  USE atom_module
  IMPLICIT NONE
  !==========
  !parameters
  !==========
  INTEGER :: ns  ! Get one sample from the trajectory every ns step.
  INTEGER :: nmo_start
  INTEGER :: nmo_end
  INTEGER :: nat  ! number of atoms
  REAL(kind=4) :: delta_t0   ! For reading data
  character(LEN=200) :: pos_filename
  !===============
  ! Initialization
  delta_t0 = 0.0005; ns = 2
  nmo_start = 7; nmo_end = 10 
  nat = 4; pos_filename="pos.xyz"
  !========================
  ! Sampling the trajectory
  CALL sample(pos_filename,nmo_start,nmo_end,nat,ns)
END PROGRAM test
! sample.f95
SUBROUTINE sample(pos_filename,nmo_start,nmo_end,nat,ns)
  USE parameter_shared
  USE atom_module, ONLY: atom_info
  USE traj
  IMPLICIT NONE
  !==========
  !Parameters
  !==========
  character(LEN=*), INTENT(IN) :: pos_filename
  INTEGER, INTENT(IN) :: nmo_start  
  INTEGER, INTENT(IN) :: nmo_end  
  INTEGER, INTENT(IN) :: nat ! number of atoms
  INTEGER, INTENT(IN) :: ns  ! Get one sample from the trajectory every ns step.
  !Local varables
  INTEGER :: n_samples  !n_samples = INT(nmo/ns)
  INTEGER :: iatom,imovie,i
  !Initialization
  iatom = 0; imovie =0; i =0
  ! Obatin n_samples
  n_samples = sampling_number(nmo_start,nmo_end,ns)
  allocate(sampled_movie(n_samples))
  allocate(sampled_time(n_samples))
  allocate(sampled_energy(n_samples))
  !=======================
  !read in trajectory file 
  !=======================
  open(10,file=trim(pos_filename))
  CALL read_traj(10,nmo_start,nmo_end,ns,nat,n_samples) 
  close(10)
  write(6,*) 'End of trajectory reading.'
  !=============
  !write in file
  !=============
  sampled_pos_filename = 'pos_sampled.xyz'
  open(10,file=sampled_pos_filename)
  do i =1,n_samples
    write (10,'(I8)') nat
    WRITE(10,100) 'i =',i-1,', time =',sampled_time(i),', E =',sampled_energy(i)
    100 FORMAT (1X,A3,I10,A8,F10.3,A5,F20.10)
    DO iatom = 1, nat
       WRITE(10,*) TRIM(atom_info(iatom, i)%atom_name), &
        atom_info(iatom,i)%coord(1), &
        atom_info(iatom,i)%coord(2), &
        atom_info(iatom,i)%coord(3)
    ENDDO
  enddo
  write(6,*)'Sampled trajectory is written in: ', sampled_pos_filename
  close(10)
  deallocate(sampled_movie, sampled_time,sampled_energy)
END SUBROUTINE sample
MODULE traj
IMPLICIT NONE
CONTAINS
  INTEGER FUNCTION sampling_number(nmo_start,nmo_end,ns)
    !To calculate the total numbers of samples one want to include 
    INTEGER,INTENT(IN) :: ns  ! Get one sample from the trajectory every ns step.
    INTEGER,INTENT(IN) :: nmo_start, nmo_end  

    write(*,*) 'In function sampling_number: nmo_end = ', nmo_end
    positive: IF (nmo_end <0 .OR. nmo_start < 0 .OR. ns <0) THEN
      write(*,*) 'Please enter non-negative values for the ns, starting step and ending step.'
    ELSE IF (nmo_end < nmo_start) THEN
      write(*,*) 'Please note that starting step shoud not larger than  ending step.'
    ELSE IF (ns ==0) THEN
      sampling_number = nmo_end-(nmo_start-1)
    ELSE IF (nmo_end-(nmo_start-1) <= ns) THEN
      sampling_number = INT((nmo_end-(nmo_start-1))/ns + 1)
    ELSE IF (nmo_end-(nmo_start-1) > ns) THEN
      sampling_number = INT((nmo_end-(nmo_start-1))/ns)
    END IF positive
  END FUNCTION sampling_number

  SUBROUTINE read_traj(indx,nmo_start,nmo_end,ns,nat,n_samples)
    ! Purpose: to READ data starting from a pattern-matched line.
    USE atom_module, ONLY: atom_info
    USE parameter_shared, ONLY: sampled_movie, sampled_time, sampled_energy

    INTEGER :: iatom,i_sample
    INTEGER, INTENT(IN) :: nat
    INTEGER, INTENT(IN) :: n_samples  !n_samples = INT(nmo/ns)
    INTEGER, INTENT(IN) :: indx
    INTEGER, INTENT(IN) :: ns  ! Get one sample from the trajectory every ns step.
    INTEGER, INTENT(IN) :: nmo_start, nmo_end  ! To get the total number of moves
    CHARACTER(LEN=4) :: head_char
    INTEGER :: y

    allocate(atom_info(nat,n_samples))
    i_sample = 1
    write(*,*) "read_traj(): New total time steps (n_samples):", n_samples
    DO WHILE (i_sample < n_samples+1) ! +1 means i_sample can take the value of n_samples 
        read(indx, '(A4)') head_char
        PRE_CHECK:IF (head_char=="i = ") THEN
            BACKSPACE(UNIT=indx) ! Because I am not able to read other lines with the format '(A4,I8)', and have not find any good way, so I try to read it in '(A4)' first 
            read(indx, '(A4,I8)') head_char, y
            CHECK_HEAD:IF (head_char=="i = " .AND. (y>nmo_start-1 .and. y<nmo_end+1) .AND. MOD(y-(nmo_start-1),ns) == 1) THEN
                WRITE(*,*)"read_traj():", head_char, y
                BACKSPACE(UNIT=indx) ! Because we have to read the whole line with ' i = ' line.
                read(indx,130) sampled_movie(i_sample), sampled_time(i_sample), sampled_energy(i_sample)
                130 FORMAT (4X,I8,9X,F12.3,6X,F20.10)
                131 FORMAT (A4,3F20.10)
                inner: do iatom= 1,nat
                  read (indx,131) atom_info(iatom, i_sample)%atom_name, atom_info(iatom,i_sample)%coord(1), & 
                    atom_info(iatom,i_sample)%coord(2), atom_info(iatom,i_sample)%coord(3)
                enddo inner
                i_sample = i_sample + 1 
            ENDIF CHECK_HEAD
        ENDIF PRE_CHECK
    END DO
  END SUBROUTINE read_traj
END MODULE traj
MODULE atom_module
! To define the derived data type for atom
IMPLICIT NONE
TYPE :: atom
  CHARACTER(LEN=2) :: atom_name
  INTEGER :: atom_id 
  INTEGER :: host_id  ! For O atom in water, host_id = atom_id
  REAL :: mass
  REAL, DIMENSION(3) :: coord 
END TYPE atom

! The array atom_info can be shared by subroutines  
TYPE(atom), ALLOCATABLE, DIMENSION(:,:) :: atom_info
END MODULE atom_module
MODULE parameter_shared
!
! Purpose:
!   To declare data to share between routines.
IMPLICIT NONE
!SAVE 
character(LEN=200) :: sampled_pos_filename
INTEGER, ALLOCATABLE, DIMENSION(:) :: sampled_movie
REAL, ALLOCATABLE, DIMENSION(:) :: sampled_time, sampled_energy
END MODULE parameter_shared

【讨论】:

【参考方案2】:

感谢@francescalus 和@High Performance Mark 的建议。我使用 DO WHILE 循环,并实现了我的想法。我把我的子程序的一个简化版本放在这里。它包括在模块中定义的一些类型,这不是这里重要的事情。现在,它可以 (1) 读取任意步a到任意步b的轨迹文件,其中ab由用户给出; (2) 每ns步读取数据。

    SUBROUTINE read_traj(indx,nmo_start,nmo_end,ns,nat,n_samples)
    ! goal:
    ! read info from the trajectory file (format: ***.xyz)
    ! read data from frame a to frame b
    USE atom_module
    USE parameter_shared
    INTEGER :: iatom, i_sample
    INTEGER, PARAMETER:: nat = 4
    INTEGER :: n_samples  !n_samples = INT((a-b)/ns)
    INTEGER, PARAMETER :: indx = 10
    INTEGER, PARAMETER :: ns = 2  ! read one sample from the trajectory every ns step.
    INTEGER, PARAMETER :: a =7 
    INTEGER, PARAMETER :: b=10  
    CHARACTER(LEN=4) :: x
    INTEGER :: y

    allocate(atom_info(nat,n_samples))
    i_sample = 1
    DO WHILE (i_sample < n_samples) 
        read(indx, '(A3,I5)') x, y
        CHECK: IF (head_char=="i = " .AND. (y>a-1 .and. y<b+1) .AND. MOD(y-(a-1),ns) == 1) THEN
          WRITE(*,*)"head_char and y:", x, y
          BACKSPACE(UNIT=indx) ! we have to read the whole line with ' i = ' line.
          read(indx,120) sampled_movie(i_sample), sampled_time(i_sample), sampled_energy(i_sample)
          120 FORMAT (3X,I5,8X,F9.3,5X,F20.10)
          inner: do iatom= 1,nat
            read (indx,*) atom_info(iatom, i_sample)%atom_name, atom_info(iatom,i_sample)%coord(1), & 
              atom_info(iatom,i_sample)%coord(2), atom_info(iatom,i_sample)%coord(3)
          enddo inner
          i_sample = i_sample + 1 
        ENDIF CHECK
    END DO
  END SUBROUTINE read_traj

【讨论】:

我很惊讶该代码可以编译 - 我原以为无法声明常规参数(例如 indxnatetc),在例程中,为parameters。即使您以某种方式说服编译器接受该代码,在例程中声明参数并(尝试)将它们也作为参数传递也没有任何意义。 确实,我承认这段代码不完整,无法编译。因为我一开始只是想展示我解决问题的想法。它也不包括模块。我添加了另一个答案,其中包含所有详细信息并且可以编译。主要部分是模块traj中的函数read_traj()中的DO WHILE循环。

以上是关于如何从 Fortran 模式匹配的行开始读取数据?的主要内容,如果未能解决你的问题,请参考以下文章

Fortran 无法从文件中读取

从正则表达式模式返回不匹配的行

在 Unix 提示符下,如何从与模式匹配的文件中提取可变数量的行(可能包括空行)?

文本处理三剑客之 Sed ——高级编辑命令

fortran读取一行数据

从fortran文件中读取变量